* [PATCH v1 01/15] xen/riscv: introduce struct arch_vcpu
2025-12-24 17:03 [PATCH v1 00/15] xen/riscv: introduce vtimer related things Oleksii Kurochko
@ 2025-12-24 17:03 ` Oleksii Kurochko
2026-01-05 16:58 ` Jan Beulich
2025-12-24 17:03 ` [PATCH v1 02/15] xen/riscv: implement arch_vcpu_{create,destroy}() Oleksii Kurochko
` (13 subsequent siblings)
14 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2025-12-24 17:03 UTC (permalink / raw)
To: xen-devel
Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Introduce structure with VCPU's registers which describes its state.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/include/asm/domain.h | 58 ++++++++++++++++++++++++++++-
1 file changed, 56 insertions(+), 2 deletions(-)
diff --git a/xen/arch/riscv/include/asm/domain.h b/xen/arch/riscv/include/asm/domain.h
index 316e7c6c8448..639cafdade99 100644
--- a/xen/arch/riscv/include/asm/domain.h
+++ b/xen/arch/riscv/include/asm/domain.h
@@ -22,9 +22,63 @@ struct hvm_domain
struct arch_vcpu_io {
};
-struct arch_vcpu {
+struct arch_vcpu
+{
struct vcpu_vmid vmid;
-};
+
+ /* Xen's state: Callee-saved registers and tp, gp, ra */
+ struct
+ {
+ register_t s0;
+ register_t s1;
+ register_t s2;
+ register_t s3;
+ register_t s4;
+ register_t s5;
+ register_t s6;
+ register_t s7;
+ register_t s8;
+ register_t s9;
+ register_t s10;
+ register_t s11;
+
+ register_t sp;
+ register_t gp;
+
+ /* ra is used to jump to guest when creating new vcpu */
+ register_t ra;
+ } xen_saved_context;
+
+ /* CSRs */
+ register_t hstatus;
+ register_t hedeleg;
+ register_t hideleg;
+ register_t hvip;
+ register_t hip;
+ register_t hie;
+ register_t hgeie;
+ register_t henvcfg;
+ register_t hcounteren;
+ register_t htimedelta;
+ register_t htval;
+ register_t htinst;
+ register_t hstateen0;
+#ifdef CONFIG_RISCV_32
+ register_t henvcfgh;
+ register_t htimedeltah;
+#endif
+
+ /* VCSRs */
+ register_t vsstatus;
+ register_t vsip;
+ register_t vsie;
+ register_t vstvec;
+ register_t vsscratch;
+ register_t vscause;
+ register_t vstval;
+ register_t vsatp;
+ register_t vsepc;
+} __cacheline_aligned;
struct paging_domain {
spinlock_t lock;
--
2.52.0
^ permalink raw reply related [flat|nested] 93+ messages in thread* Re: [PATCH v1 01/15] xen/riscv: introduce struct arch_vcpu
2025-12-24 17:03 ` [PATCH v1 01/15] xen/riscv: introduce struct arch_vcpu Oleksii Kurochko
@ 2026-01-05 16:58 ` Jan Beulich
2026-01-06 14:19 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-05 16:58 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 24.12.2025 18:03, Oleksii Kurochko wrote:
> Introduce structure with VCPU's registers which describes its state.
>
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Since none of this is being used for the time being, I think the description
wants to be a little less terse. Coming from the x86 (rather then the Arm)
side, I find the arrangements irritating. And even when comparing to Arm, ...
> --- a/xen/arch/riscv/include/asm/domain.h
> +++ b/xen/arch/riscv/include/asm/domain.h
> @@ -22,9 +22,63 @@ struct hvm_domain
> struct arch_vcpu_io {
> };
>
> -struct arch_vcpu {
> +struct arch_vcpu
> +{
> struct vcpu_vmid vmid;
> -};
> +
> + /* Xen's state: Callee-saved registers and tp, gp, ra */
... I don't think the following structure describes "Xen's state". On Arm
it's guest controlled register values which are being saved afaict. I
would then expect the same to become the case for RISC-V.
> + struct
> + {
> + register_t s0;
> + register_t s1;
> + register_t s2;
> + register_t s3;
> + register_t s4;
> + register_t s5;
> + register_t s6;
> + register_t s7;
> + register_t s8;
> + register_t s9;
> + register_t s10;
> + register_t s11;
> +
> + register_t sp;
> + register_t gp;
> +
> + /* ra is used to jump to guest when creating new vcpu */
> + register_t ra;
> + } xen_saved_context;
The xen_ prefix here also doesn't exist in Arm code. Nor is there a
similar, partly potentially misleading comment on "pc" there
comparable to the one that you added for "ra". ("Potentially
misleading" because what is being described is, aiui, not the only
and not even the main purpose of the field.)
> + /* CSRs */
> + register_t hstatus;
> + register_t hedeleg;
> + register_t hideleg;
> + register_t hvip;
> + register_t hip;
> + register_t hie;
> + register_t hgeie;
> + register_t henvcfg;
> + register_t hcounteren;
> + register_t htimedelta;
> + register_t htval;
> + register_t htinst;
> + register_t hstateen0;
> +#ifdef CONFIG_RISCV_32
> + register_t henvcfgh;
> + register_t htimedeltah;
> +#endif
> +
> + /* VCSRs */
> + register_t vsstatus;
> + register_t vsip;
> + register_t vsie;
> + register_t vstvec;
> + register_t vsscratch;
> + register_t vscause;
> + register_t vstval;
> + register_t vsatp;
> + register_t vsepc;
> +} __cacheline_aligned;
Why this attribute?
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 01/15] xen/riscv: introduce struct arch_vcpu
2026-01-05 16:58 ` Jan Beulich
@ 2026-01-06 14:19 ` Oleksii Kurochko
2026-01-06 14:26 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-06 14:19 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/5/26 5:58 PM, Jan Beulich wrote:
> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>> Introduce structure with VCPU's registers which describes its state.
>>
>> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
> Since none of this is being used for the time being, I think the description
> wants to be a little less terse. Coming from the x86 (rather then the Arm)
> side, I find the arrangements irritating. And even when comparing to Arm, ...
>
>> --- a/xen/arch/riscv/include/asm/domain.h
>> +++ b/xen/arch/riscv/include/asm/domain.h
>> @@ -22,9 +22,63 @@ struct hvm_domain
>> struct arch_vcpu_io {
>> };
>>
>> -struct arch_vcpu {
>> +struct arch_vcpu
>> +{
>> struct vcpu_vmid vmid;
>> -};
>> +
>> + /* Xen's state: Callee-saved registers and tp, gp, ra */
> ... I don't think the following structure describes "Xen's state". On Arm
> it's guest controlled register values which are being saved afaict. I
> would then expect the same to become the case for RISC-V.
I think this is not fully correct, because guest-controlled registers on
Arm are allocated on the stack [1][2].
Regarding|xen_saved_context| (or|saved_context| on Arm, which I used as a base),
I think|xen_saved_context| is a slightly better name. Looking at how the
|saved_context| structure is used on Arm [3], it can be concluded that
|__context_switch()| switches only Xen’s internal context. What actually happens is
that|__context_switch()| is called while running on the previous vCPU’s stack
and returns on the next vCPU’s stack. Therefore, it is necessary to have
the correct register values stored in the|saved_context| structure in order
to continue Xen’s execution when it later returns to the previous stack.
Probably I need to introduce|__context_switch()| in this patch series for RISC-V
now; I hope this will clarify things better. At the moment, it looks like [4].
[1] https://elixir.bootlin.com/xen/v4.21.0/source/xen/arch/arm/include/asm/arm64/processor.h#L14
[2] https://elixir.bootlin.com/xen/v4.21.0/source/xen/arch/arm/domain.c#L547
[3] https://elixir.bootlin.com/xen/v4.21.0/source/xen/arch/arm/arm64/entry.S#L650
[4] https://gitlab.com/xen-project/people/olkur/xen/-/blob/riscv-next-upstreaming/xen/arch/riscv/entry.S?ref_type=heads#L153
>
>> + struct
>> + {
>> + register_t s0;
>> + register_t s1;
>> + register_t s2;
>> + register_t s3;
>> + register_t s4;
>> + register_t s5;
>> + register_t s6;
>> + register_t s7;
>> + register_t s8;
>> + register_t s9;
>> + register_t s10;
>> + register_t s11;
>> +
>> + register_t sp;
>> + register_t gp;
>> +
>> + /* ra is used to jump to guest when creating new vcpu */
>> + register_t ra;
>> + } xen_saved_context;
> The xen_ prefix here also doesn't exist in Arm code.
I think it should be added for Arm too. I can send a patch.
> Nor is there a
> similar, partly potentially misleading comment on "pc" there
> comparable to the one that you added for "ra". ("Potentially
> misleading" because what is being described is, aiui, not the only
> and not even the main purpose of the field.)
Yes, the purpose of|ra| here is not just to jump to the new vCPU code
(|continue_new_vcpu()|). It is used that way only the first time;
afterwards,|ra| will simply point to the next instruction after the
call to|__context_switch()| in|context_switch()| [5].
[5] https://gitlab.com/xen-project/people/olkur/xen/-/blob/riscv-next-upstreaming/xen/arch/riscv/domain.c?ref_type=heads#L463
>
>> + /* CSRs */
>> + register_t hstatus;
>> + register_t hedeleg;
>> + register_t hideleg;
>> + register_t hvip;
>> + register_t hip;
>> + register_t hie;
>> + register_t hgeie;
>> + register_t henvcfg;
>> + register_t hcounteren;
>> + register_t htimedelta;
>> + register_t htval;
>> + register_t htinst;
>> + register_t hstateen0;
>> +#ifdef CONFIG_RISCV_32
>> + register_t henvcfgh;
>> + register_t htimedeltah;
>> +#endif
>> +
>> + /* VCSRs */
>> + register_t vsstatus;
>> + register_t vsip;
>> + register_t vsie;
>> + register_t vstvec;
>> + register_t vsscratch;
>> + register_t vscause;
>> + register_t vstval;
>> + register_t vsatp;
>> + register_t vsepc;
>> +} __cacheline_aligned;
> Why this attribute?
As arch_vcpu structure is accessed pretty often I thought it would
be nice to have it cache-aligned so some accesses would be faster
and something like false sharing won't happen.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 01/15] xen/riscv: introduce struct arch_vcpu
2026-01-06 14:19 ` Oleksii Kurochko
@ 2026-01-06 14:26 ` Jan Beulich
2026-01-06 14:59 ` Andrew Cooper
2026-01-06 15:05 ` Oleksii Kurochko
0 siblings, 2 replies; 93+ messages in thread
From: Jan Beulich @ 2026-01-06 14:26 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 06.01.2026 15:19, Oleksii Kurochko wrote:
> On 1/5/26 5:58 PM, Jan Beulich wrote:
>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>> Introduce structure with VCPU's registers which describes its state.
>>>
>>> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>> Since none of this is being used for the time being, I think the description
>> wants to be a little less terse. Coming from the x86 (rather then the Arm)
>> side, I find the arrangements irritating. And even when comparing to Arm, ...
>>
>>> --- a/xen/arch/riscv/include/asm/domain.h
>>> +++ b/xen/arch/riscv/include/asm/domain.h
>>> @@ -22,9 +22,63 @@ struct hvm_domain
>>> struct arch_vcpu_io {
>>> };
>>>
>>> -struct arch_vcpu {
>>> +struct arch_vcpu
>>> +{
>>> struct vcpu_vmid vmid;
>>> -};
>>> +
>>> + /* Xen's state: Callee-saved registers and tp, gp, ra */
>> ... I don't think the following structure describes "Xen's state". On Arm
>> it's guest controlled register values which are being saved afaict. I
>> would then expect the same to become the case for RISC-V.
>
> I think this is not fully correct, because guest-controlled registers on
> Arm are allocated on the stack [1][2].
I'll admit that I should have said "possibly guest-controlled". Callee-
saved registers may or may not be used in functions, and if one isn't
used throughout the call-stack reaching __context_switch(), it would
still hold whatever the guest had put there.
> Regarding|xen_saved_context| (or|saved_context| on Arm, which I used as a base),
> I think|xen_saved_context| is a slightly better name. Looking at how the
> |saved_context| structure is used on Arm [3], it can be concluded that
> |__context_switch()| switches only Xen’s internal context. What actually happens is
> that|__context_switch()| is called while running on the previous vCPU’s stack
> and returns on the next vCPU’s stack. Therefore, it is necessary to have
> the correct register values stored in the|saved_context| structure in order
> to continue Xen’s execution when it later returns to the previous stack.
For this and ...
> Probably I need to introduce|__context_switch()| in this patch series for RISC-V
> now; I hope this will clarify things better. At the moment, it looks like [4].
>
> [1] https://elixir.bootlin.com/xen/v4.21.0/source/xen/arch/arm/include/asm/arm64/processor.h#L14
> [2] https://elixir.bootlin.com/xen/v4.21.0/source/xen/arch/arm/domain.c#L547
>
> [3] https://elixir.bootlin.com/xen/v4.21.0/source/xen/arch/arm/arm64/entry.S#L650
>
> [4] https://gitlab.com/xen-project/people/olkur/xen/-/blob/riscv-next-upstreaming/xen/arch/riscv/entry.S?ref_type=heads#L153
>
>>
>>> + struct
>>> + {
>>> + register_t s0;
>>> + register_t s1;
>>> + register_t s2;
>>> + register_t s3;
>>> + register_t s4;
>>> + register_t s5;
>>> + register_t s6;
>>> + register_t s7;
>>> + register_t s8;
>>> + register_t s9;
>>> + register_t s10;
>>> + register_t s11;
>>> +
>>> + register_t sp;
>>> + register_t gp;
>>> +
>>> + /* ra is used to jump to guest when creating new vcpu */
>>> + register_t ra;
>>> + } xen_saved_context;
>> The xen_ prefix here also doesn't exist in Arm code.
>
> I think it should be added for Arm too. I can send a patch.
... this, to reword my comment: What value does the xen_ prefix add?
>> Nor is there a
>> similar, partly potentially misleading comment on "pc" there
>> comparable to the one that you added for "ra". ("Potentially
>> misleading" because what is being described is, aiui, not the only
>> and not even the main purpose of the field.)
>
> Yes, the purpose of|ra| here is not just to jump to the new vCPU code
> (|continue_new_vcpu()|). It is used that way only the first time;
> afterwards,|ra| will simply point to the next instruction after the
> call to|__context_switch()| in|context_switch()| [5].
>
> [5] https://gitlab.com/xen-project/people/olkur/xen/-/blob/riscv-next-upstreaming/xen/arch/riscv/domain.c?ref_type=heads#L463
>
>>
>>> + /* CSRs */
>>> + register_t hstatus;
>>> + register_t hedeleg;
>>> + register_t hideleg;
>>> + register_t hvip;
>>> + register_t hip;
>>> + register_t hie;
>>> + register_t hgeie;
>>> + register_t henvcfg;
>>> + register_t hcounteren;
>>> + register_t htimedelta;
>>> + register_t htval;
>>> + register_t htinst;
>>> + register_t hstateen0;
>>> +#ifdef CONFIG_RISCV_32
>>> + register_t henvcfgh;
>>> + register_t htimedeltah;
>>> +#endif
>>> +
>>> + /* VCSRs */
>>> + register_t vsstatus;
>>> + register_t vsip;
>>> + register_t vsie;
>>> + register_t vstvec;
>>> + register_t vsscratch;
>>> + register_t vscause;
>>> + register_t vstval;
>>> + register_t vsatp;
>>> + register_t vsepc;
>>> +} __cacheline_aligned;
>> Why this attribute?
>
> As arch_vcpu structure is accessed pretty often I thought it would
> be nice to have it cache-aligned so some accesses would be faster
> and something like false sharing won't happen.
I think you would want to prove that this actually makes a difference.
I notice Arm has such an attribute (and maybe indeed you merely copied
it), but x86 doesn't.
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 01/15] xen/riscv: introduce struct arch_vcpu
2026-01-06 14:26 ` Jan Beulich
@ 2026-01-06 14:59 ` Andrew Cooper
2026-01-06 15:05 ` Oleksii Kurochko
1 sibling, 0 replies; 93+ messages in thread
From: Andrew Cooper @ 2026-01-06 14:59 UTC (permalink / raw)
To: Jan Beulich, Oleksii Kurochko
Cc: Andrew Cooper, Alistair Francis, Bob Eshleman, Connor Davis,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 06/01/2026 2:26 pm, Jan Beulich wrote:
> On 06.01.2026 15:19, Oleksii Kurochko wrote:
>>>> + struct
>>>> + {
>>>> + register_t s0;
>>>> + register_t s1;
>>>> + register_t s2;
>>>> + register_t s3;
>>>> + register_t s4;
>>>> + register_t s5;
>>>> + register_t s6;
>>>> + register_t s7;
>>>> + register_t s8;
>>>> + register_t s9;
>>>> + register_t s10;
>>>> + register_t s11;
>>>> +
>>>> + register_t sp;
>>>> + register_t gp;
>>>> +
>>>> + /* ra is used to jump to guest when creating new vcpu */
>>>> + register_t ra;
>>>> + } xen_saved_context;
>>> The xen_ prefix here also doesn't exist in Arm code.
>> I think it should be added for Arm too. I can send a patch.
> ... this, to reword my comment: What value does the xen_ prefix add?
This was my recommendation after reverse engineering how ARM worked to
explain it to Oleksii. But I also thought I said to write a real
comment too.
This is arbitrary *Xen* state, not guest state like you'd expect to find
in struct vcpu. The guest GPR state is at the base of the vCPU stack.
I suggested that this property be made clearer for the benefit of anyone
trying to decipher the context switching logic.
~Andrew
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 01/15] xen/riscv: introduce struct arch_vcpu
2026-01-06 14:26 ` Jan Beulich
2026-01-06 14:59 ` Andrew Cooper
@ 2026-01-06 15:05 ` Oleksii Kurochko
2026-01-06 15:33 ` Jan Beulich
1 sibling, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-06 15:05 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/6/26 3:26 PM, Jan Beulich wrote:
> On 06.01.2026 15:19, Oleksii Kurochko wrote:
>> On 1/5/26 5:58 PM, Jan Beulich wrote:
>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>> Introduce structure with VCPU's registers which describes its state.
>>>>
>>>> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>>> Since none of this is being used for the time being, I think the description
>>> wants to be a little less terse. Coming from the x86 (rather then the Arm)
>>> side, I find the arrangements irritating. And even when comparing to Arm, ...
>>>
>>>> --- a/xen/arch/riscv/include/asm/domain.h
>>>> +++ b/xen/arch/riscv/include/asm/domain.h
>>>> @@ -22,9 +22,63 @@ struct hvm_domain
>>>> struct arch_vcpu_io {
>>>> };
>>>>
>>>> -struct arch_vcpu {
>>>> +struct arch_vcpu
>>>> +{
>>>> struct vcpu_vmid vmid;
>>>> -};
>>>> +
>>>> + /* Xen's state: Callee-saved registers and tp, gp, ra */
>>> ... I don't think the following structure describes "Xen's state". On Arm
>>> it's guest controlled register values which are being saved afaict. I
>>> would then expect the same to become the case for RISC-V.
>> I think this is not fully correct, because guest-controlled registers on
>> Arm are allocated on the stack [1][2].
> I'll admit that I should have said "possibly guest-controlled". Callee-
> saved registers may or may not be used in functions, and if one isn't
> used throughout the call-stack reaching __context_switch(), it would
> still hold whatever the guest had put there.
But the guest doesn't put there nothing, only Xen does that and it is a reason
why I am trying to call it Xen state. Guest works only with what is stored in
struct cpu_info->guest_cpu_user_regs.* ...
>
>> Regarding|xen_saved_context| (or|saved_context| on Arm, which I used as a base),
>> I think|xen_saved_context| is a slightly better name. Looking at how the
>> |saved_context| structure is used on Arm [3], it can be concluded that
>> |__context_switch()| switches only Xen’s internal context. What actually happens is
>> that|__context_switch()| is called while running on the previous vCPU’s stack
>> and returns on the next vCPU’s stack. Therefore, it is necessary to have
>> the correct register values stored in the|saved_context| structure in order
>> to continue Xen’s execution when it later returns to the previous stack.
> For this and ...
>
>> Probably I need to introduce|__context_switch()| in this patch series for RISC-V
>> now; I hope this will clarify things better. At the moment, it looks like [4].
>>
>> [1] https://elixir.bootlin.com/xen/v4.21.0/source/xen/arch/arm/include/asm/arm64/processor.h#L14
>> [2] https://elixir.bootlin.com/xen/v4.21.0/source/xen/arch/arm/domain.c#L547
>>
>> [3] https://elixir.bootlin.com/xen/v4.21.0/source/xen/arch/arm/arm64/entry.S#L650
>>
>> [4] https://gitlab.com/xen-project/people/olkur/xen/-/blob/riscv-next-upstreaming/xen/arch/riscv/entry.S?ref_type=heads#L153
>>
>>>> + struct
>>>> + {
>>>> + register_t s0;
>>>> + register_t s1;
>>>> + register_t s2;
>>>> + register_t s3;
>>>> + register_t s4;
>>>> + register_t s5;
>>>> + register_t s6;
>>>> + register_t s7;
>>>> + register_t s8;
>>>> + register_t s9;
>>>> + register_t s10;
>>>> + register_t s11;
>>>> +
>>>> + register_t sp;
>>>> + register_t gp;
>>>> +
>>>> + /* ra is used to jump to guest when creating new vcpu */
>>>> + register_t ra;
>>>> + } xen_saved_context;
>>> The xen_ prefix here also doesn't exist in Arm code.
>> I think it should be added for Arm too. I can send a patch.
> ... this, to reword my comment: What value does the xen_ prefix add?
... because guest doesn't access saved_context and as I mentioned above
guest has "access" only to struct cpu_info->guest_cpu_user_regs.*.
>
>>> Nor is there a
>>> similar, partly potentially misleading comment on "pc" there
>>> comparable to the one that you added for "ra". ("Potentially
>>> misleading" because what is being described is, aiui, not the only
>>> and not even the main purpose of the field.)
>> Yes, the purpose of|ra| here is not just to jump to the new vCPU code
>> (|continue_new_vcpu()|). It is used that way only the first time;
>> afterwards,|ra| will simply point to the next instruction after the
>> call to|__context_switch()| in|context_switch()| [5].
>>
>> [5] https://gitlab.com/xen-project/people/olkur/xen/-/blob/riscv-next-upstreaming/xen/arch/riscv/domain.c?ref_type=heads#L463
>>
>>>> + /* CSRs */
>>>> + register_t hstatus;
>>>> + register_t hedeleg;
>>>> + register_t hideleg;
>>>> + register_t hvip;
>>>> + register_t hip;
>>>> + register_t hie;
>>>> + register_t hgeie;
>>>> + register_t henvcfg;
>>>> + register_t hcounteren;
>>>> + register_t htimedelta;
>>>> + register_t htval;
>>>> + register_t htinst;
>>>> + register_t hstateen0;
>>>> +#ifdef CONFIG_RISCV_32
>>>> + register_t henvcfgh;
>>>> + register_t htimedeltah;
>>>> +#endif
>>>> +
>>>> + /* VCSRs */
>>>> + register_t vsstatus;
>>>> + register_t vsip;
>>>> + register_t vsie;
>>>> + register_t vstvec;
>>>> + register_t vsscratch;
>>>> + register_t vscause;
>>>> + register_t vstval;
>>>> + register_t vsatp;
>>>> + register_t vsepc;
>>>> +} __cacheline_aligned;
>>> Why this attribute?
>> As arch_vcpu structure is accessed pretty often I thought it would
>> be nice to have it cache-aligned so some accesses would be faster
>> and something like false sharing won't happen.
> I think you would want to prove that this actually makes a difference.
> I notice Arm has such an attribute (and maybe indeed you merely copied
> it), but x86 doesn't.
I haven't measured, but I saw that Arm has and it was my explanation to
myself to put it for RISC-V too.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 01/15] xen/riscv: introduce struct arch_vcpu
2026-01-06 15:05 ` Oleksii Kurochko
@ 2026-01-06 15:33 ` Jan Beulich
2026-01-06 16:00 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-06 15:33 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 06.01.2026 16:05, Oleksii Kurochko wrote:
> On 1/6/26 3:26 PM, Jan Beulich wrote:
>> On 06.01.2026 15:19, Oleksii Kurochko wrote:
>>> On 1/5/26 5:58 PM, Jan Beulich wrote:
>>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>>> Introduce structure with VCPU's registers which describes its state.
>>>>>
>>>>> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>>>> Since none of this is being used for the time being, I think the description
>>>> wants to be a little less terse. Coming from the x86 (rather then the Arm)
>>>> side, I find the arrangements irritating. And even when comparing to Arm, ...
>>>>
>>>>> --- a/xen/arch/riscv/include/asm/domain.h
>>>>> +++ b/xen/arch/riscv/include/asm/domain.h
>>>>> @@ -22,9 +22,63 @@ struct hvm_domain
>>>>> struct arch_vcpu_io {
>>>>> };
>>>>>
>>>>> -struct arch_vcpu {
>>>>> +struct arch_vcpu
>>>>> +{
>>>>> struct vcpu_vmid vmid;
>>>>> -};
>>>>> +
>>>>> + /* Xen's state: Callee-saved registers and tp, gp, ra */
>>>> ... I don't think the following structure describes "Xen's state". On Arm
>>>> it's guest controlled register values which are being saved afaict. I
>>>> would then expect the same to become the case for RISC-V.
>>> I think this is not fully correct, because guest-controlled registers on
>>> Arm are allocated on the stack [1][2].
>> I'll admit that I should have said "possibly guest-controlled". Callee-
>> saved registers may or may not be used in functions, and if one isn't
>> used throughout the call-stack reaching __context_switch(), it would
>> still hold whatever the guest had put there.
>
> But the guest doesn't put there nothing, only Xen does that and it is a reason
> why I am trying to call it Xen state. Guest works only with what is stored in
> struct cpu_info->guest_cpu_user_regs.* ...
>
>>> Regarding|xen_saved_context| (or|saved_context| on Arm, which I used as a base),
>>> I think|xen_saved_context| is a slightly better name. Looking at how the
>>> |saved_context| structure is used on Arm [3], it can be concluded that
>>> |__context_switch()| switches only Xen’s internal context. What actually happens is
>>> that|__context_switch()| is called while running on the previous vCPU’s stack
>>> and returns on the next vCPU’s stack. Therefore, it is necessary to have
>>> the correct register values stored in the|saved_context| structure in order
>>> to continue Xen’s execution when it later returns to the previous stack.
>> For this and ...
>>
>>> Probably I need to introduce|__context_switch()| in this patch series for RISC-V
>>> now; I hope this will clarify things better. At the moment, it looks like [4].
>>>
>>> [1] https://elixir.bootlin.com/xen/v4.21.0/source/xen/arch/arm/include/asm/arm64/processor.h#L14
>>> [2] https://elixir.bootlin.com/xen/v4.21.0/source/xen/arch/arm/domain.c#L547
>>>
>>> [3] https://elixir.bootlin.com/xen/v4.21.0/source/xen/arch/arm/arm64/entry.S#L650
>>>
>>> [4] https://gitlab.com/xen-project/people/olkur/xen/-/blob/riscv-next-upstreaming/xen/arch/riscv/entry.S?ref_type=heads#L153
>>>
>>>>> + struct
>>>>> + {
>>>>> + register_t s0;
>>>>> + register_t s1;
>>>>> + register_t s2;
>>>>> + register_t s3;
>>>>> + register_t s4;
>>>>> + register_t s5;
>>>>> + register_t s6;
>>>>> + register_t s7;
>>>>> + register_t s8;
>>>>> + register_t s9;
>>>>> + register_t s10;
>>>>> + register_t s11;
>>>>> +
>>>>> + register_t sp;
>>>>> + register_t gp;
>>>>> +
>>>>> + /* ra is used to jump to guest when creating new vcpu */
>>>>> + register_t ra;
>>>>> + } xen_saved_context;
>>>> The xen_ prefix here also doesn't exist in Arm code.
>>> I think it should be added for Arm too. I can send a patch.
>> ... this, to reword my comment: What value does the xen_ prefix add?
>
> ... because guest doesn't access saved_context and as I mentioned above
> guest has "access" only to struct cpu_info->guest_cpu_user_regs.*.
The guest has no access to anything in the hypervisor. That said, seeing
that Andrew had asked for this, so be it then (albeit I remain unconvinced).
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 01/15] xen/riscv: introduce struct arch_vcpu
2026-01-06 15:33 ` Jan Beulich
@ 2026-01-06 16:00 ` Oleksii Kurochko
0 siblings, 0 replies; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-06 16:00 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/6/26 4:33 PM, Jan Beulich wrote:
> On 06.01.2026 16:05, Oleksii Kurochko wrote:
>> On 1/6/26 3:26 PM, Jan Beulich wrote:
>>> On 06.01.2026 15:19, Oleksii Kurochko wrote:
>>>> On 1/5/26 5:58 PM, Jan Beulich wrote:
>>>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>>>> Introduce structure with VCPU's registers which describes its state.
>>>>>>
>>>>>> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>>>>> Since none of this is being used for the time being, I think the description
>>>>> wants to be a little less terse. Coming from the x86 (rather then the Arm)
>>>>> side, I find the arrangements irritating. And even when comparing to Arm, ...
>>>>>
>>>>>> --- a/xen/arch/riscv/include/asm/domain.h
>>>>>> +++ b/xen/arch/riscv/include/asm/domain.h
>>>>>> @@ -22,9 +22,63 @@ struct hvm_domain
>>>>>> struct arch_vcpu_io {
>>>>>> };
>>>>>>
>>>>>> -struct arch_vcpu {
>>>>>> +struct arch_vcpu
>>>>>> +{
>>>>>> struct vcpu_vmid vmid;
>>>>>> -};
>>>>>> +
>>>>>> + /* Xen's state: Callee-saved registers and tp, gp, ra */
>>>>> ... I don't think the following structure describes "Xen's state". On Arm
>>>>> it's guest controlled register values which are being saved afaict. I
>>>>> would then expect the same to become the case for RISC-V.
>>>> I think this is not fully correct, because guest-controlled registers on
>>>> Arm are allocated on the stack [1][2].
>>> I'll admit that I should have said "possibly guest-controlled". Callee-
>>> saved registers may or may not be used in functions, and if one isn't
>>> used throughout the call-stack reaching __context_switch(), it would
>>> still hold whatever the guest had put there.
>> But the guest doesn't put there nothing, only Xen does that and it is a reason
>> why I am trying to call it Xen state. Guest works only with what is stored in
>> struct cpu_info->guest_cpu_user_regs.* ...
>>
>>>> Regarding|xen_saved_context| (or|saved_context| on Arm, which I used as a base),
>>>> I think|xen_saved_context| is a slightly better name. Looking at how the
>>>> |saved_context| structure is used on Arm [3], it can be concluded that
>>>> |__context_switch()| switches only Xen’s internal context. What actually happens is
>>>> that|__context_switch()| is called while running on the previous vCPU’s stack
>>>> and returns on the next vCPU’s stack. Therefore, it is necessary to have
>>>> the correct register values stored in the|saved_context| structure in order
>>>> to continue Xen’s execution when it later returns to the previous stack.
>>> For this and ...
>>>
>>>> Probably I need to introduce|__context_switch()| in this patch series for RISC-V
>>>> now; I hope this will clarify things better. At the moment, it looks like [4].
>>>>
>>>> [1] https://elixir.bootlin.com/xen/v4.21.0/source/xen/arch/arm/include/asm/arm64/processor.h#L14
>>>> [2] https://elixir.bootlin.com/xen/v4.21.0/source/xen/arch/arm/domain.c#L547
>>>>
>>>> [3] https://elixir.bootlin.com/xen/v4.21.0/source/xen/arch/arm/arm64/entry.S#L650
>>>>
>>>> [4] https://gitlab.com/xen-project/people/olkur/xen/-/blob/riscv-next-upstreaming/xen/arch/riscv/entry.S?ref_type=heads#L153
>>>>
>>>>>> + struct
>>>>>> + {
>>>>>> + register_t s0;
>>>>>> + register_t s1;
>>>>>> + register_t s2;
>>>>>> + register_t s3;
>>>>>> + register_t s4;
>>>>>> + register_t s5;
>>>>>> + register_t s6;
>>>>>> + register_t s7;
>>>>>> + register_t s8;
>>>>>> + register_t s9;
>>>>>> + register_t s10;
>>>>>> + register_t s11;
>>>>>> +
>>>>>> + register_t sp;
>>>>>> + register_t gp;
>>>>>> +
>>>>>> + /* ra is used to jump to guest when creating new vcpu */
>>>>>> + register_t ra;
>>>>>> + } xen_saved_context;
>>>>> The xen_ prefix here also doesn't exist in Arm code.
>>>> I think it should be added for Arm too. I can send a patch.
>>> ... this, to reword my comment: What value does the xen_ prefix add?
>> ... because guest doesn't access saved_context and as I mentioned above
>> guest has "access" only to struct cpu_info->guest_cpu_user_regs.*.
> The guest has no access to anything in the hypervisor.
Of course, the guest doesn't have access. By "access" I meant guest context
stored in cpu_info->guest_cpu_user_regs.*.
> That said, seeing
> that Andrew had asked for this, so be it then (albeit I remain unconvinced).
I will add some extra comments about xen_saved_context to make things more
clear.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread
* [PATCH v1 02/15] xen/riscv: implement arch_vcpu_{create,destroy}()
2025-12-24 17:03 [PATCH v1 00/15] xen/riscv: introduce vtimer related things Oleksii Kurochko
2025-12-24 17:03 ` [PATCH v1 01/15] xen/riscv: introduce struct arch_vcpu Oleksii Kurochko
@ 2025-12-24 17:03 ` Oleksii Kurochko
2026-01-06 15:56 ` Jan Beulich
2025-12-24 17:03 ` [PATCH v1 03/15] xen/riscv: implement vcpu_csr_init() Oleksii Kurochko
` (12 subsequent siblings)
14 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2025-12-24 17:03 UTC (permalink / raw)
To: xen-devel
Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Introduce architecture-specific functions to create and destroy VCPUs.
Note that arch_vcpu_create() currently returns -EOPNOTSUPP, as the virtual
timer and interrupt controller are not yet implemented.
As part of this change, add continue_new_vcpu(), which will be used after
the first context_switch() of a new vCPU. Since this functionality is not
yet implemented, continue_new_vcpu() is currently provided as a stub.
Update the STACK_SIZE definition and introduce STACK_ORDER (to align with
other architectures) for allocating the vCPU stack.
Introduce struct cpu_info, which will be allocated in arch_vcpu_create()
and declare cpu_info inside arch_vcpu structure.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/Makefile | 1 +
xen/arch/riscv/domain.c | 56 ++++++++++++++++++++++++++++
xen/arch/riscv/include/asm/config.h | 3 +-
xen/arch/riscv/include/asm/current.h | 6 +++
xen/arch/riscv/include/asm/domain.h | 3 ++
xen/arch/riscv/stubs.c | 10 -----
6 files changed, 68 insertions(+), 11 deletions(-)
create mode 100644 xen/arch/riscv/domain.c
diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index 87c1148b0010..8863d4b15605 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -1,5 +1,6 @@
obj-y += aplic.o
obj-y += cpufeature.o
+obj-y += domain.o
obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
obj-y += entry.o
obj-y += imsic.o
diff --git a/xen/arch/riscv/domain.c b/xen/arch/riscv/domain.c
new file mode 100644
index 000000000000..e5fda1af4ee9
--- /dev/null
+++ b/xen/arch/riscv/domain.c
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <xen/mm.h>
+#include <xen/sched.h>
+
+static void continue_new_vcpu(struct vcpu *prev)
+{
+ BUG_ON("unimplemented\n");
+}
+
+int arch_vcpu_create(struct vcpu *v)
+{
+ int rc = 0;
+
+ BUILD_BUG_ON(sizeof(struct cpu_info) > STACK_SIZE);
+
+ v->arch.stack = alloc_xenheap_pages(STACK_ORDER, MEMF_node(vcpu_to_node(v)));
+ if ( !v->arch.stack )
+ return -ENOMEM;
+
+ v->arch.cpu_info = (struct cpu_info *)(v->arch.stack
+ + STACK_SIZE
+ - sizeof(struct cpu_info));
+ memset(v->arch.cpu_info, 0, sizeof(*v->arch.cpu_info));
+
+ v->arch.xen_saved_context.sp = (register_t)v->arch.cpu_info;
+ v->arch.xen_saved_context.ra = (register_t)continue_new_vcpu;
+
+ printk("Create vCPU with sp=%#lx, pc=%#lx, cpu_info(%#lx)\n",
+ v->arch.xen_saved_context.sp, v->arch.xen_saved_context.ra,
+ (unsigned long)v->arch.cpu_info);
+
+ /* Idle VCPUs don't need the rest of this setup */
+ if ( is_idle_vcpu(v) )
+ return rc;
+
+ /*
+ * As the vtimer and interrupt controller (IC) are not yet implemented,
+ * return an error.
+ *
+ * TODO: Drop this once the vtimer and IC are implemented.
+ */
+ rc = -EOPNOTSUPP;
+ goto fail;
+
+ return rc;
+
+ fail:
+ arch_vcpu_destroy(v);
+ return rc;
+}
+
+void arch_vcpu_destroy(struct vcpu *v)
+{
+ free_xenheap_pages(v->arch.stack, STACK_ORDER);
+}
diff --git a/xen/arch/riscv/include/asm/config.h b/xen/arch/riscv/include/asm/config.h
index 1e08d3bf78be..86a95df018b5 100644
--- a/xen/arch/riscv/include/asm/config.h
+++ b/xen/arch/riscv/include/asm/config.h
@@ -143,7 +143,8 @@
#define SMP_CACHE_BYTES (1 << 6)
-#define STACK_SIZE PAGE_SIZE
+#define STACK_ORDER 3
+#define STACK_SIZE (PAGE_SIZE << STACK_ORDER)
#define IDENT_AREA_SIZE 64
diff --git a/xen/arch/riscv/include/asm/current.h b/xen/arch/riscv/include/asm/current.h
index 0c3ea70c2ec8..58c9f1506b7c 100644
--- a/xen/arch/riscv/include/asm/current.h
+++ b/xen/arch/riscv/include/asm/current.h
@@ -21,6 +21,12 @@ struct pcpu_info {
/* tp points to one of these */
extern struct pcpu_info pcpu_info[NR_CPUS];
+/* Per-VCPU state that lives at the top of the stack */
+struct cpu_info {
+ /* This should be the first member. */
+ struct cpu_user_regs guest_cpu_user_regs;
+};
+
#define set_processor_id(id) do { \
tp->processor_id = (id); \
} while (0)
diff --git a/xen/arch/riscv/include/asm/domain.h b/xen/arch/riscv/include/asm/domain.h
index 639cafdade99..a0ffbbc09c6f 100644
--- a/xen/arch/riscv/include/asm/domain.h
+++ b/xen/arch/riscv/include/asm/domain.h
@@ -49,6 +49,9 @@ struct arch_vcpu
register_t ra;
} xen_saved_context;
+ struct cpu_info *cpu_info;
+ void *stack;
+
/* CSRs */
register_t hstatus;
register_t hedeleg;
diff --git a/xen/arch/riscv/stubs.c b/xen/arch/riscv/stubs.c
index 164fc091b28a..eab826e8c3ae 100644
--- a/xen/arch/riscv/stubs.c
+++ b/xen/arch/riscv/stubs.c
@@ -126,16 +126,6 @@ void free_vcpu_struct(struct vcpu *v)
BUG_ON("unimplemented");
}
-int arch_vcpu_create(struct vcpu *v)
-{
- BUG_ON("unimplemented");
-}
-
-void arch_vcpu_destroy(struct vcpu *v)
-{
- BUG_ON("unimplemented");
-}
-
void vcpu_switch_to_aarch64_mode(struct vcpu *v)
{
BUG_ON("unimplemented");
--
2.52.0
^ permalink raw reply related [flat|nested] 93+ messages in thread* Re: [PATCH v1 02/15] xen/riscv: implement arch_vcpu_{create,destroy}()
2025-12-24 17:03 ` [PATCH v1 02/15] xen/riscv: implement arch_vcpu_{create,destroy}() Oleksii Kurochko
@ 2026-01-06 15:56 ` Jan Beulich
2026-01-12 10:19 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-06 15:56 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
(some or even all of the comments may also apply to present Arm code)
On 24.12.2025 18:03, Oleksii Kurochko wrote:
> --- /dev/null
> +++ b/xen/arch/riscv/domain.c
> @@ -0,0 +1,56 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <xen/mm.h>
> +#include <xen/sched.h>
> +
> +static void continue_new_vcpu(struct vcpu *prev)
> +{
> + BUG_ON("unimplemented\n");
> +}
> +
> +int arch_vcpu_create(struct vcpu *v)
> +{
> + int rc = 0;
> +
> + BUILD_BUG_ON(sizeof(struct cpu_info) > STACK_SIZE);
I fear you're in trouble also when == or when only a few bytes are left on
the stack. IOW I'm unconvinced that this is a useful check to have.
> + v->arch.stack = alloc_xenheap_pages(STACK_ORDER, MEMF_node(vcpu_to_node(v)));
> + if ( !v->arch.stack )
> + return -ENOMEM;
You don't really need contiguous memory, do you? In which case why not
vmalloc()? This would then also use the larger domheap.
> + v->arch.cpu_info = (struct cpu_info *)(v->arch.stack
> + + STACK_SIZE
> + - sizeof(struct cpu_info));
Why the cast?
> + memset(v->arch.cpu_info, 0, sizeof(*v->arch.cpu_info));
> +
> + v->arch.xen_saved_context.sp = (register_t)v->arch.cpu_info;
> + v->arch.xen_saved_context.ra = (register_t)continue_new_vcpu;
> +
> + printk("Create vCPU with sp=%#lx, pc=%#lx, cpu_info(%#lx)\n",
> + v->arch.xen_saved_context.sp, v->arch.xen_saved_context.ra,
> + (unsigned long)v->arch.cpu_info);
Please don't, as this is going to get pretty noisy. (And if this wanted
keeping, use %p for pointers rather than casting to unsigned long.)
> + /* Idle VCPUs don't need the rest of this setup */
> + if ( is_idle_vcpu(v) )
> + return rc;
> +
> + /*
> + * As the vtimer and interrupt controller (IC) are not yet implemented,
> + * return an error.
> + *
> + * TODO: Drop this once the vtimer and IC are implemented.
> + */
> + rc = -EOPNOTSUPP;
> + goto fail;
> +
> + return rc;
> +
> + fail:
> + arch_vcpu_destroy(v);
> + return rc;
> +}
> +
> +void arch_vcpu_destroy(struct vcpu *v)
> +{
> + free_xenheap_pages(v->arch.stack, STACK_ORDER);
> +}
Better to use FREE_XENHEAP_PAGES() here, I think, to make the function
idempotent.
> --- a/xen/arch/riscv/include/asm/current.h
> +++ b/xen/arch/riscv/include/asm/current.h
> @@ -21,6 +21,12 @@ struct pcpu_info {
> /* tp points to one of these */
> extern struct pcpu_info pcpu_info[NR_CPUS];
>
> +/* Per-VCPU state that lives at the top of the stack */
> +struct cpu_info {
> + /* This should be the first member. */
> + struct cpu_user_regs guest_cpu_user_regs;
> +};
You may want to enforce what the comment says by way of a BUILD_BUG_ON().
> --- a/xen/arch/riscv/include/asm/domain.h
> +++ b/xen/arch/riscv/include/asm/domain.h
> @@ -49,6 +49,9 @@ struct arch_vcpu
> register_t ra;
> } xen_saved_context;
>
> + struct cpu_info *cpu_info;
> + void *stack;
Do you really need both fields, when one is derived from the other?
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 02/15] xen/riscv: implement arch_vcpu_{create,destroy}()
2026-01-06 15:56 ` Jan Beulich
@ 2026-01-12 10:19 ` Oleksii Kurochko
2026-01-12 10:42 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-12 10:19 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/6/26 4:56 PM, Jan Beulich wrote:
> (some or even all of the comments may also apply to present Arm code)
>
> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>> --- /dev/null
>> +++ b/xen/arch/riscv/domain.c
>> @@ -0,0 +1,56 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +
>> +#include <xen/mm.h>
>> +#include <xen/sched.h>
>> +
>> +static void continue_new_vcpu(struct vcpu *prev)
>> +{
>> + BUG_ON("unimplemented\n");
>> +}
>> +
>> +int arch_vcpu_create(struct vcpu *v)
>> +{
>> + int rc = 0;
>> +
>> + BUILD_BUG_ON(sizeof(struct cpu_info) > STACK_SIZE);
> I fear you're in trouble also when == or when only a few bytes are left on
> the stack. IOW I'm unconvinced that this is a useful check to have.
>
>> + v->arch.stack = alloc_xenheap_pages(STACK_ORDER, MEMF_node(vcpu_to_node(v)));
>> + if ( !v->arch.stack )
>> + return -ENOMEM;
> You don't really need contiguous memory, do you? In which case why not
> vmalloc()? This would then also use the larger domheap.
There is really no need for contiguous memory, and|vmalloc()| could be used.
I expect that|vmalloc()| is more expensive and may make hardware prefetching less
effective, with more TLB pressure since it allocates 4 KB pages.
However, the latter two points do not really matter in this case, as only a
single 4 KB page is allocated, so we are unlikely to see any performance issues.
>
>> + v->arch.cpu_info = (struct cpu_info *)(v->arch.stack
>> + + STACK_SIZE
>> + - sizeof(struct cpu_info));
> Why the cast?
Just for readability, from compiler point of view it could be just dropped.
>
>> + memset(v->arch.cpu_info, 0, sizeof(*v->arch.cpu_info));
>> +
>> + v->arch.xen_saved_context.sp = (register_t)v->arch.cpu_info;
>> + v->arch.xen_saved_context.ra = (register_t)continue_new_vcpu;
>> +
>> + printk("Create vCPU with sp=%#lx, pc=%#lx, cpu_info(%#lx)\n",
>> + v->arch.xen_saved_context.sp, v->arch.xen_saved_context.ra,
>> + (unsigned long)v->arch.cpu_info);
> Please don't, as this is going to get pretty noisy. (And if this wanted
> keeping, use %p for pointers rather than casting to unsigned long.)
I didn’t consider the case where a large number of vCPUs are created, as
I have only tested with 2 vCPUs. However, if the number of vCPUs is large,
this could indeed get quite noisy.
I will keep these lines of code in downstream for debugging purposes and
drop them from upstream version of this patch.
>> --- a/xen/arch/riscv/include/asm/current.h
>> +++ b/xen/arch/riscv/include/asm/current.h
>> @@ -21,6 +21,12 @@ struct pcpu_info {
>> /* tp points to one of these */
>> extern struct pcpu_info pcpu_info[NR_CPUS];
>>
>> +/* Per-VCPU state that lives at the top of the stack */
>> +struct cpu_info {
>> + /* This should be the first member. */
>> + struct cpu_user_regs guest_cpu_user_regs;
>> +};
> You may want to enforce what the comment says by way of a BUILD_BUG_ON().
Makes sense, I will add:
BUILD_BUG_ON(offsetof(struct cpu_info, guest_cpu_user_regs) != 0);
in|arch_vcpu_create()|, somewhere around the initialization of|v->arch.cpu_info = ... . |I noticed that there is no|BUILD_BUG_ON()| variant that can be used outside
of a function, or does such a variant exist and I’m just missing it? Or there
is no such sense at all for such variant?
>
>> --- a/xen/arch/riscv/include/asm/domain.h
>> +++ b/xen/arch/riscv/include/asm/domain.h
>> @@ -49,6 +49,9 @@ struct arch_vcpu
>> register_t ra;
>> } xen_saved_context;
>>
>> + struct cpu_info *cpu_info;
>> + void *stack;
> Do you really need both fields, when one is derived from the other?
No, I don't need. I think we can just keep cpu_info and it would be
enough. Thanks. ~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 02/15] xen/riscv: implement arch_vcpu_{create,destroy}()
2026-01-12 10:19 ` Oleksii Kurochko
@ 2026-01-12 10:42 ` Jan Beulich
0 siblings, 0 replies; 93+ messages in thread
From: Jan Beulich @ 2026-01-12 10:42 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 12.01.2026 11:19, Oleksii Kurochko wrote:
> On 1/6/26 4:56 PM, Jan Beulich wrote:
>> (some or even all of the comments may also apply to present Arm code)
>>
>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>> + v->arch.cpu_info = (struct cpu_info *)(v->arch.stack
>>> + + STACK_SIZE
>>> + - sizeof(struct cpu_info));
>> Why the cast?
>
> Just for readability, from compiler point of view it could be just dropped.
Sorry, for me readability suffers from the cast and the then necessary
parentheses. Plus I've been keeping to tell you that casts can be dangerous,
and hence they would better only ever be used when really unavoidable.
>>> --- a/xen/arch/riscv/include/asm/current.h
>>> +++ b/xen/arch/riscv/include/asm/current.h
>>> @@ -21,6 +21,12 @@ struct pcpu_info {
>>> /* tp points to one of these */
>>> extern struct pcpu_info pcpu_info[NR_CPUS];
>>>
>>> +/* Per-VCPU state that lives at the top of the stack */
>>> +struct cpu_info {
>>> + /* This should be the first member. */
>>> + struct cpu_user_regs guest_cpu_user_regs;
>>> +};
>> You may want to enforce what the comment says by way of a BUILD_BUG_ON().
>
> Makes sense, I will add:
> BUILD_BUG_ON(offsetof(struct cpu_info, guest_cpu_user_regs) != 0);
> in|arch_vcpu_create()|, somewhere around the initialization of|v->arch.cpu_info = ... . |I noticed that there is no|BUILD_BUG_ON()| variant that can be used outside
> of a function, or does such a variant exist and I’m just missing it? Or there
> is no such sense at all for such variant?
There's none, correct. hence why in a few places we have build_assertions()
functions.
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread
* [PATCH v1 03/15] xen/riscv: implement vcpu_csr_init()
2025-12-24 17:03 [PATCH v1 00/15] xen/riscv: introduce vtimer related things Oleksii Kurochko
2025-12-24 17:03 ` [PATCH v1 01/15] xen/riscv: introduce struct arch_vcpu Oleksii Kurochko
2025-12-24 17:03 ` [PATCH v1 02/15] xen/riscv: implement arch_vcpu_{create,destroy}() Oleksii Kurochko
@ 2025-12-24 17:03 ` Oleksii Kurochko
2026-01-07 8:46 ` Jan Beulich
2025-12-24 17:03 ` [PATCH v1 04/15] xen/riscv: introduce vtimer Oleksii Kurochko
` (11 subsequent siblings)
14 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2025-12-24 17:03 UTC (permalink / raw)
To: xen-devel
Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Implement function to initialize VCPU's CSR registers to delegate handling
of some traps to VS-mode ( guest ), enable vstimecmp for VS-mode, and
allow some AIA-related register (thier vs* copies ) for VS-mode.
Add detection of Smstateen extension to properly initialize hstateen0 to
allow guest to access AIA-added state.
Add call of vcpu_csr_init() in arch_vcpu_create().
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/cpufeature.c | 1 +
xen/arch/riscv/domain.c | 63 +++++++++++++++++++++
xen/arch/riscv/include/asm/cpufeature.h | 1 +
xen/arch/riscv/include/asm/riscv_encoding.h | 2 +
4 files changed, 67 insertions(+)
diff --git a/xen/arch/riscv/cpufeature.c b/xen/arch/riscv/cpufeature.c
index 02b68aeaa49f..03e27b037be0 100644
--- a/xen/arch/riscv/cpufeature.c
+++ b/xen/arch/riscv/cpufeature.c
@@ -137,6 +137,7 @@ const struct riscv_isa_ext_data __initconst riscv_isa_ext[] = {
RISCV_ISA_EXT_DATA(zbb),
RISCV_ISA_EXT_DATA(zbs),
RISCV_ISA_EXT_DATA(smaia),
+ RISCV_ISA_EXT_DATA(smstateen),
RISCV_ISA_EXT_DATA(ssaia),
RISCV_ISA_EXT_DATA(svade),
RISCV_ISA_EXT_DATA(svpbmt),
diff --git a/xen/arch/riscv/domain.c b/xen/arch/riscv/domain.c
index e5fda1af4ee9..44387d056546 100644
--- a/xen/arch/riscv/domain.c
+++ b/xen/arch/riscv/domain.c
@@ -3,6 +3,67 @@
#include <xen/mm.h>
#include <xen/sched.h>
+#include <asm/cpufeature.h>
+#include <asm/csr.h>
+#include <asm/riscv_encoding.h>
+
+static void vcpu_csr_init(struct vcpu *v)
+{
+ unsigned long hedeleg, hideleg, hstatus;
+
+ hedeleg = 0;
+ hedeleg |= (1U << CAUSE_MISALIGNED_FETCH);
+ hedeleg |= (1U << CAUSE_FETCH_ACCESS);
+ hedeleg |= (1U << CAUSE_ILLEGAL_INSTRUCTION);
+ hedeleg |= (1U << CAUSE_MISALIGNED_LOAD);
+ hedeleg |= (1U << CAUSE_LOAD_ACCESS);
+ hedeleg |= (1U << CAUSE_MISALIGNED_STORE);
+ hedeleg |= (1U << CAUSE_STORE_ACCESS);
+ hedeleg |= (1U << CAUSE_BREAKPOINT);
+ hedeleg |= (1U << CAUSE_USER_ECALL);
+ hedeleg |= (1U << CAUSE_FETCH_PAGE_FAULT);
+ hedeleg |= (1U << CAUSE_LOAD_PAGE_FAULT);
+ hedeleg |= (1U << CAUSE_STORE_PAGE_FAULT);
+ v->arch.hedeleg = hedeleg;
+
+ hstatus = HSTATUS_SPV | HSTATUS_SPVP;
+ v->arch.hstatus = hstatus;
+
+ hideleg = MIP_VSTIP | MIP_VSEIP | MIP_VSSIP;
+ v->arch.hideleg = hideleg;
+
+ /*
+ * VS should access only the time counter directly.
+ * Everything else should trap.
+ */
+ v->arch.hcounteren |= HCOUNTEREN_TM;
+
+ if ( riscv_isa_extension_available(NULL, RISCV_ISA_EXT_svpbmt) )
+ v->arch.henvcfg |= ENVCFG_PBMTE;
+
+ if ( riscv_isa_extension_available(NULL, RISCV_ISA_EXT_smstateen) )
+ {
+ /*
+ * If the hypervisor extension is implemented, the same three bitsare
+ * defined also in hypervisor CSR hstateen0 but concern only the state
+ * potentially accessible to a virtual machine executing in privilege
+ * modes VS and VU:
+ * bit 60 CSRs siselect and sireg (really vsiselect and vsireg)
+ * bit 59 CSRs siph and sieh (RV32 only) and stopi (really vsiph,
+ * vsieh, and vstopi)
+ * bit 58 all state of IMSIC guest interrupt files, including CSR
+ * stopei (really vstopei)
+ * If one of these bits is zero in hstateen0, and the same bit is one
+ * in mstateen0, then an attempt to access the corresponding state from
+ * VS or VU-mode raises a virtual instruction exception.
+ */
+ v->arch.hstateen0 = SMSTATEEN0_AIA | SMSTATEEN0_IMSIC | SMSTATEEN0_SVSLCT;
+
+ /* Allow guest to access CSR_ENVCFG */
+ v->arch.hstateen0 |= SMSTATEEN0_HSENVCFG;
+ }
+}
+
static void continue_new_vcpu(struct vcpu *prev)
{
BUG_ON("unimplemented\n");
@@ -30,6 +91,8 @@ int arch_vcpu_create(struct vcpu *v)
v->arch.xen_saved_context.sp, v->arch.xen_saved_context.ra,
(unsigned long)v->arch.cpu_info);
+ vcpu_csr_init(v);
+
/* Idle VCPUs don't need the rest of this setup */
if ( is_idle_vcpu(v) )
return rc;
diff --git a/xen/arch/riscv/include/asm/cpufeature.h b/xen/arch/riscv/include/asm/cpufeature.h
index b69616038888..ef02a3e26d2c 100644
--- a/xen/arch/riscv/include/asm/cpufeature.h
+++ b/xen/arch/riscv/include/asm/cpufeature.h
@@ -36,6 +36,7 @@ enum riscv_isa_ext_id {
RISCV_ISA_EXT_zbb,
RISCV_ISA_EXT_zbs,
RISCV_ISA_EXT_smaia,
+ RISCV_ISA_EXT_smstateen,
RISCV_ISA_EXT_ssaia,
RISCV_ISA_EXT_svade,
RISCV_ISA_EXT_svpbmt,
diff --git a/xen/arch/riscv/include/asm/riscv_encoding.h b/xen/arch/riscv/include/asm/riscv_encoding.h
index 1f7e612366f8..dd15731a86fa 100644
--- a/xen/arch/riscv/include/asm/riscv_encoding.h
+++ b/xen/arch/riscv/include/asm/riscv_encoding.h
@@ -228,6 +228,8 @@
#define ENVCFG_CBIE_INV _UL(0x3)
#define ENVCFG_FIOM _UL(0x1)
+#define HCOUNTEREN_TM BIT(1, U)
+
/* ===== User-level CSRs ===== */
/* User Trap Setup (N-extension) */
--
2.52.0
^ permalink raw reply related [flat|nested] 93+ messages in thread* Re: [PATCH v1 03/15] xen/riscv: implement vcpu_csr_init()
2025-12-24 17:03 ` [PATCH v1 03/15] xen/riscv: implement vcpu_csr_init() Oleksii Kurochko
@ 2026-01-07 8:46 ` Jan Beulich
2026-01-12 12:59 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-07 8:46 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 24.12.2025 18:03, Oleksii Kurochko wrote:
> Implement function to initialize VCPU's CSR registers to delegate handling
> of some traps to VS-mode ( guest ), enable vstimecmp for VS-mode, and
> allow some AIA-related register (thier vs* copies ) for VS-mode.
The henvcfg setting isn't covered here at all, unless I'm failing to make the
respective association. Nor is the setting of SMSTATEEN0_HSENVCFG in hstateen0.
Overall it feels like the description here is too terse anyway, as the bits
set (or not) are a pretty crucial thing for running guests. Then again maybe
this is just me, for not being a RISC-V person ...
> --- a/xen/arch/riscv/domain.c
> +++ b/xen/arch/riscv/domain.c
> @@ -3,6 +3,67 @@
> #include <xen/mm.h>
> #include <xen/sched.h>
>
> +#include <asm/cpufeature.h>
> +#include <asm/csr.h>
> +#include <asm/riscv_encoding.h>
> +
> +static void vcpu_csr_init(struct vcpu *v)
> +{
> + unsigned long hedeleg, hideleg, hstatus;
> +
> + hedeleg = 0;
> + hedeleg |= (1U << CAUSE_MISALIGNED_FETCH);
> + hedeleg |= (1U << CAUSE_FETCH_ACCESS);
> + hedeleg |= (1U << CAUSE_ILLEGAL_INSTRUCTION);
> + hedeleg |= (1U << CAUSE_MISALIGNED_LOAD);
> + hedeleg |= (1U << CAUSE_LOAD_ACCESS);
> + hedeleg |= (1U << CAUSE_MISALIGNED_STORE);
> + hedeleg |= (1U << CAUSE_STORE_ACCESS);
> + hedeleg |= (1U << CAUSE_BREAKPOINT);
> + hedeleg |= (1U << CAUSE_USER_ECALL);
> + hedeleg |= (1U << CAUSE_FETCH_PAGE_FAULT);
> + hedeleg |= (1U << CAUSE_LOAD_PAGE_FAULT);
> + hedeleg |= (1U << CAUSE_STORE_PAGE_FAULT);
> + v->arch.hedeleg = hedeleg;
Wouldn't you better start from setting all of the non-reserved bits, to then
clear the few that you mean to not delegate? Then again I'm not quite sure
whether the set of CAUSE_* in the header file is actually complete: MCAUSE
also can hold the values 16, 18, and 19. (Otoh you have CAUSE_MACHINE_ECALL,
which I don't think can ever be observed outside of M-mode.)
Also, while it may seem to not matter much, sorting the above by their numeric
values would ease comparison against the full set.
> + hstatus = HSTATUS_SPV | HSTATUS_SPVP;
> + v->arch.hstatus = hstatus;
Why would these (or in fact any) bits need setting here? Isn't hstatus written
upon exit from guest context?
> + hideleg = MIP_VSTIP | MIP_VSEIP | MIP_VSSIP;
> + v->arch.hideleg = hideleg;
Again I think having MIP_VSTIP in the middle (to establish numeric sorting)
would be slightly better.
Also there's a stray blank after the first |.
> + /*
> + * VS should access only the time counter directly.
> + * Everything else should trap.
> + */
> + v->arch.hcounteren |= HCOUNTEREN_TM;
Why are this and ...
> + if ( riscv_isa_extension_available(NULL, RISCV_ISA_EXT_svpbmt) )
> + v->arch.henvcfg |= ENVCFG_PBMTE;
... this using |= but the earlier ones simply = ? Unless there is a specific
reason, consistency is likely preferable.
> + if ( riscv_isa_extension_available(NULL, RISCV_ISA_EXT_smstateen) )
> + {
> + /*
> + * If the hypervisor extension is implemented, the same three bitsare
> + * defined also in hypervisor CSR hstateen0 but concern only the state
> + * potentially accessible to a virtual machine executing in privilege
> + * modes VS and VU:
> + * bit 60 CSRs siselect and sireg (really vsiselect and vsireg)
> + * bit 59 CSRs siph and sieh (RV32 only) and stopi (really vsiph,
> + * vsieh, and vstopi)
> + * bit 58 all state of IMSIC guest interrupt files, including CSR
> + * stopei (really vstopei)
> + * If one of these bits is zero in hstateen0, and the same bit is one
> + * in mstateen0, then an attempt to access the corresponding state from
> + * VS or VU-mode raises a virtual instruction exception.
> + */
> + v->arch.hstateen0 = SMSTATEEN0_AIA | SMSTATEEN0_IMSIC | SMSTATEEN0_SVSLCT;
What is SVSLCT? Bit 60 is named CSRIND in the spec I'm looking at, and the
commentary above looks to confirm this.
Also, wouldn't you better keep internal state in line with what hardware
actually supports? CSRIND may be read-only-zero in the real register, in
which case having the bit set in the "cached" copy can be misleading.
(This may similarly apply to at least hedeleg and hideleg, btw.)
As to consistency: Further up you use local helper variables (for imo no real
reason), when here you don't. Instead this line ends up being too long.
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 03/15] xen/riscv: implement vcpu_csr_init()
2026-01-07 8:46 ` Jan Beulich
@ 2026-01-12 12:59 ` Oleksii Kurochko
2026-01-12 14:28 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-12 12:59 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/7/26 9:46 AM, Jan Beulich wrote:
> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>> Implement function to initialize VCPU's CSR registers to delegate handling
>> of some traps to VS-mode ( guest ), enable vstimecmp for VS-mode, and
>> allow some AIA-related register (thier vs* copies ) for VS-mode.
> The henvcfg setting isn't covered here at all, unless I'm failing to make the
> respective association. Nor is the setting of SMSTATEEN0_HSENVCFG in hstateen0.
>
> Overall it feels like the description here is too terse anyway, as the bits
> set (or not) are a pretty crucial thing for running guests. Then again maybe
> this is just me, for not being a RISC-V person ...
I will add more details to commit message then.
>
>> --- a/xen/arch/riscv/domain.c
>> +++ b/xen/arch/riscv/domain.c
>> @@ -3,6 +3,67 @@
>> #include <xen/mm.h>
>> #include <xen/sched.h>
>>
>> +#include <asm/cpufeature.h>
>> +#include <asm/csr.h>
>> +#include <asm/riscv_encoding.h>
>> +
>> +static void vcpu_csr_init(struct vcpu *v)
>> +{
>> + unsigned long hedeleg, hideleg, hstatus;
>> +
>> + hedeleg = 0;
>> + hedeleg |= (1U << CAUSE_MISALIGNED_FETCH);
>> + hedeleg |= (1U << CAUSE_FETCH_ACCESS);
>> + hedeleg |= (1U << CAUSE_ILLEGAL_INSTRUCTION);
>> + hedeleg |= (1U << CAUSE_MISALIGNED_LOAD);
>> + hedeleg |= (1U << CAUSE_LOAD_ACCESS);
>> + hedeleg |= (1U << CAUSE_MISALIGNED_STORE);
>> + hedeleg |= (1U << CAUSE_STORE_ACCESS);
>> + hedeleg |= (1U << CAUSE_BREAKPOINT);
>> + hedeleg |= (1U << CAUSE_USER_ECALL);
>> + hedeleg |= (1U << CAUSE_FETCH_PAGE_FAULT);
>> + hedeleg |= (1U << CAUSE_LOAD_PAGE_FAULT);
>> + hedeleg |= (1U << CAUSE_STORE_PAGE_FAULT);
>> + v->arch.hedeleg = hedeleg;
> Wouldn't you better start from setting all of the non-reserved bits, to then
> clear the few that you mean to not delegate?
Maybe that would be better, but I don’t see much difference, especially if we
use the following define:
#define HEDELEG_DEFAULT ( BIT(CAUSE_MISALIGNED_FETCH, U) | ... )
It would still be just one instruction to write the value to|hedeleg|.
(I think the compiler will likely produce the same optimization with the
current implementation.)
> Then again I'm not quite sure
> whether the set of CAUSE_* in the header file is actually complete: MCAUSE
> also can hold the values 16, 18, and 19.
Then 14 and 17 could be added as well. I see the sense in adding 18 and 19,
since they are defined as "software check" and "hardware error." However,
I don’t see much value in providing|CAUSE_*| for 14 and 16–17, as they are
just reserved and have no specific meaning.
I could add something like|CAUSE_RES_14|,|CAUSE_RES_16|,|CAUSE_RES_17|, but
since we aren’t actually handling them, I think it’s fine to update|CAUSE_* |only when there is a real use for them, like with 18 and 19.
> (Otoh you have CAUSE_MACHINE_ECALL,
> which I don't think can ever be observed outside of M-mode.)
Good point, It seems like you are right and M-ecall can't be observed outside of
M-mode and even more it is marked as read only 0 so it is not expected to be
delegated to lower privilige mode, but then I don't know why it was added to
"Table 29 Bits of hedeleg that must be writable or must be read-only zero.".
>
> Also, while it may seem to not matter much, sorting the above by their numeric
> values would ease comparison against the full set.
I will move "hedeleg |= (1U << CAUSE_BREAKPOINT);" up; all others seems are sorted
properly.
>
>> + hstatus = HSTATUS_SPV | HSTATUS_SPVP;
>> + v->arch.hstatus = hstatus;
> Why would these (or in fact any) bits need setting here?
It could be moved to continue_new_vcpu() where now (in downstream) I have:
csr_write(CSR_HSTATUS, vcpu_guest_cpu_user_regs(current)->hstatus);
reset_stack_and_jump(return_to_new_vcpu);
But I put it here to have vCPU state (all or as much as possible) initialized
in one place.
> Isn't hstatus written
> upon exit from guest context?
Setting these bits manually is necessary for the initial entry into
a guest context.
While it is true that hardware updates hstatus during a trap from a guest,
software must set these bits to define the destination state for the
subsequent SRET instruction.
When a hypervisor prepares to run a guest for the first time, there has been no
previous "exit" from that guest to automatically populate the CSRs. Setting these
bits is essential for the following reasons:
- Defining the target Virtualization Mode (SPV): The SPV bit is used by the SRET
instruction to determine the new virtualization mode. If the hypervisor is in
HS-mode (V=0) and executes SRET, the hardware sets the new V to the current
value of hstatus.SPV. Without manually setting HSTATUS_SPV, the SRET would
return the hart to V=0 instead of entering the guest (V=1).
- Defining the target Privilege Level (SPVP): The SPVP bit tracks the nominal
privilege level (S or U) of the guest. When V=1, this determines if the guest
is in VS-mode or VU-mode.
- Controlling Hypervisor Load/Store Instructions: SPVP specifically controls
the effective privilege of explicit memory accesses made by hypervisor
virtual-machine load/store instructions (HLV, HLVX, and HSV).
If the hypervisor needs to use these instructions to access guest memory
as if it were the guest supervisor, SPVP must be set to 1.
But maybe there is no too much sense in this instructions before guest is
ran.
>
>> + hideleg = MIP_VSTIP | MIP_VSEIP | MIP_VSSIP;
>> + v->arch.hideleg = hideleg;
> Again I think having MIP_VSTIP in the middle (to establish numeric sorting)
> would be slightly better.
>
> Also there's a stray blank after the first |.
>
>> + /*
>> + * VS should access only the time counter directly.
>> + * Everything else should trap.
>> + */
>> + v->arch.hcounteren |= HCOUNTEREN_TM;
> Why are this and ...
>
>> + if ( riscv_isa_extension_available(NULL, RISCV_ISA_EXT_svpbmt) )
>> + v->arch.henvcfg |= ENVCFG_PBMTE;
> ... this using |= but the earlier ones simply = ? Unless there is a specific
> reason, consistency is likely preferable.
This was overlooked during refactoring; it seems I simply used|=| instead of||=|.
The idea is that if it’s the first initialization,|=| should be used; otherwise,
for subsequent writes,||=| is used to avoid clearing previous values.
I will update this part to use the same pattern consistently throughout
this function.
>
>> + if ( riscv_isa_extension_available(NULL, RISCV_ISA_EXT_smstateen) )
>> + {
>> + /*
>> + * If the hypervisor extension is implemented, the same three bitsare
>> + * defined also in hypervisor CSR hstateen0 but concern only the state
>> + * potentially accessible to a virtual machine executing in privilege
>> + * modes VS and VU:
>> + * bit 60 CSRs siselect and sireg (really vsiselect and vsireg)
>> + * bit 59 CSRs siph and sieh (RV32 only) and stopi (really vsiph,
>> + * vsieh, and vstopi)
>> + * bit 58 all state of IMSIC guest interrupt files, including CSR
>> + * stopei (really vstopei)
>> + * If one of these bits is zero in hstateen0, and the same bit is one
>> + * in mstateen0, then an attempt to access the corresponding state from
>> + * VS or VU-mode raises a virtual instruction exception.
>> + */
>> + v->arch.hstateen0 = SMSTATEEN0_AIA | SMSTATEEN0_IMSIC | SMSTATEEN0_SVSLCT;
> What is SVSLCT? Bit 60 is named CSRIND in the spec I'm looking at, and the
> commentary above looks to confirm this.
This is how OpenSBI called this bit from where riscv_encoding.h was taken.
SVSLCT stands for Supervisor Virtual Select, referring to the access control of the
siselect and vsiselect registers.
>
> Also, wouldn't you better keep internal state in line with what hardware
> actually supports? CSRIND may be read-only-zero in the real register, in
> which case having the bit set in the "cached" copy can be misleading.
According to the AIA spec:
If extension Smstateen is implemented together with the Advanced Interrupt Architecture (AIA),
three bits of state-enable register mstateen0 control access to AIA-added state from privilege modes
less privileged than M-mode:
bit 60 CSRs siselect, sireg, vsiselect, and vsireg
bit 59 all other state added by the AIA and not controlled by bits 60 and 58
bit 58 all IMSIC state, including CSRs stopei and vstopei
What I read as if Smstateen is supported then all the bits are supported by
hardware, and that is why it is enough to check if Smstateen is supported.
But I decided to check what KVM does in the similar case and it seems that I incorrectly read
the first line of the mentioned about AIA's spec and it is need another one if-condition:
if (riscv_has_extension_unlikely(RISCV_ISA_EXT_SMSTATEEN)) {
cfg->hstateen0 |= SMSTATEEN0_HSENVCFG;
if (riscv_isa_extension_available(isa, SSAIA))
cfg->hstateen0 |= SMSTATEEN0_AIA_IMSIC |
SMSTATEEN0_AIA |
SMSTATEEN0_AIA_ISEL;
if (riscv_isa_extension_available(isa, SMSTATEEN))
cfg->hstateen0 |= SMSTATEEN0_SSTATEEN0;
}
> (This may similarly apply to at least hedeleg and hideleg, btw.)
Regarding the previous bits, I can understand that it would be an issue:
if SSAIA isn’t supported, then it is incorrect to update the corresponding
bits of|hstateen0|.
However, I’m not really sure I understand what the issue is with|h{i,e}deleg|.
All writable bits there don’t depend on hardware support. Am I missing something?
>
> As to consistency: Further up you use local helper variables (for imo no real
> reason), when here you don't. Instead this line ends up being too long.
I will update the code to have consistency.
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 03/15] xen/riscv: implement vcpu_csr_init()
2026-01-12 12:59 ` Oleksii Kurochko
@ 2026-01-12 14:28 ` Jan Beulich
2026-01-12 15:46 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-12 14:28 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 12.01.2026 13:59, Oleksii Kurochko wrote:
> On 1/7/26 9:46 AM, Jan Beulich wrote:
>> Also, wouldn't you better keep internal state in line with what hardware
>> actually supports? CSRIND may be read-only-zero in the real register, in
>> which case having the bit set in the "cached" copy can be misleading.
>
> [...]
>
>> (This may similarly apply to at least hedeleg and hideleg, btw.)
>
> Regarding the previous bits, I can understand that it would be an issue:
> if SSAIA isn’t supported, then it is incorrect to update the corresponding
> bits of|hstateen0|.
>
> However, I’m not really sure I understand what the issue is with|h{i,e}deleg|.
> All writable bits there don’t depend on hardware support. Am I missing something?
My reading of the doc was that any of the bits can be r/o 0, with - yes -
no dependencies on particular extensions. In which case you'd need to do
the delegation in software. For which it might be helpful to know what
the two registers are actually set to in hardware (i.e. the cached values
wanting to match the real ones).
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 03/15] xen/riscv: implement vcpu_csr_init()
2026-01-12 14:28 ` Jan Beulich
@ 2026-01-12 15:46 ` Oleksii Kurochko
2026-01-12 15:54 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-12 15:46 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/12/26 3:28 PM, Jan Beulich wrote:
> On 12.01.2026 13:59, Oleksii Kurochko wrote:
>> On 1/7/26 9:46 AM, Jan Beulich wrote:
>>> Also, wouldn't you better keep internal state in line with what hardware
>>> actually supports? CSRIND may be read-only-zero in the real register, in
>>> which case having the bit set in the "cached" copy can be misleading.
>> [...]
>>
>>> (This may similarly apply to at least hedeleg and hideleg, btw.)
>> Regarding the previous bits, I can understand that it would be an issue:
>> if SSAIA isn’t supported, then it is incorrect to update the corresponding
>> bits of|hstateen0|.
>>
>> However, I’m not really sure I understand what the issue is with|h{i,e}deleg|.
>> All writable bits there don’t depend on hardware support. Am I missing something?
> My reading of the doc was that any of the bits can be r/o 0, with - yes -
> no dependencies on particular extensions.
Just to be sure that I get your idea correctly.
Based on the priv. spec:
Each bit of hedeleg shall be either writable or read-only zero. Many bits of
hedeleg are required specifically to be writable or zero, as enumerated in
Table 29.
Now let’s take hedeleg.bit1, which is marked as writable according to Table 29.
Your point is that even though hedeleg.bit1 is defined as writable, it could still
be read-only zero, right?
In general, I agree with that. It is possible that M-mode software decides, for
some reason (for example, because the implementation does not support delegation
of bit1 to a lower mode), not to delegate medeleg.bit1 to HS-mode. In that case,
hedeleg.bit1 would always be read-only zero.
> In which case you'd need to do
> the delegation in software. For which it might be helpful to know what
> the two registers are actually set to in hardware (i.e. the cached values
> wanting to match the real ones).
Does it make sense then to have the following
...
v->arch.hedeleg = hedeleg;
vcpu->arch.hedeleg = csr_read(CSR_HEDELEG);
in arch_vcpu_create()?
Or I can just add the comment that it will be sync-ed with the corresponding
hardware CSR later as ,actually, there is some h{i,e}deleg synchronization
happening during context_switch() (this code is at the moment in downstream),
because restore_csr_regs() is executed and re-reads CSR_H{I,E}DELEG:
static void restore_csr_regs(struct vcpu *vcpu)
{
csr_write(CSR_HEDELEG, vcpu->arch.hedeleg);
csr_write(CSR_HIDELEG, vcpu->arch.hideleg);
...
As a result, vcpu->arch.h{I,E}deleg is kept in sync with the corresponding
hardware CSR.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 03/15] xen/riscv: implement vcpu_csr_init()
2026-01-12 15:46 ` Oleksii Kurochko
@ 2026-01-12 15:54 ` Jan Beulich
2026-01-12 16:39 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-12 15:54 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 12.01.2026 16:46, Oleksii Kurochko wrote:
> On 1/12/26 3:28 PM, Jan Beulich wrote:
>> On 12.01.2026 13:59, Oleksii Kurochko wrote:
>>> On 1/7/26 9:46 AM, Jan Beulich wrote:
>>>> Also, wouldn't you better keep internal state in line with what hardware
>>>> actually supports? CSRIND may be read-only-zero in the real register, in
>>>> which case having the bit set in the "cached" copy can be misleading.
>>> [...]
>>>
>>>> (This may similarly apply to at least hedeleg and hideleg, btw.)
>>> Regarding the previous bits, I can understand that it would be an issue:
>>> if SSAIA isn’t supported, then it is incorrect to update the corresponding
>>> bits of|hstateen0|.
>>>
>>> However, I’m not really sure I understand what the issue is with|h{i,e}deleg|.
>>> All writable bits there don’t depend on hardware support. Am I missing something?
>> My reading of the doc was that any of the bits can be r/o 0, with - yes -
>> no dependencies on particular extensions.
>
> Just to be sure that I get your idea correctly.
>
> Based on the priv. spec:
> Each bit of hedeleg shall be either writable or read-only zero. Many bits of
> hedeleg are required specifically to be writable or zero, as enumerated in
> Table 29.
>
> Now let’s take hedeleg.bit1, which is marked as writable according to Table 29.
> Your point is that even though hedeleg.bit1 is defined as writable, it could still
> be read-only zero, right?
>
> In general, I agree with that. It is possible that M-mode software decides, for
> some reason (for example, because the implementation does not support delegation
> of bit1 to a lower mode), not to delegate medeleg.bit1 to HS-mode. In that case,
> hedeleg.bit1 would always be read-only zero.
>
>> In which case you'd need to do
>> the delegation in software. For which it might be helpful to know what
>> the two registers are actually set to in hardware (i.e. the cached values
>> wanting to match the real ones).
>
> Does it make sense then to have the following
> ...
> v->arch.hedeleg = hedeleg;
> vcpu->arch.hedeleg = csr_read(CSR_HEDELEG);
> in arch_vcpu_create()?
The above makes no sense to me, with or without s/vcpu/v/.
> Or I can just add the comment that it will be sync-ed with the corresponding
> hardware CSR later as ,actually, there is some h{i,e}deleg synchronization
> happening during context_switch() (this code is at the moment in downstream),
> because restore_csr_regs() is executed and re-reads CSR_H{I,E}DELEG:
> static void restore_csr_regs(struct vcpu *vcpu)
> {
> csr_write(CSR_HEDELEG, vcpu->arch.hedeleg);
> csr_write(CSR_HIDELEG, vcpu->arch.hideleg);
> ...
> As a result, vcpu->arch.h{I,E}deleg is kept in sync with the corresponding
> hardware CSR.
No, the r/o bits will continue to be out-of-sync between the hw register and
the struct arch_vcpu field.
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 03/15] xen/riscv: implement vcpu_csr_init()
2026-01-12 15:54 ` Jan Beulich
@ 2026-01-12 16:39 ` Oleksii Kurochko
2026-01-12 16:42 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-12 16:39 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/12/26 4:54 PM, Jan Beulich wrote:
> On 12.01.2026 16:46, Oleksii Kurochko wrote:
>> On 1/12/26 3:28 PM, Jan Beulich wrote:
>>> On 12.01.2026 13:59, Oleksii Kurochko wrote:
>>>> On 1/7/26 9:46 AM, Jan Beulich wrote:
>>>>> Also, wouldn't you better keep internal state in line with what hardware
>>>>> actually supports? CSRIND may be read-only-zero in the real register, in
>>>>> which case having the bit set in the "cached" copy can be misleading.
>>>> [...]
>>>>
>>>>> (This may similarly apply to at least hedeleg and hideleg, btw.)
>>>> Regarding the previous bits, I can understand that it would be an issue:
>>>> if SSAIA isn’t supported, then it is incorrect to update the corresponding
>>>> bits of|hstateen0|.
>>>>
>>>> However, I’m not really sure I understand what the issue is with|h{i,e}deleg|.
>>>> All writable bits there don’t depend on hardware support. Am I missing something?
>>> My reading of the doc was that any of the bits can be r/o 0, with - yes -
>>> no dependencies on particular extensions.
>> Just to be sure that I get your idea correctly.
>>
>> Based on the priv. spec:
>> Each bit of hedeleg shall be either writable or read-only zero. Many bits of
>> hedeleg are required specifically to be writable or zero, as enumerated in
>> Table 29.
>>
>> Now let’s take hedeleg.bit1, which is marked as writable according to Table 29.
>> Your point is that even though hedeleg.bit1 is defined as writable, it could still
>> be read-only zero, right?
>>
>> In general, I agree with that. It is possible that M-mode software decides, for
>> some reason (for example, because the implementation does not support delegation
>> of bit1 to a lower mode), not to delegate medeleg.bit1 to HS-mode. In that case,
>> hedeleg.bit1 would always be read-only zero.
>>
>>> In which case you'd need to do
>>> the delegation in software. For which it might be helpful to know what
>>> the two registers are actually set to in hardware (i.e. the cached values
>>> wanting to match the real ones).
>> Does it make sense then to have the following
>> ...
>> v->arch.hedeleg = hedeleg;
>> vcpu->arch.hedeleg = csr_read(CSR_HEDELEG);
>> in arch_vcpu_create()?
> The above makes no sense to me, with or without s/vcpu/v/.
Right...
It should be also csr_write() before csr_read():
csr_write(CSR_HEDELEG, hedeleg);
v->arch.hedeleg = csr_read(CSR_HEDELEG);
>
>> Or I can just add the comment that it will be sync-ed with the corresponding
>> hardware CSR later as ,actually, there is some h{i,e}deleg synchronization
>> happening during context_switch() (this code is at the moment in downstream),
>> because restore_csr_regs() is executed and re-reads CSR_H{I,E}DELEG:
>> static void restore_csr_regs(struct vcpu *vcpu)
>> {
>> csr_write(CSR_HEDELEG, vcpu->arch.hedeleg);
>> csr_write(CSR_HIDELEG, vcpu->arch.hideleg);
>> ...
>> As a result, vcpu->arch.h{I,E}deleg is kept in sync with the corresponding
>> hardware CSR.
> No, the r/o bits will continue to be out-of-sync between the hw register and
> the struct arch_vcpu field.
Yes, it would be out-of-sync until|save_csr_regs()| is called, where
|csr_read(CSR_HEDELEG)| is executed. So the value remains out-of-sync until a
trap to the hypervisor occurs and a vCPU context switch happens, which triggers
|save_csr_regs()|.
So that’s not an option. The best choice is the one mentioned above.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 03/15] xen/riscv: implement vcpu_csr_init()
2026-01-12 16:39 ` Oleksii Kurochko
@ 2026-01-12 16:42 ` Jan Beulich
0 siblings, 0 replies; 93+ messages in thread
From: Jan Beulich @ 2026-01-12 16:42 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 12.01.2026 17:39, Oleksii Kurochko wrote:
>
> On 1/12/26 4:54 PM, Jan Beulich wrote:
>> On 12.01.2026 16:46, Oleksii Kurochko wrote:
>>> On 1/12/26 3:28 PM, Jan Beulich wrote:
>>>> On 12.01.2026 13:59, Oleksii Kurochko wrote:
>>>>> On 1/7/26 9:46 AM, Jan Beulich wrote:
>>>>>> Also, wouldn't you better keep internal state in line with what hardware
>>>>>> actually supports? CSRIND may be read-only-zero in the real register, in
>>>>>> which case having the bit set in the "cached" copy can be misleading.
>>>>> [...]
>>>>>
>>>>>> (This may similarly apply to at least hedeleg and hideleg, btw.)
>>>>> Regarding the previous bits, I can understand that it would be an issue:
>>>>> if SSAIA isn’t supported, then it is incorrect to update the corresponding
>>>>> bits of|hstateen0|.
>>>>>
>>>>> However, I’m not really sure I understand what the issue is with|h{i,e}deleg|.
>>>>> All writable bits there don’t depend on hardware support. Am I missing something?
>>>> My reading of the doc was that any of the bits can be r/o 0, with - yes -
>>>> no dependencies on particular extensions.
>>> Just to be sure that I get your idea correctly.
>>>
>>> Based on the priv. spec:
>>> Each bit of hedeleg shall be either writable or read-only zero. Many bits of
>>> hedeleg are required specifically to be writable or zero, as enumerated in
>>> Table 29.
>>>
>>> Now let’s take hedeleg.bit1, which is marked as writable according to Table 29.
>>> Your point is that even though hedeleg.bit1 is defined as writable, it could still
>>> be read-only zero, right?
>>>
>>> In general, I agree with that. It is possible that M-mode software decides, for
>>> some reason (for example, because the implementation does not support delegation
>>> of bit1 to a lower mode), not to delegate medeleg.bit1 to HS-mode. In that case,
>>> hedeleg.bit1 would always be read-only zero.
>>>
>>>> In which case you'd need to do
>>>> the delegation in software. For which it might be helpful to know what
>>>> the two registers are actually set to in hardware (i.e. the cached values
>>>> wanting to match the real ones).
>>> Does it make sense then to have the following
>>> ...
>>> v->arch.hedeleg = hedeleg;
>>> vcpu->arch.hedeleg = csr_read(CSR_HEDELEG);
>>> in arch_vcpu_create()?
>> The above makes no sense to me, with or without s/vcpu/v/.
>
> Right...
>
> It should be also csr_write() before csr_read():
> csr_write(CSR_HEDELEG, hedeleg);
> v->arch.hedeleg = csr_read(CSR_HEDELEG);
Ah yes. Alternatively you could obtain a mask of modifiable bits once, and
then simply apply that here in place of the CSR read/write.
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread
* [PATCH v1 04/15] xen/riscv: introduce vtimer
2025-12-24 17:03 [PATCH v1 00/15] xen/riscv: introduce vtimer related things Oleksii Kurochko
` (2 preceding siblings ...)
2025-12-24 17:03 ` [PATCH v1 03/15] xen/riscv: implement vcpu_csr_init() Oleksii Kurochko
@ 2025-12-24 17:03 ` Oleksii Kurochko
2026-01-07 15:21 ` Jan Beulich
2025-12-24 17:03 ` [PATCH v1 05/15] xen/riscv: implement stub for smp_send_event_check_mask() Oleksii Kurochko
` (10 subsequent siblings)
14 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2025-12-24 17:03 UTC (permalink / raw)
To: xen-devel
Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Introduce a virtual timer structure along with functions to initialize
and destroy the virtual timer.
Add a vtimer_expired() function and implement it as a stub, as the timer
and tasklet subsystems are not functional at this stage.
Call vcpu_vtimer_init() in arch_vcpu_create().
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/Makefile | 1 +
xen/arch/riscv/domain.c | 8 ++++--
xen/arch/riscv/include/asm/domain.h | 4 +++
xen/arch/riscv/include/asm/vtimer.h | 25 ++++++++++++++++++
xen/arch/riscv/vtimer.c | 39 +++++++++++++++++++++++++++++
5 files changed, 75 insertions(+), 2 deletions(-)
create mode 100644 xen/arch/riscv/include/asm/vtimer.h
create mode 100644 xen/arch/riscv/vtimer.c
diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index 8863d4b15605..5bd180130165 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -22,6 +22,7 @@ obj-y += traps.o
obj-y += vmid.o
obj-y += vm_event.o
obj-y += vsbi/
+obj-y += vtimer.o
$(TARGET): $(TARGET)-syms
$(OBJCOPY) -O binary -S $< $@
diff --git a/xen/arch/riscv/domain.c b/xen/arch/riscv/domain.c
index 44387d056546..dd3c237d163d 100644
--- a/xen/arch/riscv/domain.c
+++ b/xen/arch/riscv/domain.c
@@ -6,6 +6,7 @@
#include <asm/cpufeature.h>
#include <asm/csr.h>
#include <asm/riscv_encoding.h>
+#include <asm/vtimer.h>
static void vcpu_csr_init(struct vcpu *v)
{
@@ -97,11 +98,14 @@ int arch_vcpu_create(struct vcpu *v)
if ( is_idle_vcpu(v) )
return rc;
+ if ( (rc = vcpu_vtimer_init(v)) )
+ goto fail;
+
/*
- * As the vtimer and interrupt controller (IC) are not yet implemented,
+ * As interrupt controller (IC) is not yet implemented,
* return an error.
*
- * TODO: Drop this once the vtimer and IC are implemented.
+ * TODO: Drop this once IC is implemented.
*/
rc = -EOPNOTSUPP;
goto fail;
diff --git a/xen/arch/riscv/include/asm/domain.h b/xen/arch/riscv/include/asm/domain.h
index a0ffbbc09c6f..be7ddaff30e7 100644
--- a/xen/arch/riscv/include/asm/domain.h
+++ b/xen/arch/riscv/include/asm/domain.h
@@ -8,6 +8,7 @@
#include <public/hvm/params.h>
#include <asm/p2m.h>
+#include <asm/vtimer.h>
struct vcpu_vmid {
uint64_t generation;
@@ -52,6 +53,9 @@ struct arch_vcpu
struct cpu_info *cpu_info;
void *stack;
+ struct vtimer vtimer;
+ bool vtimer_initialized;
+
/* CSRs */
register_t hstatus;
register_t hedeleg;
diff --git a/xen/arch/riscv/include/asm/vtimer.h b/xen/arch/riscv/include/asm/vtimer.h
new file mode 100644
index 000000000000..a2ca704cf0cc
--- /dev/null
+++ b/xen/arch/riscv/include/asm/vtimer.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * (c) 2023-2024 Vates
+ */
+
+#ifndef ASM__RISCV__VTIMER_H
+#define ASM__RISCV__VTIMER_H
+
+#include <xen/timer.h>
+
+struct domain;
+struct vcpu;
+struct xen_arch_domainconfig;
+
+struct vtimer {
+ struct vcpu *v;
+ struct timer timer;
+};
+
+int vcpu_vtimer_init(struct vcpu *v);
+void vcpu_timer_destroy(struct vcpu *v);
+
+int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config);
+
+#endif /* ASM__RISCV__VTIMER_H */
diff --git a/xen/arch/riscv/vtimer.c b/xen/arch/riscv/vtimer.c
new file mode 100644
index 000000000000..5ba533690bc2
--- /dev/null
+++ b/xen/arch/riscv/vtimer.c
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <xen/sched.h>
+
+#include <public/xen.h>
+
+#include <asm/vtimer.h>
+
+int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config)
+{
+ /* Nothing to do at the moment */
+
+ return 0;
+}
+
+static void vtimer_expired(void *data)
+{
+ panic("%s: TBD\n", __func__);
+}
+
+int vcpu_vtimer_init(struct vcpu *v)
+{
+ struct vtimer *t = &v->arch.vtimer;
+
+ t->v = v;
+ init_timer(&t->timer, vtimer_expired, t, v->processor);
+
+ v->arch.vtimer_initialized = true;
+
+ return 0;
+}
+
+void vcpu_timer_destroy(struct vcpu *v)
+{
+ if ( !v->arch.vtimer_initialized )
+ return;
+
+ kill_timer(&v->arch.vtimer.timer);
+}
--
2.52.0
^ permalink raw reply related [flat|nested] 93+ messages in thread* Re: [PATCH v1 04/15] xen/riscv: introduce vtimer
2025-12-24 17:03 ` [PATCH v1 04/15] xen/riscv: introduce vtimer Oleksii Kurochko
@ 2026-01-07 15:21 ` Jan Beulich
2026-01-12 16:28 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-07 15:21 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 24.12.2025 18:03, Oleksii Kurochko wrote:
> Introduce a virtual timer structure along with functions to initialize
> and destroy the virtual timer.
>
> Add a vtimer_expired() function and implement it as a stub, as the timer
> and tasklet subsystems are not functional at this stage.
Shouldn't those pieces of infrastructure be made work then first? I also
don't quite understand why the subsystems not being functional prevents
the function to be implemented as far as possible. Most if not all
functions you need from both subsystems should be available, for living
in common code.
> --- a/xen/arch/riscv/include/asm/domain.h
> +++ b/xen/arch/riscv/include/asm/domain.h
> @@ -8,6 +8,7 @@
> #include <public/hvm/params.h>
>
> #include <asm/p2m.h>
> +#include <asm/vtimer.h>
>
> struct vcpu_vmid {
> uint64_t generation;
> @@ -52,6 +53,9 @@ struct arch_vcpu
> struct cpu_info *cpu_info;
> void *stack;
>
> + struct vtimer vtimer;
> + bool vtimer_initialized;
Assuming the field is really needed (see remark further down), why is this
not part of the struct?
> --- /dev/null
> +++ b/xen/arch/riscv/include/asm/vtimer.h
> @@ -0,0 +1,25 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * (c) 2023-2024 Vates
> + */
> +
> +#ifndef ASM__RISCV__VTIMER_H
> +#define ASM__RISCV__VTIMER_H
> +
> +#include <xen/timer.h>
> +
> +struct domain;
> +struct vcpu;
I don't think this one is needed, as long as you have ...
> +struct xen_arch_domainconfig;
> +
> +struct vtimer {
> + struct vcpu *v;
... this. Question is why this is here: You should be able to get hold of the
struct vcpu containing a struct vtimer using container_of().
> --- /dev/null
> +++ b/xen/arch/riscv/vtimer.c
> @@ -0,0 +1,39 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <xen/sched.h>
> +
> +#include <public/xen.h>
> +
> +#include <asm/vtimer.h>
> +
> +int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config)
> +{
> + /* Nothing to do at the moment */
> +
> + return 0;
> +}
The function has no caller and does nothing - why do we need it?
> +static void vtimer_expired(void *data)
> +{
> + panic("%s: TBD\n", __func__);
> +}
> +
> +int vcpu_vtimer_init(struct vcpu *v)
> +{
> + struct vtimer *t = &v->arch.vtimer;
> +
> + t->v = v;
> + init_timer(&t->timer, vtimer_expired, t, v->processor);
> +
> + v->arch.vtimer_initialized = true;
init_timer() has specific effects (like setting t->function to non-NULL
and t->status to other than TIMER_STATUS_invalid). Can't you leverage
that instead of having a separate boolean? (Iirc we do so elsewhere.)
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 04/15] xen/riscv: introduce vtimer
2026-01-07 15:21 ` Jan Beulich
@ 2026-01-12 16:28 ` Oleksii Kurochko
0 siblings, 0 replies; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-12 16:28 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/7/26 4:21 PM, Jan Beulich wrote:
> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>> Introduce a virtual timer structure along with functions to initialize
>> and destroy the virtual timer.
>>
>> Add a vtimer_expired() function and implement it as a stub, as the timer
>> and tasklet subsystems are not functional at this stage.
> Shouldn't those pieces of infrastructure be made work then first?
It could be an option; it’s just not really critical until a guest is running.
I actually considered adding this in the current patch series, but decided to
introduce it later to avoid making the series too large. (On the other hand, it
would be only one additional patch, IIRC)
> I also
> don't quite understand why the subsystems not being functional prevents
> the function to be implemented as far as possible. Most if not all
> functions you need from both subsystems should be available, for living
> in common code.
I chose the wrong words here; this is not the main (that some subsystems isn't
fully functional) reason why I’m using a stub here instead of something functional.
Basically, implementing this requires vcpu_kick() and vcpu_set_interrupt(),
which are introduced later in this patch series.
As an alternative, I could drop vtimer_expired() and the related code from this
patch and reintroduce them after vcpu_kick() and vcpu_set_interrupt() are
available.
>
>> --- a/xen/arch/riscv/include/asm/domain.h
>> +++ b/xen/arch/riscv/include/asm/domain.h
>> @@ -8,6 +8,7 @@
>> #include <public/hvm/params.h>
>>
>> #include <asm/p2m.h>
>> +#include <asm/vtimer.h>
>>
>> struct vcpu_vmid {
>> uint64_t generation;
>> @@ -52,6 +53,9 @@ struct arch_vcpu
>> struct cpu_info *cpu_info;
>> void *stack;
>>
>> + struct vtimer vtimer;
>> + bool vtimer_initialized;
> Assuming the field is really needed (see remark further down), why is this
> not part of the struct?
Agree, it would be better to have it as a part of struct vtimer if it will
be used in future.
>
>> --- /dev/null
>> +++ b/xen/arch/riscv/include/asm/vtimer.h
>> @@ -0,0 +1,25 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * (c) 2023-2024 Vates
>> + */
>> +
>> +#ifndef ASM__RISCV__VTIMER_H
>> +#define ASM__RISCV__VTIMER_H
>> +
>> +#include <xen/timer.h>
>> +
>> +struct domain;
>> +struct vcpu;
> I don't think this one is needed, as long as you have ...
>
>> +struct xen_arch_domainconfig;
>> +
>> +struct vtimer {
>> + struct vcpu *v;
> ... this. Question is why this is here: You should be able to get hold of the
> struct vcpu containing a struct vtimer using container_of().
Good point, I haven't thought about that. It could really be done using container_of().
>
>> --- /dev/null
>> +++ b/xen/arch/riscv/vtimer.c
>> @@ -0,0 +1,39 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +
>> +#include <xen/sched.h>
>> +
>> +#include <public/xen.h>
>> +
>> +#include <asm/vtimer.h>
>> +
>> +int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config)
>> +{
>> + /* Nothing to do at the moment */
>> +
>> + return 0;
>> +}
> The function has no caller and does nothing - why do we need it?
It will be called later in arch_domain_create().
It will be needed if SSTC extension will be supported but could be dropped now.
>
>> +static void vtimer_expired(void *data)
>> +{
>> + panic("%s: TBD\n", __func__);
>> +}
>> +
>> +int vcpu_vtimer_init(struct vcpu *v)
>> +{
>> + struct vtimer *t = &v->arch.vtimer;
>> +
>> + t->v = v;
>> + init_timer(&t->timer, vtimer_expired, t, v->processor);
>> +
>> + v->arch.vtimer_initialized = true;
> init_timer() has specific effects (like setting t->function to non-NULL
> and t->status to other than TIMER_STATUS_invalid). Can't you leverage
> that instead of having a separate boolean? (Iirc we do so elsewhere.)
Nice, it could be used instead of having vtimer_initialized in struct vtimer.
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread
* [PATCH v1 05/15] xen/riscv: implement stub for smp_send_event_check_mask()
2025-12-24 17:03 [PATCH v1 00/15] xen/riscv: introduce vtimer related things Oleksii Kurochko
` (3 preceding siblings ...)
2025-12-24 17:03 ` [PATCH v1 04/15] xen/riscv: introduce vtimer Oleksii Kurochko
@ 2025-12-24 17:03 ` Oleksii Kurochko
2026-01-07 15:47 ` Jan Beulich
2025-12-24 17:03 ` [PATCH v1 06/15] xen/riscv: introduce vcpu_kick() implementation Oleksii Kurochko
` (9 subsequent siblings)
14 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2025-12-24 17:03 UTC (permalink / raw)
To: xen-devel
Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Since SMP is not yet supported, it is acceptable to implement
smp_send_event_check_mask() as a stub.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/smp.c | 8 ++++++++
xen/arch/riscv/stubs.c | 5 -----
2 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/xen/arch/riscv/smp.c b/xen/arch/riscv/smp.c
index 4ca6a4e89200..e727fdb09612 100644
--- a/xen/arch/riscv/smp.c
+++ b/xen/arch/riscv/smp.c
@@ -1,3 +1,4 @@
+#include <xen/cpumask.h>
#include <xen/smp.h>
/*
@@ -13,3 +14,10 @@
struct pcpu_info pcpu_info[NR_CPUS] = { [0 ... NR_CPUS - 1] = {
.processor_id = NR_CPUS,
}};
+
+void smp_send_event_check_mask(const cpumask_t *mask)
+{
+#if CONFIG_NR_CPUS > 1
+# error "smp_send_event_check_mask() unimplemented"
+#endif
+}
diff --git a/xen/arch/riscv/stubs.c b/xen/arch/riscv/stubs.c
index eab826e8c3ae..6ebb5139de69 100644
--- a/xen/arch/riscv/stubs.c
+++ b/xen/arch/riscv/stubs.c
@@ -65,11 +65,6 @@ int arch_monitor_domctl_event(struct domain *d,
/* smp.c */
-void smp_send_event_check_mask(const cpumask_t *mask)
-{
- BUG_ON("unimplemented");
-}
-
void smp_send_call_function_mask(const cpumask_t *mask)
{
BUG_ON("unimplemented");
--
2.52.0
^ permalink raw reply related [flat|nested] 93+ messages in thread* Re: [PATCH v1 05/15] xen/riscv: implement stub for smp_send_event_check_mask()
2025-12-24 17:03 ` [PATCH v1 05/15] xen/riscv: implement stub for smp_send_event_check_mask() Oleksii Kurochko
@ 2026-01-07 15:47 ` Jan Beulich
2026-01-12 16:53 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-07 15:47 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 24.12.2025 18:03, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/smp.c
> +++ b/xen/arch/riscv/smp.c
> @@ -1,3 +1,4 @@
> +#include <xen/cpumask.h>
> #include <xen/smp.h>
>
> /*
> @@ -13,3 +14,10 @@
> struct pcpu_info pcpu_info[NR_CPUS] = { [0 ... NR_CPUS - 1] = {
> .processor_id = NR_CPUS,
> }};
> +
> +void smp_send_event_check_mask(const cpumask_t *mask)
> +{
> +#if CONFIG_NR_CPUS > 1
> +# error "smp_send_event_check_mask() unimplemented"
> +#endif
> +}
CONFIG_NR_CPUS is 64 by default for 64-bit arch-es, from all I can tell, also
for RISC-V. And there's no "override" in riscv64_defconfig. How is the above
going to work in CI? Then again I must be overlooking something, as the config
used in CI has CONFIG_NR_CPUS=1. Just that I can't tell why that is.
And no, I'm not meaning to ask that you override NR_CPUS (and wherever such an
override would live, I think it would better be dropped rather sooner than
later). Instead an option may be this:
void smp_send_event_check_mask(const cpumask_t *mask)
{
#if CONFIG_NR_CPUS > 1
BUG_ON(!cpumask_subset(mask, cpumask_of(0)));
#endif
}
(I can't tell off the top of my head whether an empty mask may be passed to this
function. If not, cpumask_equal() could be used as well.)
Of course the #if may then not be necessary at all, and a TODO comment may want
putting there instead.
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 05/15] xen/riscv: implement stub for smp_send_event_check_mask()
2026-01-07 15:47 ` Jan Beulich
@ 2026-01-12 16:53 ` Oleksii Kurochko
2026-01-12 17:05 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-12 16:53 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/7/26 4:47 PM, Jan Beulich wrote:
> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/smp.c
>> +++ b/xen/arch/riscv/smp.c
>> @@ -1,3 +1,4 @@
>> +#include <xen/cpumask.h>
>> #include <xen/smp.h>
>>
>> /*
>> @@ -13,3 +14,10 @@
>> struct pcpu_info pcpu_info[NR_CPUS] = { [0 ... NR_CPUS - 1] = {
>> .processor_id = NR_CPUS,
>> }};
>> +
>> +void smp_send_event_check_mask(const cpumask_t *mask)
>> +{
>> +#if CONFIG_NR_CPUS > 1
>> +# error "smp_send_event_check_mask() unimplemented"
>> +#endif
>> +}
> CONFIG_NR_CPUS is 64 by default for 64-bit arch-es, from all I can tell, also
> for RISC-V. And there's no "override" in riscv64_defconfig. How is the above
> going to work in CI? Then again I must be overlooking something, as the config
> used in CI has CONFIG_NR_CPUS=1. Just that I can't tell why that is.
It is 1 because of the defintion of NR_CPUS in KConfig:
config NR_CPUS
int "Maximum number of CPUs"
range 1 1 if ARM && MPU
range 1 16383
.... ( all other range props are condtional and there is no RISC-V in dependency)
so for RISC-V "range 1 16383" used and CONFIG_NR_CPUS is set to the minimal of this range,
so it is 1.
>
> And no, I'm not meaning to ask that you override NR_CPUS (and wherever such an
> override would live, I think it would better be dropped rather sooner than
> later). Instead an option may be this:
>
> void smp_send_event_check_mask(const cpumask_t *mask)
> {
> #if CONFIG_NR_CPUS > 1
> BUG_ON(!cpumask_subset(mask, cpumask_of(0)));
> #endif
> }
>
> (I can't tell off the top of my head whether an empty mask may be passed to this
> function. If not, cpumask_equal() could be used as well.)
I will double-check. Thanks for such hint.
>
> Of course the #if may then not be necessary at all, and a TODO comment may want
> putting there instead.
With suggested above approach, I think it isn't really needed to use #if.
Thanks!
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 05/15] xen/riscv: implement stub for smp_send_event_check_mask()
2026-01-12 16:53 ` Oleksii Kurochko
@ 2026-01-12 17:05 ` Jan Beulich
2026-01-13 9:58 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-12 17:05 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 12.01.2026 17:53, Oleksii Kurochko wrote:
> On 1/7/26 4:47 PM, Jan Beulich wrote:
>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>> @@ -13,3 +14,10 @@
>>> struct pcpu_info pcpu_info[NR_CPUS] = { [0 ... NR_CPUS - 1] = {
>>> .processor_id = NR_CPUS,
>>> }};
>>> +
>>> +void smp_send_event_check_mask(const cpumask_t *mask)
>>> +{
>>> +#if CONFIG_NR_CPUS > 1
>>> +# error "smp_send_event_check_mask() unimplemented"
>>> +#endif
>>> +}
>> CONFIG_NR_CPUS is 64 by default for 64-bit arch-es, from all I can tell, also
>> for RISC-V. And there's no "override" in riscv64_defconfig. How is the above
>> going to work in CI? Then again I must be overlooking something, as the config
>> used in CI has CONFIG_NR_CPUS=1. Just that I can't tell why that is.
>
> It is 1 because of the defintion of NR_CPUS in KConfig:
> config NR_CPUS
> int "Maximum number of CPUs"
> range 1 1 if ARM && MPU
> range 1 16383
> .... ( all other range props are condtional and there is no RISC-V in dependency)
> so for RISC-V "range 1 16383" used and CONFIG_NR_CPUS is set to the minimal of this range,
> so it is 1.
I fear I don't follow: Why would the lowest value be picked, rather than the
specified default (which would be 64 for RV64)? That's what I thought the
default values are there (among other purposes).
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 05/15] xen/riscv: implement stub for smp_send_event_check_mask()
2026-01-12 17:05 ` Jan Beulich
@ 2026-01-13 9:58 ` Oleksii Kurochko
2026-01-13 10:22 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-13 9:58 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/12/26 6:05 PM, Jan Beulich wrote:
> On 12.01.2026 17:53, Oleksii Kurochko wrote:
>> On 1/7/26 4:47 PM, Jan Beulich wrote:
>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>> @@ -13,3 +14,10 @@
>>>> struct pcpu_info pcpu_info[NR_CPUS] = { [0 ... NR_CPUS - 1] = {
>>>> .processor_id = NR_CPUS,
>>>> }};
>>>> +
>>>> +void smp_send_event_check_mask(const cpumask_t *mask)
>>>> +{
>>>> +#if CONFIG_NR_CPUS > 1
>>>> +# error "smp_send_event_check_mask() unimplemented"
>>>> +#endif
>>>> +}
>>> CONFIG_NR_CPUS is 64 by default for 64-bit arch-es, from all I can tell, also
>>> for RISC-V. And there's no "override" in riscv64_defconfig. How is the above
>>> going to work in CI? Then again I must be overlooking something, as the config
>>> used in CI has CONFIG_NR_CPUS=1. Just that I can't tell why that is.
>> It is 1 because of the defintion of NR_CPUS in KConfig:
>> config NR_CPUS
>> int "Maximum number of CPUs"
>> range 1 1 if ARM && MPU
>> range 1 16383
>> .... ( all other range props are condtional and there is no RISC-V in dependency)
>> so for RISC-V "range 1 16383" used and CONFIG_NR_CPUS is set to the minimal of this range,
>> so it is 1.
> I fear I don't follow: Why would the lowest value be picked, rather than the
> specified default (which would be 64 for RV64)? That's what I thought the
> default values are there (among other purposes).
But there is no default for RISC-V for config NR_CPUS:
config NR_CPUS
int "Maximum number of CPUs"
range 1 1 if ARM && MPU
range 1 16383
default "256" if X86
default "1" if ARM && MPU
default "8" if ARM && RCAR3
default "4" if ARM && QEMU
default "4" if ARM && MPSOC
default "128" if ARM
help
...
So a value from range [1, 16383] is chosen and based on the code of sym_validate_range():
...
val = strtoll(sym->curr.val, NULL, base);
val2 = sym_get_range_val(prop->expr->left.sym, base);
if (val >= val2) {
val2 = sym_get_range_val(prop->expr->right.sym, base);
if (val <= val2)
return;
}
if (sym->type == S_INT)
sprintf(str, "%lld", val2);
else
sprintf(str, "0x%llx", val2);
sym->curr.val = xstrdup(str);
First initialization of val2 it is the left value of the range [1, 16383],so it is 1
and val is 0 (I assume so that it is by initialization 0), thereby val2 = 1 will be
used as a value for NR_CPUS.
I also experimented by trying to update it to the range|2 16383|, and|CONFIG_NR_CPUS|
became 2.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 05/15] xen/riscv: implement stub for smp_send_event_check_mask()
2026-01-13 9:58 ` Oleksii Kurochko
@ 2026-01-13 10:22 ` Jan Beulich
2026-01-13 11:39 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-13 10:22 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 13.01.2026 10:58, Oleksii Kurochko wrote:
>
> On 1/12/26 6:05 PM, Jan Beulich wrote:
>> On 12.01.2026 17:53, Oleksii Kurochko wrote:
>>> On 1/7/26 4:47 PM, Jan Beulich wrote:
>>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>>> @@ -13,3 +14,10 @@
>>>>> struct pcpu_info pcpu_info[NR_CPUS] = { [0 ... NR_CPUS - 1] = {
>>>>> .processor_id = NR_CPUS,
>>>>> }};
>>>>> +
>>>>> +void smp_send_event_check_mask(const cpumask_t *mask)
>>>>> +{
>>>>> +#if CONFIG_NR_CPUS > 1
>>>>> +# error "smp_send_event_check_mask() unimplemented"
>>>>> +#endif
>>>>> +}
>>>> CONFIG_NR_CPUS is 64 by default for 64-bit arch-es, from all I can tell, also
>>>> for RISC-V. And there's no "override" in riscv64_defconfig. How is the above
>>>> going to work in CI? Then again I must be overlooking something, as the config
>>>> used in CI has CONFIG_NR_CPUS=1. Just that I can't tell why that is.
>>> It is 1 because of the defintion of NR_CPUS in KConfig:
>>> config NR_CPUS
>>> int "Maximum number of CPUs"
>>> range 1 1 if ARM && MPU
>>> range 1 16383
>>> .... ( all other range props are condtional and there is no RISC-V in dependency)
>>> so for RISC-V "range 1 16383" used and CONFIG_NR_CPUS is set to the minimal of this range,
>>> so it is 1.
>> I fear I don't follow: Why would the lowest value be picked, rather than the
>> specified default (which would be 64 for RV64)? That's what I thought the
>> default values are there (among other purposes).
>
> But there is no default for RISC-V for config NR_CPUS:
>
> config NR_CPUS
> int "Maximum number of CPUs"
> range 1 1 if ARM && MPU
> range 1 16383
> default "256" if X86
> default "1" if ARM && MPU
> default "8" if ARM && RCAR3
> default "4" if ARM && QEMU
> default "4" if ARM && MPSOC
> default "128" if ARM
> help
> ...
Oh, indeed, that's what I was overlooking.
> So a value from range [1, 16383] is chosen and based on the code of sym_validate_range():
> ...
> val = strtoll(sym->curr.val, NULL, base);
> val2 = sym_get_range_val(prop->expr->left.sym, base);
> if (val >= val2) {
> val2 = sym_get_range_val(prop->expr->right.sym, base);
> if (val <= val2)
> return;
> }
> if (sym->type == S_INT)
> sprintf(str, "%lld", val2);
> else
> sprintf(str, "0x%llx", val2);
> sym->curr.val = xstrdup(str);
>
> First initialization of val2 it is the left value of the range [1, 16383],so it is 1
> and val is 0 (I assume so that it is by initialization 0), thereby val2 = 1 will be
> used as a value for NR_CPUS.
But is this behavior documented anywhere? Wouldn't RISC-V (and PPC) better
gain suitable defaults, making explicit what is wanted (for the time being)?
E.g.
config NR_CPUS
int "Maximum number of CPUs"
range 1 1 if ARM && MPU
range 1 16383
default "256" if X86
default "1" if !ARM || MPU
default "8" if RCAR3
default "4" if QEMU
default "4" if MPSOC
default "128"
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 05/15] xen/riscv: implement stub for smp_send_event_check_mask()
2026-01-13 10:22 ` Jan Beulich
@ 2026-01-13 11:39 ` Oleksii Kurochko
0 siblings, 0 replies; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-13 11:39 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/13/26 11:22 AM, Jan Beulich wrote:
> On 13.01.2026 10:58, Oleksii Kurochko wrote:
>> On 1/12/26 6:05 PM, Jan Beulich wrote:
>>> On 12.01.2026 17:53, Oleksii Kurochko wrote:
>>>> On 1/7/26 4:47 PM, Jan Beulich wrote:
>>>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>>>> @@ -13,3 +14,10 @@
>>>>>> struct pcpu_info pcpu_info[NR_CPUS] = { [0 ... NR_CPUS - 1] = {
>>>>>> .processor_id = NR_CPUS,
>>>>>> }};
>>>>>> +
>>>>>> +void smp_send_event_check_mask(const cpumask_t *mask)
>>>>>> +{
>>>>>> +#if CONFIG_NR_CPUS > 1
>>>>>> +# error "smp_send_event_check_mask() unimplemented"
>>>>>> +#endif
>>>>>> +}
>>>>> CONFIG_NR_CPUS is 64 by default for 64-bit arch-es, from all I can tell, also
>>>>> for RISC-V. And there's no "override" in riscv64_defconfig. How is the above
>>>>> going to work in CI? Then again I must be overlooking something, as the config
>>>>> used in CI has CONFIG_NR_CPUS=1. Just that I can't tell why that is.
>>>> It is 1 because of the defintion of NR_CPUS in KConfig:
>>>> config NR_CPUS
>>>> int "Maximum number of CPUs"
>>>> range 1 1 if ARM && MPU
>>>> range 1 16383
>>>> .... ( all other range props are condtional and there is no RISC-V in dependency)
>>>> so for RISC-V "range 1 16383" used and CONFIG_NR_CPUS is set to the minimal of this range,
>>>> so it is 1.
>>> I fear I don't follow: Why would the lowest value be picked, rather than the
>>> specified default (which would be 64 for RV64)? That's what I thought the
>>> default values are there (among other purposes).
>> But there is no default for RISC-V for config NR_CPUS:
>>
>> config NR_CPUS
>> int "Maximum number of CPUs"
>> range 1 1 if ARM && MPU
>> range 1 16383
>> default "256" if X86
>> default "1" if ARM && MPU
>> default "8" if ARM && RCAR3
>> default "4" if ARM && QEMU
>> default "4" if ARM && MPSOC
>> default "128" if ARM
>> help
>> ...
> Oh, indeed, that's what I was overlooking.
>
>> So a value from range [1, 16383] is chosen and based on the code of sym_validate_range():
>> ...
>> val = strtoll(sym->curr.val, NULL, base);
>> val2 = sym_get_range_val(prop->expr->left.sym, base);
>> if (val >= val2) {
>> val2 = sym_get_range_val(prop->expr->right.sym, base);
>> if (val <= val2)
>> return;
>> }
>> if (sym->type == S_INT)
>> sprintf(str, "%lld", val2);
>> else
>> sprintf(str, "0x%llx", val2);
>> sym->curr.val = xstrdup(str);
>>
>> First initialization of val2 it is the left value of the range [1, 16383],so it is 1
>> and val is 0 (I assume so that it is by initialization 0), thereby val2 = 1 will be
>> used as a value for NR_CPUS.
> But is this behavior documented anywhere?
I wasn't able to find that and it was a reason why I checked the code.
> Wouldn't RISC-V (and PPC) better
> gain suitable defaults, making explicit what is wanted (for the time being)?
> E.g.
>
> config NR_CPUS
> int "Maximum number of CPUs"
> range 1 1 if ARM && MPU
> range 1 16383
> default "256" if X86
> default "1" if !ARM || MPU
> default "8" if RCAR3
> default "4" if QEMU
> default "4" if MPSOC
> default "128"
Maybe, it would be better.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread
* [PATCH v1 06/15] xen/riscv: introduce vcpu_kick() implementation
2025-12-24 17:03 [PATCH v1 00/15] xen/riscv: introduce vtimer related things Oleksii Kurochko
` (4 preceding siblings ...)
2025-12-24 17:03 ` [PATCH v1 05/15] xen/riscv: implement stub for smp_send_event_check_mask() Oleksii Kurochko
@ 2025-12-24 17:03 ` Oleksii Kurochko
2026-01-07 16:04 ` Jan Beulich
2025-12-24 17:03 ` [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1 Oleksii Kurochko
` (8 subsequent siblings)
14 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2025-12-24 17:03 UTC (permalink / raw)
To: xen-devel
Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Add a RISC-V implementation of vcpu_kick(), which unblocks the target
vCPU and sends an event check IPI if the vCPU was running on another
processor. This mirrors the behavior of Arm and enables proper vCPU
wakeup handling on RISC-V.
Remove the stub implementation from stubs.c, as it is now provided by
arch/riscv/domain.c.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/domain.c | 14 ++++++++++++++
xen/arch/riscv/stubs.c | 5 -----
2 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/xen/arch/riscv/domain.c b/xen/arch/riscv/domain.c
index dd3c237d163d..164ab14a5209 100644
--- a/xen/arch/riscv/domain.c
+++ b/xen/arch/riscv/domain.c
@@ -1,7 +1,9 @@
/* SPDX-License-Identifier: GPL-2.0-only */
+#include <xen/cpumask.h>
#include <xen/mm.h>
#include <xen/sched.h>
+#include <xen/smp.h>
#include <asm/cpufeature.h>
#include <asm/csr.h>
@@ -121,3 +123,15 @@ void arch_vcpu_destroy(struct vcpu *v)
{
free_xenheap_pages(v->arch.stack, STACK_ORDER);
}
+
+void vcpu_kick(struct vcpu *v)
+{
+ bool running = v->is_running;
+
+ vcpu_unblock(v);
+ if ( running && v != current )
+ {
+ perfc_incr(vcpu_kick);
+ smp_send_event_check_mask(cpumask_of(v->processor));
+ }
+}
diff --git a/xen/arch/riscv/stubs.c b/xen/arch/riscv/stubs.c
index 6ebb5139de69..68ee859ca1a8 100644
--- a/xen/arch/riscv/stubs.c
+++ b/xen/arch/riscv/stubs.c
@@ -213,11 +213,6 @@ void vcpu_block_unless_event_pending(struct vcpu *v)
BUG_ON("unimplemented");
}
-void vcpu_kick(struct vcpu *v)
-{
- BUG_ON("unimplemented");
-}
-
struct vcpu *alloc_vcpu_struct(const struct domain *d)
{
BUG_ON("unimplemented");
--
2.52.0
^ permalink raw reply related [flat|nested] 93+ messages in thread* Re: [PATCH v1 06/15] xen/riscv: introduce vcpu_kick() implementation
2025-12-24 17:03 ` [PATCH v1 06/15] xen/riscv: introduce vcpu_kick() implementation Oleksii Kurochko
@ 2026-01-07 16:04 ` Jan Beulich
0 siblings, 0 replies; 93+ messages in thread
From: Jan Beulich @ 2026-01-07 16:04 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 24.12.2025 18:03, Oleksii Kurochko wrote:
> Add a RISC-V implementation of vcpu_kick(), which unblocks the target
> vCPU and sends an event check IPI if the vCPU was running on another
> processor. This mirrors the behavior of Arm and enables proper vCPU
> wakeup handling on RISC-V.
>
> Remove the stub implementation from stubs.c, as it is now provided by
> arch/riscv/domain.c.
>
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
^ permalink raw reply [flat|nested] 93+ messages in thread
* [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1
2025-12-24 17:03 [PATCH v1 00/15] xen/riscv: introduce vtimer related things Oleksii Kurochko
` (5 preceding siblings ...)
2025-12-24 17:03 ` [PATCH v1 06/15] xen/riscv: introduce vcpu_kick() implementation Oleksii Kurochko
@ 2025-12-24 17:03 ` Oleksii Kurochko
2026-01-07 16:28 ` Jan Beulich
2025-12-24 17:03 ` [PATCH v1 08/15] xen/riscv: introduce vtimer_set_timer() and vtimer_expired() Oleksii Kurochko
` (7 subsequent siblings)
14 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2025-12-24 17:03 UTC (permalink / raw)
To: xen-devel
Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
This patch is based on Linux kernel 6.16.0.
Introduce a lockless mechanism for tracking pending vCPU interrupts using
atomic bit operations. The design follows a multi-producer, single-consumer
model where the consumer is the vCPU itself.
Two bitmaps are added:
- irqs_pending — represents interrupts currently pending
- irqs_pending_mask — represents bits that have changed in irqs_pending
Introduce vcpu_(un)set_interrupt() to mark an interrupt in irqs_pending{_mask}
bitmap(s) to notify vCPU that it has or no an interrupt.
Other parts (such as vcpu_has_interrupts(), vcpu_flush_interrupts() and
vcpu_sync_interrupts()) of a lockless mechanism for tracking pending vCPU
interuupts are going to be introduced in a separate patch.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/domain.c | 47 +++++++++++++++++++++
xen/arch/riscv/include/asm/domain.h | 19 +++++++++
xen/arch/riscv/include/asm/riscv_encoding.h | 1 +
3 files changed, 67 insertions(+)
diff --git a/xen/arch/riscv/domain.c b/xen/arch/riscv/domain.c
index 164ab14a5209..8a010ae5b47e 100644
--- a/xen/arch/riscv/domain.c
+++ b/xen/arch/riscv/domain.c
@@ -5,9 +5,11 @@
#include <xen/sched.h>
#include <xen/smp.h>
+#include <asm/bitops.h>
#include <asm/cpufeature.h>
#include <asm/csr.h>
#include <asm/riscv_encoding.h>
+#include <asm/system.h>
#include <asm/vtimer.h>
static void vcpu_csr_init(struct vcpu *v)
@@ -100,6 +102,9 @@ int arch_vcpu_create(struct vcpu *v)
if ( is_idle_vcpu(v) )
return rc;
+ bitmap_zero(v->arch.irqs_pending, RISCV_VCPU_NR_IRQS);
+ bitmap_zero(v->arch.irqs_pending_mask, RISCV_VCPU_NR_IRQS);
+
if ( (rc = vcpu_vtimer_init(v)) )
goto fail;
@@ -135,3 +140,45 @@ void vcpu_kick(struct vcpu *v)
smp_send_event_check_mask(cpumask_of(v->processor));
}
}
+
+int vcpu_set_interrupt(struct vcpu *v, const unsigned int irq)
+{
+ /*
+ * We only allow VS-mode software, timer, and external
+ * interrupts when irq is one of the local interrupts
+ * defined by RISC-V privilege specification.
+ */
+ if ( irq < IRQ_LOCAL_MAX &&
+ irq != IRQ_VS_SOFT &&
+ irq != IRQ_VS_TIMER &&
+ irq != IRQ_VS_EXT )
+ return -EINVAL;
+
+ set_bit(irq, v->arch.irqs_pending);
+ smp_mb__before_atomic();
+ set_bit(irq, v->arch.irqs_pending_mask);
+
+ vcpu_kick(v);
+
+ return 0;
+}
+
+int vcpu_unset_interrupt(struct vcpu *v, const unsigned int irq)
+{
+ /*
+ * We only allow VS-mode software, timer, external
+ * interrupts when irq is one of the local interrupts
+ * defined by RISC-V privilege specification.
+ */
+ if ( irq < IRQ_LOCAL_MAX &&
+ irq != IRQ_VS_SOFT &&
+ irq != IRQ_VS_TIMER &&
+ irq != IRQ_VS_EXT )
+ return -EINVAL;
+
+ clear_bit(irq, v->arch.irqs_pending);
+ smp_mb__before_atomic();
+ set_bit(irq, v->arch.irqs_pending_mask);
+
+ return 0;
+}
diff --git a/xen/arch/riscv/include/asm/domain.h b/xen/arch/riscv/include/asm/domain.h
index be7ddaff30e7..a7538e0dc966 100644
--- a/xen/arch/riscv/include/asm/domain.h
+++ b/xen/arch/riscv/include/asm/domain.h
@@ -85,6 +85,22 @@ struct arch_vcpu
register_t vstval;
register_t vsatp;
register_t vsepc;
+
+ /*
+ * VCPU interrupts
+ *
+ * We have a lockless approach for tracking pending VCPU interrupts
+ * implemented using atomic bitops. The irqs_pending bitmap represent
+ * pending interrupts whereas irqs_pending_mask represent bits changed
+ * in irqs_pending. Our approach is modeled around multiple producer
+ * and single consumer problem where the consumer is the VCPU itself.
+ *
+ * DECLARE_BITMAP() is needed here to support 64 vCPU local interrupts
+ * on RV32 host.
+ */
+#define RISCV_VCPU_NR_IRQS 64
+ DECLARE_BITMAP(irqs_pending, RISCV_VCPU_NR_IRQS);
+ DECLARE_BITMAP(irqs_pending_mask, RISCV_VCPU_NR_IRQS);
} __cacheline_aligned;
struct paging_domain {
@@ -123,6 +139,9 @@ static inline void update_guest_memory_policy(struct vcpu *v,
static inline void arch_vcpu_block(struct vcpu *v) {}
+int vcpu_set_interrupt(struct vcpu *v, const unsigned int irq);
+int vcpu_unset_interrupt(struct vcpu *v, const unsigned int irq);
+
#endif /* ASM__RISCV__DOMAIN_H */
/*
diff --git a/xen/arch/riscv/include/asm/riscv_encoding.h b/xen/arch/riscv/include/asm/riscv_encoding.h
index dd15731a86fa..32d25f2d3e94 100644
--- a/xen/arch/riscv/include/asm/riscv_encoding.h
+++ b/xen/arch/riscv/include/asm/riscv_encoding.h
@@ -91,6 +91,7 @@
#define IRQ_M_EXT 11
#define IRQ_S_GEXT 12
#define IRQ_PMU_OVF 13
+#define IRQ_LOCAL_MAX (IRQ_PMU_OVF + 1)
#define MIP_SSIP (_UL(1) << IRQ_S_SOFT)
#define MIP_VSSIP (_UL(1) << IRQ_VS_SOFT)
--
2.52.0
^ permalink raw reply related [flat|nested] 93+ messages in thread* Re: [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1
2025-12-24 17:03 ` [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1 Oleksii Kurochko
@ 2026-01-07 16:28 ` Jan Beulich
2026-01-13 12:51 ` Oleksii Kurochko
2026-01-16 14:25 ` Oleksii Kurochko
0 siblings, 2 replies; 93+ messages in thread
From: Jan Beulich @ 2026-01-07 16:28 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 24.12.2025 18:03, Oleksii Kurochko wrote:
> This patch is based on Linux kernel 6.16.0.
>
> Introduce a lockless mechanism for tracking pending vCPU interrupts using
> atomic bit operations. The design follows a multi-producer, single-consumer
> model where the consumer is the vCPU itself.
>
> Two bitmaps are added:
> - irqs_pending — represents interrupts currently pending
> - irqs_pending_mask — represents bits that have changed in irqs_pending
>
> Introduce vcpu_(un)set_interrupt() to mark an interrupt in irqs_pending{_mask}
> bitmap(s) to notify vCPU that it has or no an interrupt.
It's not becoming clear how these are going to be used. It's also not clear
to me whether you really need to record these in software: Aren't there
(virtual) registers where they would be more naturally tracked, much like
hardware would do?
Furthermore, since you're dealing with two bitmaps, there's no full
atomicity here anyway. The bitmaps are each dealt with atomically, but
the overall update isn't atomic. Whether that's going to be okay can only
be told when also seeing the producer side.
> --- a/xen/arch/riscv/domain.c
> +++ b/xen/arch/riscv/domain.c
> @@ -5,9 +5,11 @@
> #include <xen/sched.h>
> #include <xen/smp.h>
>
> +#include <asm/bitops.h>
> #include <asm/cpufeature.h>
> #include <asm/csr.h>
> #include <asm/riscv_encoding.h>
> +#include <asm/system.h>
> #include <asm/vtimer.h>
>
> static void vcpu_csr_init(struct vcpu *v)
> @@ -100,6 +102,9 @@ int arch_vcpu_create(struct vcpu *v)
> if ( is_idle_vcpu(v) )
> return rc;
>
> + bitmap_zero(v->arch.irqs_pending, RISCV_VCPU_NR_IRQS);
> + bitmap_zero(v->arch.irqs_pending_mask, RISCV_VCPU_NR_IRQS);
This is pointless, as struct vcpu starts out all zero.
> @@ -135,3 +140,45 @@ void vcpu_kick(struct vcpu *v)
> smp_send_event_check_mask(cpumask_of(v->processor));
> }
> }
> +
> +int vcpu_set_interrupt(struct vcpu *v, const unsigned int irq)
> +{
> + /*
> + * We only allow VS-mode software, timer, and external
> + * interrupts when irq is one of the local interrupts
> + * defined by RISC-V privilege specification.
> + */
> + if ( irq < IRQ_LOCAL_MAX &&
What use is this? In particular this allows an incoming irq with a huge
number to ...
> + irq != IRQ_VS_SOFT &&
> + irq != IRQ_VS_TIMER &&
> + irq != IRQ_VS_EXT )
> + return -EINVAL;
> +
> + set_bit(irq, v->arch.irqs_pending);
> + smp_mb__before_atomic();
> + set_bit(irq, v->arch.irqs_pending_mask);
... overrun both bitmaps.
> --- a/xen/arch/riscv/include/asm/domain.h
> +++ b/xen/arch/riscv/include/asm/domain.h
> @@ -85,6 +85,22 @@ struct arch_vcpu
> register_t vstval;
> register_t vsatp;
> register_t vsepc;
> +
> + /*
> + * VCPU interrupts
> + *
> + * We have a lockless approach for tracking pending VCPU interrupts
> + * implemented using atomic bitops. The irqs_pending bitmap represent
> + * pending interrupts whereas irqs_pending_mask represent bits changed
> + * in irqs_pending.
And hence a set immediately followed by an unset is then indistinguishable
from just an unset (or the other way around). This may not be a problem, but
if it isn't, I think this needs explaining. Much like it is unclear why the
"changed" state needs tracking in the first place.
> Our approach is modeled around multiple producer
> + * and single consumer problem where the consumer is the VCPU itself.
> + *
> + * DECLARE_BITMAP() is needed here to support 64 vCPU local interrupts
> + * on RV32 host.
> + */
> +#define RISCV_VCPU_NR_IRQS 64
> + DECLARE_BITMAP(irqs_pending, RISCV_VCPU_NR_IRQS);
> + DECLARE_BITMAP(irqs_pending_mask, RISCV_VCPU_NR_IRQS);
> } __cacheline_aligned;
>
> struct paging_domain {
> @@ -123,6 +139,9 @@ static inline void update_guest_memory_policy(struct vcpu *v,
>
> static inline void arch_vcpu_block(struct vcpu *v) {}
>
> +int vcpu_set_interrupt(struct vcpu *v, const unsigned int irq);
> +int vcpu_unset_interrupt(struct vcpu *v, const unsigned int irq);
Why the const-s?
> --- a/xen/arch/riscv/include/asm/riscv_encoding.h
> +++ b/xen/arch/riscv/include/asm/riscv_encoding.h
> @@ -91,6 +91,7 @@
> #define IRQ_M_EXT 11
> #define IRQ_S_GEXT 12
> #define IRQ_PMU_OVF 13
> +#define IRQ_LOCAL_MAX (IRQ_PMU_OVF + 1)
MAX together with "+ 1" looks wrong. What is 14 (which, when MAX is 14,
must be a valid interrupt)? Or if 14 isn't a valid interrupt, please use
NR or NUM.
Also, nit: Padding doesn't match with the earlier #define-s (even if in the
quoted text it appears otherwise).
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1
2026-01-07 16:28 ` Jan Beulich
@ 2026-01-13 12:51 ` Oleksii Kurochko
2026-01-13 13:54 ` Jan Beulich
2026-01-16 14:25 ` Oleksii Kurochko
1 sibling, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-13 12:51 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/7/26 5:28 PM, Jan Beulich wrote:
> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>> This patch is based on Linux kernel 6.16.0.
>>
>> Introduce a lockless mechanism for tracking pending vCPU interrupts using
>> atomic bit operations. The design follows a multi-producer, single-consumer
>> model where the consumer is the vCPU itself.
>>
>> Two bitmaps are added:
>> - irqs_pending — represents interrupts currently pending
>> - irqs_pending_mask — represents bits that have changed in irqs_pending
>>
>> Introduce vcpu_(un)set_interrupt() to mark an interrupt in irqs_pending{_mask}
>> bitmap(s) to notify vCPU that it has or no an interrupt.
> It's not becoming clear how these are going to be used. It's also not clear
> to me whether you really need to record these in software: Aren't there
> (virtual) registers where they would be more naturally tracked, much like
> hardware would do?
Guest (virtual) registers are not used to inject interrupts on RISC-V; for that
purpose, the HVIP register is provided. Even without considering HVIP, using guest
(virtual) registers has a downside: if a bit in hideleg is zero, the corresponding
bit in VSIP is read-only zero. During a context_switch(), when CSRs are saved,
this means we would not obtain correct values, since some VSIP bits may read as
zero during csr_read().
In fact, this is one of the reasons why we want to track interrupts to be
injected separately. For example, a vtimer may expire while the vCPU is running
on a different pCPU, so we update vCPU->hvip while the vCPU is active elsewhere.
When the vCPU is later switched in during a context_switch(), we would lose the
fact that vCPU->hvip.vtimer was set to 1, because the CSR save function will do:
vCPU->hvip = csr_read(CSR_HVIP);
and the pending interrupt state would be overwritten.
>
> Furthermore, since you're dealing with two bitmaps, there's no full
> atomicity here anyway. The bitmaps are each dealt with atomically, but
> the overall update isn't atomic. Whether that's going to be okay can only
> be told when also seeing the producer side.
You're correct that the two-bitmap update isn't fully atomic, but this design
is intentional. Here [1], other is the part 2 of introduction of pending vCPU interrupts
and as it requires more stuff to introduce (for example, [2]) I decided not to
introduce it now with some stubs and introduce it when all will be ready for it.
If a producer is interrupted between updating the two bitmaps the worst case is:
vCPU might process stale state for one cycle, this is resolved on the next flush when
the mask indicates the bit changed. No interrupt is permanently lost or spuriously
generated.
[1] https://gitlab.com/xen-project/people/olkur/xen/-/commit/31022d515789a032fd994f9ca90965db089dbbd5
void vcpu_flush_interrupts(struct vcpu *v)
{
register_t *hvip = &v->arch.hvip;
unsigned long mask, val;
if ( ACCESS_ONCE(v->arch.irqs_pending_mask[0]) )
{
mask = xchg(&v->arch.irqs_pending_mask[0], 0UL);
val = ACCESS_ONCE(v->arch.irqs_pending[0]) & mask;
*hvip &= ~mask;
*hvip |= val;
}
/* Flush AIA high interrupts */
vcpu_aia_flush_interrupts(v);
vcpu_update_hvip(v);
}
void vcpu_sync_interrupts(struct vcpu *v)
{
unsigned long hvip;
/* Read current HVIP and VSIE CSRs */
v->arch.vsie = csr_read(CSR_VSIE);
/* Sync-up HVIP.VSSIP bit changes does by Guest */
hvip = csr_read(CSR_HVIP);
if ( (v->arch.hvip ^ hvip) & BIT(IRQ_VS_SOFT, UL) )
{
if ( hvip & BIT(IRQ_VS_SOFT, UL) )
{
if ( !test_and_set_bit(IRQ_VS_SOFT,
&v->arch.irqs_pending_mask) )
set_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
}
else
{
if ( !test_and_set_bit(IRQ_VS_SOFT,
&v->arch.irqs_pending_mask) )
clear_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
}
}
/* Sync-up AIA high interrupts */
vcpu_aia_sync_interrupts(v);
/* Sync-up timer CSRs */
vtimer_sync(v);
}
[2] https://gitlab.com/xen-project/people/olkur/xen/-/commit/1c06b8b1d1eadfe009a4d6b1a1902fac64d080e9
>
>> --- a/xen/arch/riscv/domain.c
>> +++ b/xen/arch/riscv/domain.c
>> @@ -5,9 +5,11 @@
>> #include <xen/sched.h>
>> #include <xen/smp.h>
>>
>> +#include <asm/bitops.h>
>> #include <asm/cpufeature.h>
>> #include <asm/csr.h>
>> #include <asm/riscv_encoding.h>
>> +#include <asm/system.h>
>> #include <asm/vtimer.h>
>>
>> static void vcpu_csr_init(struct vcpu *v)
>> @@ -100,6 +102,9 @@ int arch_vcpu_create(struct vcpu *v)
>> if ( is_idle_vcpu(v) )
>> return rc;
>>
>> + bitmap_zero(v->arch.irqs_pending, RISCV_VCPU_NR_IRQS);
>> + bitmap_zero(v->arch.irqs_pending_mask, RISCV_VCPU_NR_IRQS);
> This is pointless, as struct vcpu starts out all zero.
>
>> @@ -135,3 +140,45 @@ void vcpu_kick(struct vcpu *v)
>> smp_send_event_check_mask(cpumask_of(v->processor));
>> }
>> }
>> +
>> +int vcpu_set_interrupt(struct vcpu *v, const unsigned int irq)
>> +{
>> + /*
>> + * We only allow VS-mode software, timer, and external
>> + * interrupts when irq is one of the local interrupts
>> + * defined by RISC-V privilege specification.
>> + */
>> + if ( irq < IRQ_LOCAL_MAX &&
> What use is this? In particular this allows an incoming irq with a huge
> number to ...
>
>> + irq != IRQ_VS_SOFT &&
>> + irq != IRQ_VS_TIMER &&
>> + irq != IRQ_VS_EXT )
>> + return -EINVAL;
>> +
>> + set_bit(irq, v->arch.irqs_pending);
>> + smp_mb__before_atomic();
>> + set_bit(irq, v->arch.irqs_pending_mask);
> ... overrun both bitmaps.
Agree, it would be better just to drop "irq < IRQ_LOCAL_MAX &&".
>
>> --- a/xen/arch/riscv/include/asm/domain.h
>> +++ b/xen/arch/riscv/include/asm/domain.h
>> @@ -85,6 +85,22 @@ struct arch_vcpu
>> register_t vstval;
>> register_t vsatp;
>> register_t vsepc;
>> +
>> + /*
>> + * VCPU interrupts
>> + *
>> + * We have a lockless approach for tracking pending VCPU interrupts
>> + * implemented using atomic bitops. The irqs_pending bitmap represent
>> + * pending interrupts whereas irqs_pending_mask represent bits changed
>> + * in irqs_pending.
> And hence a set immediately followed by an unset is then indistinguishable
> from just an unset (or the other way around).
I think it is distinguishable with the combination of irqs_pending_mask.
> This may not be a problem, but
> if it isn't, I think this needs explaining. Much like it is unclear why the
> "changed" state needs tracking in the first place.
It is needed to track which bits are changed, irqs_pending only represents
the current state of pending interrupts.CPU might want to react to changes
rather than the absolute state.
Example:
- If CPU 0 sets an interrupt, CPU 1 needs to notice “something changed”
to inject it into the VCPU.
- If CPU 0 sets and then clears the bit before CPU 1 reads it,
irqs_pending alone shows 0, the transition is lost.
By maintaining irqs_pending_mask, you can detect “this bit changed
recently,” even if the final state is 0.
Also, having irqs_pending_mask allows to flush interrupts without lock:
if ( ACCESS_ONCE(v->arch.irqs_pending_mask[0]) )
{
mask = xchg(&v->arch.irqs_pending_mask[0], 0UL);
val = ACCESS_ONCE(v->arch.irqs_pending[0]) & mask;
*hvip &= ~mask;
*hvip |= val;
}
Without it I assume that we should have spinlcok around access to irqs_pending.
>
>> Our approach is modeled around multiple producer
>> + * and single consumer problem where the consumer is the VCPU itself.
>> + *
>> + * DECLARE_BITMAP() is needed here to support 64 vCPU local interrupts
>> + * on RV32 host.
>> + */
>> +#define RISCV_VCPU_NR_IRQS 64
>> + DECLARE_BITMAP(irqs_pending, RISCV_VCPU_NR_IRQS);
>> + DECLARE_BITMAP(irqs_pending_mask, RISCV_VCPU_NR_IRQS);
>> } __cacheline_aligned;
>>
>> struct paging_domain {
>> @@ -123,6 +139,9 @@ static inline void update_guest_memory_policy(struct vcpu *v,
>>
>> static inline void arch_vcpu_block(struct vcpu *v) {}
>>
>> +int vcpu_set_interrupt(struct vcpu *v, const unsigned int irq);
>> +int vcpu_unset_interrupt(struct vcpu *v, const unsigned int irq);
> Why the const-s?
As irq number isn't going to be changed inside these functions.
>
>> --- a/xen/arch/riscv/include/asm/riscv_encoding.h
>> +++ b/xen/arch/riscv/include/asm/riscv_encoding.h
>> @@ -91,6 +91,7 @@
>> #define IRQ_M_EXT 11
>> #define IRQ_S_GEXT 12
>> #define IRQ_PMU_OVF 13
>> +#define IRQ_LOCAL_MAX (IRQ_PMU_OVF + 1)
> MAX together with "+ 1" looks wrong. What is 14 (which, when MAX is 14,
> must be a valid interrupt)? Or if 14 isn't a valid interrupt, please use
> NR or NUM.
I didn’t fully understand your idea. Are you suggesting having|IRQ_LOCAL_NR|?
That sounds unclear, as it’s not obvious what it would represent.
Using|MAX_HART| seems better, since it represents the maximum number allowed
for a local interrupt. Any IRQ below that value is considered local, while
values above it are implementation-specific interrupts.
> Also, nit: Padding doesn't match with the earlier #define-s (even if in the
> quoted text it appears otherwise).
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1
2026-01-13 12:51 ` Oleksii Kurochko
@ 2026-01-13 13:54 ` Jan Beulich
2026-01-14 15:39 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-13 13:54 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 13.01.2026 13:51, Oleksii Kurochko wrote:
> On 1/7/26 5:28 PM, Jan Beulich wrote:
>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>> --- a/xen/arch/riscv/include/asm/domain.h
>>> +++ b/xen/arch/riscv/include/asm/domain.h
>>> @@ -85,6 +85,22 @@ struct arch_vcpu
>>> register_t vstval;
>>> register_t vsatp;
>>> register_t vsepc;
>>> +
>>> + /*
>>> + * VCPU interrupts
>>> + *
>>> + * We have a lockless approach for tracking pending VCPU interrupts
>>> + * implemented using atomic bitops. The irqs_pending bitmap represent
>>> + * pending interrupts whereas irqs_pending_mask represent bits changed
>>> + * in irqs_pending.
>> And hence a set immediately followed by an unset is then indistinguishable
>> from just an unset (or the other way around).
>
> I think it is distinguishable with the combination of irqs_pending_mask.
No. The set mask bit tells you that there was a change. But irqs_pending[]
records only the most recent set / clear.
>> This may not be a problem, but
>> if it isn't, I think this needs explaining. Much like it is unclear why the
>> "changed" state needs tracking in the first place.
>
> It is needed to track which bits are changed, irqs_pending only represents
> the current state of pending interrupts.CPU might want to react to changes
> rather than the absolute state.
>
> Example:
> - If CPU 0 sets an interrupt, CPU 1 needs to notice “something changed”
> to inject it into the VCPU.
> - If CPU 0 sets and then clears the bit before CPU 1 reads it,
> irqs_pending alone shows 0, the transition is lost.
The fact there was any number of transitions is recorded in _mask[], yes,
but "the transition" was still lost if we consider the "set" in your
example in isolation. And it's not quite clear to me what's interesting
about a 0 -> 0 transition. (On x86, such a lost 0 -> 1 transition, i.e.
one followed directly by a 1 -> 0 one, would result in a "spurious
interrupt": There would be an indication that there was a lost interrupt
without there being a way to know which one it was.)
> By maintaining irqs_pending_mask, you can detect “this bit changed
> recently,” even if the final state is 0.
>
> Also, having irqs_pending_mask allows to flush interrupts without lock:
> if ( ACCESS_ONCE(v->arch.irqs_pending_mask[0]) )
> {
> mask = xchg(&v->arch.irqs_pending_mask[0], 0UL);
> val = ACCESS_ONCE(v->arch.irqs_pending[0]) & mask;
>
> *hvip &= ~mask;
> *hvip |= val;
> }
> Without it I assume that we should have spinlcok around access to irqs_pending.
Ah yes, this would indeed be a benefit. Just that it's not quite clear to
me:
*hvip |= xchg(&v->arch.irqs_pending[0], 0UL);
wouldn't require a lock either. What may be confusing me is that you put
things as if it was normal to see 1 -> 0 transitions from (virtual)
hardware, when I (with my x86 background) would expect 1 -> 0 transitions
to only occur due to software actions (End Of Interrupt), unless - see
above - something malfunctioned and an interrupt was lost. That (the 1 ->
0 transitions) could be (guest) writes to SVIP, for example.
Talking of which - do you really mean HVIP in the code you provided, not
VSVIP? So far I my understanding was that HVIP would be recording the
interrupts the hypervisor itself has pending (and needs to service).
>>> Our approach is modeled around multiple producer
>>> + * and single consumer problem where the consumer is the VCPU itself.
>>> + *
>>> + * DECLARE_BITMAP() is needed here to support 64 vCPU local interrupts
>>> + * on RV32 host.
>>> + */
>>> +#define RISCV_VCPU_NR_IRQS 64
>>> + DECLARE_BITMAP(irqs_pending, RISCV_VCPU_NR_IRQS);
>>> + DECLARE_BITMAP(irqs_pending_mask, RISCV_VCPU_NR_IRQS);
>>> } __cacheline_aligned;
>>>
>>> struct paging_domain {
>>> @@ -123,6 +139,9 @@ static inline void update_guest_memory_policy(struct vcpu *v,
>>>
>>> static inline void arch_vcpu_block(struct vcpu *v) {}
>>>
>>> +int vcpu_set_interrupt(struct vcpu *v, const unsigned int irq);
>>> +int vcpu_unset_interrupt(struct vcpu *v, const unsigned int irq);
>> Why the const-s?
>
> As irq number isn't going to be changed inside these functions.
You realize though that we don't normally use const like this? This
use of qualifiers is meaningless to callers, and of limited meaning to
the function definition itself. There can be exceptions of course, when
it is important to clarify that a parameter must not change throughout
the function.
>>> --- a/xen/arch/riscv/include/asm/riscv_encoding.h
>>> +++ b/xen/arch/riscv/include/asm/riscv_encoding.h
>>> @@ -91,6 +91,7 @@
>>> #define IRQ_M_EXT 11
>>> #define IRQ_S_GEXT 12
>>> #define IRQ_PMU_OVF 13
>>> +#define IRQ_LOCAL_MAX (IRQ_PMU_OVF + 1)
>> MAX together with "+ 1" looks wrong. What is 14 (which, when MAX is 14,
>> must be a valid interrupt)? Or if 14 isn't a valid interrupt, please use
>> NR or NUM.
>
> I didn’t fully understand your idea. Are you suggesting having|IRQ_LOCAL_NR|?
> That sounds unclear, as it’s not obvious what it would represent.
> Using|MAX_HART| seems better, since it represents the maximum number allowed
> for a local interrupt. Any IRQ below that value is considered local, while
> values above it are implementation-specific interrupts.
Not quite. If you say "max", anything below _or equal_ that value is
valid / covered. When you say "num", anything below that value is
valid / covered. That is, "max" is inclusive for the upper bound of
the range, while "num" is exclusive. Hence my question whether 14 is
a valid local interrupt.
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1
2026-01-13 13:54 ` Jan Beulich
@ 2026-01-14 15:39 ` Oleksii Kurochko
2026-01-14 15:56 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-14 15:39 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/13/26 2:54 PM, Jan Beulich wrote:
> On 13.01.2026 13:51, Oleksii Kurochko wrote:
>> On 1/7/26 5:28 PM, Jan Beulich wrote:
>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>> --- a/xen/arch/riscv/include/asm/domain.h
>>>> +++ b/xen/arch/riscv/include/asm/domain.h
>>>> @@ -85,6 +85,22 @@ struct arch_vcpu
>>>> register_t vstval;
>>>> register_t vsatp;
>>>> register_t vsepc;
>>>> +
>>>> + /*
>>>> + * VCPU interrupts
>>>> + *
>>>> + * We have a lockless approach for tracking pending VCPU interrupts
>>>> + * implemented using atomic bitops. The irqs_pending bitmap represent
>>>> + * pending interrupts whereas irqs_pending_mask represent bits changed
>>>> + * in irqs_pending.
>>> And hence a set immediately followed by an unset is then indistinguishable
>>> from just an unset (or the other way around).
>> I think it is distinguishable with the combination of irqs_pending_mask.
> No. The set mask bit tells you that there was a change. But irqs_pending[]
> records only the most recent set / clear.
>
>>> This may not be a problem, but
>>> if it isn't, I think this needs explaining. Much like it is unclear why the
>>> "changed" state needs tracking in the first place.
>> It is needed to track which bits are changed, irqs_pending only represents
>> the current state of pending interrupts.CPU might want to react to changes
>> rather than the absolute state.
>>
>> Example:
>> - If CPU 0 sets an interrupt, CPU 1 needs to notice “something changed”
>> to inject it into the VCPU.
>> - If CPU 0 sets and then clears the bit before CPU 1 reads it,
>> irqs_pending alone shows 0, the transition is lost.
> The fact there was any number of transitions is recorded in _mask[], yes,
> but "the transition" was still lost if we consider the "set" in your
> example in isolation. And it's not quite clear to me what's interesting
> about a 0 -> 0 transition. (On x86, such a lost 0 -> 1 transition, i.e.
> one followed directly by a 1 -> 0 one, would result in a "spurious
> interrupt": There would be an indication that there was a lost interrupt
> without there being a way to know which one it was.)
IIUC, in this reply you are talking about when the contents written to the
irq_pending and irqs_pending_mask bitmaps are flushed to the hardware
registers.
Originally, I understood your question to be about the case where
vcpu_set_interrupt() is called and then vcpu_unset_interrupt() is called.
I am trying to understand whether such a scenario is possible.
Let’s take the vtimer as an example. vcpu_set_interrupt(t->v, IRQ_VS_TIMER)
is not called again until vcpu_unset_interrupt(t->v, IRQ_VS_TIMER) and
set_timer() are called in vtimer_set_timer().
The opposite situation is not possible: it cannot happen that
vcpu_set_interrupt(t->v, IRQ_VS_TIMER) is called and then immediately
vcpu_unset_interrupt(t->v, IRQ_VS_TIMER) is called, because
vcpu_unset_interrupt() and set_timer() are only invoked when the guest
has handled the timer interrupt and requested a new one.
So if no interrupt flush is happening, the vcpu_set_interrupt() →
vcpu_unset_interrupt() sequence will only update the irq_pending and
irqs_pending_mask bitmaps, without touching the hardware registers,
so no spurious interrupt will occur. And if an interrupt flush does
happen, it is not possible to have a 1 -> 0 transition due to the call
sequence I mentioned in the last two paragraphs above.
>
>> By maintaining irqs_pending_mask, you can detect “this bit changed
>> recently,” even if the final state is 0.
>>
>> Also, having irqs_pending_mask allows to flush interrupts without lock:
>> if ( ACCESS_ONCE(v->arch.irqs_pending_mask[0]) )
>> {
>> mask = xchg(&v->arch.irqs_pending_mask[0], 0UL);
>> val = ACCESS_ONCE(v->arch.irqs_pending[0]) & mask;
>>
>> *hvip &= ~mask;
>> *hvip |= val;
>> }
>> Without it I assume that we should have spinlcok around access to irqs_pending.
> Ah yes, this would indeed be a benefit. Just that it's not quite clear to
> me:
>
> *hvip |= xchg(&v->arch.irqs_pending[0], 0UL);
>
> wouldn't require a lock either
Because vCPU's hvip (which is stored on the stack) can't be changed concurrently
and it's almost the one place in the code where vCPU->hvip is changed. Another
place it is save_csrs() during context switch but it can't be called in parallel
with the vcpu_sync_interrupts() (look below).
> . What may be confusing me is that you put
> things as if it was normal to see 1 -> 0 transitions from (virtual)
> hardware, when I (with my x86 background) would expect 1 -> 0 transitions
> to only occur due to software actions (End Of Interrupt), unless - see
> above - something malfunctioned and an interrupt was lost. That (the 1 ->
> 0 transitions) could be (guest) writes to SVIP, for example.
>
> Talking of which - do you really mean HVIP in the code you provided, not
> VSVIP? So far I my understanding was that HVIP would be recording the
> interrupts the hypervisor itself has pending (and needs to service).
HVIP is correct to use here, HVIP is used to indicate virtual interrupts
intended for VS-mode. And I think you confused HVIP with the HIP register
which supplements the standard supervisor-level SIP register to indicate
pending virtual supervisor (VS-level) interrupts and hypervisor-specific
interrupts.
If a guest will do "That (the 1 -> 0 transitions) could be (guest) writes
to SVIP, for example." then the correspondent HVIP (and HIP as usually
they are aliasis of HVIP) bits will be updated. And that is why we need
vcpu_sync_interrupts() I've mentioned in one of replies and sync VSSIP:
+void vcpu_sync_interrupts(struct vcpu *v)
+{
+ unsigned long hvip;
+
+ /* Read current HVIP and VSIE CSRs */
+ v->arch.vsie = csr_read(CSR_VSIE);
+
+ /* Sync-up HVIP.VSSIP bit changes does by Guest */
+ hvip = csr_read(CSR_HVIP);
+ if ( (v->arch.hvip ^ hvip) & BIT(IRQ_VS_SOFT, UL) )
+ {
+ if ( hvip & BIT(IRQ_VS_SOFT, UL) )
+ {
+ if ( !test_and_set_bit(IRQ_VS_SOFT,
+ &v->arch.irqs_pending_mask) )
+ set_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
+ }
+ else
+ {
+ if ( !test_and_set_bit(IRQ_VS_SOFT,
+ &v->arch.irqs_pending_mask) )
+ clear_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
+ }
+ }
+
+ /* Sync-up AIA high interrupts */
+ vcpu_aia_sync_interrupts(v);
+
+ /* Sync-up timer CSRs */
+ vtimer_sync(v);
+}
>
>>>> Our approach is modeled around multiple producer
>>>> + * and single consumer problem where the consumer is the VCPU itself.
>>>> + *
>>>> + * DECLARE_BITMAP() is needed here to support 64 vCPU local interrupts
>>>> + * on RV32 host.
>>>> + */
>>>> +#define RISCV_VCPU_NR_IRQS 64
>>>> + DECLARE_BITMAP(irqs_pending, RISCV_VCPU_NR_IRQS);
>>>> + DECLARE_BITMAP(irqs_pending_mask, RISCV_VCPU_NR_IRQS);
>>>> } __cacheline_aligned;
>>>>
>>>> struct paging_domain {
>>>> @@ -123,6 +139,9 @@ static inline void update_guest_memory_policy(struct vcpu *v,
>>>>
>>>> static inline void arch_vcpu_block(struct vcpu *v) {}
>>>>
>>>> +int vcpu_set_interrupt(struct vcpu *v, const unsigned int irq);
>>>> +int vcpu_unset_interrupt(struct vcpu *v, const unsigned int irq);
>>> Why the const-s?
>> As irq number isn't going to be changed inside these functions.
> You realize though that we don't normally use const like this? This
> use of qualifiers is meaningless to callers, and of limited meaning to
> the function definition itself. There can be exceptions of course, when
> it is important to clarify that a parameter must not change throughout
> the function.
>
>>>> --- a/xen/arch/riscv/include/asm/riscv_encoding.h
>>>> +++ b/xen/arch/riscv/include/asm/riscv_encoding.h
>>>> @@ -91,6 +91,7 @@
>>>> #define IRQ_M_EXT 11
>>>> #define IRQ_S_GEXT 12
>>>> #define IRQ_PMU_OVF 13
>>>> +#define IRQ_LOCAL_MAX (IRQ_PMU_OVF + 1)
>>> MAX together with "+ 1" looks wrong. What is 14 (which, when MAX is 14,
>>> must be a valid interrupt)? Or if 14 isn't a valid interrupt, please use
>>> NR or NUM.
>> I didn’t fully understand your idea. Are you suggesting having|IRQ_LOCAL_NR|?
>> That sounds unclear, as it’s not obvious what it would represent.
>> Using|MAX_HART| seems better, since it represents the maximum number allowed
>> for a local interrupt. Any IRQ below that value is considered local, while
>> values above it are implementation-specific interrupts.
> Not quite. If you say "max", anything below _or equal_ that value is
> valid / covered. When you say "num", anything below that value is
> valid / covered. That is, "max" is inclusive for the upper bound of
> the range, while "num" is exclusive. Hence my question whether 14 is
> a valid local interrupt.
14 is architecturally classified as a local interrupt, but its specific
function is currently reserved.
Intention was to cover standard portion (bits 15:0) of sip for which bits
15 and 14 are 0 as they are reserved, so it seems like NUM could be used here.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1
2026-01-14 15:39 ` Oleksii Kurochko
@ 2026-01-14 15:56 ` Jan Beulich
2026-01-15 9:14 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-14 15:56 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 14.01.2026 16:39, Oleksii Kurochko wrote:
> On 1/13/26 2:54 PM, Jan Beulich wrote:
>> On 13.01.2026 13:51, Oleksii Kurochko wrote:
>>> On 1/7/26 5:28 PM, Jan Beulich wrote:
>>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>>> --- a/xen/arch/riscv/include/asm/domain.h
>>>>> +++ b/xen/arch/riscv/include/asm/domain.h
>>>>> @@ -85,6 +85,22 @@ struct arch_vcpu
>>>>> register_t vstval;
>>>>> register_t vsatp;
>>>>> register_t vsepc;
>>>>> +
>>>>> + /*
>>>>> + * VCPU interrupts
>>>>> + *
>>>>> + * We have a lockless approach for tracking pending VCPU interrupts
>>>>> + * implemented using atomic bitops. The irqs_pending bitmap represent
>>>>> + * pending interrupts whereas irqs_pending_mask represent bits changed
>>>>> + * in irqs_pending.
>>>> And hence a set immediately followed by an unset is then indistinguishable
>>>> from just an unset (or the other way around).
>>> I think it is distinguishable with the combination of irqs_pending_mask.
>> No. The set mask bit tells you that there was a change. But irqs_pending[]
>> records only the most recent set / clear.
>>
>>>> This may not be a problem, but
>>>> if it isn't, I think this needs explaining. Much like it is unclear why the
>>>> "changed" state needs tracking in the first place.
>>> It is needed to track which bits are changed, irqs_pending only represents
>>> the current state of pending interrupts.CPU might want to react to changes
>>> rather than the absolute state.
>>>
>>> Example:
>>> - If CPU 0 sets an interrupt, CPU 1 needs to notice “something changed”
>>> to inject it into the VCPU.
>>> - If CPU 0 sets and then clears the bit before CPU 1 reads it,
>>> irqs_pending alone shows 0, the transition is lost.
>> The fact there was any number of transitions is recorded in _mask[], yes,
>> but "the transition" was still lost if we consider the "set" in your
>> example in isolation. And it's not quite clear to me what's interesting
>> about a 0 -> 0 transition. (On x86, such a lost 0 -> 1 transition, i.e.
>> one followed directly by a 1 -> 0 one, would result in a "spurious
>> interrupt": There would be an indication that there was a lost interrupt
>> without there being a way to know which one it was.)
>
> IIUC, in this reply you are talking about when the contents written to the
> irq_pending and irqs_pending_mask bitmaps are flushed to the hardware
> registers.
>
> Originally, I understood your question to be about the case where
> vcpu_set_interrupt() is called and then vcpu_unset_interrupt() is called.
I was actually asking in more abstract terms. And I was assuming there
would be pretty direct ways for the guest to have vcpu_{,un}set_interrupt()
invoked. Looks like ...
> I am trying to understand whether such a scenario is possible.
>
> Let’s take the vtimer as an example. vcpu_set_interrupt(t->v, IRQ_VS_TIMER)
> is not called again until vcpu_unset_interrupt(t->v, IRQ_VS_TIMER) and
> set_timer() are called in vtimer_set_timer().
>
> The opposite situation is not possible: it cannot happen that
> vcpu_set_interrupt(t->v, IRQ_VS_TIMER) is called and then immediately
> vcpu_unset_interrupt(t->v, IRQ_VS_TIMER) is called, because
> vcpu_unset_interrupt() and set_timer() are only invoked when the guest
> has handled the timer interrupt and requested a new one.
>
> So if no interrupt flush is happening, the vcpu_set_interrupt() →
> vcpu_unset_interrupt() sequence will only update the irq_pending and
> irqs_pending_mask bitmaps, without touching the hardware registers,
> so no spurious interrupt will occur. And if an interrupt flush does
> happen, it is not possible to have a 1 -> 0 transition due to the call
> sequence I mentioned in the last two paragraphs above.
... that wasn't a correct assumption. (Partly attributed to the patch
series leaving out a number of relevant things, which makes it hard to
guess what else is left out.)
>>> By maintaining irqs_pending_mask, you can detect “this bit changed
>>> recently,” even if the final state is 0.
>>>
>>> Also, having irqs_pending_mask allows to flush interrupts without lock:
>>> if ( ACCESS_ONCE(v->arch.irqs_pending_mask[0]) )
>>> {
>>> mask = xchg(&v->arch.irqs_pending_mask[0], 0UL);
>>> val = ACCESS_ONCE(v->arch.irqs_pending[0]) & mask;
>>>
>>> *hvip &= ~mask;
>>> *hvip |= val;
>>> }
>>> Without it I assume that we should have spinlcok around access to irqs_pending.
>> Ah yes, this would indeed be a benefit. Just that it's not quite clear to
>> me:
>>
>> *hvip |= xchg(&v->arch.irqs_pending[0], 0UL);
>>
>> wouldn't require a lock either
>
> Because vCPU's hvip (which is stored on the stack) can't be changed concurrently
> and it's almost the one place in the code where vCPU->hvip is changed. Another
> place it is save_csrs() during context switch but it can't be called in parallel
> with the vcpu_sync_interrupts() (look below).
>
>> . What may be confusing me is that you put
>> things as if it was normal to see 1 -> 0 transitions from (virtual)
>> hardware, when I (with my x86 background) would expect 1 -> 0 transitions
>> to only occur due to software actions (End Of Interrupt), unless - see
>> above - something malfunctioned and an interrupt was lost. That (the 1 ->
>> 0 transitions) could be (guest) writes to SVIP, for example.
>>
>> Talking of which - do you really mean HVIP in the code you provided, not
>> VSVIP? So far I my understanding was that HVIP would be recording the
>> interrupts the hypervisor itself has pending (and needs to service).
>
> HVIP is correct to use here, HVIP is used to indicate virtual interrupts
> intended for VS-mode. And I think you confused HVIP with the HIP register
> which supplements the standard supervisor-level SIP register to indicate
> pending virtual supervisor (VS-level) interrupts and hypervisor-specific
> interrupts.
>
> If a guest will do "That (the 1 -> 0 transitions) could be (guest) writes
> to SVIP, for example." then the correspondent HVIP (and HIP as usually
> they are aliasis of HVIP) bits will be updated. And that is why we need
> vcpu_sync_interrupts() I've mentioned in one of replies and sync VSSIP:
> +void vcpu_sync_interrupts(struct vcpu *v)
> +{
> + unsigned long hvip;
> +
> + /* Read current HVIP and VSIE CSRs */
> + v->arch.vsie = csr_read(CSR_VSIE);
> +
> + /* Sync-up HVIP.VSSIP bit changes does by Guest */
> + hvip = csr_read(CSR_HVIP);
> + if ( (v->arch.hvip ^ hvip) & BIT(IRQ_VS_SOFT, UL) )
> + {
> + if ( hvip & BIT(IRQ_VS_SOFT, UL) )
> + {
> + if ( !test_and_set_bit(IRQ_VS_SOFT,
> + &v->arch.irqs_pending_mask) )
> + set_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
> + }
> + else
> + {
> + if ( !test_and_set_bit(IRQ_VS_SOFT,
> + &v->arch.irqs_pending_mask) )
> + clear_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
> + }
> + }
I fear I don't understand this at all. Why would the guest having set a
pending bit not result in the IRQ to be marked pending? You can't know
whether that guest write happened before or after you last touched
.irqs_pending{,mask}[]? Yet that pair of bit arrays is supposed to be
tracking the most recent update (according to how I understood earlier
explanations of yours).
As an aside - the !test_and_set_bit() can be pulled out, to the outermost
if().
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1
2026-01-14 15:56 ` Jan Beulich
@ 2026-01-15 9:14 ` Oleksii Kurochko
2026-01-15 9:52 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-15 9:14 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/14/26 4:56 PM, Jan Beulich wrote:
> On 14.01.2026 16:39, Oleksii Kurochko wrote:
>> On 1/13/26 2:54 PM, Jan Beulich wrote:
>>> On 13.01.2026 13:51, Oleksii Kurochko wrote:
>>>> On 1/7/26 5:28 PM, Jan Beulich wrote:
>>>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>>>> --- a/xen/arch/riscv/include/asm/domain.h
>>>>>> +++ b/xen/arch/riscv/include/asm/domain.h
>>>>>> @@ -85,6 +85,22 @@ struct arch_vcpu
>>>>>> register_t vstval;
>>>>>> register_t vsatp;
>>>>>> register_t vsepc;
>>>>>> +
>>>>>> + /*
>>>>>> + * VCPU interrupts
>>>>>> + *
>>>>>> + * We have a lockless approach for tracking pending VCPU interrupts
>>>>>> + * implemented using atomic bitops. The irqs_pending bitmap represent
>>>>>> + * pending interrupts whereas irqs_pending_mask represent bits changed
>>>>>> + * in irqs_pending.
>>>>> And hence a set immediately followed by an unset is then indistinguishable
>>>>> from just an unset (or the other way around).
>>>> I think it is distinguishable with the combination of irqs_pending_mask.
>>> No. The set mask bit tells you that there was a change. But irqs_pending[]
>>> records only the most recent set / clear.
>>>
>>>>> This may not be a problem, but
>>>>> if it isn't, I think this needs explaining. Much like it is unclear why the
>>>>> "changed" state needs tracking in the first place.
>>>> It is needed to track which bits are changed, irqs_pending only represents
>>>> the current state of pending interrupts.CPU might want to react to changes
>>>> rather than the absolute state.
>>>>
>>>> Example:
>>>> - If CPU 0 sets an interrupt, CPU 1 needs to notice “something changed”
>>>> to inject it into the VCPU.
>>>> - If CPU 0 sets and then clears the bit before CPU 1 reads it,
>>>> irqs_pending alone shows 0, the transition is lost.
>>> The fact there was any number of transitions is recorded in _mask[], yes,
>>> but "the transition" was still lost if we consider the "set" in your
>>> example in isolation. And it's not quite clear to me what's interesting
>>> about a 0 -> 0 transition. (On x86, such a lost 0 -> 1 transition, i.e.
>>> one followed directly by a 1 -> 0 one, would result in a "spurious
>>> interrupt": There would be an indication that there was a lost interrupt
>>> without there being a way to know which one it was.)
>> IIUC, in this reply you are talking about when the contents written to the
>> irq_pending and irqs_pending_mask bitmaps are flushed to the hardware
>> registers.
>>
>> Originally, I understood your question to be about the case where
>> vcpu_set_interrupt() is called and then vcpu_unset_interrupt() is called.
> I was actually asking in more abstract terms. And I was assuming there
> would be pretty direct ways for the guest to have vcpu_{,un}set_interrupt()
> invoked. Looks like ...
>
>> I am trying to understand whether such a scenario is possible.
>>
>> Let’s take the vtimer as an example. vcpu_set_interrupt(t->v, IRQ_VS_TIMER)
>> is not called again until vcpu_unset_interrupt(t->v, IRQ_VS_TIMER) and
>> set_timer() are called in vtimer_set_timer().
>>
>> The opposite situation is not possible: it cannot happen that
>> vcpu_set_interrupt(t->v, IRQ_VS_TIMER) is called and then immediately
>> vcpu_unset_interrupt(t->v, IRQ_VS_TIMER) is called, because
>> vcpu_unset_interrupt() and set_timer() are only invoked when the guest
>> has handled the timer interrupt and requested a new one.
>>
>> So if no interrupt flush is happening, the vcpu_set_interrupt() →
>> vcpu_unset_interrupt() sequence will only update the irq_pending and
>> irqs_pending_mask bitmaps, without touching the hardware registers,
>> so no spurious interrupt will occur. And if an interrupt flush does
>> happen, it is not possible to have a 1 -> 0 transition due to the call
>> sequence I mentioned in the last two paragraphs above.
> ... that wasn't a correct assumption. (Partly attributed to the patch
> series leaving out a number of relevant things, which makes it hard to
> guess what else is left out.)
Then it makes sense to introduce second part of pending interrupts tracking
as part of this patch series in the next version.
Or for now not introduce tracking of pending vCPU interrupts and implement
vtimer expired handler as:
csr_set(CSR_HVIP, IRQ_VS_TIMER);
vcpu->hvip = csr_read(CSR_HVIP);
>>>> By maintaining irqs_pending_mask, you can detect “this bit changed
>>>> recently,” even if the final state is 0.
>>>>
>>>> Also, having irqs_pending_mask allows to flush interrupts without lock:
>>>> if ( ACCESS_ONCE(v->arch.irqs_pending_mask[0]) )
>>>> {
>>>> mask = xchg(&v->arch.irqs_pending_mask[0], 0UL);
>>>> val = ACCESS_ONCE(v->arch.irqs_pending[0]) & mask;
>>>>
>>>> *hvip &= ~mask;
>>>> *hvip |= val;
>>>> }
>>>> Without it I assume that we should have spinlcok around access to irqs_pending.
>>> Ah yes, this would indeed be a benefit. Just that it's not quite clear to
>>> me:
>>>
>>> *hvip |= xchg(&v->arch.irqs_pending[0], 0UL);
>>>
>>> wouldn't require a lock either
>> Because vCPU's hvip (which is stored on the stack) can't be changed concurrently
>> and it's almost the one place in the code where vCPU->hvip is changed. Another
>> place it is save_csrs() during context switch but it can't be called in parallel
>> with the vcpu_sync_interrupts() (look below).
>>
>>> . What may be confusing me is that you put
>>> things as if it was normal to see 1 -> 0 transitions from (virtual)
>>> hardware, when I (with my x86 background) would expect 1 -> 0 transitions
>>> to only occur due to software actions (End Of Interrupt), unless - see
>>> above - something malfunctioned and an interrupt was lost. That (the 1 ->
>>> 0 transitions) could be (guest) writes to SVIP, for example.
>>>
>>> Talking of which - do you really mean HVIP in the code you provided, not
>>> VSVIP? So far I my understanding was that HVIP would be recording the
>>> interrupts the hypervisor itself has pending (and needs to service).
>> HVIP is correct to use here, HVIP is used to indicate virtual interrupts
>> intended for VS-mode. And I think you confused HVIP with the HIP register
>> which supplements the standard supervisor-level SIP register to indicate
>> pending virtual supervisor (VS-level) interrupts and hypervisor-specific
>> interrupts.
>>
>> If a guest will do "That (the 1 -> 0 transitions) could be (guest) writes
>> to SVIP, for example." then the correspondent HVIP (and HIP as usually
>> they are aliasis of HVIP) bits will be updated. And that is why we need
>> vcpu_sync_interrupts() I've mentioned in one of replies and sync VSSIP:
>> +void vcpu_sync_interrupts(struct vcpu *v)
>> +{
>> + unsigned long hvip;
>> +
>> + /* Read current HVIP and VSIE CSRs */
>> + v->arch.vsie = csr_read(CSR_VSIE);
>> +
>> + /* Sync-up HVIP.VSSIP bit changes does by Guest */
>> + hvip = csr_read(CSR_HVIP);
>> + if ( (v->arch.hvip ^ hvip) & BIT(IRQ_VS_SOFT, UL) )
>> + {
>> + if ( hvip & BIT(IRQ_VS_SOFT, UL) )
>> + {
>> + if ( !test_and_set_bit(IRQ_VS_SOFT,
>> + &v->arch.irqs_pending_mask) )
>> + set_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
>> + }
>> + else
>> + {
>> + if ( !test_and_set_bit(IRQ_VS_SOFT,
>> + &v->arch.irqs_pending_mask) )
>> + clear_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
>> + }
>> + }
> I fear I don't understand this at all. Why would the guest having set a
> pending bit not result in the IRQ to be marked pending?
Maybe it is wrong assumption but based on the spec:
Bits sip.SSIP and sie.SSIE are the interrupt-pending and interrupt-enable
bits for supervisor-level software interrupts. If implemented, SSIP is
writable in sip and may also be set to 1 by a platform-specific interrupt
controller.
and:
Interprocessor interrupts are sent to other harts by implementation-specific
means, which will ultimately cause the SSIP bit to be set in the recipient
hart’s sip register.
Meaning that sending an IPI to self by writing 1 to sip.SSIP is
well-defined. The same should be true of vsip.SSIP while in VS mode.
And so in this case if SSIP handling was delegated by hypervisor to guest by
setting hedeleg[2] = 1 we won't have an interrupt in hypervsor, and so nothing
will set a pending bit in bitmap or update hvip register from hypervisor.
( All bits except SSIP in the sip register are read-only. )
> You can't know
> whether that guest write happened before or after you last touched
> .irqs_pending{,mask}[]?
Yes, I think you are right.
On the other hand, if we are in hypervisor when vcpu_sync_interrupts() is
called it means that pCPU on which vCPU is ran and for which
vcpu_sync_interrupts() is called now executes some hypervisor things, so
guest won't able to update VSIP.SSIP for this pCPU. So nothing else will
change VSIP.SSIP and so h/w HVIP won't be changed by something and it is
okay to sync .irqs_pending{,mask} with what h/w in its HVIP.
~ Oleksii
> Yet that pair of bit arrays is supposed to be
> tracking the most recent update (according to how I understood earlier
> explanations of yours).
>
> As an aside - the !test_and_set_bit() can be pulled out, to the outermost
> if().
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1
2026-01-15 9:14 ` Oleksii Kurochko
@ 2026-01-15 9:52 ` Jan Beulich
2026-01-15 10:55 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-15 9:52 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 15.01.2026 10:14, Oleksii Kurochko wrote:
> On 1/14/26 4:56 PM, Jan Beulich wrote:
>> On 14.01.2026 16:39, Oleksii Kurochko wrote:
>>> On 1/13/26 2:54 PM, Jan Beulich wrote:
>>>> On 13.01.2026 13:51, Oleksii Kurochko wrote:
>>>>> On 1/7/26 5:28 PM, Jan Beulich wrote:
>>>>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>>> By maintaining irqs_pending_mask, you can detect “this bit changed
>>>>> recently,” even if the final state is 0.
>>>>>
>>>>> Also, having irqs_pending_mask allows to flush interrupts without lock:
>>>>> if ( ACCESS_ONCE(v->arch.irqs_pending_mask[0]) )
>>>>> {
>>>>> mask = xchg(&v->arch.irqs_pending_mask[0], 0UL);
>>>>> val = ACCESS_ONCE(v->arch.irqs_pending[0]) & mask;
>>>>>
>>>>> *hvip &= ~mask;
>>>>> *hvip |= val;
>>>>> }
>>>>> Without it I assume that we should have spinlcok around access to irqs_pending.
>>>> Ah yes, this would indeed be a benefit. Just that it's not quite clear to
>>>> me:
>>>>
>>>> *hvip |= xchg(&v->arch.irqs_pending[0], 0UL);
>>>>
>>>> wouldn't require a lock either
>>> Because vCPU's hvip (which is stored on the stack) can't be changed concurrently
>>> and it's almost the one place in the code where vCPU->hvip is changed. Another
>>> place it is save_csrs() during context switch but it can't be called in parallel
>>> with the vcpu_sync_interrupts() (look below).
>>>
>>>> . What may be confusing me is that you put
>>>> things as if it was normal to see 1 -> 0 transitions from (virtual)
>>>> hardware, when I (with my x86 background) would expect 1 -> 0 transitions
>>>> to only occur due to software actions (End Of Interrupt), unless - see
>>>> above - something malfunctioned and an interrupt was lost. That (the 1 ->
>>>> 0 transitions) could be (guest) writes to SVIP, for example.
>>>>
>>>> Talking of which - do you really mean HVIP in the code you provided, not
>>>> VSVIP? So far I my understanding was that HVIP would be recording the
>>>> interrupts the hypervisor itself has pending (and needs to service).
>>> HVIP is correct to use here, HVIP is used to indicate virtual interrupts
>>> intended for VS-mode. And I think you confused HVIP with the HIP register
>>> which supplements the standard supervisor-level SIP register to indicate
>>> pending virtual supervisor (VS-level) interrupts and hypervisor-specific
>>> interrupts.
>>>
>>> If a guest will do "That (the 1 -> 0 transitions) could be (guest) writes
>>> to SVIP, for example." then the correspondent HVIP (and HIP as usually
>>> they are aliasis of HVIP) bits will be updated. And that is why we need
>>> vcpu_sync_interrupts() I've mentioned in one of replies and sync VSSIP:
>>> +void vcpu_sync_interrupts(struct vcpu *v)
>>> +{
>>> + unsigned long hvip;
>>> +
>>> + /* Read current HVIP and VSIE CSRs */
>>> + v->arch.vsie = csr_read(CSR_VSIE);
>>> +
>>> + /* Sync-up HVIP.VSSIP bit changes does by Guest */
>>> + hvip = csr_read(CSR_HVIP);
>>> + if ( (v->arch.hvip ^ hvip) & BIT(IRQ_VS_SOFT, UL) )
>>> + {
>>> + if ( hvip & BIT(IRQ_VS_SOFT, UL) )
>>> + {
>>> + if ( !test_and_set_bit(IRQ_VS_SOFT,
>>> + &v->arch.irqs_pending_mask) )
>>> + set_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
>>> + }
>>> + else
>>> + {
>>> + if ( !test_and_set_bit(IRQ_VS_SOFT,
>>> + &v->arch.irqs_pending_mask) )
>>> + clear_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
>>> + }
>>> + }
>> I fear I don't understand this at all. Why would the guest having set a
>> pending bit not result in the IRQ to be marked pending?
>
> Maybe it is wrong assumption but based on the spec:
> Bits sip.SSIP and sie.SSIE are the interrupt-pending and interrupt-enable
> bits for supervisor-level software interrupts. If implemented, SSIP is
> writable in sip and may also be set to 1 by a platform-specific interrupt
> controller.
> and:
> Interprocessor interrupts are sent to other harts by implementation-specific
> means, which will ultimately cause the SSIP bit to be set in the recipient
> hart’s sip register.
>
> Meaning that sending an IPI to self by writing 1 to sip.SSIP is
> well-defined. The same should be true of vsip.SSIP while in VS mode.
I can't read that out of the text above. To the contrary, "will ultimately cause
the SSIP bit to be set" suggests to me that the bit is not to be set by writing
the CSR. Things still may work like this for self-IPI, but that wouldn't follow
from the quotation above.
>> You can't know
>> whether that guest write happened before or after you last touched
>> .irqs_pending{,mask}[]?
>
> Yes, I think you are right.
>
> On the other hand, if we are in hypervisor when vcpu_sync_interrupts() is
> called it means that pCPU on which vCPU is ran and for which
> vcpu_sync_interrupts() is called now executes some hypervisor things, so
> guest won't able to update VSIP.SSIP for this pCPU. So nothing else will
> change VSIP.SSIP and so h/w HVIP won't be changed by something and it is
> okay to sync .irqs_pending{,mask} with what h/w in its HVIP.
That is, vcpu_sync_interrupts() is called on every entry to the hypervisor?
Not just during context switch?
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1
2026-01-15 9:52 ` Jan Beulich
@ 2026-01-15 10:55 ` Oleksii Kurochko
2026-01-15 10:59 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-15 10:55 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/15/26 10:52 AM, Jan Beulich wrote:
> On 15.01.2026 10:14, Oleksii Kurochko wrote:
>> On 1/14/26 4:56 PM, Jan Beulich wrote:
>>> On 14.01.2026 16:39, Oleksii Kurochko wrote:
>>>> On 1/13/26 2:54 PM, Jan Beulich wrote:
>>>>> On 13.01.2026 13:51, Oleksii Kurochko wrote:
>>>>>> On 1/7/26 5:28 PM, Jan Beulich wrote:
>>>>>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>>>> By maintaining irqs_pending_mask, you can detect “this bit changed
>>>>>> recently,” even if the final state is 0.
>>>>>>
>>>>>> Also, having irqs_pending_mask allows to flush interrupts without lock:
>>>>>> if ( ACCESS_ONCE(v->arch.irqs_pending_mask[0]) )
>>>>>> {
>>>>>> mask = xchg(&v->arch.irqs_pending_mask[0], 0UL);
>>>>>> val = ACCESS_ONCE(v->arch.irqs_pending[0]) & mask;
>>>>>>
>>>>>> *hvip &= ~mask;
>>>>>> *hvip |= val;
>>>>>> }
>>>>>> Without it I assume that we should have spinlcok around access to irqs_pending.
>>>>> Ah yes, this would indeed be a benefit. Just that it's not quite clear to
>>>>> me:
>>>>>
>>>>> *hvip |= xchg(&v->arch.irqs_pending[0], 0UL);
>>>>>
>>>>> wouldn't require a lock either
>>>> Because vCPU's hvip (which is stored on the stack) can't be changed concurrently
>>>> and it's almost the one place in the code where vCPU->hvip is changed. Another
>>>> place it is save_csrs() during context switch but it can't be called in parallel
>>>> with the vcpu_sync_interrupts() (look below).
>>>>
>>>>> . What may be confusing me is that you put
>>>>> things as if it was normal to see 1 -> 0 transitions from (virtual)
>>>>> hardware, when I (with my x86 background) would expect 1 -> 0 transitions
>>>>> to only occur due to software actions (End Of Interrupt), unless - see
>>>>> above - something malfunctioned and an interrupt was lost. That (the 1 ->
>>>>> 0 transitions) could be (guest) writes to SVIP, for example.
>>>>>
>>>>> Talking of which - do you really mean HVIP in the code you provided, not
>>>>> VSVIP? So far I my understanding was that HVIP would be recording the
>>>>> interrupts the hypervisor itself has pending (and needs to service).
>>>> HVIP is correct to use here, HVIP is used to indicate virtual interrupts
>>>> intended for VS-mode. And I think you confused HVIP with the HIP register
>>>> which supplements the standard supervisor-level SIP register to indicate
>>>> pending virtual supervisor (VS-level) interrupts and hypervisor-specific
>>>> interrupts.
>>>>
>>>> If a guest will do "That (the 1 -> 0 transitions) could be (guest) writes
>>>> to SVIP, for example." then the correspondent HVIP (and HIP as usually
>>>> they are aliasis of HVIP) bits will be updated. And that is why we need
>>>> vcpu_sync_interrupts() I've mentioned in one of replies and sync VSSIP:
>>>> +void vcpu_sync_interrupts(struct vcpu *v)
>>>> +{
>>>> + unsigned long hvip;
>>>> +
>>>> + /* Read current HVIP and VSIE CSRs */
>>>> + v->arch.vsie = csr_read(CSR_VSIE);
>>>> +
>>>> + /* Sync-up HVIP.VSSIP bit changes does by Guest */
>>>> + hvip = csr_read(CSR_HVIP);
>>>> + if ( (v->arch.hvip ^ hvip) & BIT(IRQ_VS_SOFT, UL) )
>>>> + {
>>>> + if ( hvip & BIT(IRQ_VS_SOFT, UL) )
>>>> + {
>>>> + if ( !test_and_set_bit(IRQ_VS_SOFT,
>>>> + &v->arch.irqs_pending_mask) )
>>>> + set_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
>>>> + }
>>>> + else
>>>> + {
>>>> + if ( !test_and_set_bit(IRQ_VS_SOFT,
>>>> + &v->arch.irqs_pending_mask) )
>>>> + clear_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
>>>> + }
>>>> + }
>>> I fear I don't understand this at all. Why would the guest having set a
>>> pending bit not result in the IRQ to be marked pending?
>> Maybe it is wrong assumption but based on the spec:
>> Bits sip.SSIP and sie.SSIE are the interrupt-pending and interrupt-enable
>> bits for supervisor-level software interrupts. If implemented, SSIP is
>> writable in sip and may also be set to 1 by a platform-specific interrupt
>> controller.
>> and:
>> Interprocessor interrupts are sent to other harts by implementation-specific
>> means, which will ultimately cause the SSIP bit to be set in the recipient
>> hart’s sip register.
>>
>> Meaning that sending an IPI to self by writing 1 to sip.SSIP is
>> well-defined. The same should be true of vsip.SSIP while in VS mode.
> I can't read that out of the text above. To the contrary, "will ultimately cause
> the SSIP bit to be set" suggests to me that the bit is not to be set by writing
> the CSR. Things still may work like this for self-IPI, but that wouldn't follow
> from the quotation above.
Why not that wouldn't follow from the quotation above?
The first quotation tells that we can do self-IPI so VSSIP.SSIP will set to 1
what we could miss SSIP bit if won't explicitly try to read h/w HVIP (or VSSIP,
or whatever other alias of the SSIP bit) and sync with what we have cached
in hypervisor.
The second quotation tells that if another CPU send IPI to CPUx then CPUx.SIP will
have SSIP bit set to 1 and again hypervisor won't know that without explicit
reading of HVIP (or VSSIP, or whatever other alias of the SSIP bit).
>
>>> You can't know
>>> whether that guest write happened before or after you last touched
>>> .irqs_pending{,mask}[]?
>> Yes, I think you are right.
>>
>> On the other hand, if we are in hypervisor when vcpu_sync_interrupts() is
>> called it means that pCPU on which vCPU is ran and for which
>> vcpu_sync_interrupts() is called now executes some hypervisor things, so
>> guest won't able to update VSIP.SSIP for this pCPU. So nothing else will
>> change VSIP.SSIP and so h/w HVIP won't be changed by something and it is
>> okay to sync .irqs_pending{,mask} with what h/w in its HVIP.
> That is, vcpu_sync_interrupts() is called on every entry to the hypervisor?
> Not just during context switch?
It is called each time before exit from the hypervisor to a guest.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1
2026-01-15 10:55 ` Oleksii Kurochko
@ 2026-01-15 10:59 ` Jan Beulich
2026-01-15 11:46 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-15 10:59 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 15.01.2026 11:55, Oleksii Kurochko wrote:
> On 1/15/26 10:52 AM, Jan Beulich wrote:
>> On 15.01.2026 10:14, Oleksii Kurochko wrote:
>>> On 1/14/26 4:56 PM, Jan Beulich wrote:
>>>> On 14.01.2026 16:39, Oleksii Kurochko wrote:
>>>>> If a guest will do "That (the 1 -> 0 transitions) could be (guest) writes
>>>>> to SVIP, for example." then the correspondent HVIP (and HIP as usually
>>>>> they are aliasis of HVIP) bits will be updated. And that is why we need
>>>>> vcpu_sync_interrupts() I've mentioned in one of replies and sync VSSIP:
>>>>> +void vcpu_sync_interrupts(struct vcpu *v)
>>>>> +{
>>>>> + unsigned long hvip;
>>>>> +
>>>>> + /* Read current HVIP and VSIE CSRs */
>>>>> + v->arch.vsie = csr_read(CSR_VSIE);
>>>>> +
>>>>> + /* Sync-up HVIP.VSSIP bit changes does by Guest */
>>>>> + hvip = csr_read(CSR_HVIP);
>>>>> + if ( (v->arch.hvip ^ hvip) & BIT(IRQ_VS_SOFT, UL) )
>>>>> + {
>>>>> + if ( hvip & BIT(IRQ_VS_SOFT, UL) )
>>>>> + {
>>>>> + if ( !test_and_set_bit(IRQ_VS_SOFT,
>>>>> + &v->arch.irqs_pending_mask) )
>>>>> + set_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
>>>>> + }
>>>>> + else
>>>>> + {
>>>>> + if ( !test_and_set_bit(IRQ_VS_SOFT,
>>>>> + &v->arch.irqs_pending_mask) )
>>>>> + clear_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
>>>>> + }
>>>>> + }
>>>> I fear I don't understand this at all. Why would the guest having set a
>>>> pending bit not result in the IRQ to be marked pending?
>>> Maybe it is wrong assumption but based on the spec:
>>> Bits sip.SSIP and sie.SSIE are the interrupt-pending and interrupt-enable
>>> bits for supervisor-level software interrupts. If implemented, SSIP is
>>> writable in sip and may also be set to 1 by a platform-specific interrupt
>>> controller.
>>> and:
>>> Interprocessor interrupts are sent to other harts by implementation-specific
>>> means, which will ultimately cause the SSIP bit to be set in the recipient
>>> hart’s sip register.
>>>
>>> Meaning that sending an IPI to self by writing 1 to sip.SSIP is
>>> well-defined. The same should be true of vsip.SSIP while in VS mode.
>> I can't read that out of the text above. To the contrary, "will ultimately cause
>> the SSIP bit to be set" suggests to me that the bit is not to be set by writing
>> the CSR. Things still may work like this for self-IPI, but that wouldn't follow
>> from the quotation above.
>
> Why not that wouldn't follow from the quotation above?
>
> The first quotation tells that we can do self-IPI so VSSIP.SSIP will set to 1
> what we could miss SSIP bit if won't explicitly try to read h/w HVIP (or VSSIP,
> or whatever other alias of the SSIP bit) and sync with what we have cached
> in hypervisor.
The bit being writable doesn't imply that it being written with 1 would also
trigger an interruption. If that's indeed the behavior, it surely is being
said elsewhere.
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1
2026-01-15 10:59 ` Jan Beulich
@ 2026-01-15 11:46 ` Oleksii Kurochko
2026-01-15 12:09 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-15 11:46 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/15/26 11:59 AM, Jan Beulich wrote:
> On 15.01.2026 11:55, Oleksii Kurochko wrote:
>> On 1/15/26 10:52 AM, Jan Beulich wrote:
>>> On 15.01.2026 10:14, Oleksii Kurochko wrote:
>>>> On 1/14/26 4:56 PM, Jan Beulich wrote:
>>>>> On 14.01.2026 16:39, Oleksii Kurochko wrote:
>>>>>> If a guest will do "That (the 1 -> 0 transitions) could be (guest) writes
>>>>>> to SVIP, for example." then the correspondent HVIP (and HIP as usually
>>>>>> they are aliasis of HVIP) bits will be updated. And that is why we need
>>>>>> vcpu_sync_interrupts() I've mentioned in one of replies and sync VSSIP:
>>>>>> +void vcpu_sync_interrupts(struct vcpu *v)
>>>>>> +{
>>>>>> + unsigned long hvip;
>>>>>> +
>>>>>> + /* Read current HVIP and VSIE CSRs */
>>>>>> + v->arch.vsie = csr_read(CSR_VSIE);
>>>>>> +
>>>>>> + /* Sync-up HVIP.VSSIP bit changes does by Guest */
>>>>>> + hvip = csr_read(CSR_HVIP);
>>>>>> + if ( (v->arch.hvip ^ hvip) & BIT(IRQ_VS_SOFT, UL) )
>>>>>> + {
>>>>>> + if ( hvip & BIT(IRQ_VS_SOFT, UL) )
>>>>>> + {
>>>>>> + if ( !test_and_set_bit(IRQ_VS_SOFT,
>>>>>> + &v->arch.irqs_pending_mask) )
>>>>>> + set_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
>>>>>> + }
>>>>>> + else
>>>>>> + {
>>>>>> + if ( !test_and_set_bit(IRQ_VS_SOFT,
>>>>>> + &v->arch.irqs_pending_mask) )
>>>>>> + clear_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
>>>>>> + }
>>>>>> + }
>>>>> I fear I don't understand this at all. Why would the guest having set a
>>>>> pending bit not result in the IRQ to be marked pending?
>>>> Maybe it is wrong assumption but based on the spec:
>>>> Bits sip.SSIP and sie.SSIE are the interrupt-pending and interrupt-enable
>>>> bits for supervisor-level software interrupts. If implemented, SSIP is
>>>> writable in sip and may also be set to 1 by a platform-specific interrupt
>>>> controller.
>>>> and:
>>>> Interprocessor interrupts are sent to other harts by implementation-specific
>>>> means, which will ultimately cause the SSIP bit to be set in the recipient
>>>> hart’s sip register.
>>>>
>>>> Meaning that sending an IPI to self by writing 1 to sip.SSIP is
>>>> well-defined. The same should be true of vsip.SSIP while in VS mode.
>>> I can't read that out of the text above. To the contrary, "will ultimately cause
>>> the SSIP bit to be set" suggests to me that the bit is not to be set by writing
>>> the CSR. Things still may work like this for self-IPI, but that wouldn't follow
>>> from the quotation above.
>> Why not that wouldn't follow from the quotation above?
>>
>> The first quotation tells that we can do self-IPI so VSSIP.SSIP will set to 1
>> what we could miss SSIP bit if won't explicitly try to read h/w HVIP (or VSSIP,
>> or whatever other alias of the SSIP bit) and sync with what we have cached
>> in hypervisor.
> The bit being writable doesn't imply that it being written with 1 would also
> trigger an interruption. If that's indeed the behavior, it surely is being
> said elsewhere.
According to the spec it will trap to S-mode (VS-mode in our context) if both of
the following are true: (a) either the current privilege mode is S and the SIE
bit in the sstatus register is set, or the current privilege mode has less
privilege than S-mode; and (b) bit i is set in both sip and sie.
Even without a triggering an interrupt I think it we can still lose set bit in
VSSIP register (if Im not mistaken something). If we won't do a sync of cached
hvip and h/w hvip then it could lead to the issue we lost a real SSIP bit value.
For example, guest before entering hypervisor set VSSIP.SSIP to 1 what
means what means that hip.VSSIP will be also set to 1 as:
When bit 2 of hideleg is zero, vsip.SSIP and vsie.SSIE are read-only zeros.
Else, vsip.SSIP and vsie.SSIE are aliases of hip.VSSIP and hie.VSSIE.
And so hvip.SSIP will be set to 1 as:
Bits hip.VSSIP and hie.VSSIE are the interrupt-pending and interrupt-enable
bits for VS-level software interrupts. VSSIP in hip is an alias (writable)
of the same bit in hvip.
And then if we don't sync cached hvip with h/w hvip, it could lead to then
when we will put cached hvip (which has .VSSIP set to 0) overwrite h/w hvip.VSSIP
which was set to 1.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1
2026-01-15 11:46 ` Oleksii Kurochko
@ 2026-01-15 12:09 ` Jan Beulich
2026-01-15 12:25 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-15 12:09 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 15.01.2026 12:46, Oleksii Kurochko wrote:
>
> On 1/15/26 11:59 AM, Jan Beulich wrote:
>> On 15.01.2026 11:55, Oleksii Kurochko wrote:
>>> On 1/15/26 10:52 AM, Jan Beulich wrote:
>>>> On 15.01.2026 10:14, Oleksii Kurochko wrote:
>>>>> On 1/14/26 4:56 PM, Jan Beulich wrote:
>>>>>> On 14.01.2026 16:39, Oleksii Kurochko wrote:
>>>>>>> If a guest will do "That (the 1 -> 0 transitions) could be (guest) writes
>>>>>>> to SVIP, for example." then the correspondent HVIP (and HIP as usually
>>>>>>> they are aliasis of HVIP) bits will be updated. And that is why we need
>>>>>>> vcpu_sync_interrupts() I've mentioned in one of replies and sync VSSIP:
>>>>>>> +void vcpu_sync_interrupts(struct vcpu *v)
>>>>>>> +{
>>>>>>> + unsigned long hvip;
>>>>>>> +
>>>>>>> + /* Read current HVIP and VSIE CSRs */
>>>>>>> + v->arch.vsie = csr_read(CSR_VSIE);
>>>>>>> +
>>>>>>> + /* Sync-up HVIP.VSSIP bit changes does by Guest */
>>>>>>> + hvip = csr_read(CSR_HVIP);
>>>>>>> + if ( (v->arch.hvip ^ hvip) & BIT(IRQ_VS_SOFT, UL) )
>>>>>>> + {
>>>>>>> + if ( hvip & BIT(IRQ_VS_SOFT, UL) )
>>>>>>> + {
>>>>>>> + if ( !test_and_set_bit(IRQ_VS_SOFT,
>>>>>>> + &v->arch.irqs_pending_mask) )
>>>>>>> + set_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
>>>>>>> + }
>>>>>>> + else
>>>>>>> + {
>>>>>>> + if ( !test_and_set_bit(IRQ_VS_SOFT,
>>>>>>> + &v->arch.irqs_pending_mask) )
>>>>>>> + clear_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
>>>>>>> + }
>>>>>>> + }
>>>>>> I fear I don't understand this at all. Why would the guest having set a
>>>>>> pending bit not result in the IRQ to be marked pending?
>>>>> Maybe it is wrong assumption but based on the spec:
>>>>> Bits sip.SSIP and sie.SSIE are the interrupt-pending and interrupt-enable
>>>>> bits for supervisor-level software interrupts. If implemented, SSIP is
>>>>> writable in sip and may also be set to 1 by a platform-specific interrupt
>>>>> controller.
>>>>> and:
>>>>> Interprocessor interrupts are sent to other harts by implementation-specific
>>>>> means, which will ultimately cause the SSIP bit to be set in the recipient
>>>>> hart’s sip register.
>>>>>
>>>>> Meaning that sending an IPI to self by writing 1 to sip.SSIP is
>>>>> well-defined. The same should be true of vsip.SSIP while in VS mode.
>>>> I can't read that out of the text above. To the contrary, "will ultimately cause
>>>> the SSIP bit to be set" suggests to me that the bit is not to be set by writing
>>>> the CSR. Things still may work like this for self-IPI, but that wouldn't follow
>>>> from the quotation above.
>>> Why not that wouldn't follow from the quotation above?
>>>
>>> The first quotation tells that we can do self-IPI so VSSIP.SSIP will set to 1
>>> what we could miss SSIP bit if won't explicitly try to read h/w HVIP (or VSSIP,
>>> or whatever other alias of the SSIP bit) and sync with what we have cached
>>> in hypervisor.
>> The bit being writable doesn't imply that it being written with 1 would also
>> trigger an interruption. If that's indeed the behavior, it surely is being
>> said elsewhere.
>
> According to the spec it will trap to S-mode (VS-mode in our context) if both of
> the following are true: (a) either the current privilege mode is S and the SIE
> bit in the sstatus register is set, or the current privilege mode has less
> privilege than S-mode; and (b) bit i is set in both sip and sie.
That's still not it. Here is the relevant quote:
"These conditions for an interrupt trap to occur must be evaluated in a bounded
amount of time from when an interrupt becomes, or ceases to be, pending in sip,
and must also be evaluated immediately following the execution of an SRET
instruction or an explicit write to a CSR on which these interrupt trap
conditions expressly depend (including sip, sie and sstatus)."
Note in particular the "explicit write to a CSR". (Sorry, I did read that before,
but I didn't memorize it. Else I wouldn't have asked the original question.)
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1
2026-01-15 12:09 ` Jan Beulich
@ 2026-01-15 12:25 ` Oleksii Kurochko
2026-01-15 12:30 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-15 12:25 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/15/26 1:09 PM, Jan Beulich wrote:
> On 15.01.2026 12:46, Oleksii Kurochko wrote:
>> On 1/15/26 11:59 AM, Jan Beulich wrote:
>>> On 15.01.2026 11:55, Oleksii Kurochko wrote:
>>>> On 1/15/26 10:52 AM, Jan Beulich wrote:
>>>>> On 15.01.2026 10:14, Oleksii Kurochko wrote:
>>>>>> On 1/14/26 4:56 PM, Jan Beulich wrote:
>>>>>>> On 14.01.2026 16:39, Oleksii Kurochko wrote:
>>>>>>>> If a guest will do "That (the 1 -> 0 transitions) could be (guest) writes
>>>>>>>> to SVIP, for example." then the correspondent HVIP (and HIP as usually
>>>>>>>> they are aliasis of HVIP) bits will be updated. And that is why we need
>>>>>>>> vcpu_sync_interrupts() I've mentioned in one of replies and sync VSSIP:
>>>>>>>> +void vcpu_sync_interrupts(struct vcpu *v)
>>>>>>>> +{
>>>>>>>> + unsigned long hvip;
>>>>>>>> +
>>>>>>>> + /* Read current HVIP and VSIE CSRs */
>>>>>>>> + v->arch.vsie = csr_read(CSR_VSIE);
>>>>>>>> +
>>>>>>>> + /* Sync-up HVIP.VSSIP bit changes does by Guest */
>>>>>>>> + hvip = csr_read(CSR_HVIP);
>>>>>>>> + if ( (v->arch.hvip ^ hvip) & BIT(IRQ_VS_SOFT, UL) )
>>>>>>>> + {
>>>>>>>> + if ( hvip & BIT(IRQ_VS_SOFT, UL) )
>>>>>>>> + {
>>>>>>>> + if ( !test_and_set_bit(IRQ_VS_SOFT,
>>>>>>>> + &v->arch.irqs_pending_mask) )
>>>>>>>> + set_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
>>>>>>>> + }
>>>>>>>> + else
>>>>>>>> + {
>>>>>>>> + if ( !test_and_set_bit(IRQ_VS_SOFT,
>>>>>>>> + &v->arch.irqs_pending_mask) )
>>>>>>>> + clear_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
>>>>>>>> + }
>>>>>>>> + }
>>>>>>> I fear I don't understand this at all. Why would the guest having set a
>>>>>>> pending bit not result in the IRQ to be marked pending?
>>>>>> Maybe it is wrong assumption but based on the spec:
>>>>>> Bits sip.SSIP and sie.SSIE are the interrupt-pending and interrupt-enable
>>>>>> bits for supervisor-level software interrupts. If implemented, SSIP is
>>>>>> writable in sip and may also be set to 1 by a platform-specific interrupt
>>>>>> controller.
>>>>>> and:
>>>>>> Interprocessor interrupts are sent to other harts by implementation-specific
>>>>>> means, which will ultimately cause the SSIP bit to be set in the recipient
>>>>>> hart’s sip register.
>>>>>>
>>>>>> Meaning that sending an IPI to self by writing 1 to sip.SSIP is
>>>>>> well-defined. The same should be true of vsip.SSIP while in VS mode.
>>>>> I can't read that out of the text above. To the contrary, "will ultimately cause
>>>>> the SSIP bit to be set" suggests to me that the bit is not to be set by writing
>>>>> the CSR. Things still may work like this for self-IPI, but that wouldn't follow
>>>>> from the quotation above.
>>>> Why not that wouldn't follow from the quotation above?
>>>>
>>>> The first quotation tells that we can do self-IPI so VSSIP.SSIP will set to 1
>>>> what we could miss SSIP bit if won't explicitly try to read h/w HVIP (or VSSIP,
>>>> or whatever other alias of the SSIP bit) and sync with what we have cached
>>>> in hypervisor.
>>> The bit being writable doesn't imply that it being written with 1 would also
>>> trigger an interruption. If that's indeed the behavior, it surely is being
>>> said elsewhere.
>> According to the spec it will trap to S-mode (VS-mode in our context) if both of
>> the following are true: (a) either the current privilege mode is S and the SIE
>> bit in the sstatus register is set, or the current privilege mode has less
>> privilege than S-mode; and (b) bit i is set in both sip and sie.
> That's still not it. Here is the relevant quote:
>
> "These conditions for an interrupt trap to occur must be evaluated in a bounded
> amount of time from when an interrupt becomes, or ceases to be, pending in sip,
> and must also be evaluated immediately following the execution of an SRET
> instruction or an explicit write to a CSR on which these interrupt trap
> conditions expressly depend (including sip, sie and sstatus)."
>
> Note in particular the "explicit write to a CSR". (Sorry, I did read that before,
> but I didn't memorize it. Else I wouldn't have asked the original question.)
Guest can do:
csr_write(CSR_SIP, SSIP);
what is an explicit write to a CSR. Or it the quote it means a different CSR?
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1
2026-01-15 12:25 ` Oleksii Kurochko
@ 2026-01-15 12:30 ` Jan Beulich
0 siblings, 0 replies; 93+ messages in thread
From: Jan Beulich @ 2026-01-15 12:30 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 15.01.2026 13:25, Oleksii Kurochko wrote:
>
> On 1/15/26 1:09 PM, Jan Beulich wrote:
>> On 15.01.2026 12:46, Oleksii Kurochko wrote:
>>> On 1/15/26 11:59 AM, Jan Beulich wrote:
>>>> On 15.01.2026 11:55, Oleksii Kurochko wrote:
>>>>> On 1/15/26 10:52 AM, Jan Beulich wrote:
>>>>>> On 15.01.2026 10:14, Oleksii Kurochko wrote:
>>>>>>> On 1/14/26 4:56 PM, Jan Beulich wrote:
>>>>>>>> On 14.01.2026 16:39, Oleksii Kurochko wrote:
>>>>>>>>> If a guest will do "That (the 1 -> 0 transitions) could be (guest) writes
>>>>>>>>> to SVIP, for example." then the correspondent HVIP (and HIP as usually
>>>>>>>>> they are aliasis of HVIP) bits will be updated. And that is why we need
>>>>>>>>> vcpu_sync_interrupts() I've mentioned in one of replies and sync VSSIP:
>>>>>>>>> +void vcpu_sync_interrupts(struct vcpu *v)
>>>>>>>>> +{
>>>>>>>>> + unsigned long hvip;
>>>>>>>>> +
>>>>>>>>> + /* Read current HVIP and VSIE CSRs */
>>>>>>>>> + v->arch.vsie = csr_read(CSR_VSIE);
>>>>>>>>> +
>>>>>>>>> + /* Sync-up HVIP.VSSIP bit changes does by Guest */
>>>>>>>>> + hvip = csr_read(CSR_HVIP);
>>>>>>>>> + if ( (v->arch.hvip ^ hvip) & BIT(IRQ_VS_SOFT, UL) )
>>>>>>>>> + {
>>>>>>>>> + if ( hvip & BIT(IRQ_VS_SOFT, UL) )
>>>>>>>>> + {
>>>>>>>>> + if ( !test_and_set_bit(IRQ_VS_SOFT,
>>>>>>>>> + &v->arch.irqs_pending_mask) )
>>>>>>>>> + set_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
>>>>>>>>> + }
>>>>>>>>> + else
>>>>>>>>> + {
>>>>>>>>> + if ( !test_and_set_bit(IRQ_VS_SOFT,
>>>>>>>>> + &v->arch.irqs_pending_mask) )
>>>>>>>>> + clear_bit(IRQ_VS_SOFT, &v->arch.irqs_pending);
>>>>>>>>> + }
>>>>>>>>> + }
>>>>>>>> I fear I don't understand this at all. Why would the guest having set a
>>>>>>>> pending bit not result in the IRQ to be marked pending?
>>>>>>> Maybe it is wrong assumption but based on the spec:
>>>>>>> Bits sip.SSIP and sie.SSIE are the interrupt-pending and interrupt-enable
>>>>>>> bits for supervisor-level software interrupts. If implemented, SSIP is
>>>>>>> writable in sip and may also be set to 1 by a platform-specific interrupt
>>>>>>> controller.
>>>>>>> and:
>>>>>>> Interprocessor interrupts are sent to other harts by implementation-specific
>>>>>>> means, which will ultimately cause the SSIP bit to be set in the recipient
>>>>>>> hart’s sip register.
>>>>>>>
>>>>>>> Meaning that sending an IPI to self by writing 1 to sip.SSIP is
>>>>>>> well-defined. The same should be true of vsip.SSIP while in VS mode.
>>>>>> I can't read that out of the text above. To the contrary, "will ultimately cause
>>>>>> the SSIP bit to be set" suggests to me that the bit is not to be set by writing
>>>>>> the CSR. Things still may work like this for self-IPI, but that wouldn't follow
>>>>>> from the quotation above.
>>>>> Why not that wouldn't follow from the quotation above?
>>>>>
>>>>> The first quotation tells that we can do self-IPI so VSSIP.SSIP will set to 1
>>>>> what we could miss SSIP bit if won't explicitly try to read h/w HVIP (or VSSIP,
>>>>> or whatever other alias of the SSIP bit) and sync with what we have cached
>>>>> in hypervisor.
>>>> The bit being writable doesn't imply that it being written with 1 would also
>>>> trigger an interruption. If that's indeed the behavior, it surely is being
>>>> said elsewhere.
>>> According to the spec it will trap to S-mode (VS-mode in our context) if both of
>>> the following are true: (a) either the current privilege mode is S and the SIE
>>> bit in the sstatus register is set, or the current privilege mode has less
>>> privilege than S-mode; and (b) bit i is set in both sip and sie.
>> That's still not it. Here is the relevant quote:
>>
>> "These conditions for an interrupt trap to occur must be evaluated in a bounded
>> amount of time from when an interrupt becomes, or ceases to be, pending in sip,
>> and must also be evaluated immediately following the execution of an SRET
>> instruction or an explicit write to a CSR on which these interrupt trap
>> conditions expressly depend (including sip, sie and sstatus)."
>>
>> Note in particular the "explicit write to a CSR". (Sorry, I did read that before,
>> but I didn't memorize it. Else I wouldn't have asked the original question.)
>
> Guest can do:
> csr_write(CSR_SIP, SSIP);
> what is an explicit write to a CSR. Or it the quote it means a different CSR?
That's what is meant, aiui.
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1
2026-01-07 16:28 ` Jan Beulich
2026-01-13 12:51 ` Oleksii Kurochko
@ 2026-01-16 14:25 ` Oleksii Kurochko
2026-01-16 14:42 ` Jan Beulich
1 sibling, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-16 14:25 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/7/26 5:28 PM, Jan Beulich wrote:
>> +
>> + /*
>> + * VCPU interrupts
>> + *
>> + * We have a lockless approach for tracking pending VCPU interrupts
>> + * implemented using atomic bitops. The irqs_pending bitmap represent
>> + * pending interrupts whereas irqs_pending_mask represent bits changed
>> + * in irqs_pending.
> And hence a set immediately followed by an unset is then indistinguishable
> from just an unset (or the other way around). This may not be a problem, but
> if it isn't, I think this needs explaining.
I am still not sure that this is actually a problem, or what kind of explanation
is needed.
|unset| is called only when the guest makes such a request, and the guest will
make that request only after it has received an interrupt that was previously
set in the|irq_pending| bitmap and then flushed to the hardware HVIP.
If an interrupt is simply set and then unset without ever being flushed to the
hardware HVIP, it seems there would be no issue, since it would not affect the
guest. However, the question of why this happened at all would still remain.
Do I miss some corner cases which should be taken into account?
Should I still have to add some extra explanation to the comment or commit
message?
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1
2026-01-16 14:25 ` Oleksii Kurochko
@ 2026-01-16 14:42 ` Jan Beulich
0 siblings, 0 replies; 93+ messages in thread
From: Jan Beulich @ 2026-01-16 14:42 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 16.01.2026 15:25, Oleksii Kurochko wrote:
>
> On 1/7/26 5:28 PM, Jan Beulich wrote:
>>> +
>>> + /*
>>> + * VCPU interrupts
>>> + *
>>> + * We have a lockless approach for tracking pending VCPU interrupts
>>> + * implemented using atomic bitops. The irqs_pending bitmap represent
>>> + * pending interrupts whereas irqs_pending_mask represent bits changed
>>> + * in irqs_pending.
>> And hence a set immediately followed by an unset is then indistinguishable
>> from just an unset (or the other way around). This may not be a problem, but
>> if it isn't, I think this needs explaining.
>
> I am still not sure that this is actually a problem, or what kind of explanation
> is needed.
> |unset| is called only when the guest makes such a request, and the guest will
> make that request only after it has received an interrupt that was previously
> set in the|irq_pending| bitmap and then flushed to the hardware HVIP.
>
> If an interrupt is simply set and then unset without ever being flushed to the
> hardware HVIP, it seems there would be no issue, since it would not affect the
> guest. However, the question of why this happened at all would still remain.
>
> Do I miss some corner cases which should be taken into account?
> Should I still have to add some extra explanation to the comment or commit
> message?
Perhaps the problem is that really I can't see the full picture yet, for the
series not putting everything in place that is related.
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread
* [PATCH v1 08/15] xen/riscv: introduce vtimer_set_timer() and vtimer_expired()
2025-12-24 17:03 [PATCH v1 00/15] xen/riscv: introduce vtimer related things Oleksii Kurochko
` (6 preceding siblings ...)
2025-12-24 17:03 ` [PATCH v1 07/15] xen/riscv: introduce tracking of pending vCPU interrupts, part 1 Oleksii Kurochko
@ 2025-12-24 17:03 ` Oleksii Kurochko
2026-01-08 10:28 ` Jan Beulich
2025-12-24 17:03 ` [PATCH v1 09/15] xen/riscv: add vtimer_{save,restore}() Oleksii Kurochko
` (6 subsequent siblings)
14 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2025-12-24 17:03 UTC (permalink / raw)
To: xen-devel
Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Introduce vtimer_set_timer() to program a vCPU’s virtual timer based on
guest-provided tick values. The function handles clearing pending timer
interrupts, converting ticks to nanoseconds, and correctly treating
(uint64_t)-1 as a request to disable the timer per the RISC-V SBI
specification.
Additionally, update vtimer_expired() to inject IRQ_VS_TIMER into
the target vCPU instead of panicking, enabling basic virtual timer
operation.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/include/asm/vtimer.h | 2 ++
xen/arch/riscv/vtimer.c | 30 ++++++++++++++++++++++++++++-
2 files changed, 31 insertions(+), 1 deletion(-)
diff --git a/xen/arch/riscv/include/asm/vtimer.h b/xen/arch/riscv/include/asm/vtimer.h
index a2ca704cf0cc..2cacaf74b83b 100644
--- a/xen/arch/riscv/include/asm/vtimer.h
+++ b/xen/arch/riscv/include/asm/vtimer.h
@@ -22,4 +22,6 @@ void vcpu_timer_destroy(struct vcpu *v);
int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config);
+void vtimer_set_timer(struct vtimer *t, const uint64_t ticks);
+
#endif /* ASM__RISCV__VTIMER_H */
diff --git a/xen/arch/riscv/vtimer.c b/xen/arch/riscv/vtimer.c
index 5ba533690bc2..99a0c5986f1d 100644
--- a/xen/arch/riscv/vtimer.c
+++ b/xen/arch/riscv/vtimer.c
@@ -1,6 +1,8 @@
/* SPDX-License-Identifier: GPL-2.0-only */
+#include <xen/domain.h>
#include <xen/sched.h>
+#include <xen/time.h>
#include <public/xen.h>
@@ -15,7 +17,9 @@ int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config)
static void vtimer_expired(void *data)
{
- panic("%s: TBD\n", __func__);
+ struct vtimer *t = data;
+
+ vcpu_set_interrupt(t->v, IRQ_VS_TIMER);
}
int vcpu_vtimer_init(struct vcpu *v)
@@ -37,3 +41,27 @@ void vcpu_timer_destroy(struct vcpu *v)
kill_timer(&v->arch.vtimer.timer);
}
+
+void vtimer_set_timer(struct vtimer *t, const uint64_t ticks)
+{
+ s_time_t expires = ticks_to_ns(ticks - boot_clock_cycles);
+
+ vcpu_unset_interrupt(t->v, IRQ_VS_TIMER);
+
+ /*
+ * According to the RISC-V sbi spec:
+ * If the supervisor wishes to clear the timer interrupt without
+ * scheduling the next timer event, it can either request a timer
+ * interrupt infinitely far into the future (i.e., (uint64_t)-1),
+ * or it can instead mask the timer interrupt by clearing sie.STIE CSR
+ * bit.
+ */
+ if ( ticks == ((uint64_t)~0ULL) )
+ {
+ stop_timer(&t->timer);
+
+ return;
+ }
+
+ set_timer(&t->timer, expires);
+}
--
2.52.0
^ permalink raw reply related [flat|nested] 93+ messages in thread* Re: [PATCH v1 08/15] xen/riscv: introduce vtimer_set_timer() and vtimer_expired()
2025-12-24 17:03 ` [PATCH v1 08/15] xen/riscv: introduce vtimer_set_timer() and vtimer_expired() Oleksii Kurochko
@ 2026-01-08 10:28 ` Jan Beulich
2026-01-13 14:44 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-08 10:28 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 24.12.2025 18:03, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/include/asm/vtimer.h
> +++ b/xen/arch/riscv/include/asm/vtimer.h
> @@ -22,4 +22,6 @@ void vcpu_timer_destroy(struct vcpu *v);
>
> int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config);
>
> +void vtimer_set_timer(struct vtimer *t, const uint64_t ticks);
> +
> #endif /* ASM__RISCV__VTIMER_H */
> diff --git a/xen/arch/riscv/vtimer.c b/xen/arch/riscv/vtimer.c
> index 5ba533690bc2..99a0c5986f1d 100644
> --- a/xen/arch/riscv/vtimer.c
> +++ b/xen/arch/riscv/vtimer.c
> @@ -1,6 +1,8 @@
> /* SPDX-License-Identifier: GPL-2.0-only */
>
> +#include <xen/domain.h>
Is this really needed, when ...
> #include <xen/sched.h>
... this is already there?
> +#include <xen/time.h>
Don't you mean xen/timer.h here?
> @@ -15,7 +17,9 @@ int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config)
>
> static void vtimer_expired(void *data)
> {
> - panic("%s: TBD\n", __func__);
> + struct vtimer *t = data;
Pointer-to-const please.
> @@ -37,3 +41,27 @@ void vcpu_timer_destroy(struct vcpu *v)
>
> kill_timer(&v->arch.vtimer.timer);
> }
> +
> +void vtimer_set_timer(struct vtimer *t, const uint64_t ticks)
> +{
> + s_time_t expires = ticks_to_ns(ticks - boot_clock_cycles);
boot_clock_cycles is known to just Xen. If the guest provided input is an
absolute value, how would that work across migration? Doesn't there need
to be a guest-specific bias instead?
> + vcpu_unset_interrupt(t->v, IRQ_VS_TIMER);
> +
> + /*
> + * According to the RISC-V sbi spec:
> + * If the supervisor wishes to clear the timer interrupt without
> + * scheduling the next timer event, it can either request a timer
> + * interrupt infinitely far into the future (i.e., (uint64_t)-1),
> + * or it can instead mask the timer interrupt by clearing sie.STIE CSR
> + * bit.
> + */
And SBI is the only way to set the expiry value? No CSR access? (Question
also concerns the unconditional vcpu_unset_interrupt() above.)
> + if ( ticks == ((uint64_t)~0ULL) )
Nit: With the cast you won't need the ULL suffix.
> + {
> + stop_timer(&t->timer);
> +
> + return;
> + }
> +
> + set_timer(&t->timer, expires);
See the handling of VCPUOP_set_singleshot_timer for what you may want to
do if the expiry asked for is (perhaps just very slightly) into the past.
There you'll also find a use of migrate_timer(), which you will want to
at least consider using here as well.
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 08/15] xen/riscv: introduce vtimer_set_timer() and vtimer_expired()
2026-01-08 10:28 ` Jan Beulich
@ 2026-01-13 14:44 ` Oleksii Kurochko
2026-01-13 15:12 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-13 14:44 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/8/26 11:28 AM, Jan Beulich wrote:
> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/include/asm/vtimer.h
>> +++ b/xen/arch/riscv/include/asm/vtimer.h
>> @@ -22,4 +22,6 @@ void vcpu_timer_destroy(struct vcpu *v);
>>
>> int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config);
>>
>> +void vtimer_set_timer(struct vtimer *t, const uint64_t ticks);
>> +
>> #endif /* ASM__RISCV__VTIMER_H */
>> diff --git a/xen/arch/riscv/vtimer.c b/xen/arch/riscv/vtimer.c
>> index 5ba533690bc2..99a0c5986f1d 100644
>> --- a/xen/arch/riscv/vtimer.c
>> +++ b/xen/arch/riscv/vtimer.c
>> @@ -1,6 +1,8 @@
>> /* SPDX-License-Identifier: GPL-2.0-only */
>>
>> +#include <xen/domain.h>
> Is this really needed, when ...
>
>> #include <xen/sched.h>
> ... this is already there?
With the way how includes look in xen/sched.h - no.
>
>> +#include <xen/time.h>
> Don't you mean xen/timer.h here?
You are right, it should be xen/timer.h as set_timer(), stop_timer() and migrate_timer()
are from xen/timer.h.
>
>> @@ -15,7 +17,9 @@ int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config)
>>
>> static void vtimer_expired(void *data)
>> {
>> - panic("%s: TBD\n", __func__);
>> + struct vtimer *t = data;
> Pointer-to-const please.
>
>> @@ -37,3 +41,27 @@ void vcpu_timer_destroy(struct vcpu *v)
>>
>> kill_timer(&v->arch.vtimer.timer);
>> }
>> +
>> +void vtimer_set_timer(struct vtimer *t, const uint64_t ticks)
>> +{
>> + s_time_t expires = ticks_to_ns(ticks - boot_clock_cycles);
> boot_clock_cycles is known to just Xen. If the guest provided input is an
> absolute value, how would that work across migration? Doesn't there need
> to be a guest-specific bias instead?
I think that I don't understand fully your questions, but it sounds like it is a job
for htimedelta register.
>
>> + vcpu_unset_interrupt(t->v, IRQ_VS_TIMER);
>> +
>> + /*
>> + * According to the RISC-V sbi spec:
>> + * If the supervisor wishes to clear the timer interrupt without
>> + * scheduling the next timer event, it can either request a timer
>> + * interrupt infinitely far into the future (i.e., (uint64_t)-1),
>> + * or it can instead mask the timer interrupt by clearing sie.STIE CSR
>> + * bit.
>> + */
> And SBI is the only way to set the expiry value? No CSR access? (Question
> also concerns the unconditional vcpu_unset_interrupt() above.)
If we don't have SSTC extension support then I suppose yes, as CSR_MI{E,P} could
be accessed only from M-mode:
(code from OpenSBI)
void sbi_timer_event_start(u64 next_event)
{
sbi_pmu_ctr_incr_fw(SBI_PMU_FW_SET_TIMER);
/**
* Update the stimecmp directly if available. This allows
* the older software to leverage sstc extension on newer hardware.
*/
if (sbi_hart_has_extension(sbi_scratch_thishart_ptr(), SBI_HART_EXT_SSTC)) {
#if __riscv_xlen == 32
csr_write(CSR_STIMECMP, next_event & 0xFFFFFFFF);
csr_write(CSR_STIMECMPH, next_event >> 32);
#else
csr_write(CSR_STIMECMP, next_event);
#endif
} else if (timer_dev && timer_dev->timer_event_start) {
timer_dev->timer_event_start(next_event);
csr_clear(CSR_MIP, MIP_STIP);
}
csr_set(CSR_MIE, MIP_MTIP);
}
>
>> + if ( ticks == ((uint64_t)~0ULL) )
> Nit: With the cast you won't need the ULL suffix.
>
>> + {
>> + stop_timer(&t->timer);
>> +
>> + return;
>> + }
>> +
>> + set_timer(&t->timer, expires);
> See the handling of VCPUOP_set_singleshot_timer for what you may want to
> do if the expiry asked for is (perhaps just very slightly) into the past.
I got an idea why we want to check if "expires" already expired, but ...
> There you'll also find a use of migrate_timer(), which you will want to
> at least consider using here as well.
... I don't get why we want to migrate timer before set_timer() here.
Could you please explain that?
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 08/15] xen/riscv: introduce vtimer_set_timer() and vtimer_expired()
2026-01-13 14:44 ` Oleksii Kurochko
@ 2026-01-13 15:12 ` Jan Beulich
2026-01-14 12:27 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-13 15:12 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 13.01.2026 15:44, Oleksii Kurochko wrote:
> On 1/8/26 11:28 AM, Jan Beulich wrote:
>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>> @@ -15,7 +17,9 @@ int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config)
>>>
>>> static void vtimer_expired(void *data)
>>> {
>>> - panic("%s: TBD\n", __func__);
>>> + struct vtimer *t = data;
>> Pointer-to-const please.
>>
>>> @@ -37,3 +41,27 @@ void vcpu_timer_destroy(struct vcpu *v)
>>>
>>> kill_timer(&v->arch.vtimer.timer);
>>> }
>>> +
>>> +void vtimer_set_timer(struct vtimer *t, const uint64_t ticks)
>>> +{
>>> + s_time_t expires = ticks_to_ns(ticks - boot_clock_cycles);
>> boot_clock_cycles is known to just Xen. If the guest provided input is an
>> absolute value, how would that work across migration? Doesn't there need
>> to be a guest-specific bias instead?
>
> I think that I don't understand fully your questions, but it sounds like it is a job
> for htimedelta register.
Ah yes. As said, still learning RISC-V while reviewing your work.
>>> + vcpu_unset_interrupt(t->v, IRQ_VS_TIMER);
>>> +
>>> + /*
>>> + * According to the RISC-V sbi spec:
>>> + * If the supervisor wishes to clear the timer interrupt without
>>> + * scheduling the next timer event, it can either request a timer
>>> + * interrupt infinitely far into the future (i.e., (uint64_t)-1),
>>> + * or it can instead mask the timer interrupt by clearing sie.STIE CSR
>>> + * bit.
>>> + */
>> And SBI is the only way to set the expiry value? No CSR access? (Question
>> also concerns the unconditional vcpu_unset_interrupt() above.)
>
> If we don't have SSTC extension support then I suppose yes, as CSR_MI{E,P} could
> be accessed only from M-mode:
How do M-mode CSRs come into play here? My question was rather towards ...
> (code from OpenSBI)
> void sbi_timer_event_start(u64 next_event)
> {
> sbi_pmu_ctr_incr_fw(SBI_PMU_FW_SET_TIMER);
>
> /**
> * Update the stimecmp directly if available. This allows
> * the older software to leverage sstc extension on newer hardware.
> */
> if (sbi_hart_has_extension(sbi_scratch_thishart_ptr(), SBI_HART_EXT_SSTC)) {
> #if __riscv_xlen == 32
> csr_write(CSR_STIMECMP, next_event & 0xFFFFFFFF);
> csr_write(CSR_STIMECMPH, next_event >> 32);
> #else
> csr_write(CSR_STIMECMP, next_event);
> #endif
... what if a guest did these CSR writes directly. Besides intercepting
access to them, you'd also need to synchronize both paths, I suppose.
>>> + {
>>> + stop_timer(&t->timer);
>>> +
>>> + return;
>>> + }
>>> +
>>> + set_timer(&t->timer, expires);
>> See the handling of VCPUOP_set_singleshot_timer for what you may want to
>> do if the expiry asked for is (perhaps just very slightly) into the past.
>
> I got an idea why we want to check if "expires" already expired, but ...
>
>> There you'll also find a use of migrate_timer(), which you will want to
>> at least consider using here as well.
>
> ... I don't get why we want to migrate timer before set_timer() here.
> Could you please explain that?
Didn't I see you use migrate_timer() in other patches (making me assume
you understand)? Having the timer tied to the pCPU where the vCPU runs
means the signalling to that vCPU will (commonly) be cheaper. Whether
that actually matters depends on what vtimer_expired() will eventually
contain. Hence why I said "consider using".
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 08/15] xen/riscv: introduce vtimer_set_timer() and vtimer_expired()
2026-01-13 15:12 ` Jan Beulich
@ 2026-01-14 12:27 ` Oleksii Kurochko
2026-01-14 14:57 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-14 12:27 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/13/26 4:12 PM, Jan Beulich wrote:
> On 13.01.2026 15:44, Oleksii Kurochko wrote:
>> On 1/8/26 11:28 AM, Jan Beulich wrote:
>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>> + vcpu_unset_interrupt(t->v, IRQ_VS_TIMER);
>>>> +
>>>> + /*
>>>> + * According to the RISC-V sbi spec:
>>>> + * If the supervisor wishes to clear the timer interrupt without
>>>> + * scheduling the next timer event, it can either request a timer
>>>> + * interrupt infinitely far into the future (i.e., (uint64_t)-1),
>>>> + * or it can instead mask the timer interrupt by clearing sie.STIE CSR
>>>> + * bit.
>>>> + */
>>> And SBI is the only way to set the expiry value? No CSR access? (Question
>>> also concerns the unconditional vcpu_unset_interrupt() above.)
>> If we don't have SSTC extension support then I suppose yes, as CSR_MI{E,P} could
>> be accessed only from M-mode:
> How do M-mode CSRs come into play here? My question was rather towards ...
Without SSTC (Supervisor Timer Extension) the current Privileged arch specification
only defines a hardware mechanism for generating machine-mode timer interrupts (based
on the mtime and mtimecmp registers). With the resultant requirement that timer
services for S-mode/HS-mode (and for VS-mode) have to all be provided by M-mode - via
SBI calls from S/HS-mode up to M-mode (or VS-mode calls to HS-mode and then to M-mode).
>
>> (code from OpenSBI)
>> void sbi_timer_event_start(u64 next_event)
>> {
>> sbi_pmu_ctr_incr_fw(SBI_PMU_FW_SET_TIMER);
>>
>> /**
>> * Update the stimecmp directly if available. This allows
>> * the older software to leverage sstc extension on newer hardware.
>> */
>> if (sbi_hart_has_extension(sbi_scratch_thishart_ptr(), SBI_HART_EXT_SSTC)) {
>> #if __riscv_xlen == 32
>> csr_write(CSR_STIMECMP, next_event & 0xFFFFFFFF);
>> csr_write(CSR_STIMECMPH, next_event >> 32);
>> #else
>> csr_write(CSR_STIMECMP, next_event);
>> #endif
> ... what if a guest did these CSR writes directly. Besides intercepting
> access to them,
These registers are available only when the SSTC extension is present.
When SSTC is available and a guest accesses CSR_STIMECMP{H}, it actually
accesses the corresponding VS aliases, VSTIMECMP{H}. The hardware continuously
compares the value in VSTIMECMP against the guest’s view of time
(time + htimedelta). When the condition is met, the hardware asserts the
virtual supervisor timer interrupt pending bit (VSTIP) in the hypervisor’s
HIP register and guest automatically receives timer interrupt.
Therefore, there is no real need to intercept accesses to these registers.
It is possible that VS-mode software may continue to use the SBI timer call
instead of directly accessing the SSTC CSRs. In that case, VSTIMECMP would
need to be updated manually by the hypervisor when such an SBI call occurs.
However, this is not the case at the moment, as the SSTC extension is not
currently supported.
Technically, the hypervisor could also clear henvcfg.STCE when SSTC is
vailable. In that case, the hypervisor would receive an illegal
instruction trap in HS-mode when the guest attempts to access SSTC-related
registers.
However, I do not see a reason to prevent delegation of SSTC register access
to the guest, since SSTC provides VS-* aliases for these registers, so I don't
consider that as a real case.
> you'd also need to synchronize both paths, I suppose.
I didn't get you what is needed to be synchronized. Could you please explain?
>
>>>> + {
>>>> + stop_timer(&t->timer);
>>>> +
>>>> + return;
>>>> + }
>>>> +
>>>> + set_timer(&t->timer, expires);
>>> See the handling of VCPUOP_set_singleshot_timer for what you may want to
>>> do if the expiry asked for is (perhaps just very slightly) into the past.
>> I got an idea why we want to check if "expires" already expired, but ...
>>
>>> There you'll also find a use of migrate_timer(), which you will want to
>>> at least consider using here as well.
>> ... I don't get why we want to migrate timer before set_timer() here.
>> Could you please explain that?
> Didn't I see you use migrate_timer() in other patches (making me assume
> you understand)? Having the timer tied to the pCPU where the vCPU runs
> means the signalling to that vCPU will (commonly) be cheaper.
I thought that migrate_timer() is needed only when a vCPU changes the pCPU
it is running on to ensure that it is running on correct pCPU after migrations,
hotplug events, or scheduling changes. That is why I placed it in
vtimer_restore(), as there is no guarantee that the vCPU will run on the
same pCPU it was running on previously.
So that is why ...
> Whether
> that actually matters depends on what vtimer_expired() will eventually
> contain. Hence why I said "consider using".
... I didn't get why I might need vtimer_expired() in vtimer_set_timer()
before set_timer().
vtimer_expired() will only notify the vCPU that a timer interrupt has
occurred by setting bit in irqs_pending bitmap which then will be synced
with vcpu->hvip, but I still do not understand whether migrate_timer()
is needed before calling set_timer() here.
Considering that vtimer_set_timer() is called from the vCPU while it is
running on the current pCPU, and assuming no pCPU rescheduling has
occurred for this vCPU, we are already on the correct pCPU.
If pCPU rescheduling for the vCPU did occur, then migrate_timer() would
have been called in context_switch(), and at the point where
vtimer_set_timer() is invoked, we would already be running on the
correct pCPU.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 08/15] xen/riscv: introduce vtimer_set_timer() and vtimer_expired()
2026-01-14 12:27 ` Oleksii Kurochko
@ 2026-01-14 14:57 ` Jan Beulich
2026-01-14 15:59 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-14 14:57 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 14.01.2026 13:27, Oleksii Kurochko wrote:
> On 1/13/26 4:12 PM, Jan Beulich wrote:
>> On 13.01.2026 15:44, Oleksii Kurochko wrote:
>>> On 1/8/26 11:28 AM, Jan Beulich wrote:
>>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
> Therefore, there is no real need to intercept accesses to these registers.
With this ...
>> you'd also need to synchronize both paths, I suppose.
>
> I didn't get you what is needed to be synchronized. Could you please explain?
... there's nothing to synchronize.
>>>>> + {
>>>>> + stop_timer(&t->timer);
>>>>> +
>>>>> + return;
>>>>> + }
>>>>> +
>>>>> + set_timer(&t->timer, expires);
>>>> See the handling of VCPUOP_set_singleshot_timer for what you may want to
>>>> do if the expiry asked for is (perhaps just very slightly) into the past.
>>> I got an idea why we want to check if "expires" already expired, but ...
>>>
>>>> There you'll also find a use of migrate_timer(), which you will want to
>>>> at least consider using here as well.
>>> ... I don't get why we want to migrate timer before set_timer() here.
>>> Could you please explain that?
>> Didn't I see you use migrate_timer() in other patches (making me assume
>> you understand)? Having the timer tied to the pCPU where the vCPU runs
>> means the signalling to that vCPU will (commonly) be cheaper.
>
> I thought that migrate_timer() is needed only when a vCPU changes the pCPU
> it is running on to ensure that it is running on correct pCPU after migrations,
> hotplug events, or scheduling changes. That is why I placed it in
> vtimer_restore(), as there is no guarantee that the vCPU will run on the
> same pCPU it was running on previously.
>
> So that is why ...
>
>> Whether
>> that actually matters depends on what vtimer_expired() will eventually
>> contain. Hence why I said "consider using".
>
> ... I didn't get why I might need vtimer_expired() in vtimer_set_timer()
> before set_timer().
>
> vtimer_expired() will only notify the vCPU that a timer interrupt has
> occurred by setting bit in irqs_pending bitmap which then will be synced
> with vcpu->hvip, but I still do not understand whether migrate_timer()
> is needed before calling set_timer() here.
Just to repeat - it's not needed. It may be wanted.
> Considering that vtimer_set_timer() is called from the vCPU while it is
> running on the current pCPU, and assuming no pCPU rescheduling has
> occurred for this vCPU, we are already on the correct pCPU.
> If pCPU rescheduling for the vCPU did occur, then migrate_timer() would
> have been called in context_switch(),
Even if the timer wasn't active?
Jan
> and at the point where
> vtimer_set_timer() is invoked, we would already be running on the
> correct pCPU.
>
> ~ Oleksii
>
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 08/15] xen/riscv: introduce vtimer_set_timer() and vtimer_expired()
2026-01-14 14:57 ` Jan Beulich
@ 2026-01-14 15:59 ` Oleksii Kurochko
2026-01-15 7:52 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-14 15:59 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/14/26 3:57 PM, Jan Beulich wrote:
> On 14.01.2026 13:27, Oleksii Kurochko wrote:
>> On 1/13/26 4:12 PM, Jan Beulich wrote:
>>> On 13.01.2026 15:44, Oleksii Kurochko wrote:
>>>> On 1/8/26 11:28 AM, Jan Beulich wrote:
>>>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>>>> + {
>>>>>> + stop_timer(&t->timer);
>>>>>> +
>>>>>> + return;
>>>>>> + }
>>>>>> +
>>>>>> + set_timer(&t->timer, expires);
>>>>> See the handling of VCPUOP_set_singleshot_timer for what you may want to
>>>>> do if the expiry asked for is (perhaps just very slightly) into the past.
>>>> I got an idea why we want to check if "expires" already expired, but ...
>>>>
>>>>> There you'll also find a use of migrate_timer(), which you will want to
>>>>> at least consider using here as well.
>>>> ... I don't get why we want to migrate timer before set_timer() here.
>>>> Could you please explain that?
>>> Didn't I see you use migrate_timer() in other patches (making me assume
>>> you understand)? Having the timer tied to the pCPU where the vCPU runs
>>> means the signalling to that vCPU will (commonly) be cheaper.
>> I thought that migrate_timer() is needed only when a vCPU changes the pCPU
>> it is running on to ensure that it is running on correct pCPU after migrations,
>> hotplug events, or scheduling changes. That is why I placed it in
>> vtimer_restore(), as there is no guarantee that the vCPU will run on the
>> same pCPU it was running on previously.
>>
>> So that is why ...
>>
>>> Whether
>>> that actually matters depends on what vtimer_expired() will eventually
>>> contain. Hence why I said "consider using".
>> ... I didn't get why I might need vtimer_expired() in vtimer_set_timer()
>> before set_timer().
>>
>> vtimer_expired() will only notify the vCPU that a timer interrupt has
>> occurred by setting bit in irqs_pending bitmap which then will be synced
>> with vcpu->hvip, but I still do not understand whether migrate_timer()
>> is needed before calling set_timer() here.
> Just to repeat - it's not needed. It may be wanted.
>
>> Considering that vtimer_set_timer() is called from the vCPU while it is
>> running on the current pCPU, and assuming no pCPU rescheduling has
>> occurred for this vCPU, we are already on the correct pCPU.
>> If pCPU rescheduling for the vCPU did occur, then migrate_timer() would
>> have been called in context_switch(),
> Even if the timer wasn't active?
Yes, migrate_timer() is called unconditionally in vtimer_restore() called
from context_switch(). migrate_timer() will activate the timer.
~ Oleksii
>> and at the point where
>> vtimer_set_timer() is invoked, we would already be running on the
>> correct pCPU.
>>
>> ~ Oleksii
>>
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 08/15] xen/riscv: introduce vtimer_set_timer() and vtimer_expired()
2026-01-14 15:59 ` Oleksii Kurochko
@ 2026-01-15 7:52 ` Jan Beulich
2026-01-15 9:30 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-15 7:52 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 14.01.2026 16:59, Oleksii Kurochko wrote:
>
> On 1/14/26 3:57 PM, Jan Beulich wrote:
>> On 14.01.2026 13:27, Oleksii Kurochko wrote:
>>> On 1/13/26 4:12 PM, Jan Beulich wrote:
>>>> On 13.01.2026 15:44, Oleksii Kurochko wrote:
>>>>> On 1/8/26 11:28 AM, Jan Beulich wrote:
>>>>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>>>>> + {
>>>>>>> + stop_timer(&t->timer);
>>>>>>> +
>>>>>>> + return;
>>>>>>> + }
>>>>>>> +
>>>>>>> + set_timer(&t->timer, expires);
>>>>>> See the handling of VCPUOP_set_singleshot_timer for what you may want to
>>>>>> do if the expiry asked for is (perhaps just very slightly) into the past.
>>>>> I got an idea why we want to check if "expires" already expired, but ...
>>>>>
>>>>>> There you'll also find a use of migrate_timer(), which you will want to
>>>>>> at least consider using here as well.
>>>>> ... I don't get why we want to migrate timer before set_timer() here.
>>>>> Could you please explain that?
>>>> Didn't I see you use migrate_timer() in other patches (making me assume
>>>> you understand)? Having the timer tied to the pCPU where the vCPU runs
>>>> means the signalling to that vCPU will (commonly) be cheaper.
>>> I thought that migrate_timer() is needed only when a vCPU changes the pCPU
>>> it is running on to ensure that it is running on correct pCPU after migrations,
>>> hotplug events, or scheduling changes. That is why I placed it in
>>> vtimer_restore(), as there is no guarantee that the vCPU will run on the
>>> same pCPU it was running on previously.
>>>
>>> So that is why ...
>>>
>>>> Whether
>>>> that actually matters depends on what vtimer_expired() will eventually
>>>> contain. Hence why I said "consider using".
>>> ... I didn't get why I might need vtimer_expired() in vtimer_set_timer()
>>> before set_timer().
>>>
>>> vtimer_expired() will only notify the vCPU that a timer interrupt has
>>> occurred by setting bit in irqs_pending bitmap which then will be synced
>>> with vcpu->hvip, but I still do not understand whether migrate_timer()
>>> is needed before calling set_timer() here.
>> Just to repeat - it's not needed. It may be wanted.
>>
>>> Considering that vtimer_set_timer() is called from the vCPU while it is
>>> running on the current pCPU, and assuming no pCPU rescheduling has
>>> occurred for this vCPU, we are already on the correct pCPU.
>>> If pCPU rescheduling for the vCPU did occur, then migrate_timer() would
>>> have been called in context_switch(),
>> Even if the timer wasn't active?
>
> Yes, migrate_timer() is called unconditionally in vtimer_restore() called
> from context_switch(). migrate_timer() will activate the timer.
Which is wrong?
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 08/15] xen/riscv: introduce vtimer_set_timer() and vtimer_expired()
2026-01-15 7:52 ` Jan Beulich
@ 2026-01-15 9:30 ` Oleksii Kurochko
2026-01-15 9:55 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-15 9:30 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/15/26 8:52 AM, Jan Beulich wrote:
> On 14.01.2026 16:59, Oleksii Kurochko wrote:
>> On 1/14/26 3:57 PM, Jan Beulich wrote:
>>> On 14.01.2026 13:27, Oleksii Kurochko wrote:
>>>> On 1/13/26 4:12 PM, Jan Beulich wrote:
>>>>> On 13.01.2026 15:44, Oleksii Kurochko wrote:
>>>>>> On 1/8/26 11:28 AM, Jan Beulich wrote:
>>>>>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>>>>>> + {
>>>>>>>> + stop_timer(&t->timer);
>>>>>>>> +
>>>>>>>> + return;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + set_timer(&t->timer, expires);
>>>>>>> See the handling of VCPUOP_set_singleshot_timer for what you may want to
>>>>>>> do if the expiry asked for is (perhaps just very slightly) into the past.
>>>>>> I got an idea why we want to check if "expires" already expired, but ...
>>>>>>
>>>>>>> There you'll also find a use of migrate_timer(), which you will want to
>>>>>>> at least consider using here as well.
>>>>>> ... I don't get why we want to migrate timer before set_timer() here.
>>>>>> Could you please explain that?
>>>>> Didn't I see you use migrate_timer() in other patches (making me assume
>>>>> you understand)? Having the timer tied to the pCPU where the vCPU runs
>>>>> means the signalling to that vCPU will (commonly) be cheaper.
>>>> I thought that migrate_timer() is needed only when a vCPU changes the pCPU
>>>> it is running on to ensure that it is running on correct pCPU after migrations,
>>>> hotplug events, or scheduling changes. That is why I placed it in
>>>> vtimer_restore(), as there is no guarantee that the vCPU will run on the
>>>> same pCPU it was running on previously.
>>>>
>>>> So that is why ...
>>>>
>>>>> Whether
>>>>> that actually matters depends on what vtimer_expired() will eventually
>>>>> contain. Hence why I said "consider using".
>>>> ... I didn't get why I might need vtimer_expired() in vtimer_set_timer()
>>>> before set_timer().
>>>>
>>>> vtimer_expired() will only notify the vCPU that a timer interrupt has
>>>> occurred by setting bit in irqs_pending bitmap which then will be synced
>>>> with vcpu->hvip, but I still do not understand whether migrate_timer()
>>>> is needed before calling set_timer() here.
>>> Just to repeat - it's not needed. It may be wanted.
>>>
>>>> Considering that vtimer_set_timer() is called from the vCPU while it is
>>>> running on the current pCPU, and assuming no pCPU rescheduling has
>>>> occurred for this vCPU, we are already on the correct pCPU.
>>>> If pCPU rescheduling for the vCPU did occur, then migrate_timer() would
>>>> have been called in context_switch(),
>>> Even if the timer wasn't active?
>> Yes, migrate_timer() is called unconditionally in vtimer_restore() called
>> from context_switch(). migrate_timer() will activate the timer.
> Which is wrong?
I don't know, based on the comment above migrate_timer():
/* Migrate a timer to a different CPU. The timer may be currently active. */
it doesn't mention that it shouldn't be called if the timer wasn't active.
All around other cases where migrate_timer() is used I don't see also that
anyone checks if a timer is active or not.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 08/15] xen/riscv: introduce vtimer_set_timer() and vtimer_expired()
2026-01-15 9:30 ` Oleksii Kurochko
@ 2026-01-15 9:55 ` Jan Beulich
0 siblings, 0 replies; 93+ messages in thread
From: Jan Beulich @ 2026-01-15 9:55 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 15.01.2026 10:30, Oleksii Kurochko wrote:
>
> On 1/15/26 8:52 AM, Jan Beulich wrote:
>> On 14.01.2026 16:59, Oleksii Kurochko wrote:
>>> On 1/14/26 3:57 PM, Jan Beulich wrote:
>>>> On 14.01.2026 13:27, Oleksii Kurochko wrote:
>>>>> On 1/13/26 4:12 PM, Jan Beulich wrote:
>>>>>> On 13.01.2026 15:44, Oleksii Kurochko wrote:
>>>>>>> On 1/8/26 11:28 AM, Jan Beulich wrote:
>>>>>>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>>>>>>> + {
>>>>>>>>> + stop_timer(&t->timer);
>>>>>>>>> +
>>>>>>>>> + return;
>>>>>>>>> + }
>>>>>>>>> +
>>>>>>>>> + set_timer(&t->timer, expires);
>>>>>>>> See the handling of VCPUOP_set_singleshot_timer for what you may want to
>>>>>>>> do if the expiry asked for is (perhaps just very slightly) into the past.
>>>>>>> I got an idea why we want to check if "expires" already expired, but ...
>>>>>>>
>>>>>>>> There you'll also find a use of migrate_timer(), which you will want to
>>>>>>>> at least consider using here as well.
>>>>>>> ... I don't get why we want to migrate timer before set_timer() here.
>>>>>>> Could you please explain that?
>>>>>> Didn't I see you use migrate_timer() in other patches (making me assume
>>>>>> you understand)? Having the timer tied to the pCPU where the vCPU runs
>>>>>> means the signalling to that vCPU will (commonly) be cheaper.
>>>>> I thought that migrate_timer() is needed only when a vCPU changes the pCPU
>>>>> it is running on to ensure that it is running on correct pCPU after migrations,
>>>>> hotplug events, or scheduling changes. That is why I placed it in
>>>>> vtimer_restore(), as there is no guarantee that the vCPU will run on the
>>>>> same pCPU it was running on previously.
>>>>>
>>>>> So that is why ...
>>>>>
>>>>>> Whether
>>>>>> that actually matters depends on what vtimer_expired() will eventually
>>>>>> contain. Hence why I said "consider using".
>>>>> ... I didn't get why I might need vtimer_expired() in vtimer_set_timer()
>>>>> before set_timer().
>>>>>
>>>>> vtimer_expired() will only notify the vCPU that a timer interrupt has
>>>>> occurred by setting bit in irqs_pending bitmap which then will be synced
>>>>> with vcpu->hvip, but I still do not understand whether migrate_timer()
>>>>> is needed before calling set_timer() here.
>>>> Just to repeat - it's not needed. It may be wanted.
>>>>
>>>>> Considering that vtimer_set_timer() is called from the vCPU while it is
>>>>> running on the current pCPU, and assuming no pCPU rescheduling has
>>>>> occurred for this vCPU, we are already on the correct pCPU.
>>>>> If pCPU rescheduling for the vCPU did occur, then migrate_timer() would
>>>>> have been called in context_switch(),
>>>> Even if the timer wasn't active?
>>> Yes, migrate_timer() is called unconditionally in vtimer_restore() called
>>> from context_switch(). migrate_timer() will activate the timer.
>> Which is wrong?
>
> I don't know, based on the comment above migrate_timer():
> /* Migrate a timer to a different CPU. The timer may be currently active. */
>
> it doesn't mention that it shouldn't be called if the timer wasn't active.
> All around other cases where migrate_timer() is used I don't see also that
> anyone checks if a timer is active or not.
Hmm, I'm sorry, I was mis-remembering. Migrating is indeed fine for inactive
timers.
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread
* [PATCH v1 09/15] xen/riscv: add vtimer_{save,restore}()
2025-12-24 17:03 [PATCH v1 00/15] xen/riscv: introduce vtimer related things Oleksii Kurochko
` (7 preceding siblings ...)
2025-12-24 17:03 ` [PATCH v1 08/15] xen/riscv: introduce vtimer_set_timer() and vtimer_expired() Oleksii Kurochko
@ 2025-12-24 17:03 ` Oleksii Kurochko
2026-01-08 10:43 ` Jan Beulich
2025-12-24 17:03 ` [PATCH v1 10/15] xen/riscv: implement SBI legacy SET_TIMER support for guests Oleksii Kurochko
` (5 subsequent siblings)
14 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2025-12-24 17:03 UTC (permalink / raw)
To: xen-devel
Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Add implementations of vtimer_save() and vtimer_restore().
At the moment, vrtimer_save() does nothing as SSTC, which provided
virtualization-aware timer, isn't supported yet, so emulated (SBI-based)
timer is used.
vtimer uses internal Xen timer: initialize it on the pcpu the vcpu is
running on, rather than the processor that it's creating the vcpu.
On vcpu restore migrate (when vtimer_restore() is going to be called)
the vtimer to the pcpu the vcpu is running on.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/include/asm/vtimer.h | 3 +++
xen/arch/riscv/vtimer.c | 15 +++++++++++++++
2 files changed, 18 insertions(+)
diff --git a/xen/arch/riscv/include/asm/vtimer.h b/xen/arch/riscv/include/asm/vtimer.h
index 2cacaf74b83b..e0f94f7c31c7 100644
--- a/xen/arch/riscv/include/asm/vtimer.h
+++ b/xen/arch/riscv/include/asm/vtimer.h
@@ -24,4 +24,7 @@ int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config);
void vtimer_set_timer(struct vtimer *t, const uint64_t ticks);
+void vtimer_save(struct vcpu *v);
+void vtimer_restore(struct vcpu *v);
+
#endif /* ASM__RISCV__VTIMER_H */
diff --git a/xen/arch/riscv/vtimer.c b/xen/arch/riscv/vtimer.c
index 99a0c5986f1d..4256fe9a2bb0 100644
--- a/xen/arch/riscv/vtimer.c
+++ b/xen/arch/riscv/vtimer.c
@@ -1,5 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0-only */
+#include <xen/bug.h>
#include <xen/domain.h>
#include <xen/sched.h>
#include <xen/time.h>
@@ -65,3 +66,17 @@ void vtimer_set_timer(struct vtimer *t, const uint64_t ticks)
set_timer(&t->timer, expires);
}
+
+void vtimer_save(struct vcpu *p)
+{
+ ASSERT(!is_idle_vcpu(p));
+
+ /* Nothing to do at the moment as SSTC isn't supported now. */
+}
+
+void vtimer_restore(struct vcpu *n)
+{
+ ASSERT(!is_idle_vcpu(n));
+
+ migrate_timer(&n->arch.vtimer.timer, n->processor);
+}
--
2.52.0
^ permalink raw reply related [flat|nested] 93+ messages in thread* Re: [PATCH v1 09/15] xen/riscv: add vtimer_{save,restore}()
2025-12-24 17:03 ` [PATCH v1 09/15] xen/riscv: add vtimer_{save,restore}() Oleksii Kurochko
@ 2026-01-08 10:43 ` Jan Beulich
2026-01-13 15:32 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-08 10:43 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 24.12.2025 18:03, Oleksii Kurochko wrote:
> Add implementations of vtimer_save() and vtimer_restore().
And these are going to serve what purpose? Are they for context switch, or
for migration / save / restore? In the former case (supported by the naming
of the function parameters), I think they want naming differently (to avoid
confusion). See how x86 has e.g. ..._ctxt_switch_{from,to}() and then
..._switch_{from,to}() helpers.
> At the moment, vrtimer_save() does nothing as SSTC, which provided
> virtualization-aware timer, isn't supported yet, so emulated (SBI-based)
> timer is used.
Is "emulated" really the correct term here? You don't intercept any guest
insns, but rather provide a virtual SBI.
> vtimer uses internal Xen timer: initialize it on the pcpu the vcpu is
> running on, rather than the processor that it's creating the vcpu.
This doesn't look to describe anything this patch does.
> On vcpu restore migrate (when vtimer_restore() is going to be called)
> the vtimer to the pcpu the vcpu is running on.
Why "going to be" when you describe what the function does?
> --- a/xen/arch/riscv/include/asm/vtimer.h
> +++ b/xen/arch/riscv/include/asm/vtimer.h
> @@ -24,4 +24,7 @@ int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config);
>
> void vtimer_set_timer(struct vtimer *t, const uint64_t ticks);
>
> +void vtimer_save(struct vcpu *v);
> +void vtimer_restore(struct vcpu *v);
Misra demands that parameter names in declarations match ...
> @@ -65,3 +66,17 @@ void vtimer_set_timer(struct vtimer *t, const uint64_t ticks)
>
> set_timer(&t->timer, expires);
> }
> +
> +void vtimer_save(struct vcpu *p)
> +{
> + ASSERT(!is_idle_vcpu(p));
> +
> + /* Nothing to do at the moment as SSTC isn't supported now. */
> +}
> +
> +void vtimer_restore(struct vcpu *n)
> +{
> + ASSERT(!is_idle_vcpu(n));
> +
> + migrate_timer(&n->arch.vtimer.timer, n->processor);
> +}
... the ones in the definitions. No matter that RISC-V isn't scanned by Eclair,
yet, I think you want to avoid the need to later fix things up.
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 09/15] xen/riscv: add vtimer_{save,restore}()
2026-01-08 10:43 ` Jan Beulich
@ 2026-01-13 15:32 ` Oleksii Kurochko
2026-01-14 9:00 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-13 15:32 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/8/26 11:43 AM, Jan Beulich wrote:
> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>> Add implementations of vtimer_save() and vtimer_restore().
> And these are going to serve what purpose? Are they for context switch, or
> for migration / save / restore? In the former case (supported by the naming
> of the function parameters), I think they want naming differently (to avoid
> confusion). See how x86 has e.g. ..._ctxt_switch_{from,to}() and then
> ..._switch_{from,to}() helpers.
Based only on the name it is clear for what ..._ctxt_switch_{from,to}() will
be used, ..._switch_{from,to}() isn't clear just based on the name how it will
be used.
Anyway, I am okay to change vtimer_{save,restore}() to vtimer_ctx_switch_{from,to}()
and then follow for other stuff to follow the same approach (as I used for everything
*_save() *_store()).
>> At the moment, vrtimer_save() does nothing as SSTC, which provided
>> virtualization-aware timer, isn't supported yet, so emulated (SBI-based)
>> timer is used.
> Is "emulated" really the correct term here? You don't intercept any guest
> insns, but rather provide a virtual SBI.
I wasn't sure that it is the best one term.
Probably then just "virtual (SBI-based) timer" is better to use.
>
>> vtimer uses internal Xen timer: initialize it on the pcpu the vcpu is
>> running on, rather than the processor that it's creating the vcpu.
> This doesn't look to describe anything this patch does.
Hm, and why not?
In vcpu_vtimer_init() we're initializing timer (it was incorrect to use
"internal Xen timer" though) on CPU is stored in vcpu->processor by calling
init_timier().
I will update this part then to:
Initialize the timer contained in|struct vtimer| by calling|init_timer()|.
>
>> On vcpu restore migrate (when vtimer_restore() is going to be called)
>> the vtimer to the pcpu the vcpu is running on.
> Why "going to be" when you describe what the function does?
Because it isn't called now. The part inside (...) could be dropped.
>
>> --- a/xen/arch/riscv/include/asm/vtimer.h
>> +++ b/xen/arch/riscv/include/asm/vtimer.h
>> @@ -24,4 +24,7 @@ int domain_vtimer_init(struct domain *d, struct xen_arch_domainconfig *config);
>>
>> void vtimer_set_timer(struct vtimer *t, const uint64_t ticks);
>>
>> +void vtimer_save(struct vcpu *v);
>> +void vtimer_restore(struct vcpu *v);
> Misra demands that parameter names in declarations match ...
>
>> @@ -65,3 +66,17 @@ void vtimer_set_timer(struct vtimer *t, const uint64_t ticks)
>>
>> set_timer(&t->timer, expires);
>> }
>> +
>> +void vtimer_save(struct vcpu *p)
>> +{
>> + ASSERT(!is_idle_vcpu(p));
>> +
>> + /* Nothing to do at the moment as SSTC isn't supported now. */
>> +}
>> +
>> +void vtimer_restore(struct vcpu *n)
>> +{
>> + ASSERT(!is_idle_vcpu(n));
>> +
>> + migrate_timer(&n->arch.vtimer.timer, n->processor);
>> +}
> ... the ones in the definitions. No matter that RISC-V isn't scanned by Eclair,
> yet, I think you want to avoid the need to later fix things up.
Sure, I'll fix that.
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 09/15] xen/riscv: add vtimer_{save,restore}()
2026-01-13 15:32 ` Oleksii Kurochko
@ 2026-01-14 9:00 ` Jan Beulich
0 siblings, 0 replies; 93+ messages in thread
From: Jan Beulich @ 2026-01-14 9:00 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 13.01.2026 16:32, Oleksii Kurochko wrote:
> On 1/8/26 11:43 AM, Jan Beulich wrote:
>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>> vtimer uses internal Xen timer: initialize it on the pcpu the vcpu is
>>> running on, rather than the processor that it's creating the vcpu.
>> This doesn't look to describe anything this patch does.
>
> Hm, and why not?
Because this patch doesn't initialize any timer. The only timer-related
call I see is one to migrate_timer().
> In vcpu_vtimer_init() we're initializing timer (it was incorrect to use
> "internal Xen timer" though) on CPU is stored in vcpu->processor by calling
> init_timier().
>
> I will update this part then to:
> Initialize the timer contained in|struct vtimer| by calling|init_timer()|.
But you don't call that function. (Nor is this, btw, a useful sentence
to have in a patch description. May I suggest that you read a fair number
of in particular Andrew's or Roger's patch descriptions, to get a feel
for what wants saying and what doesn't need to be said? In the case above:
How else could you plausibly initialize that timer? Hence the latter part
of the sentence is largely meaningless. Plus - is leaving the field
uninitialized a plausible option? IOW you're merely stating the obvious
anyway. Sadly, and I'm sorry to have to say that, this carries through
many of your patch descriptions: You mechanically state what is being
done, when really the thinking behind what you're doing and, often,
further plans would be relevant to call out.)
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread
* [PATCH v1 10/15] xen/riscv: implement SBI legacy SET_TIMER support for guests
2025-12-24 17:03 [PATCH v1 00/15] xen/riscv: introduce vtimer related things Oleksii Kurochko
` (8 preceding siblings ...)
2025-12-24 17:03 ` [PATCH v1 09/15] xen/riscv: add vtimer_{save,restore}() Oleksii Kurochko
@ 2025-12-24 17:03 ` Oleksii Kurochko
2026-01-08 10:45 ` Jan Beulich
2025-12-24 17:03 ` [PATCH v1 11/15] xen/riscv: introduce ns_to_ticks() Oleksii Kurochko
` (4 subsequent siblings)
14 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2025-12-24 17:03 UTC (permalink / raw)
To: xen-devel
Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Add handling of the SBI_EXT_0_1_SET_TIMER function ID to the legacy
extension ecall handler. The handler now programs the vCPU’s virtual
timer via vtimer_set_timer() and returns SBI_SUCCESS.
This enables guests using the legacy SBI timer interface to schedule
timer events correctly.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/vsbi/legacy-extension.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/xen/arch/riscv/vsbi/legacy-extension.c b/xen/arch/riscv/vsbi/legacy-extension.c
index ca869942d693..7eb3a1f119d8 100644
--- a/xen/arch/riscv/vsbi/legacy-extension.c
+++ b/xen/arch/riscv/vsbi/legacy-extension.c
@@ -7,6 +7,7 @@
#include <asm/processor.h>
#include <asm/vsbi.h>
+#include <asm/vtimer.h>
static void vsbi_print_char(char c)
{
@@ -44,6 +45,11 @@ static int vsbi_legacy_ecall_handler(unsigned long eid, unsigned long fid,
ret = SBI_ERR_NOT_SUPPORTED;
break;
+ case SBI_EXT_0_1_SET_TIMER:
+ vtimer_set_timer(¤t->arch.vtimer, regs->a0);
+ regs->a0 = SBI_SUCCESS;
+ break;
+
default:
/*
* TODO: domain_crash() is acceptable here while things are still under
--
2.52.0
^ permalink raw reply related [flat|nested] 93+ messages in thread* Re: [PATCH v1 10/15] xen/riscv: implement SBI legacy SET_TIMER support for guests
2025-12-24 17:03 ` [PATCH v1 10/15] xen/riscv: implement SBI legacy SET_TIMER support for guests Oleksii Kurochko
@ 2026-01-08 10:45 ` Jan Beulich
2026-01-13 15:41 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-08 10:45 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 24.12.2025 18:03, Oleksii Kurochko wrote:
> Add handling of the SBI_EXT_0_1_SET_TIMER function ID to the legacy
> extension ecall handler. The handler now programs the vCPU’s virtual
> timer via vtimer_set_timer() and returns SBI_SUCCESS.
>
> This enables guests using the legacy SBI timer interface to schedule
> timer events correctly.
>
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
What about the more modern timer extension, though?
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH v1 10/15] xen/riscv: implement SBI legacy SET_TIMER support for guests
2026-01-08 10:45 ` Jan Beulich
@ 2026-01-13 15:41 ` Oleksii Kurochko
0 siblings, 0 replies; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-13 15:41 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/8/26 11:45 AM, Jan Beulich wrote:
> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>> Add handling of the SBI_EXT_0_1_SET_TIMER function ID to the legacy
>> extension ecall handler. The handler now programs the vCPU’s virtual
>> timer via vtimer_set_timer() and returns SBI_SUCCESS.
>>
>> This enables guests using the legacy SBI timer interface to schedule
>> timer events correctly.
>>
>> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
> Acked-by: Jan Beulich <jbeulich@suse.com>
Thanks.
>
> What about the more modern timer extension, though?
Handling will be the same, as the API is identical:
struct sbiret sbi_set_timer(uint64_t stime_value)
The only additional work needed is to add handling for the new extension
with EID|0x54494D45| (“TIME”), which was introduced in SBI v0.2.
Interestingly, we currently report to the guest that OpenSBI v0.2 is
available, so technically this modern timer extension should be usable.
However, the guest still tries to use the Legacy extension, as it has not
yet received an indication in Xen that the EID|0x54494D45| (“TIME”) isn't
implemented.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread
* [PATCH v1 11/15] xen/riscv: introduce ns_to_ticks()
2025-12-24 17:03 [PATCH v1 00/15] xen/riscv: introduce vtimer related things Oleksii Kurochko
` (9 preceding siblings ...)
2025-12-24 17:03 ` [PATCH v1 10/15] xen/riscv: implement SBI legacy SET_TIMER support for guests Oleksii Kurochko
@ 2025-12-24 17:03 ` Oleksii Kurochko
2026-01-12 14:59 ` [Arm] " Jan Beulich
2025-12-24 17:03 ` [PATCH v1 12/15] xen/riscv: introduce sbi_set_timer() Oleksii Kurochko
` (3 subsequent siblings)
14 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2025-12-24 17:03 UTC (permalink / raw)
To: xen-devel
Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/include/asm/time.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/xen/arch/riscv/include/asm/time.h b/xen/arch/riscv/include/asm/time.h
index 63bdd471ccac..3d013a3ace0f 100644
--- a/xen/arch/riscv/include/asm/time.h
+++ b/xen/arch/riscv/include/asm/time.h
@@ -29,6 +29,11 @@ static inline s_time_t ticks_to_ns(uint64_t ticks)
return muldiv64(ticks, MILLISECS(1), cpu_khz);
}
+static inline uint64_t ns_to_ticks(s_time_t ns)
+{
+ return muldiv64(ns, cpu_khz, MILLISECS(1));
+}
+
void preinit_xen_time(void);
#endif /* ASM__RISCV__TIME_H */
--
2.52.0
^ permalink raw reply related [flat|nested] 93+ messages in thread* [Arm] Re: [PATCH v1 11/15] xen/riscv: introduce ns_to_ticks()
2025-12-24 17:03 ` [PATCH v1 11/15] xen/riscv: introduce ns_to_ticks() Oleksii Kurochko
@ 2026-01-12 14:59 ` Jan Beulich
2026-01-21 0:23 ` Volodymyr Babchuk
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-12 14:59 UTC (permalink / raw)
To: Oleksii Kurochko, Michal Orzel, Julien Grall, Stefano Stabellini
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Roger Pau Monné, xen-devel
On 24.12.2025 18:03, Oleksii Kurochko wrote:
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
> ---
> xen/arch/riscv/include/asm/time.h | 5 +++++
> 1 file changed, 5 insertions(+)
Looks okay and read to go in as is (no dependencies on earlier patches afaics),
but:
> --- a/xen/arch/riscv/include/asm/time.h
> +++ b/xen/arch/riscv/include/asm/time.h
> @@ -29,6 +29,11 @@ static inline s_time_t ticks_to_ns(uint64_t ticks)
> return muldiv64(ticks, MILLISECS(1), cpu_khz);
> }
>
> +static inline uint64_t ns_to_ticks(s_time_t ns)
> +{
> + return muldiv64(ns, cpu_khz, MILLISECS(1));
> +}
It's hard to see what's arch-dependent about this or ticks_to_ns(). They're
similar but not identical to Arm's version, and I actually wonder why that
difference exists. Questions to Arm people:
1) Why are they out-of-line functions there?
2) Why the involvement of the constant 1000 there? 1000 * cpu_khz can
actually overflow in 32 bits. The forms above aren't prone to such an
issue.
If the delta isn't justified, I think we'd better put RISC-V's functions in
common code (xen/time.h). They're not presently needed by x86, but as
inline functions they also shouldn't do any harm.
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [Arm] Re: [PATCH v1 11/15] xen/riscv: introduce ns_to_ticks()
2026-01-12 14:59 ` [Arm] " Jan Beulich
@ 2026-01-21 0:23 ` Volodymyr Babchuk
2026-01-21 1:19 ` Stefano Stabellini
0 siblings, 1 reply; 93+ messages in thread
From: Volodymyr Babchuk @ 2026-01-21 0:23 UTC (permalink / raw)
To: Jan Beulich
Cc: Oleksii Kurochko, Michal Orzel, Julien Grall, Stefano Stabellini,
Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Roger Pau Monné, xen-devel@lists.xenproject.org
Hi Jan,
Jan Beulich <jbeulich@suse.com> writes:
> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>> ---
>> xen/arch/riscv/include/asm/time.h | 5 +++++
>> 1 file changed, 5 insertions(+)
>
> Looks okay and read to go in as is (no dependencies on earlier patches afaics),
> but:
>
>> --- a/xen/arch/riscv/include/asm/time.h
>> +++ b/xen/arch/riscv/include/asm/time.h
>> @@ -29,6 +29,11 @@ static inline s_time_t ticks_to_ns(uint64_t ticks)
>> return muldiv64(ticks, MILLISECS(1), cpu_khz);
>> }
>>
>> +static inline uint64_t ns_to_ticks(s_time_t ns)
>> +{
>> + return muldiv64(ns, cpu_khz, MILLISECS(1));
>> +}
>
> It's hard to see what's arch-dependent about this or ticks_to_ns(). They're
> similar but not identical to Arm's version, and I actually wonder why that
> difference exists. Questions to Arm people:
> 1) Why are they out-of-line functions there?
That's interesting question. According to git blame this is how it was
introduced in 2012 and after that no one touched this part. Original
patch had cntfrq defined as `static`, this explains why these functions
were declared out-of-line.
> 2) Why the involvement of the constant 1000 there? 1000 * cpu_khz can
> actually overflow in 32 bits. The forms above aren't prone to such an
> issue.
Patch "xen: move XEN_SYSCTL_physinfo, XEN_SYSCTL_numainfo and
XEN_SYSCTL_topologyinfo to common code" (096578b4e48) changed hz to
khz. This added that 1000 multiplication. Also this patch removed
`static` qualifier from the counter variable.
Anyways, latest ARM ARM suggests that timer frequency should be fixed at
1GHz, which is shy of 32-bit overflow. So most new platforms will be
fine. And older platforms had much lower frequencies.
> If the delta isn't justified, I think we'd better put RISC-V's functions in
> common code (xen/time.h). They're not presently needed by x86, but as
> inline functions they also shouldn't do any harm.
I'm mere reviewer, but I agree that proposed approach is better and more
resilient.
--
WBR, Volodymyr
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [Arm] Re: [PATCH v1 11/15] xen/riscv: introduce ns_to_ticks()
2026-01-21 0:23 ` Volodymyr Babchuk
@ 2026-01-21 1:19 ` Stefano Stabellini
0 siblings, 0 replies; 93+ messages in thread
From: Stefano Stabellini @ 2026-01-21 1:19 UTC (permalink / raw)
To: Volodymyr Babchuk
Cc: Jan Beulich, Oleksii Kurochko, Michal Orzel, Julien Grall,
Stefano Stabellini, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Roger Pau Monné,
xen-devel@lists.xenproject.org
On Wed, 21 Jan 2026, Volodymyr Babchuk wrote:
> > On 24.12.2025 18:03, Oleksii Kurochko wrote:
> >> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
> >> ---
> >> xen/arch/riscv/include/asm/time.h | 5 +++++
> >> 1 file changed, 5 insertions(+)
> >
> > Looks okay and read to go in as is (no dependencies on earlier patches afaics),
> > but:
> >
> >> --- a/xen/arch/riscv/include/asm/time.h
> >> +++ b/xen/arch/riscv/include/asm/time.h
> >> @@ -29,6 +29,11 @@ static inline s_time_t ticks_to_ns(uint64_t ticks)
> >> return muldiv64(ticks, MILLISECS(1), cpu_khz);
> >> }
> >>
> >> +static inline uint64_t ns_to_ticks(s_time_t ns)
> >> +{
> >> + return muldiv64(ns, cpu_khz, MILLISECS(1));
> >> +}
> >
> > It's hard to see what's arch-dependent about this or ticks_to_ns(). They're
> > similar but not identical to Arm's version, and I actually wonder why that
> > difference exists. Questions to Arm people:
> > 1) Why are they out-of-line functions there?
>
> That's interesting question. According to git blame this is how it was
> introduced in 2012 and after that no one touched this part. Original
> patch had cntfrq defined as `static`, this explains why these functions
> were declared out-of-line.
>
> > 2) Why the involvement of the constant 1000 there? 1000 * cpu_khz can
> > actually overflow in 32 bits. The forms above aren't prone to such an
> > issue.
>
> Patch "xen: move XEN_SYSCTL_physinfo, XEN_SYSCTL_numainfo and
> XEN_SYSCTL_topologyinfo to common code" (096578b4e48) changed hz to
> khz. This added that 1000 multiplication. Also this patch removed
> `static` qualifier from the counter variable.
>
> Anyways, latest ARM ARM suggests that timer frequency should be fixed at
> 1GHz, which is shy of 32-bit overflow. So most new platforms will be
> fine. And older platforms had much lower frequencies.
>
> > If the delta isn't justified, I think we'd better put RISC-V's functions in
> > common code (xen/time.h). They're not presently needed by x86, but as
> > inline functions they also shouldn't do any harm.
>
> I'm mere reviewer, but I agree that proposed approach is better and more
> resilient.
Yes I agree too.
^ permalink raw reply [flat|nested] 93+ messages in thread
* [PATCH v1 12/15] xen/riscv: introduce sbi_set_timer()
2025-12-24 17:03 [PATCH v1 00/15] xen/riscv: introduce vtimer related things Oleksii Kurochko
` (10 preceding siblings ...)
2025-12-24 17:03 ` [PATCH v1 11/15] xen/riscv: introduce ns_to_ticks() Oleksii Kurochko
@ 2025-12-24 17:03 ` Oleksii Kurochko
2026-01-12 15:12 ` Jan Beulich
2025-12-24 17:03 ` [PATCH v1 13/15] xen/riscv: implement reprogram_timer() using SBI Oleksii Kurochko
` (2 subsequent siblings)
14 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2025-12-24 17:03 UTC (permalink / raw)
To: xen-devel
Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Introduce pointer to function which points to a specific sbi_set_timer()
implementation. It is done in this way as different OpenSBI version can
have different Extenion ID and/or funcion ID for TIME extension.
sbi_set_time() programs the clock for next event after stime_value
time. This function also clears the pending timer interrupt bit.
Introduce extension ID and SBI function ID for TIME extension.
Implement only sbi_set_timer_v02() as there is not to much sense
to support earlier version and, at the moment, Xen supports only v02.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/include/asm/sbi.h | 17 +++++++++++++++++
xen/arch/riscv/sbi.c | 26 ++++++++++++++++++++++++++
2 files changed, 43 insertions(+)
diff --git a/xen/arch/riscv/include/asm/sbi.h b/xen/arch/riscv/include/asm/sbi.h
index a88d3d57127a..c54dc7642ff1 100644
--- a/xen/arch/riscv/include/asm/sbi.h
+++ b/xen/arch/riscv/include/asm/sbi.h
@@ -33,6 +33,7 @@
#define SBI_EXT_BASE 0x10
#define SBI_EXT_RFENCE 0x52464E43
+#define SBI_EXT_TIME 0x54494D45
/* SBI function IDs for BASE extension */
#define SBI_EXT_BASE_GET_SPEC_VERSION 0x0
@@ -65,6 +66,9 @@
#define SBI_SPEC_VERSION_DEFAULT 0x1
+/* SBI function IDs for TIME extension */
+#define SBI_EXT_TIME_SET_TIMER 0x0
+
struct sbiret {
long error;
long value;
@@ -138,6 +142,19 @@ int sbi_remote_hfence_gvma(const cpumask_t *cpu_mask, vaddr_t start,
int sbi_remote_hfence_gvma_vmid(const cpumask_t *cpu_mask, vaddr_t start,
size_t size, unsigned long vmid);
+/*
+ * Programs the clock for next event after stime_value time. This function also
+ * clears the pending timer interrupt bit.
+ * If the supervisor wishes to clear the timer interrupt without scheduling the
+ * next timer event, it can either request a timer interrupt infinitely far
+ * into the future (i.e., (uint64_t)-1), or it can instead mask the timer
+ * interrupt by clearing sie.STIE CSR bit.
+ *
+ * This SBI call returns 0 upon success or an implementation specific negative
+ * error code.
+ */
+extern int (*sbi_set_timer)(uint64_t stime_value);
+
/*
* Initialize SBI library
*
diff --git a/xen/arch/riscv/sbi.c b/xen/arch/riscv/sbi.c
index 425dce44c679..206ea3462c50 100644
--- a/xen/arch/riscv/sbi.c
+++ b/xen/arch/riscv/sbi.c
@@ -249,6 +249,26 @@ static int (* __ro_after_init sbi_rfence)(unsigned long fid,
unsigned long arg4,
unsigned long arg5);
+static int cf_check sbi_set_timer_v02(uint64_t stime_value)
+{
+ struct sbiret ret;
+
+#ifdef CONFIG_RISCV_64
+ ret = sbi_ecall(SBI_EXT_TIME, SBI_EXT_TIME_SET_TIMER, stime_value, 0,
+ 0, 0, 0, 0);
+#else
+ ret = sbi_ecall(SBI_EXT_TIME, SBI_EXT_TIME_SET_TIMER, stime_value,
+ stime_value >> 32, 0, 0, 0, 0);
+#endif
+
+ if ( ret.error )
+ return sbi_err_map_xen_errno(ret.error);
+ else
+ return 0;
+}
+
+int (* __ro_after_init sbi_set_timer)(uint64_t stime_value);
+
int sbi_remote_sfence_vma(const cpumask_t *cpu_mask, vaddr_t start,
size_t size)
{
@@ -326,6 +346,12 @@ int __init sbi_init(void)
sbi_rfence = sbi_rfence_v02;
printk("SBI v0.2 RFENCE extension detected\n");
}
+
+ if ( sbi_probe_extension(SBI_EXT_TIME) > 0 )
+ {
+ sbi_set_timer = sbi_set_timer_v02;
+ printk("SBI v0.2 TIME extension detected\n");
+ }
}
else
panic("Ooops. SBI spec version 0.1 detected. Need to add support");
--
2.52.0
^ permalink raw reply related [flat|nested] 93+ messages in thread* Re: [PATCH v1 12/15] xen/riscv: introduce sbi_set_timer()
2025-12-24 17:03 ` [PATCH v1 12/15] xen/riscv: introduce sbi_set_timer() Oleksii Kurochko
@ 2026-01-12 15:12 ` Jan Beulich
2026-01-13 16:33 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-12 15:12 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 24.12.2025 18:03, Oleksii Kurochko wrote:
> Introduce pointer to function which points to a specific sbi_set_timer()
> implementation. It is done in this way as different OpenSBI version can
> have different Extenion ID and/or funcion ID for TIME extension.
>
> sbi_set_time() programs the clock for next event after stime_value
> time. This function also clears the pending timer interrupt bit.
>
> Introduce extension ID and SBI function ID for TIME extension.
>
> Implement only sbi_set_timer_v02() as there is not to much sense
> to support earlier version and, at the moment, Xen supports only v02.
Besides this somewhat contradicting the use of a function pointer: What
about the legacy extension's equivalent?
> --- a/xen/arch/riscv/include/asm/sbi.h
> +++ b/xen/arch/riscv/include/asm/sbi.h
> @@ -33,6 +33,7 @@
>
> #define SBI_EXT_BASE 0x10
> #define SBI_EXT_RFENCE 0x52464E43
> +#define SBI_EXT_TIME 0x54494D45
>
> /* SBI function IDs for BASE extension */
> #define SBI_EXT_BASE_GET_SPEC_VERSION 0x0
> @@ -65,6 +66,9 @@
>
> #define SBI_SPEC_VERSION_DEFAULT 0x1
>
> +/* SBI function IDs for TIME extension */
> +#define SBI_EXT_TIME_SET_TIMER 0x0
Move up besides the other SBI_EXT_* and use the same amount of padding?
> @@ -138,6 +142,19 @@ int sbi_remote_hfence_gvma(const cpumask_t *cpu_mask, vaddr_t start,
> int sbi_remote_hfence_gvma_vmid(const cpumask_t *cpu_mask, vaddr_t start,
> size_t size, unsigned long vmid);
>
> +/*
> + * Programs the clock for next event after stime_value time. This function also
> + * clears the pending timer interrupt bit.
> + * If the supervisor wishes to clear the timer interrupt without scheduling the
> + * next timer event, it can either request a timer interrupt infinitely far
> + * into the future (i.e., (uint64_t)-1), or it can instead mask the timer
> + * interrupt by clearing sie.STIE CSR bit.
> + *
> + * This SBI call returns 0 upon success or an implementation specific negative
> + * error code.
> + */
> +extern int (*sbi_set_timer)(uint64_t stime_value);
Despite the pretty extensive comment, the granularity of the value to be passed
isn't mentioned.
> --- a/xen/arch/riscv/sbi.c
> +++ b/xen/arch/riscv/sbi.c
> @@ -249,6 +249,26 @@ static int (* __ro_after_init sbi_rfence)(unsigned long fid,
> unsigned long arg4,
> unsigned long arg5);
>
> +static int cf_check sbi_set_timer_v02(uint64_t stime_value)
> +{
> + struct sbiret ret;
> +
> +#ifdef CONFIG_RISCV_64
> + ret = sbi_ecall(SBI_EXT_TIME, SBI_EXT_TIME_SET_TIMER, stime_value, 0,
> + 0, 0, 0, 0);
> +#else
> + ret = sbi_ecall(SBI_EXT_TIME, SBI_EXT_TIME_SET_TIMER, stime_value,
> + stime_value >> 32, 0, 0, 0, 0);
> +#endif
How about
ret = sbi_ecall(SBI_EXT_TIME, SBI_EXT_TIME_SET_TIMER, stime_value,
#ifdef CONFIG_RISCV_64
0,
#else
stime_value >> 32,
#endif
0, 0, 0, 0);
? Granted some may say this looks a little m ore clumsy, but it's surely
less redundancy.
Also I'd suggest to use CONFIG_RISCV_32 with the #ifdef, as then the "else"
covers both RV64 and RV128.
> + if ( ret.error )
> + return sbi_err_map_xen_errno(ret.error);
> + else
> + return 0;
> +}
While I understand this is being debated, I continue to think that this
kind of use of if/else isn't very helpful. Function's main return
statements imo benefit from being unconditional.
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 12/15] xen/riscv: introduce sbi_set_timer()
2026-01-12 15:12 ` Jan Beulich
@ 2026-01-13 16:33 ` Oleksii Kurochko
2026-01-14 9:07 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-13 16:33 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/12/26 4:12 PM, Jan Beulich wrote:
> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>> Introduce pointer to function which points to a specific sbi_set_timer()
>> implementation. It is done in this way as different OpenSBI version can
>> have different Extenion ID and/or funcion ID for TIME extension.
>>
>> sbi_set_time() programs the clock for next event after stime_value
>> time. This function also clears the pending timer interrupt bit.
>>
>> Introduce extension ID and SBI function ID for TIME extension.
>>
>> Implement only sbi_set_timer_v02() as there is not to much sense
>> to support earlier version and, at the moment, Xen supports only v02.
> Besides this somewhat contradicting the use of a function pointer: What
> about the legacy extension's equivalent?
I think this is not really needed, and the same implementation can be used for
both the Legacy and TIME extensions, since the API is identical and the only
difference is that|sbi_set_timer()| was moved into a separate extension.
Since Xen reports to the guest that it supports SBI v0.2, it is up to the guest
implementation to decide why it is still using|sbi_set_timer()| from the
Legacy extension instead of the TIME extension.
I think that I can add Legacy extension equivalent but considering that we are
using OpenSBI v0.2 for which Time extension is available it seems for me it is
enough to define sbi_set_timer to sbi_set_timer_v02() for now.
>
>> --- a/xen/arch/riscv/include/asm/sbi.h
>> +++ b/xen/arch/riscv/include/asm/sbi.h
>> @@ -33,6 +33,7 @@
>>
>> #define SBI_EXT_BASE 0x10
>> #define SBI_EXT_RFENCE 0x52464E43
>> +#define SBI_EXT_TIME 0x54494D45
>>
>> /* SBI function IDs for BASE extension */
>> #define SBI_EXT_BASE_GET_SPEC_VERSION 0x0
>> @@ -65,6 +66,9 @@
>>
>> #define SBI_SPEC_VERSION_DEFAULT 0x1
>>
>> +/* SBI function IDs for TIME extension */
>> +#define SBI_EXT_TIME_SET_TIMER 0x0
> Move up besides the other SBI_EXT_* and use the same amount of padding?
Sure, I will do that.
>
>> @@ -138,6 +142,19 @@ int sbi_remote_hfence_gvma(const cpumask_t *cpu_mask, vaddr_t start,
>> int sbi_remote_hfence_gvma_vmid(const cpumask_t *cpu_mask, vaddr_t start,
>> size_t size, unsigned long vmid);
>>
>> +/*
>> + * Programs the clock for next event after stime_value time. This function also
>> + * clears the pending timer interrupt bit.
>> + * If the supervisor wishes to clear the timer interrupt without scheduling the
>> + * next timer event, it can either request a timer interrupt infinitely far
>> + * into the future (i.e., (uint64_t)-1), or it can instead mask the timer
>> + * interrupt by clearing sie.STIE CSR bit.
>> + *
>> + * This SBI call returns 0 upon success or an implementation specific negative
>> + * error code.
>> + */
>> +extern int (*sbi_set_timer)(uint64_t stime_value);
> Despite the pretty extensive comment, the granularity of the value to be passed
> isn't mentioned.
I update the comment with the following then:
The stime_value parameter represents absolute time measured in ticks.
>
>> --- a/xen/arch/riscv/sbi.c
>> +++ b/xen/arch/riscv/sbi.c
>> @@ -249,6 +249,26 @@ static int (* __ro_after_init sbi_rfence)(unsigned long fid,
>> unsigned long arg4,
>> unsigned long arg5);
>>
>> +static int cf_check sbi_set_timer_v02(uint64_t stime_value)
>> +{
>> + struct sbiret ret;
>> +
>> +#ifdef CONFIG_RISCV_64
>> + ret = sbi_ecall(SBI_EXT_TIME, SBI_EXT_TIME_SET_TIMER, stime_value, 0,
>> + 0, 0, 0, 0);
>> +#else
>> + ret = sbi_ecall(SBI_EXT_TIME, SBI_EXT_TIME_SET_TIMER, stime_value,
>> + stime_value >> 32, 0, 0, 0, 0);
>> +#endif
> How about
>
> ret = sbi_ecall(SBI_EXT_TIME, SBI_EXT_TIME_SET_TIMER, stime_value,
> #ifdef CONFIG_RISCV_64
> 0,
> #else
> stime_value >> 32,
> #endif
> 0, 0, 0, 0);
>
> ? Granted some may say this looks a little m ore clumsy, but it's surely
> less redundancy.
>
> Also I'd suggest to use CONFIG_RISCV_32 with the #ifdef, as then the "else"
> covers both RV64 and RV128.
Makes sense, I will update the function in mentioned way.
>
>> + if ( ret.error )
>> + return sbi_err_map_xen_errno(ret.error);
>> + else
>> + return 0;
>> +}
> While I understand this is being debated, I continue to think that this
> kind of use of if/else isn't very helpful. Function's main return
> statements imo benefit from being unconditional.
Considering what returns sbi_err_map_xen_errno() we can just drop if/else
and have only:
return sbi_err_map_xen_errno(ret.error);
as if ret.error == SBI_SUCCESS(0) then sbi_err_map_xen_errno() will
return 0.
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 12/15] xen/riscv: introduce sbi_set_timer()
2026-01-13 16:33 ` Oleksii Kurochko
@ 2026-01-14 9:07 ` Jan Beulich
2026-01-14 9:59 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-14 9:07 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 13.01.2026 17:33, Oleksii Kurochko wrote:
> On 1/12/26 4:12 PM, Jan Beulich wrote:
>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>> Introduce pointer to function which points to a specific sbi_set_timer()
>>> implementation. It is done in this way as different OpenSBI version can
>>> have different Extenion ID and/or funcion ID for TIME extension.
>>>
>>> sbi_set_time() programs the clock for next event after stime_value
>>> time. This function also clears the pending timer interrupt bit.
>>>
>>> Introduce extension ID and SBI function ID for TIME extension.
>>>
>>> Implement only sbi_set_timer_v02() as there is not to much sense
>>> to support earlier version and, at the moment, Xen supports only v02.
>> Besides this somewhat contradicting the use of a function pointer: What
>> about the legacy extension's equivalent?
>
> I think this is not really needed, and the same implementation can be used for
> both the Legacy and TIME extensions, since the API is identical and the only
> difference is that|sbi_set_timer()| was moved into a separate extension.
>
> Since Xen reports to the guest that it supports SBI v0.2, it is up to the guest
> implementation to decide why it is still using|sbi_set_timer()| from the
> Legacy extension instead of the TIME extension.
>
> I think that I can add Legacy extension equivalent but considering that we are
> using OpenSBI v0.2 for which Time extension is available it seems for me it is
> enough to define sbi_set_timer to sbi_set_timer_v02() for now.
Feels like here you're negating what just before you wrote in reply to 10/15.
IOW - I'm now sufficiently confused. (Just consider if you ran Xen itself as
a guest of the very same Xen. From what you said for 10/15, it would end up
not seeing the TIME extension as available, hence would need a fallback to
the Legacy one.)
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH v1 12/15] xen/riscv: introduce sbi_set_timer()
2026-01-14 9:07 ` Jan Beulich
@ 2026-01-14 9:59 ` Oleksii Kurochko
0 siblings, 0 replies; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-14 9:59 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/14/26 10:07 AM, Jan Beulich wrote:
> On 13.01.2026 17:33, Oleksii Kurochko wrote:
>> On 1/12/26 4:12 PM, Jan Beulich wrote:
>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>> Introduce pointer to function which points to a specific sbi_set_timer()
>>>> implementation. It is done in this way as different OpenSBI version can
>>>> have different Extenion ID and/or funcion ID for TIME extension.
>>>>
>>>> sbi_set_time() programs the clock for next event after stime_value
>>>> time. This function also clears the pending timer interrupt bit.
>>>>
>>>> Introduce extension ID and SBI function ID for TIME extension.
>>>>
>>>> Implement only sbi_set_timer_v02() as there is not to much sense
>>>> to support earlier version and, at the moment, Xen supports only v02.
>>> Besides this somewhat contradicting the use of a function pointer: What
>>> about the legacy extension's equivalent?
>> I think this is not really needed, and the same implementation can be used for
>> both the Legacy and TIME extensions, since the API is identical and the only
>> difference is that|sbi_set_timer()| was moved into a separate extension.
>>
>> Since Xen reports to the guest that it supports SBI v0.2, it is up to the guest
>> implementation to decide why it is still using|sbi_set_timer()| from the
>> Legacy extension instead of the TIME extension.
>>
>> I think that I can add Legacy extension equivalent but considering that we are
>> using OpenSBI v0.2 for which Time extension is available it seems for me it is
>> enough to define sbi_set_timer to sbi_set_timer_v02() for now.
> Feels like here you're negating what just before you wrote in reply to 10/15.
> IOW - I'm now sufficiently confused. (Just consider if you ran Xen itself as
> a guest of the very same Xen. From what you said for 10/15, it would end up
> not seeing the TIME extension as available, hence would need a fallback to
> the Legacy one.
You are right, a fallback to the Legacy one should be provided for such cases.
Actually, it is why Linux could be ran now as it has its fallback too.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread
* [PATCH v1 13/15] xen/riscv: implement reprogram_timer() using SBI
2025-12-24 17:03 [PATCH v1 00/15] xen/riscv: introduce vtimer related things Oleksii Kurochko
` (11 preceding siblings ...)
2025-12-24 17:03 ` [PATCH v1 12/15] xen/riscv: introduce sbi_set_timer() Oleksii Kurochko
@ 2025-12-24 17:03 ` Oleksii Kurochko
2026-01-12 15:24 ` Jan Beulich
2025-12-24 17:03 ` [PATCH v1 14/15] xen/riscv: handle hypervisor timer interrupts Oleksii Kurochko
2025-12-24 17:03 ` [PATCH v1 15/15] xen/riscv: init tasklet subsystem Oleksii Kurochko
14 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2025-12-24 17:03 UTC (permalink / raw)
To: xen-devel
Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Enable Xen to handle timer reprogramming on RISC-V using
standard SBI calls.
Add a RISC-V implementation of reprogram_timer() to replace the stub:
- Re-enable the function previously stubbed in stubs.c.
- Use sbi_set_timer() to program the timer for the given timeout.
- Disable the timer when timeout == 0 by clearing the SIE.STIE bit.
- Calculate the deadline based on the current boot clock cycle count
and timer ticks.
- Ensure correct behavior when the deadline is already passed.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/stubs.c | 5 -----
xen/arch/riscv/time.c | 31 +++++++++++++++++++++++++++++++
2 files changed, 31 insertions(+), 5 deletions(-)
diff --git a/xen/arch/riscv/stubs.c b/xen/arch/riscv/stubs.c
index 68ee859ca1a8..d120274af2fe 100644
--- a/xen/arch/riscv/stubs.c
+++ b/xen/arch/riscv/stubs.c
@@ -21,11 +21,6 @@ nodemask_t __read_mostly node_online_map = { { [0] = 1UL } };
/* time.c */
-int reprogram_timer(s_time_t timeout)
-{
- BUG_ON("unimplemented");
-}
-
void send_timer_event(struct vcpu *v)
{
BUG_ON("unimplemented");
diff --git a/xen/arch/riscv/time.c b/xen/arch/riscv/time.c
index e962f8518d78..53ba1cfb4a99 100644
--- a/xen/arch/riscv/time.c
+++ b/xen/arch/riscv/time.c
@@ -4,8 +4,12 @@
#include <xen/init.h>
#include <xen/lib.h>
#include <xen/sections.h>
+#include <xen/time.h>
#include <xen/types.h>
+#include <asm/csr.h>
+#include <asm/sbi.h>
+
unsigned long __ro_after_init cpu_khz; /* CPU clock frequency in kHz. */
uint64_t __ro_after_init boot_clock_cycles;
@@ -39,6 +43,33 @@ static void __init preinit_dt_xen_time(void)
cpu_khz = rate / 1000;
}
+int reprogram_timer(s_time_t timeout)
+{
+ uint64_t deadline, now;
+ int rc;
+
+ if ( timeout == 0 )
+ {
+ /* Disable timers */
+ csr_clear(CSR_SIE, BIT(IRQ_S_TIMER, UL));
+
+ return 1;
+ }
+
+ deadline = ns_to_ticks(timeout) + boot_clock_cycles;
+ now = get_cycles();
+ if ( deadline <= now )
+ return 0;
+
+ /* Enable timer */
+ csr_set(CSR_SIE, BIT(IRQ_S_TIMER, UL));
+
+ if ( (rc = sbi_set_timer(deadline)) )
+ panic("%s: timer wasn't set because: %d\n", __func__, rc);
+
+ return 1;
+}
+
void __init preinit_xen_time(void)
{
if ( acpi_disabled )
--
2.52.0
^ permalink raw reply related [flat|nested] 93+ messages in thread* Re: [PATCH v1 13/15] xen/riscv: implement reprogram_timer() using SBI
2025-12-24 17:03 ` [PATCH v1 13/15] xen/riscv: implement reprogram_timer() using SBI Oleksii Kurochko
@ 2026-01-12 15:24 ` Jan Beulich
2026-01-13 16:50 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-12 15:24 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 24.12.2025 18:03, Oleksii Kurochko wrote:
> @@ -39,6 +43,33 @@ static void __init preinit_dt_xen_time(void)
> cpu_khz = rate / 1000;
> }
>
> +int reprogram_timer(s_time_t timeout)
> +{
> + uint64_t deadline, now;
> + int rc;
> +
> + if ( timeout == 0 )
> + {
> + /* Disable timers */
> + csr_clear(CSR_SIE, BIT(IRQ_S_TIMER, UL));
> +
> + return 1;
> + }
> +
> + deadline = ns_to_ticks(timeout) + boot_clock_cycles;
> + now = get_cycles();
> + if ( deadline <= now )
> + return 0;
> +
> + /* Enable timer */
> + csr_set(CSR_SIE, BIT(IRQ_S_TIMER, UL));
Still learning RISC-V, so question for my understanding: Even if the timeout
is short enough to expire before the one SIE bit will be set, the interrupt
will still occur (effectively immediately)? (Else the bit may need setting
first.)
> + if ( (rc = sbi_set_timer(deadline)) )
> + panic("%s: timer wasn't set because: %d\n", __func__, rc);
Hmm, if this function ends up being used from any guest accessible path (e.g.
a hypercall), such panic()-ing better shouldn't be there.
> + return 1;
> +}
Finally, before we get yet another instance of this de-fact boolean function:
Wouldn't we better globally switch it to be properly "bool" first?
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 13/15] xen/riscv: implement reprogram_timer() using SBI
2026-01-12 15:24 ` Jan Beulich
@ 2026-01-13 16:50 ` Oleksii Kurochko
2026-01-14 9:13 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-13 16:50 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/12/26 4:24 PM, Jan Beulich wrote:
> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>> @@ -39,6 +43,33 @@ static void __init preinit_dt_xen_time(void)
>> cpu_khz = rate / 1000;
>> }
>>
>> +int reprogram_timer(s_time_t timeout)
>> +{
>> + uint64_t deadline, now;
>> + int rc;
>> +
>> + if ( timeout == 0 )
>> + {
>> + /* Disable timers */
>> + csr_clear(CSR_SIE, BIT(IRQ_S_TIMER, UL));
>> +
>> + return 1;
>> + }
>> +
>> + deadline = ns_to_ticks(timeout) + boot_clock_cycles;
>> + now = get_cycles();
>> + if ( deadline <= now )
>> + return 0;
>> +
>> + /* Enable timer */
>> + csr_set(CSR_SIE, BIT(IRQ_S_TIMER, UL));
> Still learning RISC-V, so question for my understanding: Even if the timeout
> is short enough to expire before the one SIE bit will be set, the interrupt
> will still occur (effectively immediately)? (Else the bit may need setting
> first.)
The interrupt will become pending first (when mtime >= mtimecmp or
mtime >= CSR_STIMECMP in case of SSTC) and then fire immediately once
|SIE.STIE |(and global|SIE|) are enabled.
>
>> + if ( (rc = sbi_set_timer(deadline)) )
>> + panic("%s: timer wasn't set because: %d\n", __func__, rc);
> Hmm, if this function ends up being used from any guest accessible path (e.g.
> a hypercall), such panic()-ing better shouldn't be there.
I don't have such use cases now and I don't expect that guest should use
this function.
>
>> + return 1;
>> +}
> Finally, before we get yet another instance of this de-fact boolean function:
> Wouldn't we better globally switch it to be properly "bool" first?
We could do that, I will prepare a separate patch in the next version of
this patch series.
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 13/15] xen/riscv: implement reprogram_timer() using SBI
2026-01-13 16:50 ` Oleksii Kurochko
@ 2026-01-14 9:13 ` Jan Beulich
2026-01-14 9:41 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-14 9:13 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 13.01.2026 17:50, Oleksii Kurochko wrote:
> On 1/12/26 4:24 PM, Jan Beulich wrote:
>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>> @@ -39,6 +43,33 @@ static void __init preinit_dt_xen_time(void)
>>> cpu_khz = rate / 1000;
>>> }
>>>
>>> +int reprogram_timer(s_time_t timeout)
>>> +{
>>> + uint64_t deadline, now;
>>> + int rc;
>>> +
>>> + if ( timeout == 0 )
>>> + {
>>> + /* Disable timers */
>>> + csr_clear(CSR_SIE, BIT(IRQ_S_TIMER, UL));
>>> +
>>> + return 1;
>>> + }
>>> +
>>> + deadline = ns_to_ticks(timeout) + boot_clock_cycles;
>>> + now = get_cycles();
>>> + if ( deadline <= now )
>>> + return 0;
>>> +
>>> + /* Enable timer */
>>> + csr_set(CSR_SIE, BIT(IRQ_S_TIMER, UL));
>> Still learning RISC-V, so question for my understanding: Even if the timeout
>> is short enough to expire before the one SIE bit will be set, the interrupt
>> will still occur (effectively immediately)? (Else the bit may need setting
>> first.)
>
> The interrupt will become pending first (when mtime >= mtimecmp or
> mtime >= CSR_STIMECMP in case of SSTC) and then fire immediately once
> |SIE.STIE |(and global|SIE|) are enabled.
>
>>
>>> + if ( (rc = sbi_set_timer(deadline)) )
>>> + panic("%s: timer wasn't set because: %d\n", __func__, rc);
>> Hmm, if this function ends up being used from any guest accessible path (e.g.
>> a hypercall), such panic()-ing better shouldn't be there.
>
> I don't have such use cases now and I don't expect that guest should use
> this function.
How do you envision supporting e.g. VCPUOP_set_singleshot_timer without
involving this function?
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 13/15] xen/riscv: implement reprogram_timer() using SBI
2026-01-14 9:13 ` Jan Beulich
@ 2026-01-14 9:41 ` Oleksii Kurochko
2026-01-14 9:53 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-14 9:41 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/14/26 10:13 AM, Jan Beulich wrote:
> On 13.01.2026 17:50, Oleksii Kurochko wrote:
>> On 1/12/26 4:24 PM, Jan Beulich wrote:
>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>> @@ -39,6 +43,33 @@ static void __init preinit_dt_xen_time(void)
>>>> cpu_khz = rate / 1000;
>>>> }
>>>>
>>>> +int reprogram_timer(s_time_t timeout)
>>>> +{
>>>> + uint64_t deadline, now;
>>>> + int rc;
>>>> +
>>>> + if ( timeout == 0 )
>>>> + {
>>>> + /* Disable timers */
>>>> + csr_clear(CSR_SIE, BIT(IRQ_S_TIMER, UL));
>>>> +
>>>> + return 1;
>>>> + }
>>>> +
>>>> + deadline = ns_to_ticks(timeout) + boot_clock_cycles;
>>>> + now = get_cycles();
>>>> + if ( deadline <= now )
>>>> + return 0;
>>>> +
>>>> + /* Enable timer */
>>>> + csr_set(CSR_SIE, BIT(IRQ_S_TIMER, UL));
>>> Still learning RISC-V, so question for my understanding: Even if the timeout
>>> is short enough to expire before the one SIE bit will be set, the interrupt
>>> will still occur (effectively immediately)? (Else the bit may need setting
>>> first.)
>> The interrupt will become pending first (when mtime >= mtimecmp or
>> mtime >= CSR_STIMECMP in case of SSTC) and then fire immediately once
>> |SIE.STIE |(and global|SIE|) are enabled.
>>
>>>> + if ( (rc = sbi_set_timer(deadline)) )
>>>> + panic("%s: timer wasn't set because: %d\n", __func__, rc);
>>> Hmm, if this function ends up being used from any guest accessible path (e.g.
>>> a hypercall), such panic()-ing better shouldn't be there.
>> I don't have such use cases now and I don't expect that guest should use
>> this function.
> How do you envision supporting e.g. VCPUOP_set_singleshot_timer without
> involving this function?
Looking at what is in common code for VCPUOP_set_singleshot_timer, it doesn't
use reprogram_timer(), it is just activate/deactivate timer.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 13/15] xen/riscv: implement reprogram_timer() using SBI
2026-01-14 9:41 ` Oleksii Kurochko
@ 2026-01-14 9:53 ` Jan Beulich
2026-01-14 10:33 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-14 9:53 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 14.01.2026 10:41, Oleksii Kurochko wrote:
>
> On 1/14/26 10:13 AM, Jan Beulich wrote:
>> On 13.01.2026 17:50, Oleksii Kurochko wrote:
>>> On 1/12/26 4:24 PM, Jan Beulich wrote:
>>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>>> @@ -39,6 +43,33 @@ static void __init preinit_dt_xen_time(void)
>>>>> cpu_khz = rate / 1000;
>>>>> }
>>>>>
>>>>> +int reprogram_timer(s_time_t timeout)
>>>>> +{
>>>>> + uint64_t deadline, now;
>>>>> + int rc;
>>>>> +
>>>>> + if ( timeout == 0 )
>>>>> + {
>>>>> + /* Disable timers */
>>>>> + csr_clear(CSR_SIE, BIT(IRQ_S_TIMER, UL));
>>>>> +
>>>>> + return 1;
>>>>> + }
>>>>> +
>>>>> + deadline = ns_to_ticks(timeout) + boot_clock_cycles;
>>>>> + now = get_cycles();
>>>>> + if ( deadline <= now )
>>>>> + return 0;
>>>>> +
>>>>> + /* Enable timer */
>>>>> + csr_set(CSR_SIE, BIT(IRQ_S_TIMER, UL));
>>>> Still learning RISC-V, so question for my understanding: Even if the timeout
>>>> is short enough to expire before the one SIE bit will be set, the interrupt
>>>> will still occur (effectively immediately)? (Else the bit may need setting
>>>> first.)
>>> The interrupt will become pending first (when mtime >= mtimecmp or
>>> mtime >= CSR_STIMECMP in case of SSTC) and then fire immediately once
>>> |SIE.STIE |(and global|SIE|) are enabled.
>>>
>>>>> + if ( (rc = sbi_set_timer(deadline)) )
>>>>> + panic("%s: timer wasn't set because: %d\n", __func__, rc);
>>>> Hmm, if this function ends up being used from any guest accessible path (e.g.
>>>> a hypercall), such panic()-ing better shouldn't be there.
>>> I don't have such use cases now and I don't expect that guest should use
>>> this function.
>> How do you envision supporting e.g. VCPUOP_set_singleshot_timer without
>> involving this function?
>
> Looking at what is in common code for VCPUOP_set_singleshot_timer, it doesn't
> use reprogram_timer(), it is just activate/deactivate timer.
And how would that work without, eventually, using reprogram_timer()? While not
directly on a hypercall path, the use can still be guest-induced.
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 13/15] xen/riscv: implement reprogram_timer() using SBI
2026-01-14 9:53 ` Jan Beulich
@ 2026-01-14 10:33 ` Oleksii Kurochko
2026-01-14 11:17 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-14 10:33 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/14/26 10:53 AM, Jan Beulich wrote:
> On 14.01.2026 10:41, Oleksii Kurochko wrote:
>> On 1/14/26 10:13 AM, Jan Beulich wrote:
>>> On 13.01.2026 17:50, Oleksii Kurochko wrote:
>>>> On 1/12/26 4:24 PM, Jan Beulich wrote:
>>>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>>>> @@ -39,6 +43,33 @@ static void __init preinit_dt_xen_time(void)
>>>>>> cpu_khz = rate / 1000;
>>>>>> }
>>>>>>
>>>>>> +int reprogram_timer(s_time_t timeout)
>>>>>> +{
>>>>>> + uint64_t deadline, now;
>>>>>> + int rc;
>>>>>> +
>>>>>> + if ( timeout == 0 )
>>>>>> + {
>>>>>> + /* Disable timers */
>>>>>> + csr_clear(CSR_SIE, BIT(IRQ_S_TIMER, UL));
>>>>>> +
>>>>>> + return 1;
>>>>>> + }
>>>>>> +
>>>>>> + deadline = ns_to_ticks(timeout) + boot_clock_cycles;
>>>>>> + now = get_cycles();
>>>>>> + if ( deadline <= now )
>>>>>> + return 0;
>>>>>> +
>>>>>> + /* Enable timer */
>>>>>> + csr_set(CSR_SIE, BIT(IRQ_S_TIMER, UL));
>>>>> Still learning RISC-V, so question for my understanding: Even if the timeout
>>>>> is short enough to expire before the one SIE bit will be set, the interrupt
>>>>> will still occur (effectively immediately)? (Else the bit may need setting
>>>>> first.)
>>>> The interrupt will become pending first (when mtime >= mtimecmp or
>>>> mtime >= CSR_STIMECMP in case of SSTC) and then fire immediately once
>>>> |SIE.STIE |(and global|SIE|) are enabled.
>>>>
>>>>>> + if ( (rc = sbi_set_timer(deadline)) )
>>>>>> + panic("%s: timer wasn't set because: %d\n", __func__, rc);
>>>>> Hmm, if this function ends up being used from any guest accessible path (e.g.
>>>>> a hypercall), such panic()-ing better shouldn't be there.
>>>> I don't have such use cases now and I don't expect that guest should use
>>>> this function.
>>> How do you envision supporting e.g. VCPUOP_set_singleshot_timer without
>>> involving this function?
>> Looking at what is in common code for VCPUOP_set_singleshot_timer, it doesn't
>> use reprogram_timer(), it is just activate/deactivate timer.
> And how would that work without, eventually, using reprogram_timer()? While not
> directly on a hypercall path, the use can still be guest-induced.
Of course, eventually|reprogram_timer()| will be used. I incorrectly thought
that we were talking about its direct use on the hypercall path.
I am not really sure what we should do in the case when rc != 0. Looking at the
OpenSBI call, it always returns 0, except when sbi_set_timer() is not supported,
in which case it returns -SBI_ENOTSUPP. With such a return value, I think it would
be acceptable to call domain_crash(current->domain). On the other hand, if some
other negative error code is returned, it might be better to return 0 and simply
allow the timer programming to be retried later.
However, if we look at the comments for other architectures, the meaning of a
return value of 0 from this function is:
Returns 1 on success; 0 if the timeout is too soon or is in the past.
In that case, it becomes difficult to distinguish whether 0 was returned due to
an error or because the timeout was too soon or already in the past.
It seems like at the moment it is better to call domain_crash() and change it
if it will be necessity in the future as I expect that the only negative code
which will be returned by sbi_set_timer() will -SBI_ENOTSUPP and if this SBI
call isn't supported then we anyway need a different way to set a timer.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 13/15] xen/riscv: implement reprogram_timer() using SBI
2026-01-14 10:33 ` Oleksii Kurochko
@ 2026-01-14 11:17 ` Jan Beulich
2026-01-14 12:41 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-14 11:17 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 14.01.2026 11:33, Oleksii Kurochko wrote:
>
> On 1/14/26 10:53 AM, Jan Beulich wrote:
>> On 14.01.2026 10:41, Oleksii Kurochko wrote:
>>> On 1/14/26 10:13 AM, Jan Beulich wrote:
>>>> On 13.01.2026 17:50, Oleksii Kurochko wrote:
>>>>> On 1/12/26 4:24 PM, Jan Beulich wrote:
>>>>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>>>>> @@ -39,6 +43,33 @@ static void __init preinit_dt_xen_time(void)
>>>>>>> cpu_khz = rate / 1000;
>>>>>>> }
>>>>>>>
>>>>>>> +int reprogram_timer(s_time_t timeout)
>>>>>>> +{
>>>>>>> + uint64_t deadline, now;
>>>>>>> + int rc;
>>>>>>> +
>>>>>>> + if ( timeout == 0 )
>>>>>>> + {
>>>>>>> + /* Disable timers */
>>>>>>> + csr_clear(CSR_SIE, BIT(IRQ_S_TIMER, UL));
>>>>>>> +
>>>>>>> + return 1;
>>>>>>> + }
>>>>>>> +
>>>>>>> + deadline = ns_to_ticks(timeout) + boot_clock_cycles;
>>>>>>> + now = get_cycles();
>>>>>>> + if ( deadline <= now )
>>>>>>> + return 0;
>>>>>>> +
>>>>>>> + /* Enable timer */
>>>>>>> + csr_set(CSR_SIE, BIT(IRQ_S_TIMER, UL));
>>>>>> Still learning RISC-V, so question for my understanding: Even if the timeout
>>>>>> is short enough to expire before the one SIE bit will be set, the interrupt
>>>>>> will still occur (effectively immediately)? (Else the bit may need setting
>>>>>> first.)
>>>>> The interrupt will become pending first (when mtime >= mtimecmp or
>>>>> mtime >= CSR_STIMECMP in case of SSTC) and then fire immediately once
>>>>> |SIE.STIE |(and global|SIE|) are enabled.
>>>>>
>>>>>>> + if ( (rc = sbi_set_timer(deadline)) )
>>>>>>> + panic("%s: timer wasn't set because: %d\n", __func__, rc);
>>>>>> Hmm, if this function ends up being used from any guest accessible path (e.g.
>>>>>> a hypercall), such panic()-ing better shouldn't be there.
>>>>> I don't have such use cases now and I don't expect that guest should use
>>>>> this function.
>>>> How do you envision supporting e.g. VCPUOP_set_singleshot_timer without
>>>> involving this function?
>>> Looking at what is in common code for VCPUOP_set_singleshot_timer, it doesn't
>>> use reprogram_timer(), it is just activate/deactivate timer.
>> And how would that work without, eventually, using reprogram_timer()? While not
>> directly on a hypercall path, the use can still be guest-induced.
>
> Of course, eventually|reprogram_timer()| will be used. I incorrectly thought
> that we were talking about its direct use on the hypercall path.
>
> I am not really sure what we should do in the case when rc != 0. Looking at the
> OpenSBI call, it always returns 0, except when sbi_set_timer() is not supported,
> in which case it returns -SBI_ENOTSUPP. With such a return value, I think it would
> be acceptable to call domain_crash(current->domain).
How is current->domain related to a failure in reprogram_timer()?
> On the other hand, if some
> other negative error code is returned, it might be better to return 0 and simply
> allow the timer programming to be retried later.
> However, if we look at the comments for other architectures, the meaning of a
> return value of 0 from this function is:
> Returns 1 on success; 0 if the timeout is too soon or is in the past.
> In that case, it becomes difficult to distinguish whether 0 was returned due to
> an error or because the timeout was too soon or already in the past.
Well, your problem is that neither Arm nor x86 can actually fail. Hence
calling code isn't presently prepared for that. With panic() (and hence
also BUG()) and domain_crash() ruled out, maybe generic infrastructure
needs touching first (in a different way than making the function's return
type "bool")?
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 13/15] xen/riscv: implement reprogram_timer() using SBI
2026-01-14 11:17 ` Jan Beulich
@ 2026-01-14 12:41 ` Oleksii Kurochko
2026-01-14 15:04 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-14 12:41 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/14/26 12:17 PM, Jan Beulich wrote:
> On 14.01.2026 11:33, Oleksii Kurochko wrote:
>> On 1/14/26 10:53 AM, Jan Beulich wrote:
>>> On 14.01.2026 10:41, Oleksii Kurochko wrote:
>>>> On 1/14/26 10:13 AM, Jan Beulich wrote:
>>>>> On 13.01.2026 17:50, Oleksii Kurochko wrote:
>>>>>> On 1/12/26 4:24 PM, Jan Beulich wrote:
>>>>>>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>>>>>>> @@ -39,6 +43,33 @@ static void __init preinit_dt_xen_time(void)
>>>>>>>> cpu_khz = rate / 1000;
>>>>>>>> }
>>>>>>>>
>>>>>>>> +int reprogram_timer(s_time_t timeout)
>>>>>>>> +{
>>>>>>>> + uint64_t deadline, now;
>>>>>>>> + int rc;
>>>>>>>> +
>>>>>>>> + if ( timeout == 0 )
>>>>>>>> + {
>>>>>>>> + /* Disable timers */
>>>>>>>> + csr_clear(CSR_SIE, BIT(IRQ_S_TIMER, UL));
>>>>>>>> +
>>>>>>>> + return 1;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + deadline = ns_to_ticks(timeout) + boot_clock_cycles;
>>>>>>>> + now = get_cycles();
>>>>>>>> + if ( deadline <= now )
>>>>>>>> + return 0;
>>>>>>>> +
>>>>>>>> + /* Enable timer */
>>>>>>>> + csr_set(CSR_SIE, BIT(IRQ_S_TIMER, UL));
>>>>>>> Still learning RISC-V, so question for my understanding: Even if the timeout
>>>>>>> is short enough to expire before the one SIE bit will be set, the interrupt
>>>>>>> will still occur (effectively immediately)? (Else the bit may need setting
>>>>>>> first.)
>>>>>> The interrupt will become pending first (when mtime >= mtimecmp or
>>>>>> mtime >= CSR_STIMECMP in case of SSTC) and then fire immediately once
>>>>>> |SIE.STIE |(and global|SIE|) are enabled.
>>>>>>
>>>>>>>> + if ( (rc = sbi_set_timer(deadline)) )
>>>>>>>> + panic("%s: timer wasn't set because: %d\n", __func__, rc);
>>>>>>> Hmm, if this function ends up being used from any guest accessible path (e.g.
>>>>>>> a hypercall), such panic()-ing better shouldn't be there.
>>>>>> I don't have such use cases now and I don't expect that guest should use
>>>>>> this function.
>>>>> How do you envision supporting e.g. VCPUOP_set_singleshot_timer without
>>>>> involving this function?
>>>> Looking at what is in common code for VCPUOP_set_singleshot_timer, it doesn't
>>>> use reprogram_timer(), it is just activate/deactivate timer.
>>> And how would that work without, eventually, using reprogram_timer()? While not
>>> directly on a hypercall path, the use can still be guest-induced.
>> Of course, eventually|reprogram_timer()| will be used. I incorrectly thought
>> that we were talking about its direct use on the hypercall path.
>>
>> I am not really sure what we should do in the case when rc != 0. Looking at the
>> OpenSBI call, it always returns 0, except when sbi_set_timer() is not supported,
>> in which case it returns -SBI_ENOTSUPP. With such a return value, I think it would
>> be acceptable to call domain_crash(current->domain).
> How is current->domain related to a failure in reprogram_timer()?
Agree, it isn't, a failure could happen during a ran of any domain.
>
>> On the other hand, if some
>> other negative error code is returned, it might be better to return 0 and simply
>> allow the timer programming to be retried later.
>> However, if we look at the comments for other architectures, the meaning of a
>> return value of 0 from this function is:
>> Returns 1 on success; 0 if the timeout is too soon or is in the past.
>> In that case, it becomes difficult to distinguish whether 0 was returned due to
>> an error or because the timeout was too soon or already in the past.
> Well, your problem is that neither Arm nor x86 can actually fail. Hence
> calling code isn't presently prepared for that. With panic() (and hence
> also BUG()) and domain_crash() ruled out, maybe generic infrastructure
> needs touching first (in a different way than making the function's return
> type "bool")?
I think making the function's return still is fine and it is only question to
arch-specific reprogram_timer() what to do when an error happens.
Still doesn't clear to me what should be a reaction on failure of
reprogram_timer().
Considering that SBI spec doesn't specify a list of possible errors and now
the only possible error is -ENOSUPP it seems to me it is fine
to have panic() as we don't have any other mechanism to set a timer
except SBI call (except the case SSTC is supported then we can use just
supervisor timer register directly without SBI call).
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 13/15] xen/riscv: implement reprogram_timer() using SBI
2026-01-14 12:41 ` Oleksii Kurochko
@ 2026-01-14 15:04 ` Jan Beulich
2026-01-14 15:53 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-14 15:04 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 14.01.2026 13:41, Oleksii Kurochko wrote:
> On 1/14/26 12:17 PM, Jan Beulich wrote:
>> On 14.01.2026 11:33, Oleksii Kurochko wrote:
>>> On the other hand, if some
>>> other negative error code is returned, it might be better to return 0 and simply
>>> allow the timer programming to be retried later.
>>> However, if we look at the comments for other architectures, the meaning of a
>>> return value of 0 from this function is:
>>> Returns 1 on success; 0 if the timeout is too soon or is in the past.
>>> In that case, it becomes difficult to distinguish whether 0 was returned due to
>>> an error or because the timeout was too soon or already in the past.
>> Well, your problem is that neither Arm nor x86 can actually fail. Hence
>> calling code isn't presently prepared for that. With panic() (and hence
>> also BUG()) and domain_crash() ruled out, maybe generic infrastructure
>> needs touching first (in a different way than making the function's return
>> type "bool")?
>
> I think making the function's return still is fine and it is only question to
> arch-specific reprogram_timer() what to do when an error happens.
>
> Still doesn't clear to me what should be a reaction on failure of
> reprogram_timer().
> Considering that SBI spec doesn't specify a list of possible errors and now
> the only possible error is -ENOSUPP it seems to me it is fine
> to have panic() as we don't have any other mechanism to set a timer
> except SBI call
panic() (or BUG_ON()) is pretty drastic a measure when possibly the system
could be kept alive. If is pretty certain that future SBI timer calls also
aren't going to work, then I'd agree that panic()ing might be appropriate.
If otoh a subsequent call might work, a less heavyweight action would seem
preferable. (Welcome to the funs of relying on lower-level software.)
> (except the case SSTC is supported then we can use just
> supervisor timer register directly without SBI call).
So maybe a good first step would be to use that extension if available?
Might even think about requiring it for the time being ...
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread
* Re: [PATCH v1 13/15] xen/riscv: implement reprogram_timer() using SBI
2026-01-14 15:04 ` Jan Beulich
@ 2026-01-14 15:53 ` Oleksii Kurochko
0 siblings, 0 replies; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-14 15:53 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/14/26 4:04 PM, Jan Beulich wrote:
> On 14.01.2026 13:41, Oleksii Kurochko wrote:
>> On 1/14/26 12:17 PM, Jan Beulich wrote:
>>> On 14.01.2026 11:33, Oleksii Kurochko wrote:
>>>> On the other hand, if some
>>>> other negative error code is returned, it might be better to return 0 and simply
>>>> allow the timer programming to be retried later.
>>>> However, if we look at the comments for other architectures, the meaning of a
>>>> return value of 0 from this function is:
>>>> Returns 1 on success; 0 if the timeout is too soon or is in the past.
>>>> In that case, it becomes difficult to distinguish whether 0 was returned due to
>>>> an error or because the timeout was too soon or already in the past.
>>> Well, your problem is that neither Arm nor x86 can actually fail. Hence
>>> calling code isn't presently prepared for that. With panic() (and hence
>>> also BUG()) and domain_crash() ruled out, maybe generic infrastructure
>>> needs touching first (in a different way than making the function's return
>>> type "bool")?
>> I think making the function's return still is fine and it is only question to
>> arch-specific reprogram_timer() what to do when an error happens.
>>
>> Still doesn't clear to me what should be a reaction on failure of
>> reprogram_timer().
>> Considering that SBI spec doesn't specify a list of possible errors and now
>> the only possible error is -ENOSUPP it seems to me it is fine
>> to have panic() as we don't have any other mechanism to set a timer
>> except SBI call
> panic() (or BUG_ON()) is pretty drastic a measure when possibly the system
> could be kept alive. If is pretty certain that future SBI timer calls also
> aren't going to work, then I'd agree that panic()ing might be appropriate.
> If otoh a subsequent call might work, a less heavyweight action would seem
> preferable. (Welcome to the funs of relying on lower-level software.)
I don’t know how OpenSBI will be updated in the future, but looking at the current
situation, since SBI timer calls have existed from very early OpenSBI versions up
to the latest ones, repeatedly issuing the SBI timer call will always return the
same negative error code indicating that the timer call is not supported.
I can add a comment above panic(), or include an explanation in the commit message.
>
>> (except the case SSTC is supported then we can use just
>> supervisor timer register directly without SBI call).
> So maybe a good first step would be to use that extension if available?
> Might even think about requiring it for the time being ...
I initially started working on SSTC support, but then stopped because an almost
production-ready board to which I do have indirect access does not support this
extension. As a result, I decided to proceed with SBI approach as it covers both
cases when SSTC is available and when no.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread
* [PATCH v1 14/15] xen/riscv: handle hypervisor timer interrupts
2025-12-24 17:03 [PATCH v1 00/15] xen/riscv: introduce vtimer related things Oleksii Kurochko
` (12 preceding siblings ...)
2025-12-24 17:03 ` [PATCH v1 13/15] xen/riscv: implement reprogram_timer() using SBI Oleksii Kurochko
@ 2025-12-24 17:03 ` Oleksii Kurochko
2026-01-12 16:15 ` Jan Beulich
2025-12-24 17:03 ` [PATCH v1 15/15] xen/riscv: init tasklet subsystem Oleksii Kurochko
14 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2025-12-24 17:03 UTC (permalink / raw)
To: xen-devel
Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Introduce timer_interrupt() to process IRQ_S_TIMER interrupts.
The handler disables further timer interrupts by clearing
SIE.STIE and raises TIMER_SOFTIRQ so the generic timer subsystem
can perform its processing.
Update do_trap() to dispatch IRQ_S_TIMER to this new handler.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/traps.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/xen/arch/riscv/traps.c b/xen/arch/riscv/traps.c
index e9c967786312..5fd12b1b21c3 100644
--- a/xen/arch/riscv/traps.c
+++ b/xen/arch/riscv/traps.c
@@ -10,6 +10,7 @@
#include <xen/lib.h>
#include <xen/nospec.h>
#include <xen/sched.h>
+#include <xen/softirq.h>
#include <asm/intc.h>
#include <asm/processor.h>
@@ -108,6 +109,15 @@ static void do_unexpected_trap(const struct cpu_user_regs *regs)
die();
}
+static void timer_interrupt(unsigned long cause)
+{
+ /* Disable the timer to avoid more interrupts */
+ csr_clear(CSR_SIE, BIT(IRQ_S_TIMER, UL));
+
+ /* Signal the generic timer code to do its work */
+ raise_softirq(TIMER_SOFTIRQ);
+}
+
void do_trap(struct cpu_user_regs *cpu_regs)
{
register_t pc = cpu_regs->sepc;
@@ -148,6 +158,10 @@ void do_trap(struct cpu_user_regs *cpu_regs)
intc_handle_external_irqs(cpu_regs);
break;
+ case IRQ_S_TIMER:
+ timer_interrupt(cause);
+ break;
+
default:
break;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 93+ messages in thread* Re: [PATCH v1 14/15] xen/riscv: handle hypervisor timer interrupts
2025-12-24 17:03 ` [PATCH v1 14/15] xen/riscv: handle hypervisor timer interrupts Oleksii Kurochko
@ 2026-01-12 16:15 ` Jan Beulich
2026-01-13 16:53 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-12 16:15 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 24.12.2025 18:03, Oleksii Kurochko wrote:
> @@ -108,6 +109,15 @@ static void do_unexpected_trap(const struct cpu_user_regs *regs)
> die();
> }
>
> +static void timer_interrupt(unsigned long cause)
> +{
> + /* Disable the timer to avoid more interrupts */
> + csr_clear(CSR_SIE, BIT(IRQ_S_TIMER, UL));
> +
> + /* Signal the generic timer code to do its work */
> + raise_softirq(TIMER_SOFTIRQ);
> +}
Why is "cause" being passed when it's not used?
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 14/15] xen/riscv: handle hypervisor timer interrupts
2026-01-12 16:15 ` Jan Beulich
@ 2026-01-13 16:53 ` Oleksii Kurochko
0 siblings, 0 replies; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-13 16:53 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Connor Davis, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/12/26 5:15 PM, Jan Beulich wrote:
> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>> @@ -108,6 +109,15 @@ static void do_unexpected_trap(const struct cpu_user_regs *regs)
>> die();
>> }
>>
>> +static void timer_interrupt(unsigned long cause)
>> +{
>> + /* Disable the timer to avoid more interrupts */
>> + csr_clear(CSR_SIE, BIT(IRQ_S_TIMER, UL));
>> +
>> + /* Signal the generic timer code to do its work */
>> + raise_softirq(TIMER_SOFTIRQ);
>> +}
> Why is "cause" being passed when it's not used?
Good point. No any sense in it. Even more I think the cause is
pretty known in such handler, it should be definitely = IRQ_S_TIMER.
I will drop an argument for timer_interrupt().
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 93+ messages in thread
* [PATCH v1 15/15] xen/riscv: init tasklet subsystem
2025-12-24 17:03 [PATCH v1 00/15] xen/riscv: introduce vtimer related things Oleksii Kurochko
` (13 preceding siblings ...)
2025-12-24 17:03 ` [PATCH v1 14/15] xen/riscv: handle hypervisor timer interrupts Oleksii Kurochko
@ 2025-12-24 17:03 ` Oleksii Kurochko
2026-01-12 16:20 ` Jan Beulich
14 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2025-12-24 17:03 UTC (permalink / raw)
To: xen-devel
Cc: Oleksii Kurochko, Alistair Francis, Bob Eshleman, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
As the tasklet subsystem is now initialized, it is necessary to implement
sync_local_execstate(), since it is invoked when something calls
tasklet_softirq_action(), which is registered in tasklet_subsys_init().
After introducing sync_local_execstate(), the following linker issue occurs:
riscv64-linux-gnu-ld: prelink.o: in function `bitmap_and':
/build/xen/./include/xen/bitmap.h:147: undefined reference to
`sync_vcpu_execstate'
riscv64-linux-gnu-ld: ./.xen-syms.0: hidden symbol
`sync_vcpu_execstate' isn't defined
riscv64-linux-gnu-ld: final link failed: bad value
Therefore, an implementation of sync_vcpu_execstate() is provided, based on
the Arm code.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/domain.c | 23 +++++++++++++++++++++++
xen/arch/riscv/setup.c | 3 +++
xen/arch/riscv/stubs.c | 10 ----------
3 files changed, 26 insertions(+), 10 deletions(-)
diff --git a/xen/arch/riscv/domain.c b/xen/arch/riscv/domain.c
index 8a010ae5b47e..574a5a34a389 100644
--- a/xen/arch/riscv/domain.c
+++ b/xen/arch/riscv/domain.c
@@ -141,6 +141,29 @@ void vcpu_kick(struct vcpu *v)
}
}
+void sync_local_execstate(void)
+{
+ /* Nothing to do -- no lazy switching */
+}
+
+void sync_vcpu_execstate(struct vcpu *v)
+{
+ /*
+ * We don't support lazy switching.
+ *
+ * However the context may have been saved from a remote pCPU so we
+ * need a barrier to ensure it is observed before continuing.
+ *
+ * Per vcpu_context_saved(), the context can be observed when
+ * v->is_running is false (the caller should check it before calling
+ * this function).
+ *
+ * Note this is a full barrier to also prevent update of the context
+ * to happen before it was observed.
+ */
+ smp_mb();
+}
+
int vcpu_set_interrupt(struct vcpu *v, const unsigned int irq)
{
/*
diff --git a/xen/arch/riscv/setup.c b/xen/arch/riscv/setup.c
index 9b4835960d20..e8dbd55ce79e 100644
--- a/xen/arch/riscv/setup.c
+++ b/xen/arch/riscv/setup.c
@@ -12,6 +12,7 @@
#include <xen/serial.h>
#include <xen/shutdown.h>
#include <xen/smp.h>
+#include <xen/tasklet.h>
#include <xen/timer.h>
#include <xen/vmap.h>
#include <xen/xvmalloc.h>
@@ -133,6 +134,8 @@ void __init noreturn start_xen(unsigned long bootcpu_id,
panic("Booting using ACPI isn't supported\n");
}
+ tasklet_subsys_init();
+
init_IRQ();
riscv_fill_hwcap();
diff --git a/xen/arch/riscv/stubs.c b/xen/arch/riscv/stubs.c
index d120274af2fe..2b3eb01bf03c 100644
--- a/xen/arch/riscv/stubs.c
+++ b/xen/arch/riscv/stubs.c
@@ -91,16 +91,6 @@ void continue_running(struct vcpu *same)
BUG_ON("unimplemented");
}
-void sync_local_execstate(void)
-{
- BUG_ON("unimplemented");
-}
-
-void sync_vcpu_execstate(struct vcpu *v)
-{
- BUG_ON("unimplemented");
-}
-
void startup_cpu_idle_loop(void)
{
BUG_ON("unimplemented");
--
2.52.0
^ permalink raw reply related [flat|nested] 93+ messages in thread* Re: [PATCH v1 15/15] xen/riscv: init tasklet subsystem
2025-12-24 17:03 ` [PATCH v1 15/15] xen/riscv: init tasklet subsystem Oleksii Kurochko
@ 2026-01-12 16:20 ` Jan Beulich
2026-01-13 17:03 ` Oleksii Kurochko
0 siblings, 1 reply; 93+ messages in thread
From: Jan Beulich @ 2026-01-12 16:20 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 24.12.2025 18:03, Oleksii Kurochko wrote:
> As the tasklet subsystem is now initialized, it is necessary to implement
> sync_local_execstate(), since it is invoked when something calls
> tasklet_softirq_action(), which is registered in tasklet_subsys_init().
>
> After introducing sync_local_execstate(), the following linker issue occurs:
> riscv64-linux-gnu-ld: prelink.o: in function `bitmap_and':
> /build/xen/./include/xen/bitmap.h:147: undefined reference to
> `sync_vcpu_execstate'
> riscv64-linux-gnu-ld: ./.xen-syms.0: hidden symbol
> `sync_vcpu_execstate' isn't defined
> riscv64-linux-gnu-ld: final link failed: bad value
How that when ...
> --- a/xen/arch/riscv/stubs.c
> +++ b/xen/arch/riscv/stubs.c
> @@ -91,16 +91,6 @@ void continue_running(struct vcpu *same)
> BUG_ON("unimplemented");
> }
>
> -void sync_local_execstate(void)
> -{
> - BUG_ON("unimplemented");
> -}
> -
> -void sync_vcpu_execstate(struct vcpu *v)
> -{
> - BUG_ON("unimplemented");
> -}
... there was a (stub) implementation? (The code changes look okay, it's just
that I can't make sense of that part of the description.)
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 15/15] xen/riscv: init tasklet subsystem
2026-01-12 16:20 ` Jan Beulich
@ 2026-01-13 17:03 ` Oleksii Kurochko
2026-01-14 9:15 ` Jan Beulich
0 siblings, 1 reply; 93+ messages in thread
From: Oleksii Kurochko @ 2026-01-13 17:03 UTC (permalink / raw)
To: Jan Beulich
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 1/12/26 5:20 PM, Jan Beulich wrote:
> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>> As the tasklet subsystem is now initialized, it is necessary to implement
>> sync_local_execstate(), since it is invoked when something calls
>> tasklet_softirq_action(), which is registered in tasklet_subsys_init().
>>
>> After introducing sync_local_execstate(), the following linker issue occurs:
>> riscv64-linux-gnu-ld: prelink.o: in function `bitmap_and':
>> /build/xen/./include/xen/bitmap.h:147: undefined reference to
>> `sync_vcpu_execstate'
>> riscv64-linux-gnu-ld: ./.xen-syms.0: hidden symbol
>> `sync_vcpu_execstate' isn't defined
>> riscv64-linux-gnu-ld: final link failed: bad value
> How that when ...
>
>> --- a/xen/arch/riscv/stubs.c
>> +++ b/xen/arch/riscv/stubs.c
>> @@ -91,16 +91,6 @@ void continue_running(struct vcpu *same)
>> BUG_ON("unimplemented");
>> }
>>
>> -void sync_local_execstate(void)
>> -{
>> - BUG_ON("unimplemented");
>> -}
>> -
>> -void sync_vcpu_execstate(struct vcpu *v)
>> -{
>> - BUG_ON("unimplemented");
>> -}
> ... there was a (stub) implementation? (The code changes look okay, it's just
> that I can't make sense of that part of the description.)
I haven’t investigated this further. I wanted to look into it now, but I can’t
reproduce the issue anymore. I reverted|sync_vcpu_execstate()| to a stub and no
longer see the problem.
I will move the introduction of|sync_vcpu_execstate()|. It doesn’t seem to be
really needed at the moment, but since it is already introduced and there are no
specific comments against it, I think it can be added as a separate patch in this
series.
Thanks.
~ Olesii
^ permalink raw reply [flat|nested] 93+ messages in thread* Re: [PATCH v1 15/15] xen/riscv: init tasklet subsystem
2026-01-13 17:03 ` Oleksii Kurochko
@ 2026-01-14 9:15 ` Jan Beulich
0 siblings, 0 replies; 93+ messages in thread
From: Jan Beulich @ 2026-01-14 9:15 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Alistair Francis, Bob Eshleman, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 13.01.2026 18:03, Oleksii Kurochko wrote:
>
> On 1/12/26 5:20 PM, Jan Beulich wrote:
>> On 24.12.2025 18:03, Oleksii Kurochko wrote:
>>> As the tasklet subsystem is now initialized, it is necessary to implement
>>> sync_local_execstate(), since it is invoked when something calls
>>> tasklet_softirq_action(), which is registered in tasklet_subsys_init().
>>>
>>> After introducing sync_local_execstate(), the following linker issue occurs:
>>> riscv64-linux-gnu-ld: prelink.o: in function `bitmap_and':
>>> /build/xen/./include/xen/bitmap.h:147: undefined reference to
>>> `sync_vcpu_execstate'
>>> riscv64-linux-gnu-ld: ./.xen-syms.0: hidden symbol
>>> `sync_vcpu_execstate' isn't defined
>>> riscv64-linux-gnu-ld: final link failed: bad value
>> How that when ...
>>
>>> --- a/xen/arch/riscv/stubs.c
>>> +++ b/xen/arch/riscv/stubs.c
>>> @@ -91,16 +91,6 @@ void continue_running(struct vcpu *same)
>>> BUG_ON("unimplemented");
>>> }
>>>
>>> -void sync_local_execstate(void)
>>> -{
>>> - BUG_ON("unimplemented");
>>> -}
>>> -
>>> -void sync_vcpu_execstate(struct vcpu *v)
>>> -{
>>> - BUG_ON("unimplemented");
>>> -}
>> ... there was a (stub) implementation? (The code changes look okay, it's just
>> that I can't make sense of that part of the description.)
>
> I haven’t investigated this further. I wanted to look into it now, but I can’t
> reproduce the issue anymore. I reverted|sync_vcpu_execstate()| to a stub and no
> longer see the problem.
>
> I will move the introduction of|sync_vcpu_execstate()|. It doesn’t seem to be
> really needed at the moment, but since it is already introduced and there are no
> specific comments against it, I think it can be added as a separate patch in this
> series.
Just to mention: Moving it right here looks to make sense to me. It's just that
the description of the change was irritating.
Jan
^ permalink raw reply [flat|nested] 93+ messages in thread