From: Leonardo Bras <leo.bras@arm.com>
To: Tian Zheng <zhengtian10@huawei.com>
Cc: Leonardo Bras <leo.bras@arm.com>,
maz@kernel.org, oupton@kernel.org, catalin.marinas@arm.com,
corbet@lwn.net, pbonzini@redhat.com, will@kernel.org,
yuzenghui@huawei.com, wangzhou1@hisilicon.com,
liuyonglong@huawei.com, Jonathan.Cameron@huawei.com,
yezhenyu2@huawei.com, linuxarm@huawei.com, joey.gouly@arm.com,
kvmarm@lists.linux.dev, kvm@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, skhan@linuxfoundation.org,
suzuki.poulose@arm.com
Subject: Re: [PATCH v3 4/5] KVM: arm64: Enable HDBSS support and handle HDBSSF events
Date: Thu, 12 Mar 2026 14:58:28 +0000 [thread overview]
Message-ID: <abLUlL2B2KbifB-4@devkitleo> (raw)
In-Reply-To: <46657927-5608-46f0-9b4c-159407f08880@huawei.com>
On Thu, Mar 12, 2026 at 09:13:33PM +0800, Tian Zheng wrote:
>
> On 3/12/2026 8:06 PM, Leonardo Bras wrote:
> > On Thu, Mar 12, 2026 at 02:17:41PM +0800, Tian Zheng wrote:
> > > On 3/6/2026 11:01 PM, Leonardo Bras wrote:
> > > > On Fri, Mar 06, 2026 at 05:27:58PM +0800, Tian Zheng wrote:
> > > > > Hi Leo,
> > > > >
> > > > > On 3/4/2026 11:40 PM, Leonardo Bras wrote:
> > > > > > Hi Tian,
> > > > > >
> > > > > > Few extra notes/questions below
> > > > > >
> > > > > > On Wed, Feb 25, 2026 at 12:04:20PM +0800, Tian Zheng wrote:
> > > > > > > From: eillon<yezhenyu2@huawei.com>
> > > > > > >
> > > > > > > HDBSS is enabled via an ioctl from userspace (e.g. QEMU) at the start of
> > > > > > > migration. This feature is only supported in VHE mode.
> > > > > > >
> > > > > > > Initially, S2 PTEs doesn't contain the DBM attribute. During migration,
> > > > > > > write faults are handled by user_mem_abort, which relaxes permissions
> > > > > > > and adds the DBM bit when HDBSS is active. Once DBM is set, subsequent
> > > > > > > writes no longer trap, as the hardware automatically transitions the page
> > > > > > > from writable-clean to writable-dirty.
> > > > > > >
> > > > > > > KVM does not scan S2 page tables to consume DBM. Instead, when HDBSS is
> > > > > > > enabled, the hardware observes the clean->dirty transition and records
> > > > > > > the corresponding page into the HDBSS buffer.
> > > > > > >
> > > > > > > During sync_dirty_log, KVM kicks all vCPUs to force VM-Exit, ensuring
> > > > > > > that check_vcpu_requests flushes the HDBSS buffer and propagates the
> > > > > > > accumulated dirty information into the userspace-visible dirty bitmap.
> > > > > > >
> > > > > > > Add fault handling for HDBSS including buffer full, external abort, and
> > > > > > > general protection fault (GPF).
> > > > > > >
> > > > > > > Signed-off-by: eillon<yezhenyu2@huawei.com>
> > > > > > > Signed-off-by: Tian Zheng<zhengtian10@huawei.com>
> > > > > > > ---
> > > > > > > arch/arm64/include/asm/esr.h | 5 ++
> > > > > > > arch/arm64/include/asm/kvm_host.h | 17 +++++
> > > > > > > arch/arm64/include/asm/kvm_mmu.h | 1 +
> > > > > > > arch/arm64/include/asm/sysreg.h | 11 ++++
> > > > > > > arch/arm64/kvm/arm.c | 102 ++++++++++++++++++++++++++++++
> > > > > > > arch/arm64/kvm/hyp/vhe/switch.c | 19 ++++++
> > > > > > > arch/arm64/kvm/mmu.c | 70 ++++++++++++++++++++
> > > > > > > arch/arm64/kvm/reset.c | 3 +
> > > > > > > 8 files changed, 228 insertions(+)
> > > > > > >
> > > > > > > diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> > > > > > > index 81c17320a588..2e6b679b5908 100644
> > > > > > > --- a/arch/arm64/include/asm/esr.h
> > > > > > > +++ b/arch/arm64/include/asm/esr.h
> > > > > > > @@ -437,6 +437,11 @@
> > > > > > > #ifndef __ASSEMBLER__
> > > > > > > #include <asm/types.h>
> > > > > > >
> > > > > > > +static inline bool esr_iss2_is_hdbssf(unsigned long esr)
> > > > > > > +{
> > > > > > > + return ESR_ELx_ISS2(esr) & ESR_ELx_HDBSSF;
> > > > > > > +}
> > > > > > > +
> > > > > > > static inline unsigned long esr_brk_comment(unsigned long esr)
> > > > > > > {
> > > > > > > return esr & ESR_ELx_BRK64_ISS_COMMENT_MASK;
> > > > > > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > > > > > index 5d5a3bbdb95e..57ee6b53e061 100644
> > > > > > > --- a/arch/arm64/include/asm/kvm_host.h
> > > > > > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > > > > > @@ -55,12 +55,17 @@
> > > > > > > #define KVM_REQ_GUEST_HYP_IRQ_PENDING KVM_ARCH_REQ(9)
> > > > > > > #define KVM_REQ_MAP_L1_VNCR_EL2 KVM_ARCH_REQ(10)
> > > > > > > #define KVM_REQ_VGIC_PROCESS_UPDATE KVM_ARCH_REQ(11)
> > > > > > > +#define KVM_REQ_FLUSH_HDBSS KVM_ARCH_REQ(12)
> > > > > > >
> > > > > > > #define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
> > > > > > > KVM_DIRTY_LOG_INITIALLY_SET)
> > > > > > >
> > > > > > > #define KVM_HAVE_MMU_RWLOCK
> > > > > > >
> > > > > > > +/* HDBSS entry field definitions */
> > > > > > > +#define HDBSS_ENTRY_VALID BIT(0)
> > > > > > > +#define HDBSS_ENTRY_IPA GENMASK_ULL(55, 12)
> > > > > > > +
> > > > > > > /*
> > > > > > > * Mode of operation configurable with kvm-arm.mode early param.
> > > > > > > * See Documentation/admin-guide/kernel-parameters.txt for more information.
> > > > > > > @@ -84,6 +89,7 @@ int __init kvm_arm_init_sve(void);
> > > > > > > u32 __attribute_const__ kvm_target_cpu(void);
> > > > > > > void kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> > > > > > > void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
> > > > > > > +void kvm_arm_vcpu_free_hdbss(struct kvm_vcpu *vcpu);
> > > > > > >
> > > > > > > struct kvm_hyp_memcache {
> > > > > > > phys_addr_t head;
> > > > > > > @@ -405,6 +411,8 @@ struct kvm_arch {
> > > > > > > * the associated pKVM instance in the hypervisor.
> > > > > > > */
> > > > > > > struct kvm_protected_vm pkvm;
> > > > > > > +
> > > > > > > + bool enable_hdbss;
> > > > > > > };
> > > > > > >
> > > > > > > struct kvm_vcpu_fault_info {
> > > > > > > @@ -816,6 +824,12 @@ struct vcpu_reset_state {
> > > > > > > bool reset;
> > > > > > > };
> > > > > > >
> > > > > > > +struct vcpu_hdbss_state {
> > > > > > > + phys_addr_t base_phys;
> > > > > > > + u32 size;
> > > > > > > + u32 next_index;
> > > > > > > +};
> > > > > > > +
> > > > > > IIUC this is used once both on enable/disable and massively on
> > > > > > vcpu_put/get.
> > > > > >
> > > > > > What if we actually save just HDBSSBR_EL2 and HDBSSPROD_EL2 instead?
> > > > > > That way we avoid having masking operations in put/get as well as any
> > > > > > possible error we may have formatting those.
> > > > > >
> > > > > > The cost is doing those operations once for enable and once for disable,
> > > > > > which should be fine.
> > > > Hi Tian,
> > > >
> > > > > Thanks for the suggestion. I actually started with storing the raw system
> > > > > register
> > > > >
> > > > > values, as you proposed.
> > > > >
> > > > >
> > > > > However, after discussing it with Oliver Upton in v1, we felt that keeping
> > > > > the base address,
> > > > >
> > > > > size, and index as separate fields makes the state easier to understand.
> > > > >
> > > > >
> > > > > Discussion
> > > > > link:https://lore.kernel.org/linux-arm-kernel/Z8_usklidqnerurc@linux.dev/
> > > > > <https://lore.kernel.org/linux-arm-kernel/Z8_usklidqnerurc@linux.dev/>
> > > > >
> > > > >
> > > > > That's why I ended up changing the storage approach in the end.
> > > > >
> > > > >
> > > > Humm, FWIW I disagree with the above argument.
> > > > I would argue that vcpu_put should save the registers, and not
> > > > actually know what they are about or how are they formatted at this point.
> > > >
> > > > The responsibility of understanding it's fields and usage value should be
> > > > in the code that actually uses it.
> > > >
> > > > IIUC on kvm_vcpu_put_vhe and kvm_vcpu_load_vhe there are calls to other
> > > > functions than only save the register as it is.
> > >
> > > ok, thx! I'll update the struct to store only the raw register values.
> > Awesome :)
> >
> > >
> > > > > > > struct vncr_tlb;
> > > > > > >
> > > > > > > struct kvm_vcpu_arch {
> > > > > > > @@ -920,6 +934,9 @@ struct kvm_vcpu_arch {
> > > > > > >
> > > > > > > /* Per-vcpu TLB for VNCR_EL2 -- NULL when !NV */
> > > > > > > struct vncr_tlb *vncr_tlb;
> > > > > > > +
> > > > > > > + /* HDBSS registers info */
> > > > > > > + struct vcpu_hdbss_state hdbss;
> > > > > > > };
> > > > > > >
> > > > > > > /*
> > > > > > > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > > > > > > index d968aca0461a..3fea8cfe8869 100644
> > > > > > > --- a/arch/arm64/include/asm/kvm_mmu.h
> > > > > > > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > > > > > > @@ -183,6 +183,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
> > > > > > >
> > > > > > > int kvm_handle_guest_sea(struct kvm_vcpu *vcpu);
> > > > > > > int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
> > > > > > > +void kvm_flush_hdbss_buffer(struct kvm_vcpu *vcpu);
> > > > > > >
> > > > > > > phys_addr_t kvm_mmu_get_httbr(void);
> > > > > > > phys_addr_t kvm_get_idmap_vector(void);
> > > > > > > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> > > > > > > index f4436ecc630c..d11f4d0dd4e7 100644
> > > > > > > --- a/arch/arm64/include/asm/sysreg.h
> > > > > > > +++ b/arch/arm64/include/asm/sysreg.h
> > > > > > > @@ -1039,6 +1039,17 @@
> > > > > > >
> > > > > > > #define GCS_CAP(x) ((((unsigned long)x) & GCS_CAP_ADDR_MASK) | \
> > > > > > > GCS_CAP_VALID_TOKEN)
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Definitions for the HDBSS feature
> > > > > > > + */
> > > > > > > +#define HDBSS_MAX_SIZE HDBSSBR_EL2_SZ_2MB
> > > > > > > +
> > > > > > > +#define HDBSSBR_EL2(baddr, sz) (((baddr) & GENMASK(55, 12 + sz)) | \
> > > > > > > + FIELD_PREP(HDBSSBR_EL2_SZ_MASK, sz))
> > > > > > > +
> > > > > > > +#define HDBSSPROD_IDX(prod) FIELD_GET(HDBSSPROD_EL2_INDEX_MASK, prod)
> > > > > > > +
> > > > > > > /*
> > > > > > > * Definitions for GICv5 instructions
> > > > > > > */
> > > > > > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > > > > > index 29f0326f7e00..d64da05e25c4 100644
> > > > > > > --- a/arch/arm64/kvm/arm.c
> > > > > > > +++ b/arch/arm64/kvm/arm.c
> > > > > > > @@ -125,6 +125,87 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
> > > > > > > return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
> > > > > > > }
> > > > > > >
> > > > > > > +void kvm_arm_vcpu_free_hdbss(struct kvm_vcpu *vcpu)
> > > > > > > +{
> > > > > > > + struct page *hdbss_pg;
> > > > > > > +
> > > > > > > + hdbss_pg = phys_to_page(vcpu->arch.hdbss.base_phys);
> > > > > > > + if (hdbss_pg)
> > > > > > > + __free_pages(hdbss_pg, vcpu->arch.hdbss.size);
> > > > > > > +
> > > > > > > + vcpu->arch.hdbss.size = 0;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static int kvm_cap_arm_enable_hdbss(struct kvm *kvm,
> > > > > > > + struct kvm_enable_cap *cap)
> > > > > > > +{
> > > > > > > + unsigned long i;
> > > > > > > + struct kvm_vcpu *vcpu;
> > > > > > > + struct page *hdbss_pg = NULL;
> > > > > > > + __u64 size = cap->args[0];
> > > > > > > + bool enable = cap->args[1] ? true : false;
> > > > > > > +
> > > > > > > + if (!system_supports_hdbss())
> > > > > > > + return -EINVAL;
> > > > > > > +
> > > > > > > + if (size > HDBSS_MAX_SIZE)
> > > > > > > + return -EINVAL;
> > > > > > > +
> > > > > > > + if (!enable && !kvm->arch.enable_hdbss) /* Already Off */
> > > > > > > + return 0;
> > > > > > > +
> > > > > > > + if (enable && kvm->arch.enable_hdbss) /* Already On, can't set size */
> > > > > > > + return -EINVAL;
> > > > > > > +
> > > > > > > + if (!enable) { /* Turn it off */
> > > > > > > + kvm->arch.mmu.vtcr &= ~(VTCR_EL2_HD | VTCR_EL2_HDBSS | VTCR_EL2_HA);
> > > > > > > +
> > > > > > > + kvm_for_each_vcpu(i, vcpu, kvm) {
> > > > > > > + /* Kick vcpus to flush hdbss buffer. */
> > > > > > > + kvm_vcpu_kick(vcpu);
> > > > > > > +
> > > > > > > + kvm_arm_vcpu_free_hdbss(vcpu);
> > > > > > > + }
> > > > > > > +
> > > > > > > + kvm->arch.enable_hdbss = false;
> > > > > > > +
> > > > > > > + return 0;
> > > > > > > + }
> > > > > > > +
> > > > > > > + /* Turn it on */
> > > > > > > + kvm_for_each_vcpu(i, vcpu, kvm) {
> > > > > > > + hdbss_pg = alloc_pages(GFP_KERNEL_ACCOUNT, size);
> > > > > > > + if (!hdbss_pg)
> > > > > > > + goto error_alloc;
> > > > > > > +
> > > > > > > + vcpu->arch.hdbss = (struct vcpu_hdbss_state) {
> > > > > > > + .base_phys = page_to_phys(hdbss_pg),
> > > > > > > + .size = size,
> > > > > > > + .next_index = 0,
> > > > > > > + };
> > > > > > > + }
> > > > > > > +
> > > > > > > + kvm->arch.enable_hdbss = true;
> > > > > > > + kvm->arch.mmu.vtcr |= VTCR_EL2_HD | VTCR_EL2_HDBSS | VTCR_EL2_HA;
> > > > > > > +
> > > > > > > + /*
> > > > > > > + * We should kick vcpus out of guest mode here to load new
> > > > > > > + * vtcr value to vtcr_el2 register when re-enter guest mode.
> > > > > > > + */
> > > > > > > + kvm_for_each_vcpu(i, vcpu, kvm)
> > > > > > > + kvm_vcpu_kick(vcpu);
> > > > > > > +
> > > > > > > + return 0;
> > > > > > > +
> > > > > > > +error_alloc:
> > > > > > > + kvm_for_each_vcpu(i, vcpu, kvm) {
> > > > > > > + if (vcpu->arch.hdbss.base_phys)
> > > > > > > + kvm_arm_vcpu_free_hdbss(vcpu);
> > > > > > > + }
> > > > > > > +
> > > > > > > + return -ENOMEM;
> > > > > > > +}
> > > > > > > +
> > > > > > > int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > > > > > > struct kvm_enable_cap *cap)
> > > > > > > {
> > > > > > > @@ -182,6 +263,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > > > > > > r = 0;
> > > > > > > set_bit(KVM_ARCH_FLAG_EXIT_SEA, &kvm->arch.flags);
> > > > > > > break;
> > > > > > > + case KVM_CAP_ARM_HW_DIRTY_STATE_TRACK:
> > > > > > > + mutex_lock(&kvm->lock);
> > > > > > > + r = kvm_cap_arm_enable_hdbss(kvm, cap);
> > > > > > > + mutex_unlock(&kvm->lock);
> > > > > > > + break;
> > > > > > > default:
> > > > > > > break;
> > > > > > > }
> > > > > > > @@ -471,6 +557,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > > > > > > r = kvm_supports_cacheable_pfnmap();
> > > > > > > break;
> > > > > > >
> > > > > > > + case KVM_CAP_ARM_HW_DIRTY_STATE_TRACK:
> > > > > > > + r = system_supports_hdbss();
> > > > > > > + break;
> > > > > > > default:
> > > > > > > r = 0;
> > > > > > > }
> > > > > > > @@ -1120,6 +1209,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
> > > > > > > if (kvm_dirty_ring_check_request(vcpu))
> > > > > > > return 0;
> > > > > > >
> > > > > > > + if (kvm_check_request(KVM_REQ_FLUSH_HDBSS, vcpu))
> > > > > > > + kvm_flush_hdbss_buffer(vcpu);
> > > > > > I am curious on why we need a flush-hdbss request,
> > > > > > Don't we have the flush function happening every time we run vcpu_put?
> > > > > >
> > > > > > Oh, I see, you want to check if there is anything needed inside the inner
> > > > > > loop of vcpu_run, without having to vcpu_put. I think it makes sense.
> > > > > >
> > > > > > But instead of having this on guest entry, does not it make more sense to
> > > > > > have it in guest exit? This way we flush every time (if needed) we exit the
> > > > > > guest, and instead of having a vcpu request, we just require a vcpu kick
> > > > > > and it should flush if needed.
> > > > > >
> > > > > > Maybe have vcpu_put just save the registers, and add a the flush before
> > > > > > handle_exit.
> > > > > >
> > > > > > What do you think?
> > > > > Thank you for the feedback.
> > > > >
> > > > >
> > > > > Indeed, in the initial version (v1), I placed the flush operation inside
> > > > > handle_exit and
> > > > >
> > > > > used a vcpu_kick in kvm_arch_sync_dirty_log to trigger the flush of the
> > > > > HDBSS buffer.
> > > > >
> > > > >
> > > > > However, during the review, Marc pointed out that calling this function on
> > > > > every exit
> > > > >
> > > > > event is too frequent if it's not always needed.
> > > > >
> > > > >
> > > > > Discussion link:
> > > > > _https://lore.kernel.org/linux-arm-kernel/86senjony9.wl-maz@kernel.org/_
> > > > >
> > > > >
> > > > > I agreed with his assessment. Therefore, in the current version, I've
> > > > > separated the flush
> > > > >
> > > > > operation into more specific and less frequent points:
> > > > >
> > > > >
> > > > > 1. In vcpu_put
> > > > >
> > > > > 2. During dirty log synchronization, by kicking the vCPU to trigger a
> > > > > request that flushes
> > > > >
> > > > > on its next exit.
> > > > >
> > > > >
> > > > > 3. When handling a specific HDBSSF event.
> > > > >
> > > > >
> > > > > This ensures the flush happens only when necessary, avoiding the overhead of
> > > > > doing it
> > > > >
> > > > > on every guest exit.
> > > > >
> > > > Fair enough, calling it every time you go in the inner loop may be too
> > > > much, even with a check to make sure it needs to run.
> > > >
> > > > Having it as a request means you may do that sometimes without
> > > > leaving the inner loop. That could be useful if you want to use it with the
> > > > IRQ handler to deal with full buffer, or any error, as well as dealing with
> > > > a regular request in the 2nd case.
> > > >
> > > > While I agree it's needed to run before leaving guest context (i.e leaving
> > > > the inner loop), I am not really sure vcpu_put is the best place to put the
> > > > flushing. I may be wrong, but for me it looks more like of a place to save
> > > > registers and context, as well as dropping refcounts or something like
> > > > that. I would not expect a flush happening in vcpu_put, if I was reading
> > > > the code.
> > > >
> > > > Would it be too bad if we had it into a call before vcpu_put, at
> > > > kvm_arch_vcpu_ioctl_run()?
> > > >
> > > Thanks for the clarification. After looking again at the code paths, I agree
> > > that
> > >
> > > kvm_vcpu_put_vhe() and kvm_arch_vcpu_put() are really meant to be pure
> > > save/restore
> > >
> > > paths, so embedding HDBSS flushing there isn't ideal.
> > >
> > >
> > > My remaining concern is that kvm_arch_vcpu_ioctl_run() doesn't cover the
> > > case where
> > >
> > > the vCPU is scheduled out. In that case we still leave guest context, but we
> > > don't return
> > >
> > > through the run loop, so a flush placed only in ioctl_run() wouldn't run.
> > >
> > >
> > > Any suggestions on where this should hook in?
> > You mention that on vcpu_put it works, right? maybe it's worth to track
> > down which vcpu_put users would make sense to flush before it's calling.
> >
> > I found that vcpu_put is called only in the vcpu_run, but it's
> > arch-specific version is called in:
> >
> > kvm_debug_handle_oslar : put and load in sequence, flush shouldnt be needed
> > kvm_emulate_nested_eret : same as above, not sure if nested will be supported
> > kvm_inject_nested : same as above
> > kvm_reset_vcpu : put and load in sequence, not needed [1]
> > kvm_sched_out : that's ran on sched-out, that makes sense for us
> > vcpu_put : called only by vcpu_run, where we already planned to use
> >
> > Which brings us other benefit of not having that in vcpu_put: the flush
> > could be happening on functions that should originally ran fast, and having
> > it outside vcpu_put allows us to decide if we need it.
> >
> > So, having the flush happening in kvm_arch_vcpu_ioctl_run() and
> > kvm_sched_out() should be enough, on top of the per-request you
> > mentioned before.
> >
> > [1]: the vcpu_reset: not sure how often this does happen, and if it would
> > be interesting flushing here as well. It seems to be called on init, so not
> > the case where would be something to flush, and from a case where it's
> > restoring the registers. So I think it should be safe to not flush here,
> > so it should be a question of 'maybe being interesting', which I am not
> > sure.
>
>
> Got it, makes sense.
>
>
> HDBSS doesn't apply to nested or other complex cases, so we only need to
> flush in
>
> kvm_arch_vcpu_ioctl_run() and kvm_sched_out().
>
>
> I'll update the code accordingly and test to make sure we haven't missed any
> edge cases.
>
>
> Thanks!
>
> Tian
>
Awesome! Thanks!
Leo
> > > Would introducing a small
> > > arch‑specific
> > >
> > > "guest exit" helper, invoked fromm kvm_arch_vcpu_put(), be acceptable?
> > >
> > IIUC that would be the same as the previous one: we would have the flush
> > happening inside a function that is supposed to save registers.
> >
> > Thanks!
> > Leo
> >
> >
> > > Thanks!
> > >
> > > Tian
> > >
> > >
> > > > > > > +
> > > > > > > check_nested_vcpu_requests(vcpu);
> > > > > > > }
> > > > > > >
> > > > > > > @@ -1898,7 +1990,17 @@ long kvm_arch_vcpu_unlocked_ioctl(struct file *filp, unsigned int ioctl,
> > > > > > >
> > > > > > > void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
> > > > > > > {
> > > > > > > + /*
> > > > > > > + * Flush all CPUs' dirty log buffers to the dirty_bitmap. Called
> > > > > > > + * before reporting dirty_bitmap to userspace. Send a request with
> > > > > > > + * KVM_REQUEST_WAIT to flush buffer synchronously.
> > > > > > > + */
> > > > > > > + struct kvm_vcpu *vcpu;
> > > > > > > +
> > > > > > > + if (!kvm->arch.enable_hdbss)
> > > > > > > + return;
> > > > > > >
> > > > > > > + kvm_make_all_cpus_request(kvm, KVM_REQ_FLUSH_HDBSS);
> > > > > > > }
> > > > > > >
> > > > > > > static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
> > > > > > > diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> > > > > > > index 9db3f11a4754..600cbc4f8ae9 100644
> > > > > > > --- a/arch/arm64/kvm/hyp/vhe/switch.c
> > > > > > > +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> > > > > > > @@ -213,6 +213,23 @@ static void __vcpu_put_deactivate_traps(struct kvm_vcpu *vcpu)
> > > > > > > local_irq_restore(flags);
> > > > > > > }
> > > > > > >
> > > > > > > +static void __load_hdbss(struct kvm_vcpu *vcpu)
> > > > > > > +{
> > > > > > > + struct kvm *kvm = vcpu->kvm;
> > > > > > > + u64 br_el2, prod_el2;
> > > > > > > +
> > > > > > > + if (!kvm->arch.enable_hdbss)
> > > > > > > + return;
> > > > > > > +
> > > > > > > + br_el2 = HDBSSBR_EL2(vcpu->arch.hdbss.base_phys, vcpu->arch.hdbss.size);
> > > > > > > + prod_el2 = vcpu->arch.hdbss.next_index;
> > > > > > > +
> > > > > > > + write_sysreg_s(br_el2, SYS_HDBSSBR_EL2);
> > > > > > > + write_sysreg_s(prod_el2, SYS_HDBSSPROD_EL2);
> > > > > > > +
> > > > > > > + isb();
> > > > > > > +}
> > > > > > > +
> > > > > > > void kvm_vcpu_load_vhe(struct kvm_vcpu *vcpu)
> > > > > > > {
> > > > > > > host_data_ptr(host_ctxt)->__hyp_running_vcpu = vcpu;
> > > > > > > @@ -220,10 +237,12 @@ void kvm_vcpu_load_vhe(struct kvm_vcpu *vcpu)
> > > > > > > __vcpu_load_switch_sysregs(vcpu);
> > > > > > > __vcpu_load_activate_traps(vcpu);
> > > > > > > __load_stage2(vcpu->arch.hw_mmu, vcpu->arch.hw_mmu->arch);
> > > > > > > + __load_hdbss(vcpu);
> > > > > > > }
> > > > > > >
> > > > > > > void kvm_vcpu_put_vhe(struct kvm_vcpu *vcpu)
> > > > > > > {
> > > > > > > + kvm_flush_hdbss_buffer(vcpu);
> > > > > > > __vcpu_put_deactivate_traps(vcpu);
> > > > > > > __vcpu_put_switch_sysregs(vcpu);
> > > > > > >
> > > > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > > > > > index 070a01e53fcb..42b0710a16ce 100644
> > > > > > > --- a/arch/arm64/kvm/mmu.c
> > > > > > > +++ b/arch/arm64/kvm/mmu.c
> > > > > > > @@ -1896,6 +1896,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > > > > > > if (writable)
> > > > > > > prot |= KVM_PGTABLE_PROT_W;
> > > > > > >
> > > > > > > + if (writable && kvm->arch.enable_hdbss && logging_active)
> > > > > > > + prot |= KVM_PGTABLE_PROT_DBM;
> > > > > > > +
> > > > > > > if (exec_fault)
> > > > > > > prot |= KVM_PGTABLE_PROT_X;
> > > > > > >
> > > > > > > @@ -2033,6 +2036,70 @@ int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
> > > > > > > return 0;
> > > > > > > }
> > > > > > >
> > > > > > > +void kvm_flush_hdbss_buffer(struct kvm_vcpu *vcpu)
> > > > > > > +{
> > > > > > > + int idx, curr_idx;
> > > > > > > + u64 br_el2;
> > > > > > > + u64 *hdbss_buf;
> > > > > > > + struct kvm *kvm = vcpu->kvm;
> > > > > > > +
> > > > > > > + if (!kvm->arch.enable_hdbss)
> > > > > > > + return;
> > > > > > > +
> > > > > > > + curr_idx = HDBSSPROD_IDX(read_sysreg_s(SYS_HDBSSPROD_EL2));
> > > > > > > + br_el2 = HDBSSBR_EL2(vcpu->arch.hdbss.base_phys, vcpu->arch.hdbss.size);
> > > > > > > +
> > > > > > > + /* Do nothing if HDBSS buffer is empty or br_el2 is NULL */
> > > > > > > + if (curr_idx == 0 || br_el2 == 0)
> > > > > > > + return;
> > > > > > > +
> > > > > > > + hdbss_buf = page_address(phys_to_page(vcpu->arch.hdbss.base_phys));
> > > > > > > + if (!hdbss_buf)
> > > > > > > + return;
> > > > > > > +
> > > > > > > + guard(write_lock_irqsave)(&vcpu->kvm->mmu_lock);
> > > > > > > + for (idx = 0; idx < curr_idx; idx++) {
> > > > > > > + u64 gpa;
> > > > > > > +
> > > > > > > + gpa = hdbss_buf[idx];
> > > > > > > + if (!(gpa & HDBSS_ENTRY_VALID))
> > > > > > > + continue;
> > > > > > > +
> > > > > > > + gpa &= HDBSS_ENTRY_IPA;
> > > > > > > + kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT);
> > > > > > > + }
> > > > > > This will mark a page dirty for both dirty_bitmap or dirty_ring, depending
> > > > > > on what is in use.
> > > > > >
> > > > > > Out of plain curiosity, have you planned / tested for the dirty-ring as
> > > > > > well, or just for dirty-bitmap?
> > > > > Currently, I have only tested this with dirty-bitmap mode.
> > > > >
> > > > >
> > > > > I will test and ensure the HDBSS feature works correctly with dirty-ring in
> > > > > the next version.
> > > > >
> > > > >
> > > > Thanks!
> > > >
> > > > > > > +
> > > > > > > + /* reset HDBSS index */
> > > > > > > + write_sysreg_s(0, SYS_HDBSSPROD_EL2);
> > > > > > > + vcpu->arch.hdbss.next_index = 0;
> > > > > > > + isb();
> > > > > > > +}
> > > > > > > +
> > > > > > > +static int kvm_handle_hdbss_fault(struct kvm_vcpu *vcpu)
> > > > > > > +{
> > > > > > > + u64 prod;
> > > > > > > + u64 fsc;
> > > > > > > +
> > > > > > > + prod = read_sysreg_s(SYS_HDBSSPROD_EL2);
> > > > > > > + fsc = FIELD_GET(HDBSSPROD_EL2_FSC_MASK, prod);
> > > > > > > +
> > > > > > > + switch (fsc) {
> > > > > > > + case HDBSSPROD_EL2_FSC_OK:
> > > > > > > + /* Buffer full, which is reported as permission fault. */
> > > > > > > + kvm_flush_hdbss_buffer(vcpu);
> > > > > > > + return 1;
> > > > > > Humm, flushing in a fault handler means hanging there, in IRQ context, for
> > > > > > a while.
> > > > > >
> > > > > > Since we already deal with this on guest_exit (vcpu_put IIUC), why not just
> > > > > > return in a way the vcpu has to exit the inner loop and let it flush there
> > > > > > instead?
> > > > > >
> > > > > > Thanks!
> > > > > > Leo
> > > > > Thanks for the feedback.
> > > > >
> > > > >
> > > > > If we flush on every guest exit (by moving the flush before handle_exit,
> > > > > then we can
> > > > >
> > > > > indeed drop the flush from the fault handler and from vcpu_put.
> > > > >
> > > > >
> > > > > However, given Marc's earlier concern about not imposing this overhead on
> > > > > all vCPUs,
> > > > >
> > > > > I'd rather avoid flushing on every exit.
> > > > >
> > > > >
> > > > > My current plan is to set a request bit in kvm_handle_hdbss_fault (via
> > > > > kvm_make_request),
> > > > >
> > > > > and move the actual flush to the normal exit path, where it can execute in a
> > > > > safe context.
> > > > >
> > > > > This also allows us to remove the flush from the fault handler entirely.
> > > > >
> > > > >
> > > > > Does that approach sound reasonable to you?
> > > > >
> > > > >
> > > > Yes, I think it looks much better, as the fault will cause guest to exit,
> > > > and it can run the flush on it's way back in.
> > > >
> > > > Thanks!
> > > > Leo
> > > >
> > > > > > > + case HDBSSPROD_EL2_FSC_ExternalAbort:
> > > > > > > + case HDBSSPROD_EL2_FSC_GPF:
> > > > > > > + return -EFAULT;
> > > > > > > + default:
> > > > > > > + /* Unknown fault. */
> > > > > > > + WARN_ONCE(1,
> > > > > > > + "Unexpected HDBSS fault type, FSC: 0x%llx (prod=0x%llx, vcpu=%d)\n",
> > > > > > > + fsc, prod, vcpu->vcpu_id);
> > > > > > > + return -EFAULT;
> > > > > > > + }
> > > > > > > +}
> > > > > > > +
> > > > > > > /**
> > > > > > > * kvm_handle_guest_abort - handles all 2nd stage aborts
> > > > > > > * @vcpu: the VCPU pointer
> > > > > > > @@ -2071,6 +2138,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> > > > > > >
> > > > > > > is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
> > > > > > >
> > > > > > > + if (esr_iss2_is_hdbssf(esr))
> > > > > > > + return kvm_handle_hdbss_fault(vcpu);
> > > > > > > +
> > > > > > > if (esr_fsc_is_translation_fault(esr)) {
> > > > > > > /* Beyond sanitised PARange (which is the IPA limit) */
> > > > > > > if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {
> > > > > > > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> > > > > > > index 959532422d3a..c03a4b310b53 100644
> > > > > > > --- a/arch/arm64/kvm/reset.c
> > > > > > > +++ b/arch/arm64/kvm/reset.c
> > > > > > > @@ -161,6 +161,9 @@ void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
> > > > > > > free_page((unsigned long)vcpu->arch.ctxt.vncr_array);
> > > > > > > kfree(vcpu->arch.vncr_tlb);
> > > > > > > kfree(vcpu->arch.ccsidr);
> > > > > > > +
> > > > > > > + if (vcpu->kvm->arch.enable_hdbss)
> > > > > > > + kvm_arm_vcpu_free_hdbss(vcpu);
> > > > > > > }
> > > > > > >
> > > > > > > static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
> > > > > > > --
> > > > > > > 2.33.0
> > > > > Thanks!
> > > > >
> > > > > Tian
> > > > >
next prev parent reply other threads:[~2026-03-12 14:58 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-25 4:04 [PATCH v3 0/5] Support the FEAT_HDBSS introduced in Armv9.5 Tian Zheng
2026-02-25 4:04 ` [PATCH v3 1/5] arm64/sysreg: Add HDBSS related register information Tian Zheng
2026-02-25 4:04 ` [PATCH v3 2/5] KVM: arm64: Add support to set the DBM attr during memory abort Tian Zheng
2026-02-25 4:04 ` [PATCH v3 3/5] KVM: arm64: Add support for FEAT_HDBSS Tian Zheng
2026-02-25 4:04 ` [PATCH v3 4/5] KVM: arm64: Enable HDBSS support and handle HDBSSF events Tian Zheng
2026-02-25 17:46 ` Leonardo Bras
2026-02-27 10:47 ` Tian Zheng
2026-02-27 14:10 ` Leonardo Bras
2026-03-04 3:06 ` Tian Zheng
2026-03-04 12:08 ` Leonardo Bras
2026-03-05 7:37 ` Tian Zheng
2026-03-04 15:40 ` Leonardo Bras
2026-03-06 9:27 ` Tian Zheng
2026-03-06 15:01 ` Leonardo Bras
2026-03-12 6:17 ` Tian Zheng
2026-03-12 12:06 ` Leonardo Bras
2026-03-12 13:13 ` Tian Zheng
2026-03-12 14:58 ` Leonardo Bras [this message]
2026-03-25 18:05 ` Leonardo Bras
2026-03-25 18:20 ` [PATCH] arm64/kvm: Enable eager hugepage splitting if HDBSS is available Leonardo Bras
2026-03-27 7:40 ` Tian Zheng
2026-03-26 14:31 ` [PATCH v3 4/5] KVM: arm64: Enable HDBSS support and handle HDBSSF events Leonardo Bras
2026-03-27 7:35 ` Tian Zheng
2026-02-25 4:04 ` [PATCH v3 5/5] KVM: arm64: Document HDBSS ioctl Tian Zheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abLUlL2B2KbifB-4@devkitleo \
--to=leo.bras@arm.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=catalin.marinas@arm.com \
--cc=corbet@lwn.net \
--cc=joey.gouly@arm.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=liuyonglong@huawei.com \
--cc=maz@kernel.org \
--cc=oupton@kernel.org \
--cc=pbonzini@redhat.com \
--cc=skhan@linuxfoundation.org \
--cc=suzuki.poulose@arm.com \
--cc=wangzhou1@hisilicon.com \
--cc=will@kernel.org \
--cc=yezhenyu2@huawei.com \
--cc=yuzenghui@huawei.com \
--cc=zhengtian10@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox