From: Sebastian Ene <sebastianene@google.com>
To: Fuad Tabba <tabba@google.com>
Cc: alexandru.elisei@arm.com, kvmarm@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, android-kvm@google.com,
catalin.marinas@arm.com, dbrazdil@google.com, joey.gouly@arm.com,
kees@kernel.org, mark.rutland@arm.com, maz@kernel.org,
oupton@kernel.org, perlarsen@google.com, qperret@google.com,
rananta@google.com, smostafa@google.com, suzuki.poulose@arm.com,
tglx@kernel.org, vdonnefort@google.com, bgrzesik@google.com,
will@kernel.org, yuzenghui@huawei.com
Subject: Re: [PATCH 05/14] irqchip/gic-v3-its: Prepare shadow structures for KVM host deprivilege
Date: Fri, 20 Mar 2026 15:11:24 +0000 [thread overview]
Message-ID: <ab1jnHvmIeh0PExw@google.com> (raw)
In-Reply-To: <CA+EHjTyG-rUp1M_4a=adBwqhAJ=JuHL5GaLmMBSi5o1ycaSu3A@mail.gmail.com>
On Fri, Mar 13, 2026 at 11:26:04AM +0000, Fuad Tabba wrote:
Hi Fuad,
> Hi Sebastian,
>
> On Tue, 10 Mar 2026 at 12:49, Sebastian Ene <sebastianene@google.com> wrote:
> >
> > Expose two helper functions to support emulated ITS in the hypervisor.
> > These allow the KVM layer to notify the driver when hypervisor
> > initialization is complete.
> > The caller is expected to use the functions as follows:
> > 1. its_start_deprivilege(): Acquire the ITS locks.
> > 2. on_each_cpu(_kvm_host_prot_finalize, ...): Finalizes pKVM init
> > 3. its_end_deprivilege(): Shadow the ITS structures, invoke the KVM
> > callback, and release locks.
> > Specifically, this shadows the ITS command queue and the 1st level
> > indirect tables. These shadow buffers will be used by the driver after
> > host deprivilege, while the hypervisor unmaps and takes ownership of the
> > original structures.
>
> Just a note again on preferring not to use the "shadow" terminology. I
> thought about it a bit more, since these are not at the host, perhaps
> "proxy" is a better term, to convey that the host is writing to a
> middle-man buffer.
>
> Another term is "staging," which is common in DMA: the host "stages"
> the commands here, and EL2 "commits" them to the hardware.
Sure, happy to use one of the two indicated ones.
>
> >
> > Signed-off-by: Sebastian Ene <sebastianene@google.com>
> > ---
> > drivers/irqchip/irq-gic-v3-its.c | 165 +++++++++++++++++++++++++++--
> > include/linux/irqchip/arm-gic-v3.h | 24 +++++
> > 2 files changed, 178 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> > index 291d7668cc8d..278dbc56f962 100644
> > --- a/drivers/irqchip/irq-gic-v3-its.c
> > +++ b/drivers/irqchip/irq-gic-v3-its.c
> > @@ -78,17 +78,6 @@ struct its_collection {
> > u16 col_id;
> > };
> >
> > -/*
> > - * The ITS_BASER structure - contains memory information, cached
> > - * value of BASER register configuration and ITS page size.
> > - */
> > -struct its_baser {
> > - void *base;
> > - u64 val;
> > - u32 order;
> > - u32 psz;
> > -};
> > -
> > struct its_device;
> >
> > /*
> > @@ -5232,6 +5221,160 @@ static int __init its_compute_its_list_map(struct its_node *its)
> > return its_number;
> > }
> >
> > +static void its_free_shadow_tables(struct its_shadow_tables *shadow)
> > +{
> > + int i;
> > +
> > + if (shadow->cmd_shadow)
> > + its_free_pages(shadow->cmd_shadow, get_order(ITS_CMD_QUEUE_SZ));
> > +
> > + for (i = 0; i < GITS_BASER_NR_REGS; i++) {
> > + if (!shadow->tables[i].shadow)
> > + continue;
> > +
> > + its_free_pages(shadow->tables[i].shadow, 0);
> > + }
> > +
> > + its_free_pages(shadow, 0);
> > +}
> > +
> > +static struct its_shadow_tables *its_get_shadow_tables(struct its_node *its)
> > +{
> > + void *page;
> > + struct its_shadow_tables *shadow;
> > + int i;
>
> Prefer RCT declarations.
>
> > +
> > + page = its_alloc_pages_node(its->numa_node, GFP_KERNEL | __GFP_ZERO, 0);
>
> This is called with the raw_spin_lock_irqsave held, and GFP_KERNEL can
> sleep. You have one of two options, either use GFP_ATOMIC, but that's
> more likely to fail. The alternative is to move this to
> its_start_deprivilege(), before any lock is held.
>
Thanks, I will try to move the allocation before the lock.
> > + if (!page)
> > + return NULL;
> > +
> > + shadow = (void *)page_address(page);
> > + page = its_alloc_pages_node(its->numa_node,
> > + GFP_KERNEL | __GFP_ZERO,
> > + get_order(ITS_CMD_QUEUE_SZ));
> > + if (!page)
> > + goto err_alloc_shadow;
> > +
> > + shadow->cmd_shadow = page_address(page);
> > + shadow->cmdq_len = ITS_CMD_QUEUE_SZ;
> > + shadow->cmd_original = its->cmd_base;
> > +
> > + memcpy(shadow->tables, its->tables, sizeof(struct its_baser) * GITS_BASER_NR_REGS);
> > +
> > + for (i = 0; i < GITS_BASER_NR_REGS; i++) {
> > + if (!(shadow->tables[i].val & GITS_BASER_VALID))
> > + continue;
> > +
> > + if (!(shadow->tables[i].val & GITS_BASER_INDIRECT))
> > + continue;
> > +
> > + page = its_alloc_pages_node(its->numa_node,
> > + GFP_KERNEL | __GFP_ZERO,
> > + shadow->tables[i].order);
> > + if (!page)
> > + goto err_alloc_shadow;
> > +
> > + shadow->tables[i].shadow = page_address(page);
> > +
> > + memcpy(shadow->tables[i].shadow, shadow->tables[i].base,
> > + PAGE_ORDER_TO_SIZE(shadow->tables[i].order));
> > + }
> > +
> > + return shadow;
> > +
> > +err_alloc_shadow:
> > + its_free_shadow_tables(shadow);
> > + return NULL;
> > +}
> > +
> > +void *its_start_depriviledge(void)
>
> Typo here and elsewhere in this patch:
>
> s/depriviledge/deprivilege/g
>
> This is particularly important because it also appears in exported
> symbols as well (later in this patch).
>
Ack, will fix this.
> > +{
> > + struct its_node *its;
> > + int num_nodes = 0, i = 0;
> > + unsigned long *flags;
>
> RCT declaration order, and please untagle them, i.e., don't declare
> the num_nodes and the iterator in the same line.
>
Ack,
> > +
> > + raw_spin_lock(&its_lock);
> > + list_for_each_entry(its, &its_nodes, entry) {
> > + num_nodes++;
> > + }
> > +
> > + flags = kzalloc(num_nodes * sizeof(unsigned long), GFP_KERNEL_ACCOUNT);
>
> Same as the other allocation. This can sleep. I think that for this as
> well, it's better to move it before lock acquisition. Even if you use
> a different allocator, it's still better to keep the critical section
> short.
>
> > + if (!flags) {
> > + raw_spin_unlock(&its_lock);
> > + return NULL;
> > + }
> > +
> > + list_for_each_entry(its, &its_nodes, entry) {
> > + raw_spin_lock_irqsave(&its->lock, flags[i++]);
> > + }
> > +
> > + return flags;
> > +}
> > +EXPORT_SYMBOL_GPL(its_start_depriviledge);
> > +
> > +static int its_switch_to_shadow_locked(struct its_node *its, its_init_emulate init_emulate_cb)
> > +{
> > + struct its_shadow_tables *hyp_shadow, shadow;
> > + int i, ret;
> > + u64 baser, baser_phys;
> > +
> > + hyp_shadow = its_get_shadow_tables(its);
> > + if (!hyp_shadow)
> > + return -ENOMEM;
> > +
> > + memcpy(&shadow, hyp_shadow, sizeof(shadow));
> > + ret = init_emulate_cb(its->phys_base, hyp_shadow);
>
> You are performing this callback with the lock held and local
> interrupts disabled. The hvc call is byitself expensive, especially
> since it's going to do stage-2 manipulations.
>
> You should decouple the synchronous pointer swapping (which must be
> locked) from the hypervisor notification (which can be done outside
> the lock). Instead of executing the callback inside the critical
> section, its_end_deprivilege should:
> - Lock everything.
> - Perform the pointer swaps in the host driver structures.
> - Save the hyp_shadow pointers to a temporary array.
> - Unlock everything.
I am afraid you can't do that because you can have dropped commands &
timeouts between these two steps. The driver might put commands in the
swapped queue and they will timeout.
> - Loop through the temporary array and call the KVM cb to notify EL2.
>
> You should probably split this patch into two. The first patch would
> implement the freeze/unfreeze locking mechanism, and the second would
> swap the driver's internal memory pointers to the shadow structures,
> and invoke the KVM callback to lock down the real hardware.
>
> Cheers,
> /fuad
>
Thanks,
Sebastian
> > + if (ret) {
> > + its_free_shadow_tables(hyp_shadow);
> > + return ret;
> > + }
> > +
> > + /* Switch the driver command queue to use the shadow and save the original */
> > + its->cmd_write = (its->cmd_write - its->cmd_base) +
> > + (struct its_cmd_block *)shadow.cmd_shadow;
> > + its->cmd_base = shadow.cmd_shadow;
> > +
> > + /* Shadow the first level of the indirect tables */
> > + for (i = 0; i < GITS_BASER_NR_REGS; i++) {
> > + baser = shadow.tables[i].val;
> > +
> > + if (!shadow.tables[i].shadow)
> > + continue;
> > +
> > + baser_phys = virt_to_phys(shadow.tables[i].shadow);
> > + if (IS_ENABLED(CONFIG_ARM64_64K_PAGES) && (baser_phys >> 48))
> > + baser_phys = GITS_BASER_PHYS_52_to_48(baser_phys);
> > +
> > + its->tables[i].val &= ~GENMASK(47, 12);
> > + its->tables[i].val |= baser_phys;
> > + its->tables[i].base = shadow.tables[i].shadow;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +int its_end_depriviledge(int ret_pkvm_finalize, unsigned long *flags, its_init_emulate cb)
> > +{
> > + struct its_node *its;
> > + int i = 0, ret = 0;
> > +
> > + if (!flags || !cb)
> > + return -EINVAL;
> > +
> > + list_for_each_entry(its, &its_nodes, entry) {
> > + if (!ret_pkvm_finalize && !ret)
> > + ret = its_switch_to_shadow_locked(its, cb);
> > +
> > + raw_spin_unlock_irqrestore(&its->lock, flags[i++]);
> > + }
> > +
> > + kfree(flags);
> > + raw_spin_unlock(&its_lock);
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(its_end_depriviledge);
> > +
> > static int __init its_probe_one(struct its_node *its)
> > {
> > u64 baser, tmp;
> > diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
> > index 0225121f3013..40457a4375d4 100644
> > --- a/include/linux/irqchip/arm-gic-v3.h
> > +++ b/include/linux/irqchip/arm-gic-v3.h
> > @@ -657,6 +657,30 @@ static inline bool gic_enable_sre(void)
> > return !!(val & ICC_SRE_EL1_SRE);
> > }
> >
> > +/*
> > + * The ITS_BASER structure - contains memory information, cached
> > + * value of BASER register configuration and ITS page size.
> > + */
> > +struct its_baser {
> > + void *base;
> > + void *shadow;
> > + u64 val;
> > + u32 order;
> > + u32 psz;
> > +};
> > +
> > +struct its_shadow_tables {
> > + struct its_baser tables[GITS_BASER_NR_REGS];
> > + void *cmd_shadow;
> > + void *cmd_original;
> > + size_t cmdq_len;
> > +};
> > +
> > +typedef int (*its_init_emulate)(phys_addr_t its_phys_base, struct its_shadow_tables *shadow);
> > +
> > +void *its_start_depriviledge(void);
> > +int its_end_depriviledge(int ret, unsigned long *flags, its_init_emulate cb);
> > +
> > #endif
> >
> > #endif
> > --
> > 2.53.0.473.g4a7958ca14-goog
> >
next prev parent reply other threads:[~2026-03-20 15:11 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-10 12:49 [RFC PATCH 00/14] KVM: ITS hardening for pKVM Sebastian Ene
2026-03-10 12:49 ` [PATCH 01/14] KVM: arm64: Donate MMIO to the hypervisor Sebastian Ene
2026-03-12 17:57 ` Fuad Tabba
2026-03-13 10:40 ` Suzuki K Poulose
2026-03-24 10:39 ` Vincent Donnefort
2026-03-10 12:49 ` [PATCH 02/14] KVM: arm64: Track host-unmapped MMIO regions in a static array Sebastian Ene
2026-03-12 19:05 ` Fuad Tabba
2026-03-24 10:46 ` Vincent Donnefort
2026-03-10 12:49 ` [PATCH 03/14] KVM: arm64: Support host MMIO trap handlers for unmapped devices Sebastian Ene
2026-03-13 9:31 ` Fuad Tabba
2026-03-24 10:59 ` Vincent Donnefort
2026-03-10 12:49 ` [PATCH 04/14] KVM: arm64: Mediate host access to GIC/ITS MMIO via unmapping Sebastian Ene
2026-03-13 9:58 ` Fuad Tabba
2026-03-10 12:49 ` [PATCH 05/14] irqchip/gic-v3-its: Prepare shadow structures for KVM host deprivilege Sebastian Ene
2026-03-13 11:26 ` Fuad Tabba
2026-03-13 13:10 ` Fuad Tabba
2026-03-20 15:11 ` Sebastian Ene [this message]
2026-03-24 14:36 ` Fuad Tabba
2026-03-10 12:49 ` [PATCH 06/14] KVM: arm64: Add infrastructure for ITS emulation setup Sebastian Ene
2026-03-16 10:46 ` Fuad Tabba
2026-03-17 9:40 ` Fuad Tabba
2026-03-10 12:49 ` [PATCH 07/14] KVM: arm64: Restrict host access to the ITS tables Sebastian Ene
2026-03-16 16:13 ` Fuad Tabba
2026-03-10 12:49 ` [PATCH 08/14] KVM: arm64: Trap & emulate the ITS MAPD command Sebastian Ene
2026-03-17 10:20 ` Fuad Tabba
2026-03-10 12:49 ` [PATCH 09/14] KVM: arm64: Trap & emulate the ITS VMAPP command Sebastian Ene
2026-03-10 12:49 ` [PATCH 10/14] KVM: arm64: Trap & emulate the ITS MAPC command Sebastian Ene
2026-03-10 12:49 ` [PATCH 11/14] KVM: arm64: Restrict host updates to GITS_CTLR Sebastian Ene
2026-03-10 12:49 ` [PATCH 12/14] KVM: arm64: Restrict host updates to GITS_CBASER Sebastian Ene
2026-03-10 12:49 ` [PATCH 13/14] KVM: arm64: Restrict host updates to GITS_BASER Sebastian Ene
2026-03-10 12:49 ` [PATCH 14/14] KVM: arm64: Implement HVC interface for ITS emulation setup Sebastian Ene
2026-03-12 17:56 ` [RFC PATCH 00/14] KVM: ITS hardening for pKVM Fuad Tabba
2026-03-20 14:42 ` Sebastian Ene
2026-03-13 15:18 ` Mostafa Saleh
2026-03-15 13:24 ` Fuad Tabba
2026-03-25 16:26 ` Sebastian Ene
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ab1jnHvmIeh0PExw@google.com \
--to=sebastianene@google.com \
--cc=alexandru.elisei@arm.com \
--cc=android-kvm@google.com \
--cc=bgrzesik@google.com \
--cc=catalin.marinas@arm.com \
--cc=dbrazdil@google.com \
--cc=joey.gouly@arm.com \
--cc=kees@kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=maz@kernel.org \
--cc=oupton@kernel.org \
--cc=perlarsen@google.com \
--cc=qperret@google.com \
--cc=rananta@google.com \
--cc=smostafa@google.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=tglx@kernel.org \
--cc=vdonnefort@google.com \
--cc=will@kernel.org \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox