public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sebastian Ene <sebastianene@google.com>
To: Fuad Tabba <tabba@google.com>
Cc: alexandru.elisei@arm.com, kvmarm@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, android-kvm@google.com,
	catalin.marinas@arm.com, dbrazdil@google.com, joey.gouly@arm.com,
	kees@kernel.org, mark.rutland@arm.com, maz@kernel.org,
	oupton@kernel.org, perlarsen@google.com, qperret@google.com,
	rananta@google.com, smostafa@google.com, suzuki.poulose@arm.com,
	tglx@kernel.org, vdonnefort@google.com, bgrzesik@google.com,
	will@kernel.org, yuzenghui@huawei.com
Subject: Re: [PATCH 05/14] irqchip/gic-v3-its: Prepare shadow structures for KVM host deprivilege
Date: Fri, 20 Mar 2026 15:11:24 +0000	[thread overview]
Message-ID: <ab1jnHvmIeh0PExw@google.com> (raw)
In-Reply-To: <CA+EHjTyG-rUp1M_4a=adBwqhAJ=JuHL5GaLmMBSi5o1ycaSu3A@mail.gmail.com>

On Fri, Mar 13, 2026 at 11:26:04AM +0000, Fuad Tabba wrote:

Hi Fuad,

> Hi Sebastian,
> 
> On Tue, 10 Mar 2026 at 12:49, Sebastian Ene <sebastianene@google.com> wrote:
> >
> > Expose two helper functions to support emulated ITS in the hypervisor.
> > These allow the KVM layer to notify the driver when hypervisor
> > initialization is complete.
> > The caller is expected to use the functions as follows:
> > 1. its_start_deprivilege(): Acquire the ITS locks.
> > 2. on_each_cpu(_kvm_host_prot_finalize, ...): Finalizes pKVM init
> > 3. its_end_deprivilege(): Shadow the ITS structures, invoke the KVM
> >    callback, and release locks.
> > Specifically, this shadows the ITS command queue and the 1st level
> > indirect tables. These shadow buffers will be used by the driver after
> > host deprivilege, while the hypervisor unmaps and takes ownership of the
> > original structures.
> 
> Just a note again on preferring not to use the "shadow" terminology. I
> thought about it a bit more, since these are not at the host, perhaps
> "proxy" is a better term, to convey that the host is writing to a
> middle-man buffer.
> 
> Another term is "staging," which is common in DMA: the host "stages"
> the commands here, and EL2 "commits" them to the hardware.

Sure, happy to use one of the two indicated ones.

> 
> >
> > Signed-off-by: Sebastian Ene <sebastianene@google.com>
> > ---
> >  drivers/irqchip/irq-gic-v3-its.c   | 165 +++++++++++++++++++++++++++--
> >  include/linux/irqchip/arm-gic-v3.h |  24 +++++
> >  2 files changed, 178 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> > index 291d7668cc8d..278dbc56f962 100644
> > --- a/drivers/irqchip/irq-gic-v3-its.c
> > +++ b/drivers/irqchip/irq-gic-v3-its.c
> > @@ -78,17 +78,6 @@ struct its_collection {
> >         u16                     col_id;
> >  };
> >
> > -/*
> > - * The ITS_BASER structure - contains memory information, cached
> > - * value of BASER register configuration and ITS page size.
> > - */
> > -struct its_baser {
> > -       void            *base;
> > -       u64             val;
> > -       u32             order;
> > -       u32             psz;
> > -};
> > -
> >  struct its_device;
> >
> >  /*
> > @@ -5232,6 +5221,160 @@ static int __init its_compute_its_list_map(struct its_node *its)
> >         return its_number;
> >  }
> >
> > +static void its_free_shadow_tables(struct its_shadow_tables *shadow)
> > +{
> > +       int i;
> > +
> > +       if (shadow->cmd_shadow)
> > +               its_free_pages(shadow->cmd_shadow, get_order(ITS_CMD_QUEUE_SZ));
> > +
> > +       for (i = 0; i < GITS_BASER_NR_REGS; i++) {
> > +               if (!shadow->tables[i].shadow)
> > +                       continue;
> > +
> > +               its_free_pages(shadow->tables[i].shadow, 0);
> > +       }
> > +
> > +       its_free_pages(shadow, 0);
> > +}
> > +
> > +static struct its_shadow_tables *its_get_shadow_tables(struct its_node *its)
> > +{
> > +       void *page;
> > +       struct its_shadow_tables *shadow;
> > +       int i;
> 
> Prefer RCT declarations.
> 
> > +
> > +       page = its_alloc_pages_node(its->numa_node, GFP_KERNEL | __GFP_ZERO, 0);
> 
> This is called with the raw_spin_lock_irqsave held, and GFP_KERNEL can
> sleep. You have one of two options, either use GFP_ATOMIC, but that's
> more likely to fail. The alternative is to move this to
> its_start_deprivilege(), before any lock is held.
> 

Thanks, I will try to move the allocation before the lock.

> > +       if (!page)
> > +               return NULL;
> > +
> > +       shadow = (void *)page_address(page);
> > +       page = its_alloc_pages_node(its->numa_node,
> > +                                   GFP_KERNEL | __GFP_ZERO,
> > +                                   get_order(ITS_CMD_QUEUE_SZ));
> > +       if (!page)
> > +               goto err_alloc_shadow;
> > +
> > +       shadow->cmd_shadow = page_address(page);
> > +       shadow->cmdq_len = ITS_CMD_QUEUE_SZ;
> > +       shadow->cmd_original = its->cmd_base;
> > +
> > +       memcpy(shadow->tables, its->tables, sizeof(struct its_baser) * GITS_BASER_NR_REGS);
> > +
> > +       for (i = 0; i < GITS_BASER_NR_REGS; i++) {
> > +               if (!(shadow->tables[i].val & GITS_BASER_VALID))
> > +                       continue;
> > +
> > +               if (!(shadow->tables[i].val & GITS_BASER_INDIRECT))
> > +                       continue;
> > +
> > +               page = its_alloc_pages_node(its->numa_node,
> > +                                           GFP_KERNEL | __GFP_ZERO,
> > +                                           shadow->tables[i].order);
> > +               if (!page)
> > +                       goto err_alloc_shadow;
> > +
> > +               shadow->tables[i].shadow = page_address(page);
> > +
> > +               memcpy(shadow->tables[i].shadow, shadow->tables[i].base,
> > +                      PAGE_ORDER_TO_SIZE(shadow->tables[i].order));
> > +       }
> > +
> > +       return shadow;
> > +
> > +err_alloc_shadow:
> > +       its_free_shadow_tables(shadow);
> > +       return NULL;
> > +}
> > +
> > +void *its_start_depriviledge(void)
> 
> Typo here and elsewhere in this patch:
> 
> s/depriviledge/deprivilege/g
> 
> This is particularly important because it also appears in exported
> symbols as well (later in this patch).
>

Ack, will fix this.

> > +{
> > +       struct its_node *its;
> > +       int num_nodes = 0, i = 0;
> > +       unsigned long *flags;
> 
> RCT declaration order, and please untagle them, i.e., don't declare
> the num_nodes and the iterator in the same line.
>

Ack,

> > +
> > +       raw_spin_lock(&its_lock);
> > +       list_for_each_entry(its, &its_nodes, entry) {
> > +               num_nodes++;
> > +       }
> > +
> > +       flags = kzalloc(num_nodes * sizeof(unsigned long), GFP_KERNEL_ACCOUNT);
> 
> Same as the other allocation. This can sleep. I think that for this as
> well, it's better to move it before lock acquisition. Even if you use
> a different allocator, it's still better to keep the critical section
> short.
> 
> > +       if (!flags) {
> > +               raw_spin_unlock(&its_lock);
> > +               return NULL;
> > +       }
> > +
> > +       list_for_each_entry(its, &its_nodes, entry) {
> > +               raw_spin_lock_irqsave(&its->lock, flags[i++]);
> > +       }
> > +
> > +       return flags;
> > +}
> > +EXPORT_SYMBOL_GPL(its_start_depriviledge);
> > +
> > +static int its_switch_to_shadow_locked(struct its_node *its, its_init_emulate init_emulate_cb)
> > +{
> > +       struct its_shadow_tables *hyp_shadow, shadow;
> > +       int i, ret;
> > +       u64 baser, baser_phys;
> > +
> > +       hyp_shadow = its_get_shadow_tables(its);
> > +       if (!hyp_shadow)
> > +               return -ENOMEM;
> > +
> > +       memcpy(&shadow, hyp_shadow, sizeof(shadow));
> > +       ret = init_emulate_cb(its->phys_base, hyp_shadow);
> 
> You are performing this callback with the lock held and local
> interrupts disabled. The hvc call is byitself expensive, especially
> since it's going to do stage-2 manipulations.
> 
> You should decouple the synchronous pointer swapping (which must be
> locked) from the hypervisor notification (which can be done outside
> the lock). Instead of executing the callback inside the critical
> section, its_end_deprivilege should:
> - Lock everything.
> - Perform the pointer swaps in the host driver structures.
> - Save the hyp_shadow pointers to a temporary array.
> - Unlock everything.

I am afraid you can't do that because you can have dropped commands &
timeouts between these two steps. The driver might put commands in the
swapped queue and they will timeout.

> - Loop through the temporary array and call the KVM cb to notify EL2.
> 
> You should probably split this patch into two. The first patch would
> implement the freeze/unfreeze locking mechanism, and the second would
> swap the driver's internal memory pointers to the shadow structures,
> and invoke the KVM callback to lock down the real hardware.
> 
> Cheers,
> /fuad
> 

Thanks,
Sebastian

> > +       if (ret) {
> > +               its_free_shadow_tables(hyp_shadow);
> > +               return ret;
> > +       }
> > +
> > +       /* Switch the driver command queue to use the shadow and save the original */
> > +       its->cmd_write = (its->cmd_write - its->cmd_base) +
> > +               (struct its_cmd_block *)shadow.cmd_shadow;
> > +       its->cmd_base = shadow.cmd_shadow;
> > +
> > +       /* Shadow the first level of the indirect tables */
> > +       for (i = 0; i < GITS_BASER_NR_REGS; i++) {
> > +               baser = shadow.tables[i].val;
> > +
> > +               if (!shadow.tables[i].shadow)
> > +                       continue;
> > +
> > +               baser_phys = virt_to_phys(shadow.tables[i].shadow);
> > +               if (IS_ENABLED(CONFIG_ARM64_64K_PAGES) && (baser_phys >> 48))
> > +                       baser_phys = GITS_BASER_PHYS_52_to_48(baser_phys);
> > +
> > +               its->tables[i].val &= ~GENMASK(47, 12);
> > +               its->tables[i].val |= baser_phys;
> > +               its->tables[i].base = shadow.tables[i].shadow;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +int its_end_depriviledge(int ret_pkvm_finalize, unsigned long *flags, its_init_emulate cb)
> > +{
> > +       struct its_node *its;
> > +       int i = 0, ret = 0;
> > +
> > +       if (!flags || !cb)
> > +               return -EINVAL;
> > +
> > +       list_for_each_entry(its, &its_nodes, entry) {
> > +               if (!ret_pkvm_finalize && !ret)
> > +                       ret = its_switch_to_shadow_locked(its, cb);
> > +
> > +               raw_spin_unlock_irqrestore(&its->lock, flags[i++]);
> > +       }
> > +
> > +       kfree(flags);
> > +       raw_spin_unlock(&its_lock);
> > +
> > +       return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(its_end_depriviledge);
> > +
> >  static int __init its_probe_one(struct its_node *its)
> >  {
> >         u64 baser, tmp;
> > diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
> > index 0225121f3013..40457a4375d4 100644
> > --- a/include/linux/irqchip/arm-gic-v3.h
> > +++ b/include/linux/irqchip/arm-gic-v3.h
> > @@ -657,6 +657,30 @@ static inline bool gic_enable_sre(void)
> >         return !!(val & ICC_SRE_EL1_SRE);
> >  }
> >
> > +/*
> > + * The ITS_BASER structure - contains memory information, cached
> > + * value of BASER register configuration and ITS page size.
> > + */
> > +struct its_baser {
> > +       void            *base;
> > +       void            *shadow;
> > +       u64             val;
> > +       u32             order;
> > +       u32             psz;
> > +};
> > +
> > +struct its_shadow_tables {
> > +       struct its_baser        tables[GITS_BASER_NR_REGS];
> > +       void                    *cmd_shadow;
> > +       void                    *cmd_original;
> > +       size_t                  cmdq_len;
> > +};
> > +
> > +typedef int (*its_init_emulate)(phys_addr_t its_phys_base, struct its_shadow_tables *shadow);
> > +
> > +void *its_start_depriviledge(void);
> > +int its_end_depriviledge(int ret, unsigned long *flags, its_init_emulate cb);
> > +
> >  #endif
> >
> >  #endif
> > --
> > 2.53.0.473.g4a7958ca14-goog
> >

  parent reply	other threads:[~2026-03-20 15:11 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-10 12:49 [RFC PATCH 00/14] KVM: ITS hardening for pKVM Sebastian Ene
2026-03-10 12:49 ` [PATCH 01/14] KVM: arm64: Donate MMIO to the hypervisor Sebastian Ene
2026-03-12 17:57   ` Fuad Tabba
2026-03-13 10:40   ` Suzuki K Poulose
2026-03-24 10:39   ` Vincent Donnefort
2026-03-10 12:49 ` [PATCH 02/14] KVM: arm64: Track host-unmapped MMIO regions in a static array Sebastian Ene
2026-03-12 19:05   ` Fuad Tabba
2026-03-24 10:46   ` Vincent Donnefort
2026-03-10 12:49 ` [PATCH 03/14] KVM: arm64: Support host MMIO trap handlers for unmapped devices Sebastian Ene
2026-03-13  9:31   ` Fuad Tabba
2026-03-24 10:59   ` Vincent Donnefort
2026-03-10 12:49 ` [PATCH 04/14] KVM: arm64: Mediate host access to GIC/ITS MMIO via unmapping Sebastian Ene
2026-03-13  9:58   ` Fuad Tabba
2026-03-10 12:49 ` [PATCH 05/14] irqchip/gic-v3-its: Prepare shadow structures for KVM host deprivilege Sebastian Ene
2026-03-13 11:26   ` Fuad Tabba
2026-03-13 13:10     ` Fuad Tabba
2026-03-20 15:11     ` Sebastian Ene [this message]
2026-03-24 14:36       ` Fuad Tabba
2026-03-10 12:49 ` [PATCH 06/14] KVM: arm64: Add infrastructure for ITS emulation setup Sebastian Ene
2026-03-16 10:46   ` Fuad Tabba
2026-03-17  9:40     ` Fuad Tabba
2026-03-10 12:49 ` [PATCH 07/14] KVM: arm64: Restrict host access to the ITS tables Sebastian Ene
2026-03-16 16:13   ` Fuad Tabba
2026-03-10 12:49 ` [PATCH 08/14] KVM: arm64: Trap & emulate the ITS MAPD command Sebastian Ene
2026-03-17 10:20   ` Fuad Tabba
2026-03-10 12:49 ` [PATCH 09/14] KVM: arm64: Trap & emulate the ITS VMAPP command Sebastian Ene
2026-03-10 12:49 ` [PATCH 10/14] KVM: arm64: Trap & emulate the ITS MAPC command Sebastian Ene
2026-03-10 12:49 ` [PATCH 11/14] KVM: arm64: Restrict host updates to GITS_CTLR Sebastian Ene
2026-03-10 12:49 ` [PATCH 12/14] KVM: arm64: Restrict host updates to GITS_CBASER Sebastian Ene
2026-03-10 12:49 ` [PATCH 13/14] KVM: arm64: Restrict host updates to GITS_BASER Sebastian Ene
2026-03-10 12:49 ` [PATCH 14/14] KVM: arm64: Implement HVC interface for ITS emulation setup Sebastian Ene
2026-03-12 17:56 ` [RFC PATCH 00/14] KVM: ITS hardening for pKVM Fuad Tabba
2026-03-20 14:42   ` Sebastian Ene
2026-03-13 15:18 ` Mostafa Saleh
2026-03-15 13:24   ` Fuad Tabba
2026-03-25 16:26   ` Sebastian Ene

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ab1jnHvmIeh0PExw@google.com \
    --to=sebastianene@google.com \
    --cc=alexandru.elisei@arm.com \
    --cc=android-kvm@google.com \
    --cc=bgrzesik@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=dbrazdil@google.com \
    --cc=joey.gouly@arm.com \
    --cc=kees@kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=oupton@kernel.org \
    --cc=perlarsen@google.com \
    --cc=qperret@google.com \
    --cc=rananta@google.com \
    --cc=smostafa@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=tglx@kernel.org \
    --cc=vdonnefort@google.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox