From: Mostafa Saleh <smostafa@google.com>
To: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: qemu-arm@nongnu.org, eric.auger@redhat.com,
peter.maydell@linaro.org, qemu-devel@nongnu.org,
alex.bennee@linaro.org, maz@kernel.org, nicolinc@nvidia.com,
julien@xen.org, richard.henderson@linaro.org,
marcin.juszkiewicz@linaro.org
Subject: Re: [PATCH v4 09/19] hw/arm/smmu-common: Rework TLB lookup for nesting
Date: Wed, 10 Jul 2024 08:39:33 +0000 [thread overview]
Message-ID: <Zo5Ixb3Li81fZPvQ@google.com> (raw)
In-Reply-To: <20240709171345.GC2189727@myrica>
Hi Jean,
On Tue, Jul 09, 2024 at 06:13:45PM +0100, Jean-Philippe Brucker wrote:
> On Tue, Jul 09, 2024 at 07:14:19AM +0000, Mostafa Saleh wrote:
> > Hi Jean,
> >
> > On Thu, Jul 04, 2024 at 07:12:35PM +0100, Jean-Philippe Brucker wrote:
> > > On Mon, Jul 01, 2024 at 11:02:31AM +0000, Mostafa Saleh wrote:
> > > > In the next patch, combine_tlb() will be added which combines 2 TLB
> > > > entries into one for nested translations, which chooses the granule
> > > > and level from the smallest entry.
> > > >
> > > > This means that with nested translation, an entry can be cached with
> > > > the granule of stage-2 and not stage-1.
> > > >
> > > > However, currently, the lookup for an IOVA is done with input stage
> > > > granule, which is stage-1 for nested configuration, which will not
> > > > work with the above logic.
> > > > This patch reworks lookup in that case, so it falls back to stage-2
> > > > granule if no entry is found using stage-1 granule.
> > >
> > > Why not initialize tt_combined to the minimum granule of stages 1 and 2?
> > > It looks like you introduced it for this. I'm wondering if we lookup the
> > > wrong IOVA if changing the granule size after the address is masked in
> > > smmu_translate()
> >
> > I am not sure I fully understand, but I don’t think that would work as it is
> > not guaranteed that the minimum granule is the one that would be cached,
> > as we might hit block mappings.
> >
> > The IOVA at first is masked with the first stage mask for the expected page
> > address, and the lookup logic would mask the address for each level look up,
> > so It should match the alignment of the cached page of that granule and level,
> > and as the combine logic is done with the aligned_addr it is guaranteed by
> > construction that it has to be aligned with stage-1.
>
> I missed something, this is what I had in mind initially:
>
> * s1 granule is 64k, s2 granule is 4k
> * the tlb already contains a translations for IOVA 0x30000, tg=4k
> * now we lookup IOVA 0x31000. Masked with the s1 granule, aligned_addr is
> 0x30000. Not found at first because lookup is with tg=64k, but then we
> call smmu_iotlb_lookup_all_levels() again with the s2 granule and the
> same IOVA, which returns the wrong translation
If the granules are s1=64k, s2=4k, the only way we get a cached entry as
(IOVA 0x30000, tg=4k) would be for s2 and level-3 as for level-2 it has
to be aligned with 0x200000
So when we look up for 0x31000, there is no entry for it anyway.
But I can see some problems here:
In case also s1 granule is 64k, s2 granule is 4k
- Translation A: 0x31000
- TLB is empty => PTW, entry s1 = 64k 0x30000, s2 = 4k, 0x30000 and
the cached entry would be 0x30000,tg=4k as the combine logic also
uses the aligned address
- Translation B: 0x31000 => also misses as the only cached entry
is 0x30000, 4k
I think this is actually a bug and not just a TLB inefficiency, I need
to think more about it, but my initial thought is not to align the
iova until it’s used by a stage so it can use its granule.
>
> But it's not actually possible, because if cfg->stage == SMMU_NESTED, then
> in smmu_translate() we end up with
>
> } else {
> /* Stage2. */
> tt_combined.granule_sz = cfg->s2cfg.granule_sz;
>
> So I think the condition
>
> (cfg->stage == SMMU_NESTED) && (cfg->s2cfg.granule_sz != tt->granule_sz)
>
> in this patch is never true?
>
Ah, that’s a bug, I will fix it, NESTED should use stage-1 granule.
>
> Then the following scenario:
>
> * s1 granule is 4k, s2 granule is 64k
> * we lookup IOVA A, miss. The translation gets cached with granule 4k
> * we lookup IOVA A again, but with tt->granule_sz = 64k so we'll
> never find the entry?
>
>
> I guess we want to start the lookup with the smallest granule, and then if
> the s1 and s2 granules differ, retry with the other one. Or with
> SMMU_NESTED, start with the s1 granule and keep this patch to fallback to
> s2 granule, but without masking the IOVA in smmu_translate() (it will be
> masked correctly by smmu_iotlb_lookup_all_levels()).
Thanks for pointing that out, I will think more about it but I sense
that we would need to modify where we align the iova, for translation
and lookup.
Thanks,
Mostafa
>
> Thanks,
> Jean
>
> >
> > Thanks,
> > Mostafa
> >
> > >
> > > Thanks,
> > > Jean
> > >
> > > >
> > > > Signed-off-by: Mostafa Saleh <smostafa@google.com>
> > > > ---
> > > > hw/arm/smmu-common.c | 36 ++++++++++++++++++++++++++++++++++--
> > > > 1 file changed, 34 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> > > > index 21982621c0..0840b5cffd 100644
> > > > --- a/hw/arm/smmu-common.c
> > > > +++ b/hw/arm/smmu-common.c
> > > > @@ -66,8 +66,10 @@ SMMUIOTLBKey smmu_get_iotlb_key(int asid, int vmid, uint64_t iova,
> > > > return key;
> > > > }
> > > >
> > > > -SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg,
> > > > - SMMUTransTableInfo *tt, hwaddr iova)
> > > > +static SMMUTLBEntry *smmu_iotlb_lookup_all_levels(SMMUState *bs,
> > > > + SMMUTransCfg *cfg,
> > > > + SMMUTransTableInfo *tt,
> > > > + hwaddr iova)
> > > > {
> > > > uint8_t tg = (tt->granule_sz - 10) / 2;
> > > > uint8_t inputsize = 64 - tt->tsz;
> > > > @@ -88,6 +90,36 @@ SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg,
> > > > }
> > > > level++;
> > > > }
> > > > + return entry;
> > > > +}
> > > > +
> > > > +/**
> > > > + * smmu_iotlb_lookup - Look up for a TLB entry.
> > > > + * @bs: SMMU state which includes the TLB instance
> > > > + * @cfg: Configuration of the translation
> > > > + * @tt: Translation table info (granule and tsz)
> > > > + * @iova: IOVA address to lookup
> > > > + *
> > > > + * returns a valid entry on success, otherwise NULL.
> > > > + * In case of nested translation, tt can be updated to include
> > > > + * the granule of the found entry as it might different from
> > > > + * the IOVA granule.
> > > > + */
> > > > +SMMUTLBEntry *smmu_iotlb_lookup(SMMUState *bs, SMMUTransCfg *cfg,
> > > > + SMMUTransTableInfo *tt, hwaddr iova)
> > > > +{
> > > > + SMMUTLBEntry *entry = NULL;
> > > > +
> > > > + entry = smmu_iotlb_lookup_all_levels(bs, cfg, tt, iova);
> > > > + /*
> > > > + * For nested translation also try the s2 granule, as the TLB will insert
> > > > + * it if the size of s2 tlb entry was smaller.
> > > > + */
> > > > + if (!entry && (cfg->stage == SMMU_NESTED) &&
> > > > + (cfg->s2cfg.granule_sz != tt->granule_sz)) {
> > > > + tt->granule_sz = cfg->s2cfg.granule_sz;
> > > > + entry = smmu_iotlb_lookup_all_levels(bs, cfg, tt, iova);
> > > > + }
> > > >
> > > > if (entry) {
> > > > cfg->iotlb_hits++;
> > > > --
> > > > 2.45.2.803.g4e1b14247a-goog
> > > >
next prev parent reply other threads:[~2024-07-10 8:40 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-01 11:02 [PATCH v4 00/19] SMMUv3 nested translation support Mostafa Saleh
2024-07-01 11:02 ` [PATCH v4 01/19] hw/arm/smmu-common: Add missing size check for stage-1 Mostafa Saleh
2024-07-04 17:55 ` Jean-Philippe Brucker
2024-07-01 11:02 ` [PATCH v4 02/19] hw/arm/smmu: Fix IPA for stage-2 events Mostafa Saleh
2024-07-04 17:56 ` Jean-Philippe Brucker
2024-07-01 11:02 ` [PATCH v4 03/19] hw/arm/smmuv3: Fix encoding of CLASS in events Mostafa Saleh
2024-07-04 18:02 ` Jean-Philippe Brucker
2024-07-09 7:10 ` Mostafa Saleh
2024-07-01 11:02 ` [PATCH v4 04/19] hw/arm/smmu: Use enum for SMMU stage Mostafa Saleh
2024-07-01 11:02 ` [PATCH v4 05/19] hw/arm/smmu: Split smmuv3_translate() Mostafa Saleh
2024-07-01 11:02 ` [PATCH v4 06/19] hw/arm/smmu: Consolidate ASID and VMID types Mostafa Saleh
2024-07-01 11:02 ` [PATCH v4 07/19] hw/arm/smmu: Introduce CACHED_ENTRY_TO_ADDR Mostafa Saleh
2024-07-04 18:04 ` Jean-Philippe Brucker
2024-07-08 13:27 ` Eric Auger
2024-07-01 11:02 ` [PATCH v4 08/19] hw/arm/smmuv3: Translate CD and TT using stage-2 table Mostafa Saleh
2024-07-04 18:08 ` Jean-Philippe Brucker
2024-07-08 14:54 ` Eric Auger
2024-07-09 7:12 ` Mostafa Saleh
2024-07-09 13:00 ` Jean-Philippe Brucker
2024-07-11 13:03 ` Mostafa Saleh
2024-07-01 11:02 ` [PATCH v4 09/19] hw/arm/smmu-common: Rework TLB lookup for nesting Mostafa Saleh
2024-07-04 18:12 ` Jean-Philippe Brucker
2024-07-09 7:14 ` Mostafa Saleh
2024-07-09 17:13 ` Jean-Philippe Brucker
2024-07-10 8:39 ` Mostafa Saleh [this message]
2024-07-01 11:02 ` [PATCH v4 10/19] hw/arm/smmu-common: Add support for nested TLB Mostafa Saleh
2024-07-04 18:13 ` Jean-Philippe Brucker
2024-07-08 15:06 ` Eric Auger
2024-07-01 11:02 ` [PATCH v4 11/19] hw/arm/smmu-common: Support nested translation Mostafa Saleh
2024-07-04 18:31 ` Jean-Philippe Brucker
2024-07-09 7:15 ` Mostafa Saleh
2024-07-08 15:19 ` Eric Auger
2024-07-09 7:18 ` Mostafa Saleh
2024-07-09 7:57 ` Eric Auger
2024-07-01 11:02 ` [PATCH v4 12/19] hw/arm/smmu: Support nesting in smmuv3_range_inval() Mostafa Saleh
2024-07-04 18:32 ` Jean-Philippe Brucker
2024-07-09 7:19 ` Mostafa Saleh
2024-07-01 11:02 ` [PATCH v4 13/19] hw/arm/smmu: Introduce smmu_iotlb_inv_asid_vmid Mostafa Saleh
2024-07-04 18:33 ` Jean-Philippe Brucker
2024-07-08 16:08 ` Eric Auger
2024-07-01 11:02 ` [PATCH v4 14/19] hw/arm/smmu: Support nesting in the rest of commands Mostafa Saleh
2024-07-04 18:34 ` Jean-Philippe Brucker
2024-07-08 16:15 ` Eric Auger
2024-07-01 11:02 ` [PATCH v4 15/19] hw/arm/smmuv3: Support nested SMMUs in smmuv3_notify_iova() Mostafa Saleh
2024-07-04 18:35 ` Jean-Philippe Brucker
2024-07-09 7:20 ` Mostafa Saleh
2024-07-01 11:02 ` [PATCH v4 16/19] hw/arm/smmuv3: Handle translation faults according to SMMUPTWEventInfo Mostafa Saleh
2024-07-04 18:36 ` Jean-Philippe Brucker
2024-07-08 16:30 ` Eric Auger
2024-07-01 11:02 ` [PATCH v4 17/19] hw/arm/smmuv3: Support and advertise nesting Mostafa Saleh
2024-07-04 18:36 ` Jean-Philippe Brucker
2024-07-08 16:51 ` Eric Auger
2024-07-01 11:02 ` [PATCH v4 18/19] hw/arm/smmuv3: Advertise S2FWB Mostafa Saleh
2024-07-04 18:36 ` Jean-Philippe Brucker
2024-07-08 17:09 ` Eric Auger
2024-07-09 7:22 ` Mostafa Saleh
2024-07-09 7:21 ` Mostafa Saleh
2024-07-01 11:02 ` [PATCH v4 19/19] hw/arm/smmu: Refactor SMMU OAS Mostafa Saleh
2024-07-04 18:37 ` Jean-Philippe Brucker
2024-07-08 17:30 ` [PATCH v4 00/19] SMMUv3 nested translation support Eric Auger
2024-07-08 18:13 ` Mostafa Saleh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zo5Ixb3Li81fZPvQ@google.com \
--to=smostafa@google.com \
--cc=alex.bennee@linaro.org \
--cc=eric.auger@redhat.com \
--cc=jean-philippe@linaro.org \
--cc=julien@xen.org \
--cc=marcin.juszkiewicz@linaro.org \
--cc=maz@kernel.org \
--cc=nicolinc@nvidia.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).