Re: [PATCH 3/3] iommu/arm-smmu-v3: Add a max_tlbi_ops for __arm_smmu_tlb_inv_range()

From: Nicolin Chen <nicolinc@nvidia.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: <will@kernel.org>, <jgg@nvidia.com>, <joro@8bytes.org>,
	<jean-philippe@linaro.org>, <apopple@nvidia.com>,
	<linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>, <iommu@lists.linux.dev>
Subject: Re: [PATCH 3/3] iommu/arm-smmu-v3: Add a max_tlbi_ops for __arm_smmu_tlb_inv_range()
Date: Tue, 29 Aug 2023 16:14:47 -0700	[thread overview]
Message-ID: <ZO5755PjMx8mofa3@Asurada-Nvidia> (raw)
In-Reply-To: <dc99bc7b-b6bc-1b82-3d8e-8e385596070b@arm.com>

On Tue, Aug 29, 2023 at 11:40:29PM +0100, Robin Murphy wrote:

> > > > > -     if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
> > > > > +     if (!(smmu->features & ARM_SMMU_FEAT_RANGE_INV)) {
> > > > > +             /*
> > > > > +              * When the size reaches a threshold, replace per-granule TLBI
> > > > > +              * commands with one single per-asid or per-vmid TLBI command.
> > > > > +              */
> > > > > +             if (size >= granule * smmu_domain->max_tlbi_ops)
> > > > > +                     return arm_smmu_tlb_inv_domain(smmu_domain);
> > > > 
> > > > This looks like it's at the wrong level - we should have figured this
> > > > out before we got as far as low-level command-building. I'd have thought
> > > > it would be a case of short-circuiting directly from
> > > > arm_smmu_tlb_inv_range_domain() to arm_smmu_tlb_inv_context().
> > > 
> > > OK, I could do that. We would have copies of this same routine
> > > though. Also, the shortcut applies to !ARM_SMMU_FEAT_RANGE_INV
> > > cases only, so this function feels convenient to me.
> > 
> > I was trying to say that we would need the same piece in both
> > arm_smmu_tlb_inv_range_domain() and arm_smmu_tlb_inv_range_asid(),
> > though the latter one only needs to call arm_smmu_tlb_inv_asid().
> 
> Its not like arm_smmu_tlb_inv_range_asid() doesn't already massively
> overlap with arm_smmu_tlb_inv_range_domain() anyway, so a little further
> duplication hardly seems like it would hurt. Checking
> ARM_SMMU_FEAT_RANGE_INV should be cheap (otherwise we'd really want to
> split __arm_smmu_tlb_inv_range() into separate RIL vs. non-RIL versions
> to avoid having it in the loop), and it makes the intent clear. What I
> just really don't like is a flow where we construct a specific command,
> then call the low-level function to issue it, only that function then
> actually jumps back out into another high-level function which
> constructs a *different* command. This code is already a maze of twisty
> little passages...

OK. That sounds convincing to me. We can do at the higher level.

> > Also, arm_smmu_tlb_inv_context() does a full range ATC invalidation
> > instead of a given range like what arm_smmu_tlb_inv_range_domain()
> > currently does. So, it might be a bit overkill.
> > 
> > Combining all your comments, we'd have something like this:
> 
> TBH I'd be inclined to refactor a bit harder, maybe break out some
> VMID-based helpers for orthogonality, and aim for a flow like:
> 
>        if (over threshold)
>                tlb_inv_domain()
>        else if (stage 1)
>                tlb_inv_range_asid()
>        else
>                tlb_inv_range_vmid()
>        atc_inv_range()
> 
> or possibly if you prefer:
> 
>        if (stage 1) {
>                if (over threshold)
>                        tlb_inv_asid()
>                else
>                        tlb_inv_range_asid()
>        } else {
>                if (over threshold)
>                        tlb_inv_vmid()
>                else
>                        tlb_inv_range_vmid()
>        }
>        atc_inv_range()
> 
> where the latter maybe trades more verbosity for less duplication
> overall - I'd probably have to try both to see which looks nicer in the
> end. And obviously if there's any chance of inventing a clear and
> consistent naming scheme in the process, that would be lovely.

Oh, I just replied you in another email, asking for a refactor,
though that didn't include the over-threshold part yet.

Anyway, I think I got your point now and will bear in mind that
we want something clean overall with less duplication among the
"inv" functions.

Let me try some rewriting. And we can see how it looks after.

Thanks
Nicolin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel