Re: [PATCH 1/3] iommu/io-pgtable-arm: Add nents_per_pgtable in struct io_pgtable_cfg

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Nicolin Chen <nicolinc@nvidia.com>
To: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>, <jgg@nvidia.com>,
	<joro@8bytes.org>, <jean-philippe@linaro.org>,
	<apopple@nvidia.com>, <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>, <iommu@lists.linux.dev>
Subject: Re: [PATCH 1/3] iommu/io-pgtable-arm: Add nents_per_pgtable in struct io_pgtable_cfg
Date: Thu, 31 Aug 2023 10:39:10 -0700	[thread overview]
Message-ID: <ZPDQPs9UL2sksblM@Asurada-Nvidia> (raw)
In-Reply-To: <20230830214958.GA30121@willie-the-truck>

On Wed, Aug 30, 2023 at 10:49:59PM +0100, Will Deacon wrote:
 
> On Tue, Aug 29, 2023 at 03:15:52PM -0700, Nicolin Chen wrote:
> > Meanwhile, by re-looking at Will's commit log:
> >     arm64: tlbi: Set MAX_TLBI_OPS to PTRS_PER_PTE
> >
> >     In order to reduce the possibility of soft lock-ups, we bound the
> >     maximum number of TLBI operations performed by a single call to
> >     flush_tlb_range() to an arbitrary constant of 1024.
> >
> >     Whilst this does the job of avoiding lock-ups, we can actually be a bit
> >     smarter by defining this as PTRS_PER_PTE. Due to the structure of our
> >     page tables, using PTRS_PER_PTE means that an outer loop calling
> >     flush_tlb_range() for entire table entries will end up performing just a
> >     single TLBI operation for each entry. As an example, mremap()ing a 1GB
> >     range mapped using 4k pages now requires only 512 TLBI operations when
> >     moving the page tables as opposed to 262144 operations (512*512) when
> >     using the current threshold of 1024.
> >
> > I found that I am actually not quite getting the calculation at the
> > end for the comparison between 512 and 262144.
> >
> > For a 4K pgsize setup, MAX_TLBI_OPS is set to 512, calculated from
> > 4096 / 8. Then, any VA range >= 2MB will trigger a flush_tlb_all().
> > By setting the threshold to 1024, the 2MB size bumps up to 4MB, i.e.
> > the condition becomes range >= 4MB.
> >
> > So, it seems to me that requesting a 1GB invalidation will trigger
> > a flush_tlb_all() in either case of having a 2MB or a 4MB threshold?
> >
> > I can get that the 262144 is the number of pages in a 1GB size, so
> > the number of per-page invalidations will be 262144 operations if
> > there was no threshold to replace with a full-as invalidation. Yet,
> > that wasn't the case since we had a 4MB threshold with an arbitrary
> > 1024 for MAX_TLBI_OPS?
> 
> I think this is because you can't always batch up the entire range as
> you'd like due to things like locking concerns. For example,
> move_page_tables() can end up invalidating 2MiB at a time, which is
> too low to trigger the old threshold and so you end up doing ever single
> pte individually.

Thanks for elaborating! So, I see now that it was about the worst
case when 1GB breaks down to 512 pieces of 2MB range invalidation
ops, i.e. 512 flush_tlb_all ops v.s. 262144 flush_tlb_range ops.

Though I have not dig enough, I assume that this worst case could
happen to SVA too, since the IOTLB invalidation is from MMU code.
But the same worst case might not happen to non-SVA pathway, i.e.
TLBI ops for IO Page Table doesn't really benefit from this?

With that being questioned, I got Robin's remarks that it wouldn't
be easy to decide the arbitrary number, so we could just take the
worst case from SVA pathway as the common threshold.

Then, SVA uses the CPU page table, so perhaps we should highlight
that SMMU sets the threshold directly reusing the MAX_TLBI_OPS of
CPU page table rather than calculating from IO page table, though
both of them are in the same format?

Thank you
Nicolin

next prev parent reply	other threads:[~2023-08-31 17:39 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-22  8:45 [PATCH 0/3] iommu/arm-smmu-v3: Reduce latency in __arm_smmu_tlb_inv_range() Nicolin Chen
2023-08-22  8:45 ` [PATCH 1/3] iommu/io-pgtable-arm: Add nents_per_pgtable in struct io_pgtable_cfg Nicolin Chen
2023-08-22  9:19   ` Robin Murphy
2023-08-22 16:42     ` Nicolin Chen
2023-08-29 15:37       ` Robin Murphy
2023-08-29 20:15         ` Nicolin Chen
2023-08-29 21:25           ` Robin Murphy
2023-08-29 22:15             ` Nicolin Chen
2023-08-30 21:49               ` Will Deacon
2023-08-31 17:39                 ` Nicolin Chen [this message]
2023-09-01  0:08                   ` Nicolin Chen
2023-09-01 18:02                     ` Robin Murphy
2023-09-01 18:23                       ` Nicolin Chen
2024-01-20 19:59             ` Nicolin Chen
2024-01-22 13:01               ` Jason Gunthorpe
2024-01-22 17:24                 ` Nicolin Chen
2024-01-22 17:57                   ` Jason Gunthorpe
2024-01-24  0:11                     ` Nicolin Chen
2024-01-25 13:55                       ` Jason Gunthorpe
2024-01-25 17:23                         ` Nicolin Chen
2024-01-25 17:47                           ` Jason Gunthorpe
2024-01-25 19:55                             ` Nicolin Chen
     [not found]                   ` <098d64da-ecf5-4a23-bff9-a04840726ef0@huawei.com>
2024-01-25  5:09                     ` Nicolin Chen
2023-08-22  8:45 ` [PATCH 2/3] iommu/arm-smmu-v3: Add an arm_smmu_tlb_inv_domain helper Nicolin Chen
2023-08-22  9:40   ` Robin Murphy
2023-08-22 17:03     ` Nicolin Chen
2023-08-29 21:54       ` Robin Murphy
2023-08-29 23:03         ` Nicolin Chen
2023-08-22  8:45 ` [PATCH 3/3] iommu/arm-smmu-v3: Add a max_tlbi_ops for __arm_smmu_tlb_inv_range() Nicolin Chen
2023-08-22  9:30   ` Robin Murphy
2023-08-22 16:32     ` Nicolin Chen
2023-08-22 23:04       ` Nicolin Chen
2023-08-29 22:40         ` Robin Murphy
2023-08-29 23:14           ` Nicolin Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZPDQPs9UL2sksblM@Asurada-Nvidia \
    --to=nicolinc@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=iommu@lists.linux.dev \
    --cc=jean-philippe@linaro.org \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).