From: Jason Gunthorpe <jgg@nvidia.com>
To: Nicolin Chen <nicolinc@nvidia.com>
Cc: "will@kernel.org" <will@kernel.org>,
Robin Murphy <robin.murphy@arm.com>,
"joro@8bytes.org" <joro@8bytes.org>,
"jean-philippe@linaro.org" <jean-philippe@linaro.org>,
Alistair Popple <apopple@nvidia.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>
Subject: Re: [PATCH 1/3] iommu/io-pgtable-arm: Add nents_per_pgtable in struct io_pgtable_cfg
Date: Thu, 25 Jan 2024 09:55:37 -0400 [thread overview]
Message-ID: <20240125135537.GP1455070@nvidia.com> (raw)
In-Reply-To: <Za63HOMZE2fuJKQ4@Asurada-Nvidia>
On Tue, Jan 23, 2024 at 04:11:09PM -0800, Nicolin Chen wrote:
> > prevented strongly. Broadly speaking if SVA is pushing too high an
> > invalidation workload then we need to agressively trim it, and do so
> > dynamically. Certainly we should not have a tunable that has to be set
> > right to avoid soft lockup.
> >
> > A tunable to improve performance, perhaps, but not to achieve basic
> > correctness.
>
> So, should we make an optional tunable only for those who care
> about performance? Though I think having a tunable would just
> fix both issues.
When the soft lockup issue is solved you can consider if a tunable is
still interesting..
> > Maybe it is really just a simple thing - compute how many invalidation
> > commands are needed, if they don't all fit in the current queue space,
> > then do an invalidate all instead?
>
> The queue could actually have a large space. But one large-size
> invalidation would be divided into batches that have to execute
> back-to-back. And the batch size is 64 commands in 64-bit case,
> which might be too small as a cap.
Yes, some notable code reorganizing would be needed to implement
something like this
Broadly I'd sketch sort of:
- Figure out how fast the HW can execute a lot of commands
- The above should drive some XX maximum number of commands, maybe we
need to measure at boot, IDK
- Strongly time bound SVA invalidation:
* No more than XX commands, if more needed then push invalidate
all
* All commands must fit in the available queue space, if more
needed then push invalidate all
- The total queue depth must not be larger than YY based on the
retire rate so that even a full queue will complete invalidation
below the target time.
A tunable indicating what the SVA time bound target should be might be
appropriate..
Jason
WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: Nicolin Chen <nicolinc@nvidia.com>
Cc: "will@kernel.org" <will@kernel.org>,
Robin Murphy <robin.murphy@arm.com>,
"joro@8bytes.org" <joro@8bytes.org>,
"jean-philippe@linaro.org" <jean-philippe@linaro.org>,
Alistair Popple <apopple@nvidia.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>
Subject: Re: [PATCH 1/3] iommu/io-pgtable-arm: Add nents_per_pgtable in struct io_pgtable_cfg
Date: Thu, 25 Jan 2024 09:55:37 -0400 [thread overview]
Message-ID: <20240125135537.GP1455070@nvidia.com> (raw)
In-Reply-To: <Za63HOMZE2fuJKQ4@Asurada-Nvidia>
On Tue, Jan 23, 2024 at 04:11:09PM -0800, Nicolin Chen wrote:
> > prevented strongly. Broadly speaking if SVA is pushing too high an
> > invalidation workload then we need to agressively trim it, and do so
> > dynamically. Certainly we should not have a tunable that has to be set
> > right to avoid soft lockup.
> >
> > A tunable to improve performance, perhaps, but not to achieve basic
> > correctness.
>
> So, should we make an optional tunable only for those who care
> about performance? Though I think having a tunable would just
> fix both issues.
When the soft lockup issue is solved you can consider if a tunable is
still interesting..
> > Maybe it is really just a simple thing - compute how many invalidation
> > commands are needed, if they don't all fit in the current queue space,
> > then do an invalidate all instead?
>
> The queue could actually have a large space. But one large-size
> invalidation would be divided into batches that have to execute
> back-to-back. And the batch size is 64 commands in 64-bit case,
> which might be too small as a cap.
Yes, some notable code reorganizing would be needed to implement
something like this
Broadly I'd sketch sort of:
- Figure out how fast the HW can execute a lot of commands
- The above should drive some XX maximum number of commands, maybe we
need to measure at boot, IDK
- Strongly time bound SVA invalidation:
* No more than XX commands, if more needed then push invalidate
all
* All commands must fit in the available queue space, if more
needed then push invalidate all
- The total queue depth must not be larger than YY based on the
retire rate so that even a full queue will complete invalidation
below the target time.
A tunable indicating what the SVA time bound target should be might be
appropriate..
Jason
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2024-01-25 13:55 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-22 8:45 [PATCH 0/3] iommu/arm-smmu-v3: Reduce latency in __arm_smmu_tlb_inv_range() Nicolin Chen
2023-08-22 8:45 ` Nicolin Chen
2023-08-22 8:45 ` [PATCH 1/3] iommu/io-pgtable-arm: Add nents_per_pgtable in struct io_pgtable_cfg Nicolin Chen
2023-08-22 8:45 ` Nicolin Chen
2023-08-22 9:19 ` Robin Murphy
2023-08-22 9:19 ` Robin Murphy
2023-08-22 16:42 ` Nicolin Chen
2023-08-22 16:42 ` Nicolin Chen
2023-08-29 15:37 ` Robin Murphy
2023-08-29 15:37 ` Robin Murphy
2023-08-29 20:15 ` Nicolin Chen
2023-08-29 20:15 ` Nicolin Chen
2023-08-29 21:25 ` Robin Murphy
2023-08-29 21:25 ` Robin Murphy
2023-08-29 22:15 ` Nicolin Chen
2023-08-29 22:15 ` Nicolin Chen
2023-08-30 21:49 ` Will Deacon
2023-08-30 21:49 ` Will Deacon
2023-08-31 17:39 ` Nicolin Chen
2023-08-31 17:39 ` Nicolin Chen
2023-09-01 0:08 ` Nicolin Chen
2023-09-01 0:08 ` Nicolin Chen
2023-09-01 18:02 ` Robin Murphy
2023-09-01 18:02 ` Robin Murphy
2023-09-01 18:23 ` Nicolin Chen
2023-09-01 18:23 ` Nicolin Chen
2024-01-20 19:59 ` Nicolin Chen
2024-01-20 19:59 ` Nicolin Chen
2024-01-22 13:01 ` Jason Gunthorpe
2024-01-22 13:01 ` Jason Gunthorpe
2024-01-22 17:24 ` Nicolin Chen
2024-01-22 17:24 ` Nicolin Chen
2024-01-22 17:57 ` Jason Gunthorpe
2024-01-22 17:57 ` Jason Gunthorpe
2024-01-24 0:11 ` Nicolin Chen
2024-01-24 0:11 ` Nicolin Chen
2024-01-25 13:55 ` Jason Gunthorpe [this message]
2024-01-25 13:55 ` Jason Gunthorpe
2024-01-25 17:23 ` Nicolin Chen
2024-01-25 17:23 ` Nicolin Chen
2024-01-25 17:47 ` Jason Gunthorpe
2024-01-25 17:47 ` Jason Gunthorpe
2024-01-25 19:55 ` Nicolin Chen
2024-01-25 19:55 ` Nicolin Chen
[not found] ` <098d64da-ecf5-4a23-bff9-a04840726ef0@huawei.com>
2024-01-25 5:09 ` Nicolin Chen
2024-01-25 5:09 ` Nicolin Chen
2023-08-22 8:45 ` [PATCH 2/3] iommu/arm-smmu-v3: Add an arm_smmu_tlb_inv_domain helper Nicolin Chen
2023-08-22 8:45 ` Nicolin Chen
2023-08-22 9:40 ` Robin Murphy
2023-08-22 9:40 ` Robin Murphy
2023-08-22 17:03 ` Nicolin Chen
2023-08-22 17:03 ` Nicolin Chen
2023-08-29 21:54 ` Robin Murphy
2023-08-29 21:54 ` Robin Murphy
2023-08-29 23:03 ` Nicolin Chen
2023-08-29 23:03 ` Nicolin Chen
2023-08-22 8:45 ` [PATCH 3/3] iommu/arm-smmu-v3: Add a max_tlbi_ops for __arm_smmu_tlb_inv_range() Nicolin Chen
2023-08-22 8:45 ` Nicolin Chen
2023-08-22 9:30 ` Robin Murphy
2023-08-22 9:30 ` Robin Murphy
2023-08-22 16:32 ` Nicolin Chen
2023-08-22 16:32 ` Nicolin Chen
2023-08-22 23:04 ` Nicolin Chen
2023-08-22 23:04 ` Nicolin Chen
2023-08-29 22:40 ` Robin Murphy
2023-08-29 22:40 ` Robin Murphy
2023-08-29 23:14 ` Nicolin Chen
2023-08-29 23:14 ` Nicolin Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240125135537.GP1455070@nvidia.com \
--to=jgg@nvidia.com \
--cc=apopple@nvidia.com \
--cc=iommu@lists.linux.dev \
--cc=jean-philippe@linaro.org \
--cc=joro@8bytes.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nicolinc@nvidia.com \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.