From: Mostafa Saleh <smostafa@google.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
linux-arm-kernel@lists.infradead.org,
Robin Murphy <robin.murphy@arm.com>,
Will Deacon <will@kernel.org>,
Michael Shavit <mshavit@google.com>,
Nicolin Chen <nicolinc@nvidia.com>,
patches@lists.linux.dev, Ryan Roberts <ryan.roberts@arm.com>
Subject: Re: [PATCH 3/7] iommu/arm-smmu-v3: Shrink the strtab l1_desc array
Date: Tue, 4 Jun 2024 16:01:34 +0000 [thread overview]
Message-ID: <Zl86XhrWGcBTLFAl@google.com> (raw)
In-Reply-To: <3-v1-1b720dce51d1+4f44-smmuv3_tidy_jgg@nvidia.com>
Hi Jason,
On Mon, Jun 03, 2024 at 07:31:29PM -0300, Jason Gunthorpe wrote:
> The top of the 2 level stream table is (at most) 128k entries big, and two
> high order allocations are required. One of __le64 which is programmed
> into the HW (1M), and one of struct arm_smmu_strtab_l1_desc which holds
> the CPU pointer (3M).
>
> There is no reason to store the l2ptr_dma as nothing reads it. devm stores
> a copy of it and the DMA memory will be freed via devm mechanisms. span is
> a constant of 8+1. Remove both.
>
This caught my eye before, I imagine(although I was not there) there was some
thought about having different spans per SMMUs maybe, but that’s not the case.
> This removes 16 bytes from each arm_smmu_l1_ctx_desc and saves up to 2M of
> memory per iommu instance.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 14 +++++++-------
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 ---
> 2 files changed, 7 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index d27dd0600bf1df..735dd9ff61890e 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1448,12 +1448,12 @@ bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd)
>
> /* Stream table manipulation functions */
> static void
> -arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
> +arm_smmu_write_strtab_l1_desc(__le64 *dst, dma_addr_t l2ptr_dma, u8 span)
> {
> u64 val = 0;
>
> - val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span);
> - val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
> + val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, span);
> + val |= l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
>
> /* The HW has 64 bit atomicity with stores to the L2 STE table */
> WRITE_ONCE(*dst, cpu_to_le64(val));
> @@ -1655,6 +1655,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
> {
> size_t size;
> void *strtab;
> + dma_addr_t l2ptr_dma;
> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT];
>
> @@ -1664,9 +1665,8 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
> size = (1 << STRTAB_SPLIT) * sizeof(struct arm_smmu_ste);
> strtab = &cfg->strtab.l1_desc[sid >> STRTAB_SPLIT];
>
> - desc->span = STRTAB_SPLIT + 1;
> - desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma,
> - GFP_KERNEL);
> + desc->l2ptr =
> + dmam_alloc_coherent(smmu->dev, size, &l2ptr_dma, GFP_KERNEL);
> if (!desc->l2ptr) {
> dev_err(smmu->dev,
> "failed to allocate l2 stream table for SID %u\n",
> @@ -1675,7 +1675,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
> }
>
> arm_smmu_init_initial_stes(desc->l2ptr, 1 << STRTAB_SPLIT);
> - arm_smmu_write_strtab_l1_desc(strtab, desc);
> + arm_smmu_write_strtab_l1_desc(strtab, l2ptr_dma, STRTAB_SPLIT + 1);
> return 0;
> }
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 4769780259affc..280a04bfb7230c 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -577,10 +577,7 @@ struct arm_smmu_priq {
>
> /* High-level stream table and context descriptor structures */
> struct arm_smmu_strtab_l1_desc {
> - u8 span;
> -
> struct arm_smmu_ste *l2ptr;
> - dma_addr_t l2ptr_dma;
> };
>
> struct arm_smmu_ctx_desc {
> --
> 2.45.2
>
WARNING: multiple messages have this Message-ID (diff)
From: Mostafa Saleh <smostafa@google.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>,
linux-arm-kernel@lists.infradead.org,
Robin Murphy <robin.murphy@arm.com>,
Will Deacon <will@kernel.org>,
Michael Shavit <mshavit@google.com>,
Nicolin Chen <nicolinc@nvidia.com>,
patches@lists.linux.dev, Ryan Roberts <ryan.roberts@arm.com>
Subject: Re: [PATCH 3/7] iommu/arm-smmu-v3: Shrink the strtab l1_desc array
Date: Tue, 4 Jun 2024 16:01:34 +0000 [thread overview]
Message-ID: <Zl86XhrWGcBTLFAl@google.com> (raw)
In-Reply-To: <3-v1-1b720dce51d1+4f44-smmuv3_tidy_jgg@nvidia.com>
Hi Jason,
On Mon, Jun 03, 2024 at 07:31:29PM -0300, Jason Gunthorpe wrote:
> The top of the 2 level stream table is (at most) 128k entries big, and two
> high order allocations are required. One of __le64 which is programmed
> into the HW (1M), and one of struct arm_smmu_strtab_l1_desc which holds
> the CPU pointer (3M).
>
> There is no reason to store the l2ptr_dma as nothing reads it. devm stores
> a copy of it and the DMA memory will be freed via devm mechanisms. span is
> a constant of 8+1. Remove both.
>
This caught my eye before, I imagine(although I was not there) there was some
thought about having different spans per SMMUs maybe, but that’s not the case.
> This removes 16 bytes from each arm_smmu_l1_ctx_desc and saves up to 2M of
> memory per iommu instance.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 14 +++++++-------
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 ---
> 2 files changed, 7 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index d27dd0600bf1df..735dd9ff61890e 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1448,12 +1448,12 @@ bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd)
>
> /* Stream table manipulation functions */
> static void
> -arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
> +arm_smmu_write_strtab_l1_desc(__le64 *dst, dma_addr_t l2ptr_dma, u8 span)
> {
> u64 val = 0;
>
> - val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span);
> - val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
> + val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, span);
> + val |= l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
>
> /* The HW has 64 bit atomicity with stores to the L2 STE table */
> WRITE_ONCE(*dst, cpu_to_le64(val));
> @@ -1655,6 +1655,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
> {
> size_t size;
> void *strtab;
> + dma_addr_t l2ptr_dma;
> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT];
>
> @@ -1664,9 +1665,8 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
> size = (1 << STRTAB_SPLIT) * sizeof(struct arm_smmu_ste);
> strtab = &cfg->strtab.l1_desc[sid >> STRTAB_SPLIT];
>
> - desc->span = STRTAB_SPLIT + 1;
> - desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma,
> - GFP_KERNEL);
> + desc->l2ptr =
> + dmam_alloc_coherent(smmu->dev, size, &l2ptr_dma, GFP_KERNEL);
> if (!desc->l2ptr) {
> dev_err(smmu->dev,
> "failed to allocate l2 stream table for SID %u\n",
> @@ -1675,7 +1675,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
> }
>
> arm_smmu_init_initial_stes(desc->l2ptr, 1 << STRTAB_SPLIT);
> - arm_smmu_write_strtab_l1_desc(strtab, desc);
> + arm_smmu_write_strtab_l1_desc(strtab, l2ptr_dma, STRTAB_SPLIT + 1);
> return 0;
> }
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 4769780259affc..280a04bfb7230c 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -577,10 +577,7 @@ struct arm_smmu_priq {
>
> /* High-level stream table and context descriptor structures */
> struct arm_smmu_strtab_l1_desc {
> - u8 span;
> -
> struct arm_smmu_ste *l2ptr;
> - dma_addr_t l2ptr_dma;
> };
>
> struct arm_smmu_ctx_desc {
> --
> 2.45.2
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2024-06-04 16:01 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-03 22:31 [PATCH 0/7] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
2024-06-03 22:31 ` Jason Gunthorpe
2024-06-03 22:31 ` [PATCH 1/7] iommu/arm-smmu-v3: Split struct arm_smmu_strtab_cfg.strtab Jason Gunthorpe
2024-06-03 22:31 ` Jason Gunthorpe
2024-06-04 8:32 ` Nicolin Chen
2024-06-04 8:32 ` Nicolin Chen
2024-06-04 12:59 ` Jason Gunthorpe
2024-06-04 12:59 ` Jason Gunthorpe
2024-06-04 18:28 ` Nicolin Chen
2024-06-04 18:28 ` Nicolin Chen
2024-06-04 19:02 ` Jason Gunthorpe
2024-06-04 19:02 ` Jason Gunthorpe
2024-06-04 19:28 ` Nicolin Chen
2024-06-04 19:28 ` Nicolin Chen
2024-06-04 15:52 ` Mostafa Saleh
2024-06-04 15:52 ` Mostafa Saleh
2024-06-05 23:51 ` Jason Gunthorpe
2024-06-05 23:51 ` Jason Gunthorpe
2024-06-03 22:31 ` [PATCH 2/7] iommu/arm-smmu-v3: Do not zero the strtab twice Jason Gunthorpe
2024-06-03 22:31 ` Jason Gunthorpe
2024-06-04 15:56 ` Mostafa Saleh
2024-06-04 15:56 ` Mostafa Saleh
2024-06-05 21:22 ` Jason Gunthorpe
2024-06-05 21:22 ` Jason Gunthorpe
2024-06-03 22:31 ` [PATCH 3/7] iommu/arm-smmu-v3: Shrink the strtab l1_desc array Jason Gunthorpe
2024-06-03 22:31 ` Jason Gunthorpe
2024-06-04 16:01 ` Mostafa Saleh [this message]
2024-06-04 16:01 ` Mostafa Saleh
2024-06-03 22:31 ` [PATCH 4/7] iommu/arm-smmu-v3: Split struct arm_smmu_ctx_desc_cfg.cdtab Jason Gunthorpe
2024-06-03 22:31 ` Jason Gunthorpe
2024-06-04 16:07 ` Mostafa Saleh
2024-06-04 16:07 ` Mostafa Saleh
2024-06-06 23:59 ` Jason Gunthorpe
2024-06-06 23:59 ` Jason Gunthorpe
2024-06-03 22:31 ` [PATCH 5/7] iommu/arm-smmu-v3: Do not use devm for the cd table allocations Jason Gunthorpe
2024-06-03 22:31 ` Jason Gunthorpe
2024-06-03 22:31 ` [PATCH 6/7] iommu/arm-smmu-v3: Shrink the cdtab l1_desc array Jason Gunthorpe
2024-06-03 22:31 ` Jason Gunthorpe
2024-06-04 16:14 ` Mostafa Saleh
2024-06-04 16:14 ` Mostafa Saleh
2024-06-03 22:31 ` [PATCH 7/7] iommu/arm-smmu-v3: Use the new rb tree helpers Jason Gunthorpe
2024-06-03 22:31 ` Jason Gunthorpe
2024-06-04 16:22 ` Mostafa Saleh
2024-06-04 16:22 ` Mostafa Saleh
2024-06-03 22:41 ` [PATCH 0/7] Tidy some minor things in the stream table/cd table area Nicolin Chen
2024-06-03 22:41 ` Nicolin Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zl86XhrWGcBTLFAl@google.com \
--to=smostafa@google.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=mshavit@google.com \
--cc=nicolinc@nvidia.com \
--cc=patches@lists.linux.dev \
--cc=robin.murphy@arm.com \
--cc=ryan.roberts@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.