* [PATCH v2 01/10] iommu/arm-smmu-v3: Do not zero the strtab twice
2024-06-11 0:31 [PATCH v2 00/10] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
@ 2024-06-11 0:31 ` Jason Gunthorpe
2024-06-11 0:50 ` Nicolin Chen
2024-06-11 23:52 ` Daniel Mentz
2024-06-11 0:31 ` [PATCH v2 02/10] iommu/arm-smmu-v3: Shrink the strtab l1_desc array Jason Gunthorpe
` (10 subsequent siblings)
11 siblings, 2 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2024-06-11 0:31 UTC (permalink / raw)
To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
Cc: Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
dmam_alloc_coherent() already returns zero'd memory so cfg->strtab.l1_desc
(the list of DMA addresses for the L2 entries) is already zero'd.
arm_smmu_init_l1_strtab() goes through and calls
arm_smmu_write_strtab_l1_desc() on the newly allocated (and zero'd) struct
arm_smmu_strtab_l1_desc, which ends up computing 'val = 0' and zeroing it
again.
Remove arm_smmu_init_l1_strtab() and just call devm_kcalloc() from
arm_smmu_init_strtab_2lvl to allocate the companion struct.
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 +++++++--------------
1 file changed, 9 insertions(+), 20 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index ec8c7b7f4cb9c1..d56e19ca6ef6c5 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3215,25 +3215,6 @@ static int arm_smmu_init_queues(struct arm_smmu_device *smmu)
PRIQ_ENT_DWORDS, "priq");
}
-static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu)
-{
- unsigned int i;
- struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
- void *strtab = smmu->strtab_cfg.strtab;
-
- cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents,
- sizeof(*cfg->l1_desc), GFP_KERNEL);
- if (!cfg->l1_desc)
- return -ENOMEM;
-
- for (i = 0; i < cfg->num_l1_ents; ++i) {
- arm_smmu_write_strtab_l1_desc(strtab, &cfg->l1_desc[i]);
- strtab += STRTAB_L1_DESC_DWORDS << 3;
- }
-
- return 0;
-}
-
static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
{
void *strtab;
@@ -3269,7 +3250,15 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT);
cfg->strtab_base_cfg = reg;
- return arm_smmu_init_l1_strtab(smmu);
+ cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents,
+ sizeof(*cfg->l1_desc), GFP_KERNEL);
+ if (!cfg->l1_desc) {
+ dev_err(smmu->dev,
+ "failed to allocate l1 stream table (%zu bytes)\n",
+ cfg->num_l1_ents * sizeof(*cfg->l1_desc));
+ return -ENOMEM;
+ }
+ return 0;
}
static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
--
2.45.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 01/10] iommu/arm-smmu-v3: Do not zero the strtab twice
2024-06-11 0:31 ` [PATCH v2 01/10] iommu/arm-smmu-v3: Do not zero the strtab twice Jason Gunthorpe
@ 2024-06-11 0:50 ` Nicolin Chen
2024-06-11 23:52 ` Daniel Mentz
1 sibling, 0 replies; 34+ messages in thread
From: Nicolin Chen @ 2024-06-11 0:50 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, patches, Ryan Roberts, Mostafa Saleh
On Mon, Jun 10, 2024 at 09:31:10PM -0300, Jason Gunthorpe wrote:
> dmam_alloc_coherent() already returns zero'd memory so cfg->strtab.l1_desc
> (the list of DMA addresses for the L2 entries) is already zero'd.
>
> arm_smmu_init_l1_strtab() goes through and calls
> arm_smmu_write_strtab_l1_desc() on the newly allocated (and zero'd) struct
> arm_smmu_strtab_l1_desc, which ends up computing 'val = 0' and zeroing it
> again.
>
> Remove arm_smmu_init_l1_strtab() and just call devm_kcalloc() from
> arm_smmu_init_strtab_2lvl to allocate the companion struct.
>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Reviewed-by: Mostafa Saleh <smostafa@google.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 01/10] iommu/arm-smmu-v3: Do not zero the strtab twice
2024-06-11 0:31 ` [PATCH v2 01/10] iommu/arm-smmu-v3: Do not zero the strtab twice Jason Gunthorpe
2024-06-11 0:50 ` Nicolin Chen
@ 2024-06-11 23:52 ` Daniel Mentz
2024-06-12 11:46 ` Jason Gunthorpe
1 sibling, 1 reply; 34+ messages in thread
From: Daniel Mentz @ 2024-06-11 23:52 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
On Mon, Jun 10, 2024 at 5:31 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> + cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents,
> + sizeof(*cfg->l1_desc), GFP_KERNEL);
> + if (!cfg->l1_desc) {
> + dev_err(smmu->dev,
> + "failed to allocate l1 stream table (%zu bytes)\n",
> + cfg->num_l1_ents * sizeof(*cfg->l1_desc));
The error message "failed to allocate l1 stream table (%zu bytes)\n"
is identical to the one a few lines above that's printed if
dmam_alloc_coherent fails. This might make it difficult to determine
the origin of this message if someone sees it in a log file. Also,
it's not technically an SMMUv3 l1 stream table but rather some
internal data structure.
I thought there was a guideline around not printing error messages on
certain allocation failures because there'll be a generic OOM message
that'll get printed. Also, this driver never prints an error message
when devm_kcalloc or devm_kzalloc fails, only on dmam_alloc_coherent
failures.
Just nitpicking here. Looks good to me, otherwise.
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v2 01/10] iommu/arm-smmu-v3: Do not zero the strtab twice
2024-06-11 23:52 ` Daniel Mentz
@ 2024-06-12 11:46 ` Jason Gunthorpe
0 siblings, 0 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2024-06-12 11:46 UTC (permalink / raw)
To: Daniel Mentz
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
On Tue, Jun 11, 2024 at 04:52:57PM -0700, Daniel Mentz wrote:
> I thought there was a guideline around not printing error messages on
> certain allocation failures because there'll be a generic OOM message
> that'll get printed.
Oops, right, C&P mistake. I removed it.
> Also, this driver never prints an error message
> when devm_kcalloc or devm_kzalloc fails, only on dmam_alloc_coherent
> failures.
dma_alloc_coherent generates the same dump as kzalloc on ENOMEM, I
guessed the intention was to add extra logging on the high order
allocations, but this allocation is larger than the dma one and
doesn't get a print so <shrug>
I was tempted to remove those prints too..
Thanks,
Jason
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 02/10] iommu/arm-smmu-v3: Shrink the strtab l1_desc array
2024-06-11 0:31 [PATCH v2 00/10] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
2024-06-11 0:31 ` [PATCH v2 01/10] iommu/arm-smmu-v3: Do not zero the strtab twice Jason Gunthorpe
@ 2024-06-11 0:31 ` Jason Gunthorpe
2024-06-11 0:55 ` Nicolin Chen
2024-06-11 0:31 ` [PATCH v2 03/10] iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx() Jason Gunthorpe
` (9 subsequent siblings)
11 siblings, 1 reply; 34+ messages in thread
From: Jason Gunthorpe @ 2024-06-11 0:31 UTC (permalink / raw)
To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
Cc: Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
The top of the 2 level stream table is (at most) 128k entries big, and two
high order allocations are required. One of __le64 which is programmed
into the HW (1M), and one of struct arm_smmu_strtab_l1_desc which holds
the CPU pointer (3M).
There is no reason to store the l2ptr_dma as nothing reads it. devm stores
a copy of it and the DMA memory will be freed via devm mechanisms. span is
a constant of 8+1. Remove both.
This removes 16 bytes from each arm_smmu_l1_ctx_desc and saves up to 2M of
memory per iommu instance.
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 ++++++-------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 ---
2 files changed, 6 insertions(+), 10 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index d56e19ca6ef6c5..95351c134c7c45 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1447,13 +1447,12 @@ bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd)
}
/* Stream table manipulation functions */
-static void
-arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
+static void arm_smmu_write_strtab_l1_desc(__le64 *dst, dma_addr_t l2ptr_dma)
{
u64 val = 0;
- val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span);
- val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
+ val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, STRTAB_SPLIT + 1);
+ val |= l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
/* The HW has 64 bit atomicity with stores to the L2 STE table */
WRITE_ONCE(*dst, cpu_to_le64(val));
@@ -1655,6 +1654,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
{
size_t size;
void *strtab;
+ dma_addr_t l2ptr_dma;
struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT];
@@ -1664,8 +1664,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3);
strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS];
- desc->span = STRTAB_SPLIT + 1;
- desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma,
+ desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &l2ptr_dma,
GFP_KERNEL);
if (!desc->l2ptr) {
dev_err(smmu->dev,
@@ -1675,7 +1674,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
}
arm_smmu_init_initial_stes(desc->l2ptr, 1 << STRTAB_SPLIT);
- arm_smmu_write_strtab_l1_desc(strtab, desc);
+ arm_smmu_write_strtab_l1_desc(strtab, l2ptr_dma);
return 0;
}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 1242a086c9f948..087733797f9087 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -579,10 +579,7 @@ struct arm_smmu_priq {
/* High-level stream table and context descriptor structures */
struct arm_smmu_strtab_l1_desc {
- u8 span;
-
struct arm_smmu_ste *l2ptr;
- dma_addr_t l2ptr_dma;
};
struct arm_smmu_ctx_desc {
--
2.45.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 02/10] iommu/arm-smmu-v3: Shrink the strtab l1_desc array
2024-06-11 0:31 ` [PATCH v2 02/10] iommu/arm-smmu-v3: Shrink the strtab l1_desc array Jason Gunthorpe
@ 2024-06-11 0:55 ` Nicolin Chen
0 siblings, 0 replies; 34+ messages in thread
From: Nicolin Chen @ 2024-06-11 0:55 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, patches, Ryan Roberts, Mostafa Saleh
On Mon, Jun 10, 2024 at 09:31:11PM -0300, Jason Gunthorpe wrote:
> The top of the 2 level stream table is (at most) 128k entries big, and two
> high order allocations are required. One of __le64 which is programmed
> into the HW (1M), and one of struct arm_smmu_strtab_l1_desc which holds
> the CPU pointer (3M).
>
> There is no reason to store the l2ptr_dma as nothing reads it. devm stores
> a copy of it and the DMA memory will be freed via devm mechanisms. span is
> a constant of 8+1. Remove both.
>
> This removes 16 bytes from each arm_smmu_l1_ctx_desc and saves up to 2M of
> memory per iommu instance.
>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Reviewed-by: Mostafa Saleh <smostafa@google.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 03/10] iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()
2024-06-11 0:31 [PATCH v2 00/10] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
2024-06-11 0:31 ` [PATCH v2 01/10] iommu/arm-smmu-v3: Do not zero the strtab twice Jason Gunthorpe
2024-06-11 0:31 ` [PATCH v2 02/10] iommu/arm-smmu-v3: Shrink the strtab l1_desc array Jason Gunthorpe
@ 2024-06-11 0:31 ` Jason Gunthorpe
2024-06-11 2:00 ` Nicolin Chen
` (2 more replies)
2024-06-11 0:31 ` [PATCH v2 04/10] iommu/arm-smmu-v3: Add types for each level of the 2 level stream table Jason Gunthorpe
` (8 subsequent siblings)
11 siblings, 3 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2024-06-11 0:31 UTC (permalink / raw)
To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
Cc: Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
Don't open code the calculations of the indexes for each level, provide
two functions to do that math and call them in all the places. Update all
the places computing indexes.
Calculate the L1 table size directly based on the max required index from
the cap. Remove STRTAB_L1_SZ_SHIFT in favour of STRTAB_NUM_L2_STES.
Use STRTAB_NUM_L2_STES to replace remaining open coded 1 << STRTAB_SPLIT.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 53 +++++++++------------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 14 +++++-
2 files changed, 36 insertions(+), 31 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 95351c134c7c45..07b797ad832801 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1652,20 +1652,17 @@ static void arm_smmu_init_initial_stes(struct arm_smmu_ste *strtab,
static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
{
- size_t size;
- void *strtab;
dma_addr_t l2ptr_dma;
struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
- struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT];
+ struct arm_smmu_strtab_l1_desc *desc =
+ &cfg->l1_desc[arm_smmu_strtab_l1_idx(sid)];
if (desc->l2ptr)
return 0;
- size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3);
- strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS];
-
- desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &l2ptr_dma,
- GFP_KERNEL);
+ desc->l2ptr = dmam_alloc_coherent(
+ smmu->dev, STRTAB_NUM_L2_STES * sizeof(struct arm_smmu_ste),
+ &l2ptr_dma, GFP_KERNEL);
if (!desc->l2ptr) {
dev_err(smmu->dev,
"failed to allocate l2 stream table for SID %u\n",
@@ -1673,8 +1670,9 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
return -ENOMEM;
}
- arm_smmu_init_initial_stes(desc->l2ptr, 1 << STRTAB_SPLIT);
- arm_smmu_write_strtab_l1_desc(strtab, l2ptr_dma);
+ arm_smmu_init_initial_stes(desc->l2ptr, STRTAB_NUM_L2_STES);
+ arm_smmu_write_strtab_l1_desc(&cfg->strtab[arm_smmu_strtab_l1_idx(sid)],
+ l2ptr_dma);
return 0;
}
@@ -2411,12 +2409,9 @@ arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid)
struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
- unsigned int idx1, idx2;
-
/* Two-level walk */
- idx1 = (sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS;
- idx2 = sid & ((1 << STRTAB_SPLIT) - 1);
- return &cfg->l1_desc[idx1].l2ptr[idx2];
+ return &cfg->l1_desc[arm_smmu_strtab_l1_idx(sid)]
+ .l2ptr[arm_smmu_strtab_l2_idx(sid)];
} else {
/* Simple linear lookup */
return (struct arm_smmu_ste *)&cfg
@@ -2792,12 +2787,10 @@ struct arm_smmu_device *arm_smmu_get_by_fwnode(struct fwnode_handle *fwnode)
static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
{
- unsigned long limit = smmu->strtab_cfg.num_l1_ents;
-
if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
- limit *= 1UL << STRTAB_SPLIT;
-
- return sid < limit;
+ return arm_smmu_strtab_l1_idx(sid) <
+ smmu->strtab_cfg.num_l1_ents;
+ return sid < smmu->strtab_cfg.num_l1_ents;
}
static int arm_smmu_init_sid_strtab(struct arm_smmu_device *smmu, u32 sid)
@@ -3218,19 +3211,18 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
{
void *strtab;
u64 reg;
- u32 size, l1size;
+ u32 l1size;
struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+ unsigned int last_sid_idx =
+ arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
/* Calculate the L1 size, capped to the SIDSIZE. */
- size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3);
- size = min(size, smmu->sid_bits - STRTAB_SPLIT);
- cfg->num_l1_ents = 1 << size;
-
- size += STRTAB_SPLIT;
- if (size < smmu->sid_bits)
+ cfg->num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
+ if (cfg->num_l1_ents <= last_sid_idx)
dev_warn(smmu->dev,
- "2-level strtab only covers %u/%u bits of SID\n",
- size, smmu->sid_bits);
+ "2-level strtab only covers %u/%u of SIDs\n",
+ cfg->num_l1_ents * STRTAB_NUM_L2_STES,
+ 1 << smmu->sid_bits);
l1size = cfg->num_l1_ents * (STRTAB_L1_DESC_DWORDS << 3);
strtab = dmam_alloc_coherent(smmu->dev, l1size, &cfg->strtab_dma,
@@ -3245,7 +3237,8 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
/* Configure strtab_base_cfg for 2 levels */
reg = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL);
- reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, size);
+ reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE,
+ ilog2(cfg->num_l1_ents) + STRTAB_SPLIT);
reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT);
cfg->strtab_base_cfg = reg;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 087733797f9087..95c3ac8613da79 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -199,7 +199,6 @@
* 2lvl: 128k L1 entries,
* 256 lazy entries per table (each table covers a PCI bus)
*/
-#define STRTAB_L1_SZ_SHIFT 20
#define STRTAB_SPLIT 8
#define STRTAB_L1_DESC_DWORDS 1
@@ -212,6 +211,19 @@ struct arm_smmu_ste {
__le64 data[STRTAB_STE_DWORDS];
};
+#define STRTAB_NUM_L2_STES (1 << STRTAB_SPLIT)
+#define STRTAB_MAX_L1_ENTRIES (1 << 17)
+
+static inline unsigned int arm_smmu_strtab_l1_idx(unsigned int sid)
+{
+ return sid / STRTAB_NUM_L2_STES;
+}
+
+static inline unsigned int arm_smmu_strtab_l2_idx(unsigned int sid)
+{
+ return sid % STRTAB_NUM_L2_STES;
+}
+
#define STRTAB_STE_0_V (1UL << 0)
#define STRTAB_STE_0_CFG GENMASK_ULL(3, 1)
#define STRTAB_STE_0_CFG_ABORT 0
--
2.45.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 03/10] iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()
2024-06-11 0:31 ` [PATCH v2 03/10] iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx() Jason Gunthorpe
@ 2024-06-11 2:00 ` Nicolin Chen
2024-06-12 0:30 ` Daniel Mentz
2024-06-12 0:37 ` Daniel Mentz
2 siblings, 0 replies; 34+ messages in thread
From: Nicolin Chen @ 2024-06-11 2:00 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, patches, Ryan Roberts, Mostafa Saleh
On Mon, Jun 10, 2024 at 09:31:12PM -0300, Jason Gunthorpe wrote:
> Don't open code the calculations of the indexes for each level, provide
> two functions to do that math and call them in all the places. Update all
> the places computing indexes.
>
> Calculate the L1 table size directly based on the max required index from
> the cap. Remove STRTAB_L1_SZ_SHIFT in favour of STRTAB_NUM_L2_STES.
>
> Use STRTAB_NUM_L2_STES to replace remaining open coded 1 << STRTAB_SPLIT.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 03/10] iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()
2024-06-11 0:31 ` [PATCH v2 03/10] iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx() Jason Gunthorpe
2024-06-11 2:00 ` Nicolin Chen
@ 2024-06-12 0:30 ` Daniel Mentz
2024-06-12 12:09 ` Jason Gunthorpe
2024-06-12 0:37 ` Daniel Mentz
2 siblings, 1 reply; 34+ messages in thread
From: Daniel Mentz @ 2024-06-12 0:30 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
On Mon, Jun 10, 2024 at 5:31 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> dev_warn(smmu->dev,
> - "2-level strtab only covers %u/%u bits of SID\n",
> - size, smmu->sid_bits);
> + "2-level strtab only covers %u/%u of SIDs\n",
> + cfg->num_l1_ents * STRTAB_NUM_L2_STES,
> + 1 << smmu->sid_bits);
Does this mean it'll change from printing
"2-level strtab only covers 25/32 bits of SID"
to
"2-level strtab only covers 33554432/4294967296 of SIDs"?
I think that's less helpful. I would prefer printing the values in bits.
> +#define STRTAB_NUM_L2_STES (1 << STRTAB_SPLIT)
> +#define STRTAB_MAX_L1_ENTRIES (1 << 17)
Consider adding some extra white space to nicely align these #defines
with the rest of this .h file.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 03/10] iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()
2024-06-12 0:30 ` Daniel Mentz
@ 2024-06-12 12:09 ` Jason Gunthorpe
0 siblings, 0 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2024-06-12 12:09 UTC (permalink / raw)
To: Daniel Mentz
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
On Tue, Jun 11, 2024 at 05:30:53PM -0700, Daniel Mentz wrote:
> On Mon, Jun 10, 2024 at 5:31 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > dev_warn(smmu->dev,
> > - "2-level strtab only covers %u/%u bits of SID\n",
> > - size, smmu->sid_bits);
> > + "2-level strtab only covers %u/%u of SIDs\n",
> > + cfg->num_l1_ents * STRTAB_NUM_L2_STES,
> > + 1 << smmu->sid_bits);
>
> Does this mean it'll change from printing
>
> "2-level strtab only covers 25/32 bits of SID"
>
> to
>
> "2-level strtab only covers 33554432/4294967296 of SIDs"?
>
> I think that's less helpful. I would prefer printing the values in
> bits.
Sure
> > +#define STRTAB_NUM_L2_STES (1 << STRTAB_SPLIT)
> > +#define STRTAB_MAX_L1_ENTRIES (1 << 17)
>
> Consider adding some extra white space to nicely align these #defines
> with the rest of this .h file.
I presonally dislike the random vertical alignments, but sure
Thanks,
Jason
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 03/10] iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()
2024-06-11 0:31 ` [PATCH v2 03/10] iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx() Jason Gunthorpe
2024-06-11 2:00 ` Nicolin Chen
2024-06-12 0:30 ` Daniel Mentz
@ 2024-06-12 0:37 ` Daniel Mentz
2024-06-12 11:49 ` Jason Gunthorpe
2 siblings, 1 reply; 34+ messages in thread
From: Daniel Mentz @ 2024-06-12 0:37 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
On Mon, Jun 10, 2024 at 5:31 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> +static inline unsigned int arm_smmu_strtab_l1_idx(unsigned int sid)
Existing code appears to be exclusively using the data type u32 to
store StreamIDs.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 03/10] iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()
2024-06-12 0:37 ` Daniel Mentz
@ 2024-06-12 11:49 ` Jason Gunthorpe
0 siblings, 0 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2024-06-12 11:49 UTC (permalink / raw)
To: Daniel Mentz
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
On Tue, Jun 11, 2024 at 05:37:26PM -0700, Daniel Mentz wrote:
> On Mon, Jun 10, 2024 at 5:31 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> > +static inline unsigned int arm_smmu_strtab_l1_idx(unsigned int sid)
>
> Existing code appears to be exclusively using the data type u32 to
> store StreamIDs.
Not entirely exclusively, but often enough, I fixed it.
Thanks,
Jason
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 04/10] iommu/arm-smmu-v3: Add types for each level of the 2 level stream table
2024-06-11 0:31 [PATCH v2 00/10] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
` (2 preceding siblings ...)
2024-06-11 0:31 ` [PATCH v2 03/10] iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx() Jason Gunthorpe
@ 2024-06-11 0:31 ` Jason Gunthorpe
2024-06-11 2:13 ` Nicolin Chen
2024-06-11 0:31 ` [PATCH v2 05/10] iommu/arm-smmu-v3: Reorganize struct arm_smmu_strtab_cfg Jason Gunthorpe
` (7 subsequent siblings)
11 siblings, 1 reply; 34+ messages in thread
From: Jason Gunthorpe @ 2024-06-11 0:31 UTC (permalink / raw)
To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
Cc: Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
Add types struct arm_smmu_strtab_l1 and l2 to represent the HW layout of
the descriptors, and use them in most places, following patches will get
the remaing places. The size of the l1 and l2 HW allocations are
sizeof(struct arm_smmu_strtab_l1/2).
This provides some more clarity than having raw __le64 *'s and sizes
computed via macros.
Remove STRTAB_L1_DESC_DWORDS.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 +++++++++++----------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 10 ++++++++--
2 files changed, 20 insertions(+), 12 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 07b797ad832801..6643594121a2b2 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1447,7 +1447,8 @@ bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd)
}
/* Stream table manipulation functions */
-static void arm_smmu_write_strtab_l1_desc(__le64 *dst, dma_addr_t l2ptr_dma)
+static void arm_smmu_write_strtab_l1_desc(struct arm_smmu_strtab_l1 *dst,
+ dma_addr_t l2ptr_dma)
{
u64 val = 0;
@@ -1455,7 +1456,7 @@ static void arm_smmu_write_strtab_l1_desc(__le64 *dst, dma_addr_t l2ptr_dma)
val |= l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
/* The HW has 64 bit atomicity with stores to the L2 STE table */
- WRITE_ONCE(*dst, cpu_to_le64(val));
+ WRITE_ONCE(dst->l2ptr, cpu_to_le64(val));
}
struct arm_smmu_ste_writer {
@@ -1660,9 +1661,8 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
if (desc->l2ptr)
return 0;
- desc->l2ptr = dmam_alloc_coherent(
- smmu->dev, STRTAB_NUM_L2_STES * sizeof(struct arm_smmu_ste),
- &l2ptr_dma, GFP_KERNEL);
+ desc->l2ptr = dmam_alloc_coherent(smmu->dev, sizeof(*desc->l2ptr),
+ &l2ptr_dma, GFP_KERNEL);
if (!desc->l2ptr) {
dev_err(smmu->dev,
"failed to allocate l2 stream table for SID %u\n",
@@ -1670,9 +1670,11 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
return -ENOMEM;
}
- arm_smmu_init_initial_stes(desc->l2ptr, STRTAB_NUM_L2_STES);
- arm_smmu_write_strtab_l1_desc(&cfg->strtab[arm_smmu_strtab_l1_idx(sid)],
- l2ptr_dma);
+ arm_smmu_init_initial_stes(desc->l2ptr->stes, STRTAB_NUM_L2_STES);
+ arm_smmu_write_strtab_l1_desc(
+ (struct arm_smmu_strtab_l1 *)&cfg
+ ->strtab[arm_smmu_strtab_l1_idx(sid)],
+ l2ptr_dma);
return 0;
}
@@ -2411,7 +2413,7 @@ arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid)
if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
/* Two-level walk */
return &cfg->l1_desc[arm_smmu_strtab_l1_idx(sid)]
- .l2ptr[arm_smmu_strtab_l2_idx(sid)];
+ .l2ptr->stes[arm_smmu_strtab_l2_idx(sid)];
} else {
/* Simple linear lookup */
return (struct arm_smmu_ste *)&cfg
@@ -3224,7 +3226,7 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
cfg->num_l1_ents * STRTAB_NUM_L2_STES,
1 << smmu->sid_bits);
- l1size = cfg->num_l1_ents * (STRTAB_L1_DESC_DWORDS << 3);
+ l1size = cfg->num_l1_ents * sizeof(struct arm_smmu_strtab_l1);
strtab = dmam_alloc_coherent(smmu->dev, l1size, &cfg->strtab_dma,
GFP_KERNEL);
if (!strtab) {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 95c3ac8613da79..1418f21f5db6a0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -201,7 +201,6 @@
*/
#define STRTAB_SPLIT 8
-#define STRTAB_L1_DESC_DWORDS 1
#define STRTAB_L1_DESC_SPAN GENMASK_ULL(4, 0)
#define STRTAB_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 6)
@@ -212,6 +211,13 @@ struct arm_smmu_ste {
};
#define STRTAB_NUM_L2_STES (1 << STRTAB_SPLIT)
+struct arm_smmu_strtab_l2 {
+ struct arm_smmu_ste stes[STRTAB_NUM_L2_STES];
+};
+
+struct arm_smmu_strtab_l1 {
+ __le64 l2ptr;
+};
#define STRTAB_MAX_L1_ENTRIES (1 << 17)
static inline unsigned int arm_smmu_strtab_l1_idx(unsigned int sid)
@@ -591,7 +597,7 @@ struct arm_smmu_priq {
/* High-level stream table and context descriptor structures */
struct arm_smmu_strtab_l1_desc {
- struct arm_smmu_ste *l2ptr;
+ struct arm_smmu_strtab_l2 *l2ptr;
};
struct arm_smmu_ctx_desc {
--
2.45.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 04/10] iommu/arm-smmu-v3: Add types for each level of the 2 level stream table
2024-06-11 0:31 ` [PATCH v2 04/10] iommu/arm-smmu-v3: Add types for each level of the 2 level stream table Jason Gunthorpe
@ 2024-06-11 2:13 ` Nicolin Chen
0 siblings, 0 replies; 34+ messages in thread
From: Nicolin Chen @ 2024-06-11 2:13 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, patches, Ryan Roberts, Mostafa Saleh
On Mon, Jun 10, 2024 at 09:31:13PM -0300, Jason Gunthorpe wrote:
> Add types struct arm_smmu_strtab_l1 and l2 to represent the HW layout of
> the descriptors, and use them in most places, following patches will get
> the remaing places. The size of the l1 and l2 HW allocations are
> sizeof(struct arm_smmu_strtab_l1/2).
>
> This provides some more clarity than having raw __le64 *'s and sizes
> computed via macros.
>
> Remove STRTAB_L1_DESC_DWORDS.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 05/10] iommu/arm-smmu-v3: Reorganize struct arm_smmu_strtab_cfg
2024-06-11 0:31 [PATCH v2 00/10] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
` (3 preceding siblings ...)
2024-06-11 0:31 ` [PATCH v2 04/10] iommu/arm-smmu-v3: Add types for each level of the 2 level stream table Jason Gunthorpe
@ 2024-06-11 0:31 ` Jason Gunthorpe
2024-06-11 2:31 ` Nicolin Chen
2024-07-02 18:46 ` Will Deacon
2024-06-11 0:31 ` [PATCH v2 06/10] iommu/arm-smmu-v3: Remove strtab_base/cfg Jason Gunthorpe
` (6 subsequent siblings)
11 siblings, 2 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2024-06-11 0:31 UTC (permalink / raw)
To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
Cc: Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
The members here are being used for both the linear and the 2 level case,
with the meaning of each item slightly different in the two cases.
Split it into a clean union where both cases have their own struct with
their own logical names and correct types.
Adjust all the users to detect linear/2lvl and use the right sub structure
and types consistently.
Remove STRTAB_STE_DWORDS by changing the last places to use
sizeof(struct arm_smmu_ste).
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 79 ++++++++++-----------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 26 +++----
2 files changed, 51 insertions(+), 54 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 6643594121a2b2..21e0438d09b26e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1655,26 +1655,25 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
{
dma_addr_t l2ptr_dma;
struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
- struct arm_smmu_strtab_l1_desc *desc =
- &cfg->l1_desc[arm_smmu_strtab_l1_idx(sid)];
+ struct arm_smmu_strtab_l2 **l2table =
+ &cfg->l2.l2ptrs[arm_smmu_strtab_l1_idx(sid)];
- if (desc->l2ptr)
+ if (*l2table)
return 0;
- desc->l2ptr = dmam_alloc_coherent(smmu->dev, sizeof(*desc->l2ptr),
- &l2ptr_dma, GFP_KERNEL);
- if (!desc->l2ptr) {
+ *l2table = dmam_alloc_coherent(smmu->dev, sizeof(**l2table),
+ &l2ptr_dma, GFP_KERNEL);
+ if (!*l2table) {
dev_err(smmu->dev,
"failed to allocate l2 stream table for SID %u\n",
sid);
return -ENOMEM;
}
- arm_smmu_init_initial_stes(desc->l2ptr->stes, STRTAB_NUM_L2_STES);
+ arm_smmu_init_initial_stes((*l2table)->stes,
+ ARRAY_SIZE((*l2table)->stes));
arm_smmu_write_strtab_l1_desc(
- (struct arm_smmu_strtab_l1 *)&cfg
- ->strtab[arm_smmu_strtab_l1_idx(sid)],
- l2ptr_dma);
+ &cfg->l2.l1tab[arm_smmu_strtab_l1_idx(sid)], l2ptr_dma);
return 0;
}
@@ -2412,12 +2411,11 @@ arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid)
if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
/* Two-level walk */
- return &cfg->l1_desc[arm_smmu_strtab_l1_idx(sid)]
- .l2ptr->stes[arm_smmu_strtab_l2_idx(sid)];
+ return &cfg->l2.l2ptrs[arm_smmu_strtab_l1_idx(sid)]
+ ->stes[arm_smmu_strtab_l2_idx(sid)];
} else {
/* Simple linear lookup */
- return (struct arm_smmu_ste *)&cfg
- ->strtab[sid * STRTAB_STE_DWORDS];
+ return &cfg->linear.table[sid];
}
}
@@ -2791,8 +2789,8 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid)
{
if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
return arm_smmu_strtab_l1_idx(sid) <
- smmu->strtab_cfg.num_l1_ents;
- return sid < smmu->strtab_cfg.num_l1_ents;
+ smmu->strtab_cfg.l2.num_l1_ents;
+ return sid < smmu->strtab_cfg.linear.num_ents;
}
static int arm_smmu_init_sid_strtab(struct arm_smmu_device *smmu, u32 sid)
@@ -3211,7 +3209,6 @@ static int arm_smmu_init_queues(struct arm_smmu_device *smmu)
static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
{
- void *strtab;
u64 reg;
u32 l1size;
struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
@@ -3219,37 +3216,36 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
/* Calculate the L1 size, capped to the SIDSIZE. */
- cfg->num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
- if (cfg->num_l1_ents <= last_sid_idx)
+ cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
+ if (cfg->l2.num_l1_ents <= last_sid_idx)
dev_warn(smmu->dev,
"2-level strtab only covers %u/%u of SIDs\n",
- cfg->num_l1_ents * STRTAB_NUM_L2_STES,
+ cfg->l2.num_l1_ents * STRTAB_NUM_L2_STES,
1 << smmu->sid_bits);
- l1size = cfg->num_l1_ents * sizeof(struct arm_smmu_strtab_l1);
- strtab = dmam_alloc_coherent(smmu->dev, l1size, &cfg->strtab_dma,
- GFP_KERNEL);
- if (!strtab) {
+ l1size = cfg->l2.num_l1_ents * sizeof(struct arm_smmu_strtab_l1);
+ cfg->l2.l1tab = dmam_alloc_coherent(smmu->dev, l1size, &cfg->l2.l1_dma,
+ GFP_KERNEL);
+ if (!cfg->l2.l1tab) {
dev_err(smmu->dev,
"failed to allocate l1 stream table (%u bytes)\n",
l1size);
return -ENOMEM;
}
- cfg->strtab = strtab;
/* Configure strtab_base_cfg for 2 levels */
reg = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL);
reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE,
- ilog2(cfg->num_l1_ents) + STRTAB_SPLIT);
+ ilog2(cfg->l2.num_l1_ents) + STRTAB_SPLIT);
reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT);
cfg->strtab_base_cfg = reg;
- cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents,
- sizeof(*cfg->l1_desc), GFP_KERNEL);
- if (!cfg->l1_desc) {
+ cfg->l2.l2ptrs = devm_kcalloc(smmu->dev, cfg->l2.num_l1_ents,
+ sizeof(*cfg->l2.l2ptrs), GFP_KERNEL);
+ if (!cfg->l2.l2ptrs) {
dev_err(smmu->dev,
"failed to allocate l1 stream table (%zu bytes)\n",
- cfg->num_l1_ents * sizeof(*cfg->l1_desc));
+ cfg->l2.num_l1_ents * sizeof(*cfg->l2.l2ptrs));
return -ENOMEM;
}
return 0;
@@ -3257,29 +3253,27 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
{
- void *strtab;
u64 reg;
u32 size;
struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
- size = (1 << smmu->sid_bits) * (STRTAB_STE_DWORDS << 3);
- strtab = dmam_alloc_coherent(smmu->dev, size, &cfg->strtab_dma,
- GFP_KERNEL);
- if (!strtab) {
+ size = (1 << smmu->sid_bits) * sizeof(struct arm_smmu_ste);
+ cfg->linear.table = dmam_alloc_coherent(
+ smmu->dev, size, &cfg->linear.ste_dma, GFP_KERNEL);
+ if (!cfg->linear.table) {
dev_err(smmu->dev,
"failed to allocate linear stream table (%u bytes)\n",
size);
return -ENOMEM;
}
- cfg->strtab = strtab;
- cfg->num_l1_ents = 1 << smmu->sid_bits;
+ cfg->linear.num_ents = 1 << smmu->sid_bits;
/* Configure strtab_base_cfg for a linear table covering all SIDs */
reg = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_LINEAR);
reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits);
cfg->strtab_base_cfg = reg;
- arm_smmu_init_initial_stes(strtab, cfg->num_l1_ents);
+ arm_smmu_init_initial_stes(cfg->linear.table, cfg->linear.num_ents);
return 0;
}
@@ -3288,16 +3282,17 @@ static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
u64 reg;
int ret;
- if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
+ if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
ret = arm_smmu_init_strtab_2lvl(smmu);
- else
+ reg = smmu->strtab_cfg.l2.l1_dma & STRTAB_BASE_ADDR_MASK;
+ } else {
ret = arm_smmu_init_strtab_linear(smmu);
-
+ reg = smmu->strtab_cfg.linear.ste_dma & STRTAB_BASE_ADDR_MASK;
+ }
if (ret)
return ret;
/* Set the strtab base address */
- reg = smmu->strtab_cfg.strtab_dma & STRTAB_BASE_ADDR_MASK;
reg |= STRTAB_BASE_RA;
smmu->strtab_cfg.strtab_base = reg;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 1418f21f5db6a0..8b58a30ebeb06b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -204,10 +204,8 @@
#define STRTAB_L1_DESC_SPAN GENMASK_ULL(4, 0)
#define STRTAB_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 6)
-#define STRTAB_STE_DWORDS 8
-
struct arm_smmu_ste {
- __le64 data[STRTAB_STE_DWORDS];
+ __le64 data[8];
};
#define STRTAB_NUM_L2_STES (1 << STRTAB_SPLIT)
@@ -596,10 +594,6 @@ struct arm_smmu_priq {
};
/* High-level stream table and context descriptor structures */
-struct arm_smmu_strtab_l1_desc {
- struct arm_smmu_strtab_l2 *l2ptr;
-};
-
struct arm_smmu_ctx_desc {
u16 asid;
@@ -627,11 +621,19 @@ struct arm_smmu_s2_cfg {
};
struct arm_smmu_strtab_cfg {
- __le64 *strtab;
- dma_addr_t strtab_dma;
- struct arm_smmu_strtab_l1_desc *l1_desc;
- unsigned int num_l1_ents;
-
+ union {
+ struct {
+ struct arm_smmu_ste *table;
+ dma_addr_t ste_dma;
+ unsigned int num_ents;
+ } linear;
+ struct {
+ struct arm_smmu_strtab_l1 *l1tab;
+ struct arm_smmu_strtab_l2 **l2ptrs;
+ dma_addr_t l1_dma;
+ unsigned int num_l1_ents;
+ } l2;
+ };
u64 strtab_base;
u32 strtab_base_cfg;
};
--
2.45.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 05/10] iommu/arm-smmu-v3: Reorganize struct arm_smmu_strtab_cfg
2024-06-11 0:31 ` [PATCH v2 05/10] iommu/arm-smmu-v3: Reorganize struct arm_smmu_strtab_cfg Jason Gunthorpe
@ 2024-06-11 2:31 ` Nicolin Chen
2024-07-02 18:46 ` Will Deacon
1 sibling, 0 replies; 34+ messages in thread
From: Nicolin Chen @ 2024-06-11 2:31 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, patches, Ryan Roberts, Mostafa Saleh
On Mon, Jun 10, 2024 at 09:31:14PM -0300, Jason Gunthorpe wrote:
> The members here are being used for both the linear and the 2 level case,
> with the meaning of each item slightly different in the two cases.
>
> Split it into a clean union where both cases have their own struct with
> their own logical names and correct types.
>
> Adjust all the users to detect linear/2lvl and use the right sub structure
> and types consistently.
>
> Remove STRTAB_STE_DWORDS by changing the last places to use
> sizeof(struct arm_smmu_ste).
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 05/10] iommu/arm-smmu-v3: Reorganize struct arm_smmu_strtab_cfg
2024-06-11 0:31 ` [PATCH v2 05/10] iommu/arm-smmu-v3: Reorganize struct arm_smmu_strtab_cfg Jason Gunthorpe
2024-06-11 2:31 ` Nicolin Chen
@ 2024-07-02 18:46 ` Will Deacon
2024-07-09 19:42 ` Jason Gunthorpe
1 sibling, 1 reply; 34+ messages in thread
From: Will Deacon @ 2024-07-02 18:46 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy,
Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
On Mon, Jun 10, 2024 at 09:31:14PM -0300, Jason Gunthorpe wrote:
> The members here are being used for both the linear and the 2 level case,
> with the meaning of each item slightly different in the two cases.
>
> Split it into a clean union where both cases have their own struct with
> their own logical names and correct types.
>
> Adjust all the users to detect linear/2lvl and use the right sub structure
> and types consistently.
>
> Remove STRTAB_STE_DWORDS by changing the last places to use
> sizeof(struct arm_smmu_ste).
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 79 ++++++++++-----------
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 26 +++----
> 2 files changed, 51 insertions(+), 54 deletions(-)
[...]
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 1418f21f5db6a0..8b58a30ebeb06b 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -204,10 +204,8 @@
> #define STRTAB_L1_DESC_SPAN GENMASK_ULL(4, 0)
> #define STRTAB_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 6)
>
> -#define STRTAB_STE_DWORDS 8
> -
> struct arm_smmu_ste {
> - __le64 data[STRTAB_STE_DWORDS];
> + __le64 data[8];
> };
I'm not seeing the improvement here.
Will
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v2 05/10] iommu/arm-smmu-v3: Reorganize struct arm_smmu_strtab_cfg
2024-07-02 18:46 ` Will Deacon
@ 2024-07-09 19:42 ` Jason Gunthorpe
0 siblings, 0 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2024-07-09 19:42 UTC (permalink / raw)
To: Will Deacon
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy,
Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
On Tue, Jul 02, 2024 at 07:46:16PM +0100, Will Deacon wrote:
> On Mon, Jun 10, 2024 at 09:31:14PM -0300, Jason Gunthorpe wrote:
> > The members here are being used for both the linear and the 2 level case,
> > with the meaning of each item slightly different in the two cases.
> >
> > Split it into a clean union where both cases have their own struct with
> > their own logical names and correct types.
> >
> > Adjust all the users to detect linear/2lvl and use the right sub structure
> > and types consistently.
> >
> > Remove STRTAB_STE_DWORDS by changing the last places to use
> > sizeof(struct arm_smmu_ste).
> >
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > ---
> > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 79 ++++++++++-----------
> > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 26 +++----
> > 2 files changed, 51 insertions(+), 54 deletions(-)
>
> [...]
>
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> > index 1418f21f5db6a0..8b58a30ebeb06b 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> > @@ -204,10 +204,8 @@
> > #define STRTAB_L1_DESC_SPAN GENMASK_ULL(4, 0)
> > #define STRTAB_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 6)
> >
> > -#define STRTAB_STE_DWORDS 8
> > -
> > struct arm_smmu_ste {
> > - __le64 data[STRTAB_STE_DWORDS];
> > + __le64 data[8];
> > };
>
> I'm not seeing the improvement here.
Are you just asking to keep STRTAB_STE_DWORDS to use only for data[]?
I seem to recall a reviewer asked for this - but it can stay with no
issue. Let me know.
Jason
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 06/10] iommu/arm-smmu-v3: Remove strtab_base/cfg
2024-06-11 0:31 [PATCH v2 00/10] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
` (4 preceding siblings ...)
2024-06-11 0:31 ` [PATCH v2 05/10] iommu/arm-smmu-v3: Reorganize struct arm_smmu_strtab_cfg Jason Gunthorpe
@ 2024-06-11 0:31 ` Jason Gunthorpe
2024-06-11 2:36 ` Nicolin Chen
2024-06-11 0:31 ` [PATCH v2 07/10] iommu/arm-smmu-v3: Do not use devm for the cd table allocations Jason Gunthorpe
` (5 subsequent siblings)
11 siblings, 1 reply; 34+ messages in thread
From: Jason Gunthorpe @ 2024-06-11 0:31 UTC (permalink / raw)
To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
Cc: Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
These values can be computed from the other values already stored in the
config. Move the calculation to arm_smmu_write_strtab() and do it directly
before writing the registers.
This moves all the logic to calculate the two registers into one function
from three and saves an unimportant 16 bytes from the arm_smmu_device.
Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 55 ++++++++++-----------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 -
2 files changed, 27 insertions(+), 30 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 21e0438d09b26e..b36bf6bed67d5f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3209,7 +3209,6 @@ static int arm_smmu_init_queues(struct arm_smmu_device *smmu)
static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
{
- u64 reg;
u32 l1size;
struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
unsigned int last_sid_idx =
@@ -3233,13 +3232,6 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
return -ENOMEM;
}
- /* Configure strtab_base_cfg for 2 levels */
- reg = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL);
- reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE,
- ilog2(cfg->l2.num_l1_ents) + STRTAB_SPLIT);
- reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT);
- cfg->strtab_base_cfg = reg;
-
cfg->l2.l2ptrs = devm_kcalloc(smmu->dev, cfg->l2.num_l1_ents,
sizeof(*cfg->l2.l2ptrs), GFP_KERNEL);
if (!cfg->l2.l2ptrs) {
@@ -3253,7 +3245,6 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
{
- u64 reg;
u32 size;
struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
@@ -3268,34 +3259,21 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
}
cfg->linear.num_ents = 1 << smmu->sid_bits;
- /* Configure strtab_base_cfg for a linear table covering all SIDs */
- reg = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_LINEAR);
- reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits);
- cfg->strtab_base_cfg = reg;
-
arm_smmu_init_initial_stes(cfg->linear.table, cfg->linear.num_ents);
return 0;
}
static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
{
- u64 reg;
int ret;
- if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
+ if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
ret = arm_smmu_init_strtab_2lvl(smmu);
- reg = smmu->strtab_cfg.l2.l1_dma & STRTAB_BASE_ADDR_MASK;
- } else {
+ else
ret = arm_smmu_init_strtab_linear(smmu);
- reg = smmu->strtab_cfg.linear.ste_dma & STRTAB_BASE_ADDR_MASK;
- }
if (ret)
return ret;
- /* Set the strtab base address */
- reg |= STRTAB_BASE_RA;
- smmu->strtab_cfg.strtab_base = reg;
-
ida_init(&smmu->vmid_map);
return 0;
@@ -3504,6 +3482,30 @@ static int arm_smmu_device_disable(struct arm_smmu_device *smmu)
return ret;
}
+static void arm_smmu_write_strtab(struct arm_smmu_device *smmu)
+{
+ struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+ dma_addr_t dma;
+ u32 reg;
+
+ if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
+ reg = FIELD_PREP(STRTAB_BASE_CFG_FMT,
+ STRTAB_BASE_CFG_FMT_2LVL) |
+ FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE,
+ ilog2(cfg->l2.num_l1_ents) + STRTAB_SPLIT) |
+ FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT);
+ dma = cfg->l2.l1_dma;
+ } else {
+ reg = FIELD_PREP(STRTAB_BASE_CFG_FMT,
+ STRTAB_BASE_CFG_FMT_LINEAR) |
+ FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits);
+ dma = cfg->linear.ste_dma;
+ }
+ writeq_relaxed((dma & STRTAB_BASE_ADDR_MASK) | STRTAB_BASE_RA,
+ smmu->base + ARM_SMMU_STRTAB_BASE);
+ writel_relaxed(reg, smmu->base + ARM_SMMU_STRTAB_BASE_CFG);
+}
+
static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
{
int ret;
@@ -3539,10 +3541,7 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
writel_relaxed(reg, smmu->base + ARM_SMMU_CR2);
/* Stream table */
- writeq_relaxed(smmu->strtab_cfg.strtab_base,
- smmu->base + ARM_SMMU_STRTAB_BASE);
- writel_relaxed(smmu->strtab_cfg.strtab_base_cfg,
- smmu->base + ARM_SMMU_STRTAB_BASE_CFG);
+ arm_smmu_write_strtab(smmu);
/* Command queue */
writeq_relaxed(smmu->cmdq.q.q_base, smmu->base + ARM_SMMU_CMDQ_BASE);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 8b58a30ebeb06b..3862f6e65c770e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -634,8 +634,6 @@ struct arm_smmu_strtab_cfg {
unsigned int num_l1_ents;
} l2;
};
- u64 strtab_base;
- u32 strtab_base_cfg;
};
/* An SMMUv3 instance */
--
2.45.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 06/10] iommu/arm-smmu-v3: Remove strtab_base/cfg
2024-06-11 0:31 ` [PATCH v2 06/10] iommu/arm-smmu-v3: Remove strtab_base/cfg Jason Gunthorpe
@ 2024-06-11 2:36 ` Nicolin Chen
0 siblings, 0 replies; 34+ messages in thread
From: Nicolin Chen @ 2024-06-11 2:36 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, patches, Ryan Roberts, Mostafa Saleh
On Mon, Jun 10, 2024 at 09:31:15PM -0300, Jason Gunthorpe wrote:
> These values can be computed from the other values already stored in the
> config. Move the calculation to arm_smmu_write_strtab() and do it directly
> before writing the registers.
>
> This moves all the logic to calculate the two registers into one function
> from three and saves an unimportant 16 bytes from the arm_smmu_device.
>
> Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 07/10] iommu/arm-smmu-v3: Do not use devm for the cd table allocations
2024-06-11 0:31 [PATCH v2 00/10] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
` (5 preceding siblings ...)
2024-06-11 0:31 ` [PATCH v2 06/10] iommu/arm-smmu-v3: Remove strtab_base/cfg Jason Gunthorpe
@ 2024-06-11 0:31 ` Jason Gunthorpe
2024-06-11 2:39 ` Nicolin Chen
2024-06-11 0:31 ` [PATCH v2 08/10] iommu/arm-smmu-v3: Shrink the cdtab l1_desc array Jason Gunthorpe
` (4 subsequent siblings)
11 siblings, 1 reply; 34+ messages in thread
From: Jason Gunthorpe @ 2024-06-11 0:31 UTC (permalink / raw)
To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
Cc: Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
The master->cd_table is entirely contained within the struct
arm_smmu_master which is guaranteed to be freed by the core code under
arm_smmu_release_device().
There is no reason to use devm here, arm_smmu_free_cd_tables() is reliably
called to free the CD related memory. Remove it and save some memory.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 30 ++++++++++-----------
1 file changed, 14 insertions(+), 16 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index b36bf6bed67d5f..6245e2558e6a6a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1172,8 +1172,8 @@ static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu,
{
size_t size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3);
- l1_desc->l2ptr = dmam_alloc_coherent(smmu->dev, size,
- &l1_desc->l2ptr_dma, GFP_KERNEL);
+ l1_desc->l2ptr = dma_alloc_coherent(smmu->dev, size,
+ &l1_desc->l2ptr_dma, GFP_KERNEL);
if (!l1_desc->l2ptr) {
dev_warn(smmu->dev,
"failed to allocate context descriptor table\n");
@@ -1372,17 +1372,17 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
cd_table->num_l1_ents = DIV_ROUND_UP(max_contexts,
CTXDESC_L2_ENTRIES);
- cd_table->l1_desc = devm_kcalloc(smmu->dev, cd_table->num_l1_ents,
- sizeof(*cd_table->l1_desc),
- GFP_KERNEL);
+ cd_table->l1_desc = kcalloc(cd_table->num_l1_ents,
+ sizeof(*cd_table->l1_desc),
+ GFP_KERNEL);
if (!cd_table->l1_desc)
return -ENOMEM;
l1size = cd_table->num_l1_ents * (CTXDESC_L1_DESC_DWORDS << 3);
}
- cd_table->cdtab = dmam_alloc_coherent(smmu->dev, l1size, &cd_table->cdtab_dma,
- GFP_KERNEL);
+ cd_table->cdtab = dma_alloc_coherent(smmu->dev, l1size,
+ &cd_table->cdtab_dma, GFP_KERNEL);
if (!cd_table->cdtab) {
dev_warn(smmu->dev, "failed to allocate context descriptor\n");
ret = -ENOMEM;
@@ -1393,7 +1393,7 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
err_free_l1:
if (cd_table->l1_desc) {
- devm_kfree(smmu->dev, cd_table->l1_desc);
+ kfree(cd_table->l1_desc);
cd_table->l1_desc = NULL;
}
return ret;
@@ -1413,21 +1413,19 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_master *master)
if (!cd_table->l1_desc[i].l2ptr)
continue;
- dmam_free_coherent(smmu->dev, size,
- cd_table->l1_desc[i].l2ptr,
- cd_table->l1_desc[i].l2ptr_dma);
+ dma_free_coherent(smmu->dev, size,
+ cd_table->l1_desc[i].l2ptr,
+ cd_table->l1_desc[i].l2ptr_dma);
}
- devm_kfree(smmu->dev, cd_table->l1_desc);
- cd_table->l1_desc = NULL;
+ kfree(cd_table->l1_desc);
l1size = cd_table->num_l1_ents * (CTXDESC_L1_DESC_DWORDS << 3);
} else {
l1size = cd_table->num_l1_ents * (CTXDESC_CD_DWORDS << 3);
}
- dmam_free_coherent(smmu->dev, l1size, cd_table->cdtab, cd_table->cdtab_dma);
- cd_table->cdtab_dma = 0;
- cd_table->cdtab = NULL;
+ dma_free_coherent(smmu->dev, l1size, cd_table->cdtab,
+ cd_table->cdtab_dma);
}
bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd)
--
2.45.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 07/10] iommu/arm-smmu-v3: Do not use devm for the cd table allocations
2024-06-11 0:31 ` [PATCH v2 07/10] iommu/arm-smmu-v3: Do not use devm for the cd table allocations Jason Gunthorpe
@ 2024-06-11 2:39 ` Nicolin Chen
0 siblings, 0 replies; 34+ messages in thread
From: Nicolin Chen @ 2024-06-11 2:39 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, patches, Ryan Roberts, Mostafa Saleh
On Mon, Jun 10, 2024 at 09:31:16PM -0300, Jason Gunthorpe wrote:
> The master->cd_table is entirely contained within the struct
> arm_smmu_master which is guaranteed to be freed by the core code under
> arm_smmu_release_device().
>
> There is no reason to use devm here, arm_smmu_free_cd_tables() is reliably
> called to free the CD related memory. Remove it and save some memory.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 08/10] iommu/arm-smmu-v3: Shrink the cdtab l1_desc array
2024-06-11 0:31 [PATCH v2 00/10] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
` (6 preceding siblings ...)
2024-06-11 0:31 ` [PATCH v2 07/10] iommu/arm-smmu-v3: Do not use devm for the cd table allocations Jason Gunthorpe
@ 2024-06-11 0:31 ` Jason Gunthorpe
2024-06-11 3:02 ` Nicolin Chen
2024-07-02 18:46 ` Will Deacon
2024-06-11 0:31 ` [PATCH v2 09/10] iommu/arm-smmu-v3: Add types for each level of the CD table Jason Gunthorpe
` (3 subsequent siblings)
11 siblings, 2 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2024-06-11 0:31 UTC (permalink / raw)
To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
Cc: Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
The top of the 2 level CD table is (at most) 1024 entries big, and two
high order allocations are required. One of __le64 which is programmed
into the HW (8k) and one of struct arm_smmu_l1_ctx_desc which holds the
CPU pointer (16k).
There are two copies of the l2ptr_dma, one is stored in the struct
arm_smmu_l1_ctx_desc, and another is encoded in the __le64 for the HW to
use. Instead of storing two copies just decode the value from the __le64.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 41 +++++++++------------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 -
2 files changed, 17 insertions(+), 25 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 6245e2558e6a6a..dd65e27aebafd4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1167,31 +1167,19 @@ static void arm_smmu_sync_cd(struct arm_smmu_master *master,
arm_smmu_cmdq_batch_submit(smmu, &cmds);
}
-static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu,
- struct arm_smmu_l1_ctx_desc *l1_desc)
+static void arm_smmu_write_cd_l1_desc(__le64 *dst, dma_addr_t l2ptr_dma)
{
- size_t size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3);
-
- l1_desc->l2ptr = dma_alloc_coherent(smmu->dev, size,
- &l1_desc->l2ptr_dma, GFP_KERNEL);
- if (!l1_desc->l2ptr) {
- dev_warn(smmu->dev,
- "failed to allocate context descriptor table\n");
- return -ENOMEM;
- }
- return 0;
-}
-
-static void arm_smmu_write_cd_l1_desc(__le64 *dst,
- struct arm_smmu_l1_ctx_desc *l1_desc)
-{
- u64 val = (l1_desc->l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) |
- CTXDESC_L1_DESC_V;
+ u64 val = (l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) | CTXDESC_L1_DESC_V;
/* The HW has 64 bit atomicity with stores to the L2 CD table */
WRITE_ONCE(*dst, cpu_to_le64(val));
}
+static dma_addr_t arm_smmu_cd_l1_get_desc(const __le64 *src)
+{
+ return le64_to_cpu(src) & CTXDESC_L1_DESC_L2PTR_MASK;
+}
+
struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
u32 ssid)
{
@@ -1231,13 +1219,17 @@ struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
l1_desc = &cd_table->l1_desc[idx];
if (!l1_desc->l2ptr) {
- __le64 *l1ptr;
+ dma_addr_t l2ptr_dma;
- if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc))
+ l1_desc->l2ptr = dma_alloc_coherent(
+ smmu->dev,
+ CTXDESC_L2_ENTRIES * sizeof(struct arm_smmu_cd),
+ &l2ptr_dma, GFP_KERNEL);
+ if (!l1_desc->l2ptr)
return NULL;
- l1ptr = cd_table->cdtab + idx * CTXDESC_L1_DESC_DWORDS;
- arm_smmu_write_cd_l1_desc(l1ptr, l1_desc);
+ arm_smmu_write_cd_l1_desc(&cd_table->cdtab[idx],
+ l2ptr_dma);
/* An invalid L1CD can be cached */
arm_smmu_sync_cd(master, ssid, false);
}
@@ -1415,7 +1407,8 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_master *master)
dma_free_coherent(smmu->dev, size,
cd_table->l1_desc[i].l2ptr,
- cd_table->l1_desc[i].l2ptr_dma);
+ arm_smmu_cd_l1_get_desc(
+ &cd_table->cdtab[i]));
}
kfree(cd_table->l1_desc);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 3862f6e65c770e..2f7d70d92b1f31 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -603,7 +603,6 @@ struct arm_smmu_ctx_desc {
struct arm_smmu_l1_ctx_desc {
struct arm_smmu_cd *l2ptr;
- dma_addr_t l2ptr_dma;
};
struct arm_smmu_ctx_desc_cfg {
--
2.45.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 08/10] iommu/arm-smmu-v3: Shrink the cdtab l1_desc array
2024-06-11 0:31 ` [PATCH v2 08/10] iommu/arm-smmu-v3: Shrink the cdtab l1_desc array Jason Gunthorpe
@ 2024-06-11 3:02 ` Nicolin Chen
2024-07-02 18:46 ` Will Deacon
1 sibling, 0 replies; 34+ messages in thread
From: Nicolin Chen @ 2024-06-11 3:02 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, patches, Ryan Roberts, Mostafa Saleh
On Mon, Jun 10, 2024 at 09:31:17PM -0300, Jason Gunthorpe wrote:
> The top of the 2 level CD table is (at most) 1024 entries big, and two
> high order allocations are required. One of __le64 which is programmed
> into the HW (8k) and one of struct arm_smmu_l1_ctx_desc which holds the
> CPU pointer (16k).
>
> There are two copies of the l2ptr_dma, one is stored in the struct
> arm_smmu_l1_ctx_desc, and another is encoded in the __le64 for the HW to
> use. Instead of storing two copies just decode the value from the __le64.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 08/10] iommu/arm-smmu-v3: Shrink the cdtab l1_desc array
2024-06-11 0:31 ` [PATCH v2 08/10] iommu/arm-smmu-v3: Shrink the cdtab l1_desc array Jason Gunthorpe
2024-06-11 3:02 ` Nicolin Chen
@ 2024-07-02 18:46 ` Will Deacon
2024-07-09 19:52 ` Jason Gunthorpe
2024-07-09 23:53 ` Jason Gunthorpe
1 sibling, 2 replies; 34+ messages in thread
From: Will Deacon @ 2024-07-02 18:46 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy,
Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
On Mon, Jun 10, 2024 at 09:31:17PM -0300, Jason Gunthorpe wrote:
> The top of the 2 level CD table is (at most) 1024 entries big, and two
> high order allocations are required. One of __le64 which is programmed
> into the HW (8k) and one of struct arm_smmu_l1_ctx_desc which holds the
> CPU pointer (16k).
>
> There are two copies of the l2ptr_dma, one is stored in the struct
> arm_smmu_l1_ctx_desc, and another is encoded in the __le64 for the HW to
> use. Instead of storing two copies just decode the value from the __le64.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 41 +++++++++------------
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 -
> 2 files changed, 17 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 6245e2558e6a6a..dd65e27aebafd4 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1167,31 +1167,19 @@ static void arm_smmu_sync_cd(struct arm_smmu_master *master,
> arm_smmu_cmdq_batch_submit(smmu, &cmds);
> }
>
> -static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu,
> - struct arm_smmu_l1_ctx_desc *l1_desc)
> +static void arm_smmu_write_cd_l1_desc(__le64 *dst, dma_addr_t l2ptr_dma)
> {
> - size_t size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3);
> -
> - l1_desc->l2ptr = dma_alloc_coherent(smmu->dev, size,
> - &l1_desc->l2ptr_dma, GFP_KERNEL);
> - if (!l1_desc->l2ptr) {
> - dev_warn(smmu->dev,
> - "failed to allocate context descriptor table\n");
> - return -ENOMEM;
> - }
> - return 0;
> -}
> -
> -static void arm_smmu_write_cd_l1_desc(__le64 *dst,
> - struct arm_smmu_l1_ctx_desc *l1_desc)
> -{
> - u64 val = (l1_desc->l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) |
> - CTXDESC_L1_DESC_V;
> + u64 val = (l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) | CTXDESC_L1_DESC_V;
>
> /* The HW has 64 bit atomicity with stores to the L2 CD table */
> WRITE_ONCE(*dst, cpu_to_le64(val));
> }
>
> +static dma_addr_t arm_smmu_cd_l1_get_desc(const __le64 *src)
> +{
> + return le64_to_cpu(src) & CTXDESC_L1_DESC_L2PTR_MASK;
> +}
I'm assuming this is supposed to be *src, in which case this could be
accessing non-cacheable memory if the SMMU isn't coherent. That's why we
shadow everything.
If the current code isn't causing you any problems, please can we leave
it as-is?
Will
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v2 08/10] iommu/arm-smmu-v3: Shrink the cdtab l1_desc array
2024-07-02 18:46 ` Will Deacon
@ 2024-07-09 19:52 ` Jason Gunthorpe
2024-07-09 23:53 ` Jason Gunthorpe
1 sibling, 0 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2024-07-09 19:52 UTC (permalink / raw)
To: Will Deacon
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy,
Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
On Tue, Jul 02, 2024 at 07:46:07PM +0100, Will Deacon wrote:
> On Mon, Jun 10, 2024 at 09:31:17PM -0300, Jason Gunthorpe wrote:
> > The top of the 2 level CD table is (at most) 1024 entries big, and two
> > high order allocations are required. One of __le64 which is programmed
> > into the HW (8k) and one of struct arm_smmu_l1_ctx_desc which holds the
> > CPU pointer (16k).
> >
> > There are two copies of the l2ptr_dma, one is stored in the struct
> > arm_smmu_l1_ctx_desc, and another is encoded in the __le64 for the HW to
> > use. Instead of storing two copies just decode the value from the __le64.
> >
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > ---
> > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 41 +++++++++------------
> > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 -
> > 2 files changed, 17 insertions(+), 25 deletions(-)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index 6245e2558e6a6a..dd65e27aebafd4 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -1167,31 +1167,19 @@ static void arm_smmu_sync_cd(struct arm_smmu_master *master,
> > arm_smmu_cmdq_batch_submit(smmu, &cmds);
> > }
> >
> > -static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu,
> > - struct arm_smmu_l1_ctx_desc *l1_desc)
> > +static void arm_smmu_write_cd_l1_desc(__le64 *dst, dma_addr_t l2ptr_dma)
> > {
> > - size_t size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3);
> > -
> > - l1_desc->l2ptr = dma_alloc_coherent(smmu->dev, size,
> > - &l1_desc->l2ptr_dma, GFP_KERNEL);
> > - if (!l1_desc->l2ptr) {
> > - dev_warn(smmu->dev,
> > - "failed to allocate context descriptor table\n");
> > - return -ENOMEM;
> > - }
> > - return 0;
> > -}
> > -
> > -static void arm_smmu_write_cd_l1_desc(__le64 *dst,
> > - struct arm_smmu_l1_ctx_desc *l1_desc)
> > -{
> > - u64 val = (l1_desc->l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) |
> > - CTXDESC_L1_DESC_V;
> > + u64 val = (l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) | CTXDESC_L1_DESC_V;
> >
> > /* The HW has 64 bit atomicity with stores to the L2 CD table */
> > WRITE_ONCE(*dst, cpu_to_le64(val));
> > }
> >
> > +static dma_addr_t arm_smmu_cd_l1_get_desc(const __le64 *src)
> > +{
> > + return le64_to_cpu(src) & CTXDESC_L1_DESC_L2PTR_MASK;
> > +}
>
> I'm assuming this is supposed to be *src,
Uh, wow, surprised that compiles, yes.
> in which case this could be
> accessing non-cacheable memory if the SMMU isn't coherent. That's why we
> shadow everything.
That is a confusing statement. I know it is a DMA coherent allocation,
but the driver does not avoid reading for DMA coherent memory in all
cases, and DMA coherent allocations are well defined to be readable
anyhow.
For instance the driver has always read back from coherent allocations
when working with the STE or CD:
strtab = dmam_alloc_coherent(smmu->dev, size, &cfg->strtab_dma,
GFP_KERNEL);
cfg->strtab.linear = strtab;
static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
__le64 *dst)
{
u64 val = le64_to_cpu(dst[0]);
^^^^ dst points into strtab above
That's the same thing being done here, with no shadowing, no issue.
I know no reason why the first level cd table would be special that it
needs shadowing? It is just wasting memory. Let's fix it.
Jason
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v2 08/10] iommu/arm-smmu-v3: Shrink the cdtab l1_desc array
2024-07-02 18:46 ` Will Deacon
2024-07-09 19:52 ` Jason Gunthorpe
@ 2024-07-09 23:53 ` Jason Gunthorpe
1 sibling, 0 replies; 34+ messages in thread
From: Jason Gunthorpe @ 2024-07-09 23:53 UTC (permalink / raw)
To: Will Deacon
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy,
Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
On Tue, Jul 02, 2024 at 07:46:07PM +0100, Will Deacon wrote:
> > +static dma_addr_t arm_smmu_cd_l1_get_desc(const __le64 *src)
> > +{
> > + return le64_to_cpu(src) & CTXDESC_L1_DESC_L2PTR_MASK;
> > +}
>
> I'm assuming this is supposed to be *src, in which case this could
> be
Oh I see now, a few patches later it becomes:
static dma_addr_t arm_smmu_cd_l1_get_desc(const struct arm_smmu_cdtab_l1 *src)
{
return le64_to_cpu(src->l2ptr) & CTXDESC_L1_DESC_L2PTR_MASK;
}
Which makes it fine, it is just a bisection thing.
Jason
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 09/10] iommu/arm-smmu-v3: Add types for each level of the CD table
2024-06-11 0:31 [PATCH v2 00/10] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
` (7 preceding siblings ...)
2024-06-11 0:31 ` [PATCH v2 08/10] iommu/arm-smmu-v3: Shrink the cdtab l1_desc array Jason Gunthorpe
@ 2024-06-11 0:31 ` Jason Gunthorpe
2024-06-11 3:12 ` Nicolin Chen
2024-06-11 0:31 ` [PATCH v2 10/10] iommu/arm-smmu-v3: Reorganize struct arm_smmu_ctx_desc_cfg Jason Gunthorpe
` (2 subsequent siblings)
11 siblings, 1 reply; 34+ messages in thread
From: Jason Gunthorpe @ 2024-06-11 0:31 UTC (permalink / raw)
To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
Cc: Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
As well as indexing helpers arm_smmu_cdtab_l1/2_idx().
Remove CTXDESC_L1_DESC_DWORDS and CTXDESC_CD_DWORDS replacing them all
with type specific calculations.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 46 +++++++++++----------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 27 +++++++++---
2 files changed, 46 insertions(+), 27 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index dd65e27aebafd4..49e019131ef3b5 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1167,17 +1167,18 @@ static void arm_smmu_sync_cd(struct arm_smmu_master *master,
arm_smmu_cmdq_batch_submit(smmu, &cmds);
}
-static void arm_smmu_write_cd_l1_desc(__le64 *dst, dma_addr_t l2ptr_dma)
+static void arm_smmu_write_cd_l1_desc(struct arm_smmu_cdtab_l1 *dst,
+ dma_addr_t l2ptr_dma)
{
u64 val = (l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) | CTXDESC_L1_DESC_V;
/* The HW has 64 bit atomicity with stores to the L2 CD table */
- WRITE_ONCE(*dst, cpu_to_le64(val));
+ WRITE_ONCE(dst->l2ptr, cpu_to_le64(val));
}
-static dma_addr_t arm_smmu_cd_l1_get_desc(const __le64 *src)
+static dma_addr_t arm_smmu_cd_l1_get_desc(const struct arm_smmu_cdtab_l1 *src)
{
- return le64_to_cpu(src) & CTXDESC_L1_DESC_L2PTR_MASK;
+ return le64_to_cpu(src->l2ptr) & CTXDESC_L1_DESC_L2PTR_MASK;
}
struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
@@ -1190,13 +1191,12 @@ struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
return NULL;
if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_LINEAR)
- return (struct arm_smmu_cd *)(cd_table->cdtab +
- ssid * CTXDESC_CD_DWORDS);
+ return &((struct arm_smmu_cd *)cd_table->cdtab)[ssid];
- l1_desc = &cd_table->l1_desc[ssid / CTXDESC_L2_ENTRIES];
+ l1_desc = &cd_table->l1_desc[arm_smmu_cdtab_l1_idx(ssid)];
if (!l1_desc->l2ptr)
return NULL;
- return &l1_desc->l2ptr[ssid % CTXDESC_L2_ENTRIES];
+ return &l1_desc->l2ptr->cds[arm_smmu_cdtab_l2_idx(ssid)];
}
struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
@@ -1214,7 +1214,7 @@ struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
}
if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_64K_L2) {
- unsigned int idx = ssid / CTXDESC_L2_ENTRIES;
+ unsigned int idx = arm_smmu_cdtab_l1_idx(ssid);
struct arm_smmu_l1_ctx_desc *l1_desc;
l1_desc = &cd_table->l1_desc[idx];
@@ -1228,8 +1228,10 @@ struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
if (!l1_desc->l2ptr)
return NULL;
- arm_smmu_write_cd_l1_desc(&cd_table->cdtab[idx],
- l2ptr_dma);
+ arm_smmu_write_cd_l1_desc(
+ &((struct arm_smmu_cdtab_l1 *)
+ cd_table->cdtab)[idx],
+ l2ptr_dma);
/* An invalid L1CD can be cached */
arm_smmu_sync_cd(master, ssid, false);
}
@@ -1358,7 +1360,7 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
cd_table->s1fmt = STRTAB_STE_0_S1FMT_LINEAR;
cd_table->num_l1_ents = max_contexts;
- l1size = max_contexts * (CTXDESC_CD_DWORDS << 3);
+ l1size = max_contexts * sizeof(struct arm_smmu_cd);
} else {
cd_table->s1fmt = STRTAB_STE_0_S1FMT_64K_L2;
cd_table->num_l1_ents = DIV_ROUND_UP(max_contexts,
@@ -1370,7 +1372,8 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
if (!cd_table->l1_desc)
return -ENOMEM;
- l1size = cd_table->num_l1_ents * (CTXDESC_L1_DESC_DWORDS << 3);
+ l1size = cd_table->num_l1_ents *
+ sizeof(struct arm_smmu_cdtab_l1);
}
cd_table->cdtab = dma_alloc_coherent(smmu->dev, l1size,
@@ -1394,27 +1397,28 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
static void arm_smmu_free_cd_tables(struct arm_smmu_master *master)
{
int i;
- size_t size, l1size;
+ size_t l1size;
struct arm_smmu_device *smmu = master->smmu;
struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
if (cd_table->l1_desc) {
- size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3);
-
for (i = 0; i < cd_table->num_l1_ents; i++) {
if (!cd_table->l1_desc[i].l2ptr)
continue;
- dma_free_coherent(smmu->dev, size,
+ dma_free_coherent(smmu->dev,
+ sizeof(*cd_table->l1_desc[i].l2ptr),
cd_table->l1_desc[i].l2ptr,
- arm_smmu_cd_l1_get_desc(
- &cd_table->cdtab[i]));
+ arm_smmu_cd_l1_get_desc(&(
+ (struct arm_smmu_cdtab_l1 *)
+ cd_table->cdtab)[i]));
}
kfree(cd_table->l1_desc);
- l1size = cd_table->num_l1_ents * (CTXDESC_L1_DESC_DWORDS << 3);
+ l1size = cd_table->num_l1_ents *
+ sizeof(struct arm_smmu_cdtab_l1);
} else {
- l1size = cd_table->num_l1_ents * (CTXDESC_CD_DWORDS << 3);
+ l1size = cd_table->num_l1_ents * sizeof(struct arm_smmu_cd);
}
dma_free_coherent(smmu->dev, l1size, cd_table->cdtab,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 2f7d70d92b1f31..7ed8d53423cf56 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -293,16 +293,31 @@ static inline unsigned int arm_smmu_strtab_l2_idx(unsigned int sid)
*/
#define CTXDESC_L2_ENTRIES 1024
-#define CTXDESC_L1_DESC_DWORDS 1
#define CTXDESC_L1_DESC_V (1UL << 0)
#define CTXDESC_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 12)
-#define CTXDESC_CD_DWORDS 8
-
struct arm_smmu_cd {
- __le64 data[CTXDESC_CD_DWORDS];
+ __le64 data[8];
};
+struct arm_smmu_cdtab_l2 {
+ struct arm_smmu_cd cds[CTXDESC_L2_ENTRIES];
+};
+
+struct arm_smmu_cdtab_l1 {
+ __le64 l2ptr;
+};
+
+static inline unsigned int arm_smmu_cdtab_l1_idx(unsigned int ssid)
+{
+ return ssid / CTXDESC_L2_ENTRIES;
+}
+
+static inline unsigned int arm_smmu_cdtab_l2_idx(unsigned int ssid)
+{
+ return ssid % CTXDESC_L2_ENTRIES;
+}
+
#define CTXDESC_CD_0_TCR_T0SZ GENMASK_ULL(5, 0)
#define CTXDESC_CD_0_TCR_TG0 GENMASK_ULL(7, 6)
#define CTXDESC_CD_0_TCR_IRGN0 GENMASK_ULL(9, 8)
@@ -330,7 +345,7 @@ struct arm_smmu_cd {
* When the SMMU only supports linear context descriptor tables, pick a
* reasonable size limit (64kB).
*/
-#define CTXDESC_LINEAR_CDMAX ilog2(SZ_64K / (CTXDESC_CD_DWORDS << 3))
+#define CTXDESC_LINEAR_CDMAX ilog2(SZ_64K / sizeof(struct arm_smmu_cd))
/* Command queue */
#define CMDQ_ENT_SZ_SHIFT 4
@@ -602,7 +617,7 @@ struct arm_smmu_ctx_desc {
};
struct arm_smmu_l1_ctx_desc {
- struct arm_smmu_cd *l2ptr;
+ struct arm_smmu_cdtab_l2 *l2ptr;
};
struct arm_smmu_ctx_desc_cfg {
--
2.45.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 09/10] iommu/arm-smmu-v3: Add types for each level of the CD table
2024-06-11 0:31 ` [PATCH v2 09/10] iommu/arm-smmu-v3: Add types for each level of the CD table Jason Gunthorpe
@ 2024-06-11 3:12 ` Nicolin Chen
0 siblings, 0 replies; 34+ messages in thread
From: Nicolin Chen @ 2024-06-11 3:12 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, patches, Ryan Roberts, Mostafa Saleh
On Mon, Jun 10, 2024 at 09:31:18PM -0300, Jason Gunthorpe wrote:
> As well as indexing helpers arm_smmu_cdtab_l1/2_idx().
>
> Remove CTXDESC_L1_DESC_DWORDS and CTXDESC_CD_DWORDS replacing them all
> with type specific calculations.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 10/10] iommu/arm-smmu-v3: Reorganize struct arm_smmu_ctx_desc_cfg
2024-06-11 0:31 [PATCH v2 00/10] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
` (8 preceding siblings ...)
2024-06-11 0:31 ` [PATCH v2 09/10] iommu/arm-smmu-v3: Add types for each level of the CD table Jason Gunthorpe
@ 2024-06-11 0:31 ` Jason Gunthorpe
2024-06-11 3:25 ` Nicolin Chen
2024-06-11 3:27 ` [PATCH v2 00/10] Tidy some minor things in the stream table/cd table area Nicolin Chen
2024-07-02 18:43 ` Will Deacon
11 siblings, 1 reply; 34+ messages in thread
From: Jason Gunthorpe @ 2024-06-11 0:31 UTC (permalink / raw)
To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
Cc: Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
The members here are being used for both the linear and the 2 level case,
with the meaning of each item slightly different in the two cases.
Split it into a clean union where both cases have their own struct with
their own logical names and correct types.
Adjust all the users to detect linear/2lvl and use the right sub structure
and types consistently.
Remove CTXDESC_CD_DWORDS by changing the last places to use
sizeof(struct arm_smmu_cd).
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 119 +++++++++-----------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 24 ++--
2 files changed, 71 insertions(+), 72 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 49e019131ef3b5..b9994c46c52f25 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1184,19 +1184,19 @@ static dma_addr_t arm_smmu_cd_l1_get_desc(const struct arm_smmu_cdtab_l1 *src)
struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
u32 ssid)
{
- struct arm_smmu_l1_ctx_desc *l1_desc;
+ struct arm_smmu_cdtab_l2 *l2;
struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
- if (!cd_table->cdtab)
+ if (!arm_smmu_cdtab_allocated(cd_table))
return NULL;
if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_LINEAR)
- return &((struct arm_smmu_cd *)cd_table->cdtab)[ssid];
+ return &cd_table->linear.table[ssid];
- l1_desc = &cd_table->l1_desc[arm_smmu_cdtab_l1_idx(ssid)];
- if (!l1_desc->l2ptr)
+ l2 = cd_table->l2.l2ptrs[arm_smmu_cdtab_l1_idx(ssid)];
+ if (!l2)
return NULL;
- return &l1_desc->l2ptr->cds[arm_smmu_cdtab_l2_idx(ssid)];
+ return &l2->cds[arm_smmu_cdtab_l2_idx(ssid)];
}
struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
@@ -1208,30 +1208,25 @@ struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
might_sleep();
iommu_group_mutex_assert(master->dev);
- if (!cd_table->cdtab) {
+ if (!arm_smmu_cdtab_allocated(cd_table)) {
if (arm_smmu_alloc_cd_tables(master))
return NULL;
}
if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_64K_L2) {
unsigned int idx = arm_smmu_cdtab_l1_idx(ssid);
- struct arm_smmu_l1_ctx_desc *l1_desc;
+ struct arm_smmu_cdtab_l2 **l2ptr = &cd_table->l2.l2ptrs[idx];
- l1_desc = &cd_table->l1_desc[idx];
- if (!l1_desc->l2ptr) {
+ if (!*l2ptr) {
dma_addr_t l2ptr_dma;
- l1_desc->l2ptr = dma_alloc_coherent(
- smmu->dev,
- CTXDESC_L2_ENTRIES * sizeof(struct arm_smmu_cd),
- &l2ptr_dma, GFP_KERNEL);
- if (!l1_desc->l2ptr)
+ *l2ptr = dma_alloc_coherent(smmu->dev, sizeof(**l2ptr),
+ &l2ptr_dma, GFP_KERNEL);
+ if (!*l2ptr)
return NULL;
- arm_smmu_write_cd_l1_desc(
- &((struct arm_smmu_cdtab_l1 *)
- cd_table->cdtab)[idx],
- l2ptr_dma);
+ arm_smmu_write_cd_l1_desc(&cd_table->l2.l1tab[idx],
+ l2ptr_dma);
/* An invalid L1CD can be cached */
arm_smmu_sync_cd(master, ssid, false);
}
@@ -1336,7 +1331,7 @@ void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid)
struct arm_smmu_cd target = {};
struct arm_smmu_cd *cdptr;
- if (!master->cd_table.cdtab)
+ if (!arm_smmu_cdtab_allocated(&master->cd_table))
return;
cdptr = arm_smmu_get_cd_ptr(master, ssid);
if (WARN_ON(!cdptr))
@@ -1346,8 +1341,6 @@ void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid)
static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
{
- int ret;
- size_t l1size;
size_t max_contexts;
struct arm_smmu_device *smmu = master->smmu;
struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
@@ -1358,71 +1351,67 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB) ||
max_contexts <= CTXDESC_L2_ENTRIES) {
cd_table->s1fmt = STRTAB_STE_0_S1FMT_LINEAR;
- cd_table->num_l1_ents = max_contexts;
+ cd_table->linear.num_ents = max_contexts;
- l1size = max_contexts * sizeof(struct arm_smmu_cd);
+ cd_table->linear.table = dma_alloc_coherent(
+ smmu->dev, max_contexts * sizeof(struct arm_smmu_cd),
+ &cd_table->cdtab_dma, GFP_KERNEL);
+ if (!cd_table->linear.table)
+ return -ENOMEM;
} else {
cd_table->s1fmt = STRTAB_STE_0_S1FMT_64K_L2;
- cd_table->num_l1_ents = DIV_ROUND_UP(max_contexts,
- CTXDESC_L2_ENTRIES);
+ cd_table->l2.num_l1_ents =
+ DIV_ROUND_UP(max_contexts, CTXDESC_L2_ENTRIES);
- cd_table->l1_desc = kcalloc(cd_table->num_l1_ents,
- sizeof(*cd_table->l1_desc),
- GFP_KERNEL);
- if (!cd_table->l1_desc)
+ cd_table->l2.l2ptrs = kcalloc(cd_table->l2.num_l1_ents,
+ sizeof(*cd_table->l2.l2ptrs),
+ GFP_KERNEL);
+ if (!cd_table->l2.l2ptrs)
return -ENOMEM;
- l1size = cd_table->num_l1_ents *
- sizeof(struct arm_smmu_cdtab_l1);
+ cd_table->l2.l1tab = dma_alloc_coherent(
+ smmu->dev,
+ cd_table->l2.num_l1_ents *
+ sizeof(struct arm_smmu_cdtab_l1),
+ &cd_table->cdtab_dma, GFP_KERNEL);
+ if (!cd_table->l2.l1tab) {
+ kfree(cd_table->l2.l2ptrs);
+ cd_table->l2.l2ptrs = NULL;
+ return -ENOMEM;
+ }
}
-
- cd_table->cdtab = dma_alloc_coherent(smmu->dev, l1size,
- &cd_table->cdtab_dma, GFP_KERNEL);
- if (!cd_table->cdtab) {
- dev_warn(smmu->dev, "failed to allocate context descriptor\n");
- ret = -ENOMEM;
- goto err_free_l1;
- }
-
return 0;
-
-err_free_l1:
- if (cd_table->l1_desc) {
- kfree(cd_table->l1_desc);
- cd_table->l1_desc = NULL;
- }
- return ret;
}
static void arm_smmu_free_cd_tables(struct arm_smmu_master *master)
{
int i;
- size_t l1size;
struct arm_smmu_device *smmu = master->smmu;
struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
- if (cd_table->l1_desc) {
- for (i = 0; i < cd_table->num_l1_ents; i++) {
- if (!cd_table->l1_desc[i].l2ptr)
+ if (cd_table->s1fmt != STRTAB_STE_0_S1FMT_LINEAR) {
+ for (i = 0; i < cd_table->l2.num_l1_ents; i++) {
+ if (!cd_table->l2.l2ptrs[i])
continue;
dma_free_coherent(smmu->dev,
- sizeof(*cd_table->l1_desc[i].l2ptr),
- cd_table->l1_desc[i].l2ptr,
- arm_smmu_cd_l1_get_desc(&(
- (struct arm_smmu_cdtab_l1 *)
- cd_table->cdtab)[i]));
+ sizeof(*cd_table->l2.l2ptrs[i]),
+ cd_table->l2.l2ptrs[i],
+ arm_smmu_cd_l1_get_desc(
+ &cd_table->l2.l1tab[i]));
}
- kfree(cd_table->l1_desc);
+ kfree(cd_table->l2.l2ptrs);
- l1size = cd_table->num_l1_ents *
- sizeof(struct arm_smmu_cdtab_l1);
+ dma_free_coherent(smmu->dev,
+ cd_table->l2.num_l1_ents *
+ sizeof(struct arm_smmu_cdtab_l1),
+ cd_table->l2.l1tab, cd_table->cdtab_dma);
} else {
- l1size = cd_table->num_l1_ents * sizeof(struct arm_smmu_cd);
+ dma_free_coherent(smmu->dev,
+ cd_table->linear.num_ents *
+ sizeof(struct arm_smmu_cd),
+ cd_table->linear.table, cd_table->cdtab_dma);
}
-
- dma_free_coherent(smmu->dev, l1size, cd_table->cdtab,
- cd_table->cdtab_dma);
}
bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd)
@@ -2937,7 +2926,7 @@ static void arm_smmu_release_device(struct device *dev)
arm_smmu_disable_pasid(master);
arm_smmu_remove_master(master);
- if (master->cd_table.cdtab)
+ if (arm_smmu_cdtab_allocated(&master->cd_table))
arm_smmu_free_cd_tables(master);
kfree(master);
}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 7ed8d53423cf56..35879064c0f725 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -616,20 +616,30 @@ struct arm_smmu_ctx_desc {
struct mm_struct *mm;
};
-struct arm_smmu_l1_ctx_desc {
- struct arm_smmu_cdtab_l2 *l2ptr;
-};
-
struct arm_smmu_ctx_desc_cfg {
- __le64 *cdtab;
+ union {
+ struct {
+ struct arm_smmu_cd *table;
+ unsigned int num_ents;
+ } linear;
+ struct {
+ struct arm_smmu_cdtab_l1 *l1tab;
+ struct arm_smmu_cdtab_l2 **l2ptrs;
+ unsigned int num_l1_ents;
+ } l2;
+ };
dma_addr_t cdtab_dma;
- struct arm_smmu_l1_ctx_desc *l1_desc;
- unsigned int num_l1_ents;
u8 s1fmt;
/* log2 of the maximum number of CDs supported by this table */
u8 s1cdmax;
};
+static inline bool
+arm_smmu_cdtab_allocated(struct arm_smmu_ctx_desc_cfg *cfg)
+{
+ return cfg->linear.table || cfg->l2.l1tab;
+}
+
struct arm_smmu_s2_cfg {
u16 vmid;
};
--
2.45.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v2 10/10] iommu/arm-smmu-v3: Reorganize struct arm_smmu_ctx_desc_cfg
2024-06-11 0:31 ` [PATCH v2 10/10] iommu/arm-smmu-v3: Reorganize struct arm_smmu_ctx_desc_cfg Jason Gunthorpe
@ 2024-06-11 3:25 ` Nicolin Chen
0 siblings, 0 replies; 34+ messages in thread
From: Nicolin Chen @ 2024-06-11 3:25 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, patches, Ryan Roberts, Mostafa Saleh
On Mon, Jun 10, 2024 at 09:31:19PM -0300, Jason Gunthorpe wrote:
> The members here are being used for both the linear and the 2 level case,
> with the meaning of each item slightly different in the two cases.
>
> Split it into a clean union where both cases have their own struct with
> their own logical names and correct types.
>
> Adjust all the users to detect linear/2lvl and use the right sub structure
> and types consistently.
>
> Remove CTXDESC_CD_DWORDS by changing the last places to use
> sizeof(struct arm_smmu_cd).
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 00/10] Tidy some minor things in the stream table/cd table area
2024-06-11 0:31 [PATCH v2 00/10] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
` (9 preceding siblings ...)
2024-06-11 0:31 ` [PATCH v2 10/10] iommu/arm-smmu-v3: Reorganize struct arm_smmu_ctx_desc_cfg Jason Gunthorpe
@ 2024-06-11 3:27 ` Nicolin Chen
2024-07-02 18:43 ` Will Deacon
11 siblings, 0 replies; 34+ messages in thread
From: Nicolin Chen @ 2024-06-11 3:27 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, patches, Ryan Roberts, Mostafa Saleh
On Mon, Jun 10, 2024 at 09:31:09PM -0300, Jason Gunthorpe wrote:
> Will pointed out that two places referring to the CD/STE struct did not
> get the new types. While auditing this code a few more oddities were
> noticed. Based on a feedback from Mostafa and Nicolin a few more things
> were fixed up too
>
> - Use types for all the HW structures everywhere even for the L1
> descriptors that are just a single 8 bytes. This helps with clarity of
> what everthing is pointing at
> - Use indexing helpers for the STE/CD two level calculations
> - Use sizeof(struct X) instead of open coded math on constants. The sizeof
> naturally follows the type of the related variable in almost all cases
> - Remove redundant dma_addr_t's and save some memory
> - Remove redundant devm usage
> - Use the modern rbtree API
>
> Parts of this have been sitting in my tree for a while now, it grew a bit
> since v1, but nothing is particularly profound here. Enough is merged now
> that they can be cleanly based and are seperate from my other series.
>
> v2:
> - Add a patch to add structs for the L1/L2 HW layouts and use their
> sizeof and type instead of constants and generic __le64 *.
> - Add a patch for L1/L2 indexing helpers for clarity
> - Reorder patches
> - Redo the union layout in the cfg for both cases
> - Fully remove some more defines
For the entire series,
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v2 00/10] Tidy some minor things in the stream table/cd table area
2024-06-11 0:31 [PATCH v2 00/10] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
` (10 preceding siblings ...)
2024-06-11 3:27 ` [PATCH v2 00/10] Tidy some minor things in the stream table/cd table area Nicolin Chen
@ 2024-07-02 18:43 ` Will Deacon
11 siblings, 0 replies; 34+ messages in thread
From: Will Deacon @ 2024-07-02 18:43 UTC (permalink / raw)
To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy,
Jason Gunthorpe
Cc: catalin.marinas, kernel-team, Will Deacon, Michael Shavit,
Nicolin Chen, patches, Ryan Roberts, Mostafa Saleh
On Mon, 10 Jun 2024 21:31:09 -0300, Jason Gunthorpe wrote:
> Will pointed out that two places referring to the CD/STE struct did not
> get the new types. While auditing this code a few more oddities were
> noticed. Based on a feedback from Mostafa and Nicolin a few more things
> were fixed up too
>
> - Use types for all the HW structures everywhere even for the L1
> descriptors that are just a single 8 bytes. This helps with clarity of
> what everthing is pointing at
> - Use indexing helpers for the STE/CD two level calculations
> - Use sizeof(struct X) instead of open coded math on constants. The sizeof
> naturally follows the type of the related variable in almost all cases
> - Remove redundant dma_addr_t's and save some memory
> - Remove redundant devm usage
> - Use the modern rbtree API
>
> [...]
Applied first two to will (for-joerg/arm-smmu/updates), thanks!
[01/10] iommu/arm-smmu-v3: Do not zero the strtab twice
https://git.kernel.org/will/c/c84c5ab76c9c
[02/10] iommu/arm-smmu-v3: Shrink the strtab l1_desc array
https://git.kernel.org/will/c/a4d75360f7a6
Cheers,
--
Will
https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
^ permalink raw reply [flat|nested] 34+ messages in thread