* [PATCH 0/7] Tidy some minor things in the stream table/cd table area
@ 2024-06-03 22:31 Jason Gunthorpe
2024-06-03 22:31 ` [PATCH 1/7] iommu/arm-smmu-v3: Split struct arm_smmu_strtab_cfg.strtab Jason Gunthorpe
` (7 more replies)
0 siblings, 8 replies; 23+ messages in thread
From: Jason Gunthorpe @ 2024-06-03 22:31 UTC (permalink / raw)
To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
Cc: Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
Will pointed out that two places referring to the CD/STE struct did not
get the new types. While auditing this code a few more oddities were
noticed.
- Correct types for the linear stream/cd table pointers struct
- Remove redundant dma_addr_t's and save some memory
- Remove redundant devm usage
- Use the modern rbtree API
Nothing is particularly profound here, I've been sitting on these for
awhile, enough is merged now that they can be cleanly based and are
seperate from my other series.
Jason Gunthorpe (7):
iommu/arm-smmu-v3: Split struct arm_smmu_strtab_cfg.strtab
iommu/arm-smmu-v3: Do not zero the strtab twice
iommu/arm-smmu-v3: Shrink the strtab l1_desc array
iommu/arm-smmu-v3: Split struct arm_smmu_ctx_desc_cfg.cdtab
iommu/arm-smmu-v3: Do not use devm for the cd table allocations
iommu/arm-smmu-v3: Shrink the cdtab l1_desc array
iommu/arm-smmu-v3: Use the new rb tree helpers
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 234 +++++++++-----------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 30 +--
2 files changed, 119 insertions(+), 145 deletions(-)
base-commit: c3f38fa61af77b49866b006939479069cd451173
--
2.45.2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/7] iommu/arm-smmu-v3: Split struct arm_smmu_strtab_cfg.strtab
2024-06-03 22:31 [PATCH 0/7] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
@ 2024-06-03 22:31 ` Jason Gunthorpe
2024-06-04 8:32 ` Nicolin Chen
2024-06-04 15:52 ` Mostafa Saleh
2024-06-03 22:31 ` [PATCH 2/7] iommu/arm-smmu-v3: Do not zero the strtab twice Jason Gunthorpe
` (6 subsequent siblings)
7 siblings, 2 replies; 23+ messages in thread
From: Jason Gunthorpe @ 2024-06-03 22:31 UTC (permalink / raw)
To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
Cc: Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
This is being used as both an array of STEs and an array of L1
descriptors.
Give the two usages different names and correct types.
Remove STRTAB_STE_DWORDS as most usages were indexing or sizing an array
of struct arm_smmu_ste.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 21 +++++++++------------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 9 +++++----
2 files changed, 14 insertions(+), 16 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index ab415e107054c1..6b4f1a664288db 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1661,8 +1661,8 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
if (desc->l2ptr)
return 0;
- size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3);
- strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS];
+ size = (1 << STRTAB_SPLIT) * sizeof(struct arm_smmu_ste);
+ strtab = &cfg->strtab.l1_desc[sid >> STRTAB_SPLIT];
desc->span = STRTAB_SPLIT + 1;
desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma,
@@ -2409,8 +2409,7 @@ arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid)
return &cfg->l1_desc[idx1].l2ptr[idx2];
} else {
/* Simple linear lookup */
- return (struct arm_smmu_ste *)&cfg
- ->strtab[sid * STRTAB_STE_DWORDS];
+ return &cfg->strtab.linear[sid];
}
}
@@ -3225,17 +3224,15 @@ static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu)
{
unsigned int i;
struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
- void *strtab = smmu->strtab_cfg.strtab;
cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents,
sizeof(*cfg->l1_desc), GFP_KERNEL);
if (!cfg->l1_desc)
return -ENOMEM;
- for (i = 0; i < cfg->num_l1_ents; ++i) {
- arm_smmu_write_strtab_l1_desc(strtab, &cfg->l1_desc[i]);
- strtab += STRTAB_L1_DESC_DWORDS << 3;
- }
+ for (i = 0; i < cfg->num_l1_ents; ++i)
+ arm_smmu_write_strtab_l1_desc(
+ &smmu->strtab_cfg.strtab.l1_desc[i], &cfg->l1_desc[i]);
return 0;
}
@@ -3267,7 +3264,7 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
l1size);
return -ENOMEM;
}
- cfg->strtab = strtab;
+ cfg->strtab.l1_desc = strtab;
/* Configure strtab_base_cfg for 2 levels */
reg = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL);
@@ -3285,7 +3282,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
u32 size;
struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
- size = (1 << smmu->sid_bits) * (STRTAB_STE_DWORDS << 3);
+ size = (1 << smmu->sid_bits) * sizeof(cfg->strtab.linear[0]);
strtab = dmam_alloc_coherent(smmu->dev, size, &cfg->strtab_dma,
GFP_KERNEL);
if (!strtab) {
@@ -3294,7 +3291,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
size);
return -ENOMEM;
}
- cfg->strtab = strtab;
+ cfg->strtab.linear = strtab;
cfg->num_l1_ents = 1 << smmu->sid_bits;
/* Configure strtab_base_cfg for a linear table covering all SIDs */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 1242a086c9f948..4769780259affc 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -206,10 +206,8 @@
#define STRTAB_L1_DESC_SPAN GENMASK_ULL(4, 0)
#define STRTAB_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 6)
-#define STRTAB_STE_DWORDS 8
-
struct arm_smmu_ste {
- __le64 data[STRTAB_STE_DWORDS];
+ __le64 data[8];
};
#define STRTAB_STE_0_V (1UL << 0)
@@ -612,7 +610,10 @@ struct arm_smmu_s2_cfg {
};
struct arm_smmu_strtab_cfg {
- __le64 *strtab;
+ union {
+ struct arm_smmu_ste *linear;
+ __le64 *l1_desc;
+ } strtab;
dma_addr_t strtab_dma;
struct arm_smmu_strtab_l1_desc *l1_desc;
unsigned int num_l1_ents;
--
2.45.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 2/7] iommu/arm-smmu-v3: Do not zero the strtab twice
2024-06-03 22:31 [PATCH 0/7] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
2024-06-03 22:31 ` [PATCH 1/7] iommu/arm-smmu-v3: Split struct arm_smmu_strtab_cfg.strtab Jason Gunthorpe
@ 2024-06-03 22:31 ` Jason Gunthorpe
2024-06-04 15:56 ` Mostafa Saleh
2024-06-03 22:31 ` [PATCH 3/7] iommu/arm-smmu-v3: Shrink the strtab l1_desc array Jason Gunthorpe
` (5 subsequent siblings)
7 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2024-06-03 22:31 UTC (permalink / raw)
To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
Cc: Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
dmam_alloc_coherent() already returns zero'd memory so cfg->strtab.l1_desc
(the list of DMA addresses for the L2 entries) is already zero'd.
arm_smmu_init_l1_strtab() goes through and calls
arm_smmu_write_strtab_l1_desc() on the newly allocated (and zero'd) struct
arm_smmu_strtab_l1_desc, which ends up computing 'val = 0' and zeroing it
again.
Remove arm_smmu_init_l1_strtab() and just call devm_kcalloc() from
arm_smmu_init_strtab_2lvl to allocate the companion struct.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 27 +++++++--------------
1 file changed, 9 insertions(+), 18 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 6b4f1a664288db..d27dd0600bf1df 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3220,23 +3220,6 @@ static int arm_smmu_init_queues(struct arm_smmu_device *smmu)
PRIQ_ENT_DWORDS, "priq");
}
-static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu)
-{
- unsigned int i;
- struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
-
- cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents,
- sizeof(*cfg->l1_desc), GFP_KERNEL);
- if (!cfg->l1_desc)
- return -ENOMEM;
-
- for (i = 0; i < cfg->num_l1_ents; ++i)
- arm_smmu_write_strtab_l1_desc(
- &smmu->strtab_cfg.strtab.l1_desc[i], &cfg->l1_desc[i]);
-
- return 0;
-}
-
static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
{
void *strtab;
@@ -3272,7 +3255,15 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT);
cfg->strtab_base_cfg = reg;
- return arm_smmu_init_l1_strtab(smmu);
+ cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents,
+ sizeof(*cfg->l1_desc), GFP_KERNEL);
+ if (!cfg->l1_desc) {
+ dev_err(smmu->dev,
+ "failed to allocate l1 stream table (%zu bytes)\n",
+ cfg->num_l1_ents * sizeof(*cfg->l1_desc));
+ return -ENOMEM;
+ }
+ return 0;
}
static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
--
2.45.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 3/7] iommu/arm-smmu-v3: Shrink the strtab l1_desc array
2024-06-03 22:31 [PATCH 0/7] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
2024-06-03 22:31 ` [PATCH 1/7] iommu/arm-smmu-v3: Split struct arm_smmu_strtab_cfg.strtab Jason Gunthorpe
2024-06-03 22:31 ` [PATCH 2/7] iommu/arm-smmu-v3: Do not zero the strtab twice Jason Gunthorpe
@ 2024-06-03 22:31 ` Jason Gunthorpe
2024-06-04 16:01 ` Mostafa Saleh
2024-06-03 22:31 ` [PATCH 4/7] iommu/arm-smmu-v3: Split struct arm_smmu_ctx_desc_cfg.cdtab Jason Gunthorpe
` (4 subsequent siblings)
7 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2024-06-03 22:31 UTC (permalink / raw)
To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
Cc: Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
The top of the 2 level stream table is (at most) 128k entries big, and two
high order allocations are required. One of __le64 which is programmed
into the HW (1M), and one of struct arm_smmu_strtab_l1_desc which holds
the CPU pointer (3M).
There is no reason to store the l2ptr_dma as nothing reads it. devm stores
a copy of it and the DMA memory will be freed via devm mechanisms. span is
a constant of 8+1. Remove both.
This removes 16 bytes from each arm_smmu_l1_ctx_desc and saves up to 2M of
memory per iommu instance.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 14 +++++++-------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 ---
2 files changed, 7 insertions(+), 10 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index d27dd0600bf1df..735dd9ff61890e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1448,12 +1448,12 @@ bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd)
/* Stream table manipulation functions */
static void
-arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
+arm_smmu_write_strtab_l1_desc(__le64 *dst, dma_addr_t l2ptr_dma, u8 span)
{
u64 val = 0;
- val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span);
- val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
+ val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, span);
+ val |= l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
/* The HW has 64 bit atomicity with stores to the L2 STE table */
WRITE_ONCE(*dst, cpu_to_le64(val));
@@ -1655,6 +1655,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
{
size_t size;
void *strtab;
+ dma_addr_t l2ptr_dma;
struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT];
@@ -1664,9 +1665,8 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
size = (1 << STRTAB_SPLIT) * sizeof(struct arm_smmu_ste);
strtab = &cfg->strtab.l1_desc[sid >> STRTAB_SPLIT];
- desc->span = STRTAB_SPLIT + 1;
- desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma,
- GFP_KERNEL);
+ desc->l2ptr =
+ dmam_alloc_coherent(smmu->dev, size, &l2ptr_dma, GFP_KERNEL);
if (!desc->l2ptr) {
dev_err(smmu->dev,
"failed to allocate l2 stream table for SID %u\n",
@@ -1675,7 +1675,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
}
arm_smmu_init_initial_stes(desc->l2ptr, 1 << STRTAB_SPLIT);
- arm_smmu_write_strtab_l1_desc(strtab, desc);
+ arm_smmu_write_strtab_l1_desc(strtab, l2ptr_dma, STRTAB_SPLIT + 1);
return 0;
}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 4769780259affc..280a04bfb7230c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -577,10 +577,7 @@ struct arm_smmu_priq {
/* High-level stream table and context descriptor structures */
struct arm_smmu_strtab_l1_desc {
- u8 span;
-
struct arm_smmu_ste *l2ptr;
- dma_addr_t l2ptr_dma;
};
struct arm_smmu_ctx_desc {
--
2.45.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 4/7] iommu/arm-smmu-v3: Split struct arm_smmu_ctx_desc_cfg.cdtab
2024-06-03 22:31 [PATCH 0/7] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
` (2 preceding siblings ...)
2024-06-03 22:31 ` [PATCH 3/7] iommu/arm-smmu-v3: Shrink the strtab l1_desc array Jason Gunthorpe
@ 2024-06-03 22:31 ` Jason Gunthorpe
2024-06-04 16:07 ` Mostafa Saleh
2024-06-03 22:31 ` [PATCH 5/7] iommu/arm-smmu-v3: Do not use devm for the cd table allocations Jason Gunthorpe
` (3 subsequent siblings)
7 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2024-06-03 22:31 UTC (permalink / raw)
To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
Cc: Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
This is being used as both an array of CDs and an array of L1 descriptors.
Give the two usages different names and correct types.
Remove CTXDESC_CD_DWORDS as most usages were indexing or sizing an array
of struct arm_smmu_cd.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 60 ++++++++++-----------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 17 ++++--
2 files changed, 41 insertions(+), 36 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 735dd9ff61890e..24c286a0834445 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1170,7 +1170,7 @@ static void arm_smmu_sync_cd(struct arm_smmu_master *master,
static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu,
struct arm_smmu_l1_ctx_desc *l1_desc)
{
- size_t size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3);
+ size_t size = CTXDESC_L2_ENTRIES * sizeof(struct arm_smmu_cd);
l1_desc->l2ptr = dmam_alloc_coherent(smmu->dev, size,
&l1_desc->l2ptr_dma, GFP_KERNEL);
@@ -1198,12 +1198,11 @@ struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
struct arm_smmu_l1_ctx_desc *l1_desc;
struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
- if (!cd_table->cdtab)
+ if (!arm_smmu_cdtab_allocated(cd_table))
return NULL;
if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_LINEAR)
- return (struct arm_smmu_cd *)(cd_table->cdtab +
- ssid * CTXDESC_CD_DWORDS);
+ return &cd_table->cdtab.linear[ssid];
l1_desc = &cd_table->l1_desc[ssid / CTXDESC_L2_ENTRIES];
if (!l1_desc->l2ptr)
@@ -1220,7 +1219,7 @@ struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
might_sleep();
iommu_group_mutex_assert(master->dev);
- if (!cd_table->cdtab) {
+ if (!arm_smmu_cdtab_allocated(cd_table)) {
if (arm_smmu_alloc_cd_tables(master))
return NULL;
}
@@ -1236,7 +1235,7 @@ struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc))
return NULL;
- l1ptr = cd_table->cdtab + idx * CTXDESC_L1_DESC_DWORDS;
+ l1ptr = &cd_table->cdtab.l1_desc[idx];
arm_smmu_write_cd_l1_desc(l1ptr, l1_desc);
/* An invalid L1CD can be cached */
arm_smmu_sync_cd(master, ssid, false);
@@ -1342,7 +1341,7 @@ void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid)
struct arm_smmu_cd target = {};
struct arm_smmu_cd *cdptr;
- if (!master->cd_table.cdtab)
+ if (!arm_smmu_cdtab_allocated(&master->cd_table))
return;
cdptr = arm_smmu_get_cd_ptr(master, ssid);
if (WARN_ON(!cdptr))
@@ -1352,8 +1351,6 @@ void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid)
static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
{
- int ret;
- size_t l1size;
size_t max_contexts;
struct arm_smmu_device *smmu = master->smmu;
struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
@@ -1366,8 +1363,14 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
cd_table->s1fmt = STRTAB_STE_0_S1FMT_LINEAR;
cd_table->num_l1_ents = max_contexts;
- l1size = max_contexts * (CTXDESC_CD_DWORDS << 3);
+ cd_table->cdtab.linear = dmam_alloc_coherent(
+ smmu->dev, max_contexts * sizeof(struct arm_smmu_cd),
+ &cd_table->cdtab_dma, GFP_KERNEL);
+ if (!cd_table->cdtab.linear)
+ return -ENOMEM;
} else {
+ size_t l1size;
+
cd_table->s1fmt = STRTAB_STE_0_S1FMT_64K_L2;
cd_table->num_l1_ents = DIV_ROUND_UP(max_contexts,
CTXDESC_L2_ENTRIES);
@@ -1379,24 +1382,15 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
return -ENOMEM;
l1size = cd_table->num_l1_ents * (CTXDESC_L1_DESC_DWORDS << 3);
+ cd_table->cdtab.l1_desc = dmam_alloc_coherent(
+ smmu->dev, l1size, &cd_table->cdtab_dma, GFP_KERNEL);
+ if (!cd_table->cdtab.l1_desc) {
+ devm_kfree(smmu->dev, cd_table->l1_desc);
+ cd_table->l1_desc = NULL;
+ return -ENOMEM;
+ }
}
-
- cd_table->cdtab = dmam_alloc_coherent(smmu->dev, l1size, &cd_table->cdtab_dma,
- GFP_KERNEL);
- if (!cd_table->cdtab) {
- dev_warn(smmu->dev, "failed to allocate context descriptor\n");
- ret = -ENOMEM;
- goto err_free_l1;
- }
-
return 0;
-
-err_free_l1:
- if (cd_table->l1_desc) {
- devm_kfree(smmu->dev, cd_table->l1_desc);
- cd_table->l1_desc = NULL;
- }
- return ret;
}
static void arm_smmu_free_cd_tables(struct arm_smmu_master *master)
@@ -1407,7 +1401,7 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_master *master)
struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
if (cd_table->l1_desc) {
- size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3);
+ size = CTXDESC_L2_ENTRIES * sizeof(struct arm_smmu_cd);
for (i = 0; i < cd_table->num_l1_ents; i++) {
if (!cd_table->l1_desc[i].l2ptr)
@@ -1421,13 +1415,17 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_master *master)
cd_table->l1_desc = NULL;
l1size = cd_table->num_l1_ents * (CTXDESC_L1_DESC_DWORDS << 3);
+ dmam_free_coherent(smmu->dev, l1size, cd_table->cdtab.l1_desc,
+ cd_table->cdtab_dma);
} else {
- l1size = cd_table->num_l1_ents * (CTXDESC_CD_DWORDS << 3);
+ dmam_free_coherent(smmu->dev,
+ cd_table->num_l1_ents *
+ sizeof(struct arm_smmu_cd),
+ cd_table->cdtab.linear, cd_table->cdtab_dma);
}
- dmam_free_coherent(smmu->dev, l1size, cd_table->cdtab, cd_table->cdtab_dma);
cd_table->cdtab_dma = 0;
- cd_table->cdtab = NULL;
+ cd_table->cdtab.linear = NULL;
}
bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd)
@@ -2955,7 +2953,7 @@ static void arm_smmu_release_device(struct device *dev)
arm_smmu_disable_pasid(master);
arm_smmu_remove_master(master);
- if (master->cd_table.cdtab)
+ if (arm_smmu_cdtab_allocated(&master->cd_table))
arm_smmu_free_cd_tables(master);
kfree(master);
}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 280a04bfb7230c..21c1acf34dd29c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -279,10 +279,8 @@ struct arm_smmu_ste {
#define CTXDESC_L1_DESC_V (1UL << 0)
#define CTXDESC_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 12)
-#define CTXDESC_CD_DWORDS 8
-
struct arm_smmu_cd {
- __le64 data[CTXDESC_CD_DWORDS];
+ __le64 data[8];
};
#define CTXDESC_CD_0_TCR_T0SZ GENMASK_ULL(5, 0)
@@ -312,7 +310,7 @@ struct arm_smmu_cd {
* When the SMMU only supports linear context descriptor tables, pick a
* reasonable size limit (64kB).
*/
-#define CTXDESC_LINEAR_CDMAX ilog2(SZ_64K / (CTXDESC_CD_DWORDS << 3))
+#define CTXDESC_LINEAR_CDMAX ilog2(SZ_64K / sizeof(struct arm_smmu_cd))
/* Command queue */
#define CMDQ_ENT_SZ_SHIFT 4
@@ -593,7 +591,10 @@ struct arm_smmu_l1_ctx_desc {
};
struct arm_smmu_ctx_desc_cfg {
- __le64 *cdtab;
+ union {
+ struct arm_smmu_cd *linear;
+ __le64 *l1_desc;
+ } cdtab;
dma_addr_t cdtab_dma;
struct arm_smmu_l1_ctx_desc *l1_desc;
unsigned int num_l1_ents;
@@ -602,6 +603,12 @@ struct arm_smmu_ctx_desc_cfg {
u8 s1cdmax;
};
+static inline bool
+arm_smmu_cdtab_allocated(struct arm_smmu_ctx_desc_cfg *cd_table)
+{
+ return cd_table->cdtab.linear;
+}
+
struct arm_smmu_s2_cfg {
u16 vmid;
};
--
2.45.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 5/7] iommu/arm-smmu-v3: Do not use devm for the cd table allocations
2024-06-03 22:31 [PATCH 0/7] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
` (3 preceding siblings ...)
2024-06-03 22:31 ` [PATCH 4/7] iommu/arm-smmu-v3: Split struct arm_smmu_ctx_desc_cfg.cdtab Jason Gunthorpe
@ 2024-06-03 22:31 ` Jason Gunthorpe
2024-06-03 22:31 ` [PATCH 6/7] iommu/arm-smmu-v3: Shrink the cdtab l1_desc array Jason Gunthorpe
` (2 subsequent siblings)
7 siblings, 0 replies; 23+ messages in thread
From: Jason Gunthorpe @ 2024-06-03 22:31 UTC (permalink / raw)
To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
Cc: Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
The master->cd_table is entirely contained within the struct
arm_smmu_master which is guaranteed to be freed by the core code under
arm_smmu_release_device().
There is no reason to use devm here, arm_smmu_free_cd_tables() is reliably
called to free the CD related memory. Remove it and save some memory.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 45 +++++++++------------
1 file changed, 19 insertions(+), 26 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 24c286a0834445..3f2e0462433d2d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1172,13 +1172,10 @@ static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu,
{
size_t size = CTXDESC_L2_ENTRIES * sizeof(struct arm_smmu_cd);
- l1_desc->l2ptr = dmam_alloc_coherent(smmu->dev, size,
- &l1_desc->l2ptr_dma, GFP_KERNEL);
- if (!l1_desc->l2ptr) {
- dev_warn(smmu->dev,
- "failed to allocate context descriptor table\n");
+ l1_desc->l2ptr = dma_alloc_coherent(smmu->dev, size,
+ &l1_desc->l2ptr_dma, GFP_KERNEL);
+ if (!l1_desc->l2ptr)
return -ENOMEM;
- }
return 0;
}
@@ -1363,7 +1360,7 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
cd_table->s1fmt = STRTAB_STE_0_S1FMT_LINEAR;
cd_table->num_l1_ents = max_contexts;
- cd_table->cdtab.linear = dmam_alloc_coherent(
+ cd_table->cdtab.linear = dma_alloc_coherent(
smmu->dev, max_contexts * sizeof(struct arm_smmu_cd),
&cd_table->cdtab_dma, GFP_KERNEL);
if (!cd_table->cdtab.linear)
@@ -1375,17 +1372,17 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
cd_table->num_l1_ents = DIV_ROUND_UP(max_contexts,
CTXDESC_L2_ENTRIES);
- cd_table->l1_desc = devm_kcalloc(smmu->dev, cd_table->num_l1_ents,
- sizeof(*cd_table->l1_desc),
- GFP_KERNEL);
+ cd_table->l1_desc = kcalloc(cd_table->num_l1_ents,
+ sizeof(*cd_table->l1_desc),
+ GFP_KERNEL);
if (!cd_table->l1_desc)
return -ENOMEM;
l1size = cd_table->num_l1_ents * (CTXDESC_L1_DESC_DWORDS << 3);
- cd_table->cdtab.l1_desc = dmam_alloc_coherent(
+ cd_table->cdtab.l1_desc = dma_alloc_coherent(
smmu->dev, l1size, &cd_table->cdtab_dma, GFP_KERNEL);
if (!cd_table->cdtab.l1_desc) {
- devm_kfree(smmu->dev, cd_table->l1_desc);
+ kfree(cd_table->l1_desc);
cd_table->l1_desc = NULL;
return -ENOMEM;
}
@@ -1407,25 +1404,21 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_master *master)
if (!cd_table->l1_desc[i].l2ptr)
continue;
- dmam_free_coherent(smmu->dev, size,
- cd_table->l1_desc[i].l2ptr,
- cd_table->l1_desc[i].l2ptr_dma);
+ dma_free_coherent(smmu->dev, size,
+ cd_table->l1_desc[i].l2ptr,
+ cd_table->l1_desc[i].l2ptr_dma);
}
- devm_kfree(smmu->dev, cd_table->l1_desc);
- cd_table->l1_desc = NULL;
+ kfree(cd_table->l1_desc);
l1size = cd_table->num_l1_ents * (CTXDESC_L1_DESC_DWORDS << 3);
- dmam_free_coherent(smmu->dev, l1size, cd_table->cdtab.l1_desc,
- cd_table->cdtab_dma);
+ dma_free_coherent(smmu->dev, l1size, cd_table->cdtab.l1_desc,
+ cd_table->cdtab_dma);
} else {
- dmam_free_coherent(smmu->dev,
- cd_table->num_l1_ents *
- sizeof(struct arm_smmu_cd),
- cd_table->cdtab.linear, cd_table->cdtab_dma);
+ dma_free_coherent(smmu->dev,
+ cd_table->num_l1_ents *
+ sizeof(struct arm_smmu_cd),
+ cd_table->cdtab.linear, cd_table->cdtab_dma);
}
-
- cd_table->cdtab_dma = 0;
- cd_table->cdtab.linear = NULL;
}
bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd)
--
2.45.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 6/7] iommu/arm-smmu-v3: Shrink the cdtab l1_desc array
2024-06-03 22:31 [PATCH 0/7] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
` (4 preceding siblings ...)
2024-06-03 22:31 ` [PATCH 5/7] iommu/arm-smmu-v3: Do not use devm for the cd table allocations Jason Gunthorpe
@ 2024-06-03 22:31 ` Jason Gunthorpe
2024-06-04 16:14 ` Mostafa Saleh
2024-06-03 22:31 ` [PATCH 7/7] iommu/arm-smmu-v3: Use the new rb tree helpers Jason Gunthorpe
2024-06-03 22:41 ` [PATCH 0/7] Tidy some minor things in the stream table/cd table area Nicolin Chen
7 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2024-06-03 22:31 UTC (permalink / raw)
To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
Cc: Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
The top of the 2 level CD table is (at most) 1024 entries big, and two
high order allocations are required. One of __le64 which is programmed
into the HW (8k) and one of struct arm_smmu_l1_ctx_desc which holds the
CPU pointer (16k).
There are two copies of the l2ptr_dma, one is stored in the struct
arm_smmu_l1_ctx_desc, and another is encoded in the __le64 for the HW to
use. Instead of storing two copies just decode the value from the __le64.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 35 ++++++++++-----------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 -
2 files changed, 16 insertions(+), 20 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 3f2e0462433d2d..7a6c9aac4cd450 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1167,28 +1167,19 @@ static void arm_smmu_sync_cd(struct arm_smmu_master *master,
arm_smmu_cmdq_batch_submit(smmu, &cmds);
}
-static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu,
- struct arm_smmu_l1_ctx_desc *l1_desc)
+static void arm_smmu_write_cd_l1_desc(__le64 *dst, dma_addr_t l2ptr_dma)
{
- size_t size = CTXDESC_L2_ENTRIES * sizeof(struct arm_smmu_cd);
-
- l1_desc->l2ptr = dma_alloc_coherent(smmu->dev, size,
- &l1_desc->l2ptr_dma, GFP_KERNEL);
- if (!l1_desc->l2ptr)
- return -ENOMEM;
- return 0;
-}
-
-static void arm_smmu_write_cd_l1_desc(__le64 *dst,
- struct arm_smmu_l1_ctx_desc *l1_desc)
-{
- u64 val = (l1_desc->l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) |
- CTXDESC_L1_DESC_V;
+ u64 val = (l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) | CTXDESC_L1_DESC_V;
/* The HW has 64 bit atomicity with stores to the L2 CD table */
WRITE_ONCE(*dst, cpu_to_le64(val));
}
+static dma_addr_t arm_smmu_cd_l1_get_desc(__le64 *src)
+{
+ return le64_to_cpu(*src) & CTXDESC_L1_DESC_L2PTR_MASK;
+}
+
struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
u32 ssid)
{
@@ -1227,13 +1218,18 @@ struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
l1_desc = &cd_table->l1_desc[idx];
if (!l1_desc->l2ptr) {
+ dma_addr_t l2ptr_dma;
__le64 *l1ptr;
- if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc))
+ l1_desc->l2ptr = dma_alloc_coherent(
+ smmu->dev,
+ CTXDESC_L2_ENTRIES * sizeof(struct arm_smmu_cd),
+ &l2ptr_dma, GFP_KERNEL);
+ if (!l1_desc->l2ptr)
return NULL;
l1ptr = &cd_table->cdtab.l1_desc[idx];
- arm_smmu_write_cd_l1_desc(l1ptr, l1_desc);
+ arm_smmu_write_cd_l1_desc(l1ptr, l2ptr_dma);
/* An invalid L1CD can be cached */
arm_smmu_sync_cd(master, ssid, false);
}
@@ -1406,7 +1402,8 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_master *master)
dma_free_coherent(smmu->dev, size,
cd_table->l1_desc[i].l2ptr,
- cd_table->l1_desc[i].l2ptr_dma);
+ arm_smmu_cd_l1_get_desc(
+ &cd_table->cdtab.l1_desc[i]));
}
kfree(cd_table->l1_desc);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 21c1acf34dd29c..1ffe2fdfd3755f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -587,7 +587,6 @@ struct arm_smmu_ctx_desc {
struct arm_smmu_l1_ctx_desc {
struct arm_smmu_cd *l2ptr;
- dma_addr_t l2ptr_dma;
};
struct arm_smmu_ctx_desc_cfg {
--
2.45.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 7/7] iommu/arm-smmu-v3: Use the new rb tree helpers
2024-06-03 22:31 [PATCH 0/7] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
` (5 preceding siblings ...)
2024-06-03 22:31 ` [PATCH 6/7] iommu/arm-smmu-v3: Shrink the cdtab l1_desc array Jason Gunthorpe
@ 2024-06-03 22:31 ` Jason Gunthorpe
2024-06-04 16:22 ` Mostafa Saleh
2024-06-03 22:41 ` [PATCH 0/7] Tidy some minor things in the stream table/cd table area Nicolin Chen
7 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2024-06-03 22:31 UTC (permalink / raw)
To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
Cc: Michael Shavit, Nicolin Chen, patches, Ryan Roberts,
Mostafa Saleh
Since v5.12 the rbtree has gained some simplifying helpers aimed at making
rb tree users write less convoluted boiler plate code. Instead the caller
provides a single comparison function and the helpers generate the prior
open-coded stuff.
Update smmu->streams to use rb_find_add() and rb_find().
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 68 ++++++++++-----------
1 file changed, 31 insertions(+), 37 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 7a6c9aac4cd450..25bae0b05a488c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1667,26 +1667,37 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
return 0;
}
+static int arm_smmu_streams_cmp_key(const void *lhs, const struct rb_node *rhs)
+{
+ struct arm_smmu_stream *stream_rhs =
+ rb_entry(rhs, struct arm_smmu_stream, node);
+ const u32 *sid_lhs = lhs;
+
+ if (*sid_lhs < stream_rhs->id)
+ return -1;
+ if (*sid_lhs > stream_rhs->id)
+ return 1;
+ return 0;
+}
+
+static int arm_smmu_streams_cmp_node(struct rb_node *lhs,
+ const struct rb_node *rhs)
+{
+ return arm_smmu_streams_cmp_key(
+ &rb_entry(lhs, struct arm_smmu_stream, node)->id, rhs);
+}
+
static struct arm_smmu_master *
arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
{
struct rb_node *node;
- struct arm_smmu_stream *stream;
lockdep_assert_held(&smmu->streams_mutex);
- node = smmu->streams.rb_node;
- while (node) {
- stream = rb_entry(node, struct arm_smmu_stream, node);
- if (stream->id < sid)
- node = node->rb_right;
- else if (stream->id > sid)
- node = node->rb_left;
- else
- return stream->master;
- }
-
- return NULL;
+ node = rb_find(&sid, &smmu->streams, arm_smmu_streams_cmp_key);
+ if (!node)
+ return NULL;
+ return rb_entry(node, struct arm_smmu_stream, node)->master;
}
/* IRQ and event handlers */
@@ -2795,8 +2806,6 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
{
int i;
int ret = 0;
- struct arm_smmu_stream *new_stream, *cur_stream;
- struct rb_node **new_node, *parent_node = NULL;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
master->streams = kcalloc(fwspec->num_ids, sizeof(*master->streams),
@@ -2807,9 +2816,9 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
mutex_lock(&smmu->streams_mutex);
for (i = 0; i < fwspec->num_ids; i++) {
+ struct arm_smmu_stream *new_stream = &master->streams[i];
u32 sid = fwspec->ids[i];
- new_stream = &master->streams[i];
new_stream->id = sid;
new_stream->master = master;
@@ -2818,28 +2827,13 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
break;
/* Insert into SID tree */
- new_node = &(smmu->streams.rb_node);
- while (*new_node) {
- cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
- node);
- parent_node = *new_node;
- if (cur_stream->id > new_stream->id) {
- new_node = &((*new_node)->rb_left);
- } else if (cur_stream->id < new_stream->id) {
- new_node = &((*new_node)->rb_right);
- } else {
- dev_warn(master->dev,
- "stream %u already in tree\n",
- cur_stream->id);
- ret = -EINVAL;
- break;
- }
- }
- if (ret)
+ if (rb_find_add(&new_stream->node, &smmu->streams,
+ arm_smmu_streams_cmp_node)) {
+ dev_warn(master->dev, "stream %u already in tree\n",
+ sid);
+ ret = -EINVAL;
break;
-
- rb_link_node(&new_stream->node, parent_node, new_node);
- rb_insert_color(&new_stream->node, &smmu->streams);
+ }
}
if (ret) {
--
2.45.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 0/7] Tidy some minor things in the stream table/cd table area
2024-06-03 22:31 [PATCH 0/7] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
` (6 preceding siblings ...)
2024-06-03 22:31 ` [PATCH 7/7] iommu/arm-smmu-v3: Use the new rb tree helpers Jason Gunthorpe
@ 2024-06-03 22:41 ` Nicolin Chen
7 siblings, 0 replies; 23+ messages in thread
From: Nicolin Chen @ 2024-06-03 22:41 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, patches, Ryan Roberts, Mostafa Saleh
On Mon, Jun 03, 2024 at 07:31:26PM -0300, Jason Gunthorpe wrote:
> Will pointed out that two places referring to the CD/STE struct did not
> get the new types. While auditing this code a few more oddities were
> noticed.
>
> - Correct types for the linear stream/cd table pointers struct
> - Remove redundant dma_addr_t's and save some memory
> - Remove redundant devm usage
> - Use the modern rbtree API
>
> Nothing is particularly profound here, I've been sitting on these for
> awhile, enough is merged now that they can be cleanly based and are
> seperate from my other series.
>
> Jason Gunthorpe (7):
> iommu/arm-smmu-v3: Split struct arm_smmu_strtab_cfg.strtab
> iommu/arm-smmu-v3: Do not zero the strtab twice
> iommu/arm-smmu-v3: Shrink the strtab l1_desc array
> iommu/arm-smmu-v3: Split struct arm_smmu_ctx_desc_cfg.cdtab
> iommu/arm-smmu-v3: Do not use devm for the cd table allocations
> iommu/arm-smmu-v3: Shrink the cdtab l1_desc array
> iommu/arm-smmu-v3: Use the new rb tree helpers
>
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 234 +++++++++-----------
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 30 +--
> 2 files changed, 119 insertions(+), 145 deletions(-)
Didn't try something too particular for the changes, yet boot
and sanity tests work fine with this series:
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/7] iommu/arm-smmu-v3: Split struct arm_smmu_strtab_cfg.strtab
2024-06-03 22:31 ` [PATCH 1/7] iommu/arm-smmu-v3: Split struct arm_smmu_strtab_cfg.strtab Jason Gunthorpe
@ 2024-06-04 8:32 ` Nicolin Chen
2024-06-04 12:59 ` Jason Gunthorpe
2024-06-04 15:52 ` Mostafa Saleh
1 sibling, 1 reply; 23+ messages in thread
From: Nicolin Chen @ 2024-06-04 8:32 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, patches, Ryan Roberts, Mostafa Saleh
On Mon, Jun 03, 2024 at 07:31:27PM -0300, Jason Gunthorpe wrote:
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 1242a086c9f948..4769780259affc 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -612,7 +610,10 @@ struct arm_smmu_s2_cfg {
> };
>
> struct arm_smmu_strtab_cfg {
> - __le64 *strtab;
> + union {
> + struct arm_smmu_ste *linear;
> + __le64 *l1_desc;
> + } strtab;
> dma_addr_t strtab_dma;
> struct arm_smmu_strtab_l1_desc *l1_desc;
> unsigned int num_l1_ents;
It looks like we have two "l1_desc" ptrs now in the same struct:
strtab.l1_desc // raw level-1 descriptor memory
l1_desc // SW array to store level-2 descriptor memory
And it gets a bit more confusing that they even use the same error
prints in arm_smmu_init_strtab_2lvl()...
The "struct arm_smmu_strtab_l1_desc" seems to be only used at one
place in arm_smmu_init_l2_strtab(). So, how about:
struct arm_smmu_strtab_cfg {
union {
struct arm_smmu_ste *linear;
__le64 *l1_desc;
} strtab;
dma_addr_t strtab_dma;
struct arm_smmu_ste **l2ptrs;
Thanks
Nicolin
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/7] iommu/arm-smmu-v3: Split struct arm_smmu_strtab_cfg.strtab
2024-06-04 8:32 ` Nicolin Chen
@ 2024-06-04 12:59 ` Jason Gunthorpe
2024-06-04 18:28 ` Nicolin Chen
0 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2024-06-04 12:59 UTC (permalink / raw)
To: Nicolin Chen
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, patches, Ryan Roberts, Mostafa Saleh
On Tue, Jun 04, 2024 at 01:32:20AM -0700, Nicolin Chen wrote:
> On Mon, Jun 03, 2024 at 07:31:27PM -0300, Jason Gunthorpe wrote:
>
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> > index 1242a086c9f948..4769780259affc 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> > @@ -612,7 +610,10 @@ struct arm_smmu_s2_cfg {
> > };
> >
> > struct arm_smmu_strtab_cfg {
> > - __le64 *strtab;
> > + union {
> > + struct arm_smmu_ste *linear;
> > + __le64 *l1_desc;
> > + } strtab;
> > dma_addr_t strtab_dma;
> > struct arm_smmu_strtab_l1_desc *l1_desc;
> > unsigned int num_l1_ents;
>
> It looks like we have two "l1_desc" ptrs now in the same struct:
> strtab.l1_desc // raw level-1 descriptor memory
> l1_desc // SW array to store level-2 descriptor memory
>
> And it gets a bit more confusing that they even use the same error
> prints in arm_smmu_init_strtab_2lvl()...
Yeah, I noticed that too, but failed to come with better names.. The
CD has the same issue
strtab.l1_desc is a pointer to the data structure that the HW fetches
that is the first level of a 2 level strtab, it stores an encoded
dma_addr_t.
cfg.l1_desc is an array of CPU information for each HW L1 entry,
eventually just being the CPU pointer to the L2 STE table.
So they are both the l1 array, just one is a CPU pointer and one is a
HW/DMA pointer.
Let's call strtab.l1_desc --> strtab.l1_table ?
> The "struct arm_smmu_strtab_l1_desc" seems to be only used at one
> place in arm_smmu_init_l2_strtab(). So, how about:
I didn't do it but, it would make some of the maths more obvious
if we encoded the table structure in the types:
struct arm_smmu_strtab_l2_stes {
struct arm_smmu_ste l2[256];
};
Jason
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/7] iommu/arm-smmu-v3: Split struct arm_smmu_strtab_cfg.strtab
2024-06-03 22:31 ` [PATCH 1/7] iommu/arm-smmu-v3: Split struct arm_smmu_strtab_cfg.strtab Jason Gunthorpe
2024-06-04 8:32 ` Nicolin Chen
@ 2024-06-04 15:52 ` Mostafa Saleh
2024-06-05 23:51 ` Jason Gunthorpe
1 sibling, 1 reply; 23+ messages in thread
From: Mostafa Saleh @ 2024-06-04 15:52 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, Nicolin Chen, patches, Ryan Roberts
Hi Jason,
On Mon, Jun 03, 2024 at 07:31:27PM -0300, Jason Gunthorpe wrote:
> This is being used as both an array of STEs and an array of L1
> descriptors.
>
> Give the two usages different names and correct types.
>
> Remove STRTAB_STE_DWORDS as most usages were indexing or sizing an array
> of struct arm_smmu_ste.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 21 +++++++++------------
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 9 +++++----
> 2 files changed, 14 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index ab415e107054c1..6b4f1a664288db 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1661,8 +1661,8 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
> if (desc->l2ptr)
> return 0;
>
> - size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3);
> - strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS];
I believe also STRTAB_L1_DESC_DWORDS isn’t needed as l1_desc has the same size.
Especially I already see sizeof(*cfg->l1_desc) used in some places instead of the macro.
So we can remove it also as STRTAB_STE_DWORDS.
> + size = (1 << STRTAB_SPLIT) * sizeof(struct arm_smmu_ste);
> + strtab = &cfg->strtab.l1_desc[sid >> STRTAB_SPLIT];
>
> desc->span = STRTAB_SPLIT + 1;
> desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma,
> @@ -2409,8 +2409,7 @@ arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid)
> return &cfg->l1_desc[idx1].l2ptr[idx2];
> } else {
> /* Simple linear lookup */
> - return (struct arm_smmu_ste *)&cfg
> - ->strtab[sid * STRTAB_STE_DWORDS];
> + return &cfg->strtab.linear[sid];
> }
> }
>
> @@ -3225,17 +3224,15 @@ static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu)
> {
> unsigned int i;
> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> - void *strtab = smmu->strtab_cfg.strtab;
>
> cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents,
> sizeof(*cfg->l1_desc), GFP_KERNEL);
> if (!cfg->l1_desc)
> return -ENOMEM;
>
> - for (i = 0; i < cfg->num_l1_ents; ++i) {
> - arm_smmu_write_strtab_l1_desc(strtab, &cfg->l1_desc[i]);
> - strtab += STRTAB_L1_DESC_DWORDS << 3;
> - }
> + for (i = 0; i < cfg->num_l1_ents; ++i)
> + arm_smmu_write_strtab_l1_desc(
> + &smmu->strtab_cfg.strtab.l1_desc[i], &cfg->l1_desc[i]);
>
> return 0;
> }
> @@ -3267,7 +3264,7 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
> l1size);
> return -ENOMEM;
> }
> - cfg->strtab = strtab;
> + cfg->strtab.l1_desc = strtab;
>
> /* Configure strtab_base_cfg for 2 levels */
> reg = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL);
> @@ -3285,7 +3282,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
> u32 size;
> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
>
> - size = (1 << smmu->sid_bits) * (STRTAB_STE_DWORDS << 3);
> + size = (1 << smmu->sid_bits) * sizeof(cfg->strtab.linear[0]);
nit: maybe be consistent with "sizeof(struct arm_smmu_ste)" which was used earlier
this patch and "sizeof(cfg->strtab.linear[0])"
> strtab = dmam_alloc_coherent(smmu->dev, size, &cfg->strtab_dma,
> GFP_KERNEL);
> if (!strtab) {
> @@ -3294,7 +3291,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
> size);
> return -ENOMEM;
> }
> - cfg->strtab = strtab;
> + cfg->strtab.linear = strtab;
> cfg->num_l1_ents = 1 << smmu->sid_bits;
>
> /* Configure strtab_base_cfg for a linear table covering all SIDs */
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 1242a086c9f948..4769780259affc 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -206,10 +206,8 @@
> #define STRTAB_L1_DESC_SPAN GENMASK_ULL(4, 0)
> #define STRTAB_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 6)
>
> -#define STRTAB_STE_DWORDS 8
> -
> struct arm_smmu_ste {
> - __le64 data[STRTAB_STE_DWORDS];
> + __le64 data[8];
> };
>
> #define STRTAB_STE_0_V (1UL << 0)
> @@ -612,7 +610,10 @@ struct arm_smmu_s2_cfg {
> };
>
> struct arm_smmu_strtab_cfg {
> - __le64 *strtab;
> + union {
> + struct arm_smmu_ste *linear;
> + __le64 *l1_desc;
I agree with Nicolin, that it is confusing to have both l1_desc,
I guess a rename is sufficient.
> + } strtab;
> dma_addr_t strtab_dma;
> struct arm_smmu_strtab_l1_desc *l1_desc;
> unsigned int num_l1_ents;
> --
> 2.45.2
>
Thanks,
Mostafa
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/7] iommu/arm-smmu-v3: Do not zero the strtab twice
2024-06-03 22:31 ` [PATCH 2/7] iommu/arm-smmu-v3: Do not zero the strtab twice Jason Gunthorpe
@ 2024-06-04 15:56 ` Mostafa Saleh
2024-06-05 21:22 ` Jason Gunthorpe
0 siblings, 1 reply; 23+ messages in thread
From: Mostafa Saleh @ 2024-06-04 15:56 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, Nicolin Chen, patches, Ryan Roberts
Hi Jason,
On Mon, Jun 03, 2024 at 07:31:28PM -0300, Jason Gunthorpe wrote:
> dmam_alloc_coherent() already returns zero'd memory so cfg->strtab.l1_desc
> (the list of DMA addresses for the L2 entries) is already zero'd.
>
> arm_smmu_init_l1_strtab() goes through and calls
> arm_smmu_write_strtab_l1_desc() on the newly allocated (and zero'd) struct
> arm_smmu_strtab_l1_desc, which ends up computing 'val = 0' and zeroing it
> again.
>
> Remove arm_smmu_init_l1_strtab() and just call devm_kcalloc() from
> arm_smmu_init_strtab_2lvl to allocate the companion struct.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Looking at the code for dmam_alloc_coherent(basically dma_alloc_coherent)
I see that the memory is zeroed for both DMA direct and IOMMU, however
I don’t see that documented (in DMA-API.txt).
Assuming that’s guaranteed to be zeroed (maybe we should update the docs
if I am not missing something)
Reviewed-by: Mostafa Saleh <smostafa@google.com>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 27 +++++++--------------
> 1 file changed, 9 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 6b4f1a664288db..d27dd0600bf1df 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3220,23 +3220,6 @@ static int arm_smmu_init_queues(struct arm_smmu_device *smmu)
> PRIQ_ENT_DWORDS, "priq");
> }
>
> -static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu)
> -{
> - unsigned int i;
> - struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> -
> - cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents,
> - sizeof(*cfg->l1_desc), GFP_KERNEL);
> - if (!cfg->l1_desc)
> - return -ENOMEM;
> -
> - for (i = 0; i < cfg->num_l1_ents; ++i)
> - arm_smmu_write_strtab_l1_desc(
> - &smmu->strtab_cfg.strtab.l1_desc[i], &cfg->l1_desc[i]);
> -
> - return 0;
> -}
> -
> static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
> {
> void *strtab;
> @@ -3272,7 +3255,15 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
> reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT);
> cfg->strtab_base_cfg = reg;
>
> - return arm_smmu_init_l1_strtab(smmu);
> + cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents,
> + sizeof(*cfg->l1_desc), GFP_KERNEL);
> + if (!cfg->l1_desc) {
> + dev_err(smmu->dev,
> + "failed to allocate l1 stream table (%zu bytes)\n",
> + cfg->num_l1_ents * sizeof(*cfg->l1_desc));
> + return -ENOMEM;
> + }
> + return 0;
> }
>
> static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
> --
> 2.45.2
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/7] iommu/arm-smmu-v3: Shrink the strtab l1_desc array
2024-06-03 22:31 ` [PATCH 3/7] iommu/arm-smmu-v3: Shrink the strtab l1_desc array Jason Gunthorpe
@ 2024-06-04 16:01 ` Mostafa Saleh
0 siblings, 0 replies; 23+ messages in thread
From: Mostafa Saleh @ 2024-06-04 16:01 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, Nicolin Chen, patches, Ryan Roberts
Hi Jason,
On Mon, Jun 03, 2024 at 07:31:29PM -0300, Jason Gunthorpe wrote:
> The top of the 2 level stream table is (at most) 128k entries big, and two
> high order allocations are required. One of __le64 which is programmed
> into the HW (1M), and one of struct arm_smmu_strtab_l1_desc which holds
> the CPU pointer (3M).
>
> There is no reason to store the l2ptr_dma as nothing reads it. devm stores
> a copy of it and the DMA memory will be freed via devm mechanisms. span is
> a constant of 8+1. Remove both.
>
This caught my eye before, I imagine(although I was not there) there was some
thought about having different spans per SMMUs maybe, but that’s not the case.
> This removes 16 bytes from each arm_smmu_l1_ctx_desc and saves up to 2M of
> memory per iommu instance.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 14 +++++++-------
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 ---
> 2 files changed, 7 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index d27dd0600bf1df..735dd9ff61890e 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1448,12 +1448,12 @@ bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd)
>
> /* Stream table manipulation functions */
> static void
> -arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
> +arm_smmu_write_strtab_l1_desc(__le64 *dst, dma_addr_t l2ptr_dma, u8 span)
> {
> u64 val = 0;
>
> - val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span);
> - val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
> + val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, span);
> + val |= l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
>
> /* The HW has 64 bit atomicity with stores to the L2 STE table */
> WRITE_ONCE(*dst, cpu_to_le64(val));
> @@ -1655,6 +1655,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
> {
> size_t size;
> void *strtab;
> + dma_addr_t l2ptr_dma;
> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT];
>
> @@ -1664,9 +1665,8 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
> size = (1 << STRTAB_SPLIT) * sizeof(struct arm_smmu_ste);
> strtab = &cfg->strtab.l1_desc[sid >> STRTAB_SPLIT];
>
> - desc->span = STRTAB_SPLIT + 1;
> - desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma,
> - GFP_KERNEL);
> + desc->l2ptr =
> + dmam_alloc_coherent(smmu->dev, size, &l2ptr_dma, GFP_KERNEL);
> if (!desc->l2ptr) {
> dev_err(smmu->dev,
> "failed to allocate l2 stream table for SID %u\n",
> @@ -1675,7 +1675,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
> }
>
> arm_smmu_init_initial_stes(desc->l2ptr, 1 << STRTAB_SPLIT);
> - arm_smmu_write_strtab_l1_desc(strtab, desc);
> + arm_smmu_write_strtab_l1_desc(strtab, l2ptr_dma, STRTAB_SPLIT + 1);
> return 0;
> }
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 4769780259affc..280a04bfb7230c 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -577,10 +577,7 @@ struct arm_smmu_priq {
>
> /* High-level stream table and context descriptor structures */
> struct arm_smmu_strtab_l1_desc {
> - u8 span;
> -
> struct arm_smmu_ste *l2ptr;
> - dma_addr_t l2ptr_dma;
> };
>
> struct arm_smmu_ctx_desc {
> --
> 2.45.2
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 4/7] iommu/arm-smmu-v3: Split struct arm_smmu_ctx_desc_cfg.cdtab
2024-06-03 22:31 ` [PATCH 4/7] iommu/arm-smmu-v3: Split struct arm_smmu_ctx_desc_cfg.cdtab Jason Gunthorpe
@ 2024-06-04 16:07 ` Mostafa Saleh
2024-06-06 23:59 ` Jason Gunthorpe
0 siblings, 1 reply; 23+ messages in thread
From: Mostafa Saleh @ 2024-06-04 16:07 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, Nicolin Chen, patches, Ryan Roberts
Hi Jason,
On Mon, Jun 03, 2024 at 07:31:30PM -0300, Jason Gunthorpe wrote:
> This is being used as both an array of CDs and an array of L1 descriptors.
>
> Give the two usages different names and correct types.
>
> Remove CTXDESC_CD_DWORDS as most usages were indexing or sizing an array
> of struct arm_smmu_cd.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 60 ++++++++++-----------
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 17 ++++--
> 2 files changed, 41 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 735dd9ff61890e..24c286a0834445 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1170,7 +1170,7 @@ static void arm_smmu_sync_cd(struct arm_smmu_master *master,
> static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu,
> struct arm_smmu_l1_ctx_desc *l1_desc)
> {
> - size_t size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3);
> + size_t size = CTXDESC_L2_ENTRIES * sizeof(struct arm_smmu_cd);
>
> l1_desc->l2ptr = dmam_alloc_coherent(smmu->dev, size,
> &l1_desc->l2ptr_dma, GFP_KERNEL);
> @@ -1198,12 +1198,11 @@ struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
> struct arm_smmu_l1_ctx_desc *l1_desc;
> struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
>
> - if (!cd_table->cdtab)
> + if (!arm_smmu_cdtab_allocated(cd_table))
> return NULL;
>
> if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_LINEAR)
> - return (struct arm_smmu_cd *)(cd_table->cdtab +
> - ssid * CTXDESC_CD_DWORDS);
> + return &cd_table->cdtab.linear[ssid];
>
> l1_desc = &cd_table->l1_desc[ssid / CTXDESC_L2_ENTRIES];
> if (!l1_desc->l2ptr)
> @@ -1220,7 +1219,7 @@ struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
> might_sleep();
> iommu_group_mutex_assert(master->dev);
>
> - if (!cd_table->cdtab) {
> + if (!arm_smmu_cdtab_allocated(cd_table)) {
> if (arm_smmu_alloc_cd_tables(master))
> return NULL;
> }
> @@ -1236,7 +1235,7 @@ struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
> if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc))
> return NULL;
>
> - l1ptr = cd_table->cdtab + idx * CTXDESC_L1_DESC_DWORDS;
Similarly to patch 1, I believe we can get rid of CTXDESC_L1_DESC_DWORDS entirely.
> + l1ptr = &cd_table->cdtab.l1_desc[idx];
> arm_smmu_write_cd_l1_desc(l1ptr, l1_desc);
> /* An invalid L1CD can be cached */
> arm_smmu_sync_cd(master, ssid, false);
> @@ -1342,7 +1341,7 @@ void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid)
> struct arm_smmu_cd target = {};
> struct arm_smmu_cd *cdptr;
>
> - if (!master->cd_table.cdtab)
> + if (!arm_smmu_cdtab_allocated(&master->cd_table))
> return;
> cdptr = arm_smmu_get_cd_ptr(master, ssid);
> if (WARN_ON(!cdptr))
> @@ -1352,8 +1351,6 @@ void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid)
>
> static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
> {
> - int ret;
> - size_t l1size;
> size_t max_contexts;
> struct arm_smmu_device *smmu = master->smmu;
> struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
> @@ -1366,8 +1363,14 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
> cd_table->s1fmt = STRTAB_STE_0_S1FMT_LINEAR;
> cd_table->num_l1_ents = max_contexts;
>
> - l1size = max_contexts * (CTXDESC_CD_DWORDS << 3);
> + cd_table->cdtab.linear = dmam_alloc_coherent(
> + smmu->dev, max_contexts * sizeof(struct arm_smmu_cd),
> + &cd_table->cdtab_dma, GFP_KERNEL);
> + if (!cd_table->cdtab.linear)
> + return -ENOMEM;
> } else {
> + size_t l1size;
> +
> cd_table->s1fmt = STRTAB_STE_0_S1FMT_64K_L2;
> cd_table->num_l1_ents = DIV_ROUND_UP(max_contexts,
> CTXDESC_L2_ENTRIES);
> @@ -1379,24 +1382,15 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
> return -ENOMEM;
>
> l1size = cd_table->num_l1_ents * (CTXDESC_L1_DESC_DWORDS << 3);
> + cd_table->cdtab.l1_desc = dmam_alloc_coherent(
> + smmu->dev, l1size, &cd_table->cdtab_dma, GFP_KERNEL);
> + if (!cd_table->cdtab.l1_desc) {
> + devm_kfree(smmu->dev, cd_table->l1_desc);
> + cd_table->l1_desc = NULL;
> + return -ENOMEM;
> + }
> }
> -
> - cd_table->cdtab = dmam_alloc_coherent(smmu->dev, l1size, &cd_table->cdtab_dma,
> - GFP_KERNEL);
> - if (!cd_table->cdtab) {
> - dev_warn(smmu->dev, "failed to allocate context descriptor\n");
> - ret = -ENOMEM;
> - goto err_free_l1;
> - }
> -
> return 0;
> -
> -err_free_l1:
> - if (cd_table->l1_desc) {
> - devm_kfree(smmu->dev, cd_table->l1_desc);
> - cd_table->l1_desc = NULL;
> - }
> - return ret;
> }
>
> static void arm_smmu_free_cd_tables(struct arm_smmu_master *master)
> @@ -1407,7 +1401,7 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_master *master)
> struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
>
> if (cd_table->l1_desc) {
> - size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3);
> + size = CTXDESC_L2_ENTRIES * sizeof(struct arm_smmu_cd);
>
> for (i = 0; i < cd_table->num_l1_ents; i++) {
> if (!cd_table->l1_desc[i].l2ptr)
> @@ -1421,13 +1415,17 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_master *master)
> cd_table->l1_desc = NULL;
>
> l1size = cd_table->num_l1_ents * (CTXDESC_L1_DESC_DWORDS << 3);
> + dmam_free_coherent(smmu->dev, l1size, cd_table->cdtab.l1_desc,
> + cd_table->cdtab_dma);
> } else {
> - l1size = cd_table->num_l1_ents * (CTXDESC_CD_DWORDS << 3);
> + dmam_free_coherent(smmu->dev,
> + cd_table->num_l1_ents *
> + sizeof(struct arm_smmu_cd),
> + cd_table->cdtab.linear, cd_table->cdtab_dma);
> }
>
> - dmam_free_coherent(smmu->dev, l1size, cd_table->cdtab, cd_table->cdtab_dma);
> cd_table->cdtab_dma = 0;
> - cd_table->cdtab = NULL;
> + cd_table->cdtab.linear = NULL;
nit: AFAIU, arm_smmu_cdtab_allocated() was added to check the cd table existence
regardless the config.
So, I think we should be consistent, as here it is set as linear, while it can be 2lvl.
> }
>
> bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd)
> @@ -2955,7 +2953,7 @@ static void arm_smmu_release_device(struct device *dev)
>
> arm_smmu_disable_pasid(master);
> arm_smmu_remove_master(master);
> - if (master->cd_table.cdtab)
> + if (arm_smmu_cdtab_allocated(&master->cd_table))
> arm_smmu_free_cd_tables(master);
> kfree(master);
> }
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 280a04bfb7230c..21c1acf34dd29c 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -279,10 +279,8 @@ struct arm_smmu_ste {
> #define CTXDESC_L1_DESC_V (1UL << 0)
> #define CTXDESC_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 12)
>
> -#define CTXDESC_CD_DWORDS 8
> -
> struct arm_smmu_cd {
> - __le64 data[CTXDESC_CD_DWORDS];
> + __le64 data[8];
> };
>
> #define CTXDESC_CD_0_TCR_T0SZ GENMASK_ULL(5, 0)
> @@ -312,7 +310,7 @@ struct arm_smmu_cd {
> * When the SMMU only supports linear context descriptor tables, pick a
> * reasonable size limit (64kB).
> */
> -#define CTXDESC_LINEAR_CDMAX ilog2(SZ_64K / (CTXDESC_CD_DWORDS << 3))
> +#define CTXDESC_LINEAR_CDMAX ilog2(SZ_64K / sizeof(struct arm_smmu_cd))
>
> /* Command queue */
> #define CMDQ_ENT_SZ_SHIFT 4
> @@ -593,7 +591,10 @@ struct arm_smmu_l1_ctx_desc {
> };
>
> struct arm_smmu_ctx_desc_cfg {
> - __le64 *cdtab;
> + union {
> + struct arm_smmu_cd *linear;
> + __le64 *l1_desc;
As patch-1 also, I think naming is confusing.
> + } cdtab;
> dma_addr_t cdtab_dma;
> struct arm_smmu_l1_ctx_desc *l1_desc;
> unsigned int num_l1_ents;
> @@ -602,6 +603,12 @@ struct arm_smmu_ctx_desc_cfg {
> u8 s1cdmax;
> };
>
> +static inline bool
> +arm_smmu_cdtab_allocated(struct arm_smmu_ctx_desc_cfg *cd_table)
> +{
> + return cd_table->cdtab.linear;
> +}
> +
> struct arm_smmu_s2_cfg {
> u16 vmid;
> };
> --
> 2.45.2
>
Thanks,
Mostafa
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 6/7] iommu/arm-smmu-v3: Shrink the cdtab l1_desc array
2024-06-03 22:31 ` [PATCH 6/7] iommu/arm-smmu-v3: Shrink the cdtab l1_desc array Jason Gunthorpe
@ 2024-06-04 16:14 ` Mostafa Saleh
0 siblings, 0 replies; 23+ messages in thread
From: Mostafa Saleh @ 2024-06-04 16:14 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, Nicolin Chen, patches, Ryan Roberts
Hi Jason,
On Mon, Jun 03, 2024 at 07:31:32PM -0300, Jason Gunthorpe wrote:
> The top of the 2 level CD table is (at most) 1024 entries big, and two
> high order allocations are required. One of __le64 which is programmed
> into the HW (8k) and one of struct arm_smmu_l1_ctx_desc which holds the
> CPU pointer (16k).
>
> There are two copies of the l2ptr_dma, one is stored in the struct
> arm_smmu_l1_ctx_desc, and another is encoded in the __le64 for the HW to
> use. Instead of storing two copies just decode the value from the __le64.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 35 ++++++++++-----------
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 -
> 2 files changed, 16 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 3f2e0462433d2d..7a6c9aac4cd450 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1167,28 +1167,19 @@ static void arm_smmu_sync_cd(struct arm_smmu_master *master,
> arm_smmu_cmdq_batch_submit(smmu, &cmds);
> }
>
> -static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu,
> - struct arm_smmu_l1_ctx_desc *l1_desc)
> +static void arm_smmu_write_cd_l1_desc(__le64 *dst, dma_addr_t l2ptr_dma)
> {
> - size_t size = CTXDESC_L2_ENTRIES * sizeof(struct arm_smmu_cd);
> -
> - l1_desc->l2ptr = dma_alloc_coherent(smmu->dev, size,
> - &l1_desc->l2ptr_dma, GFP_KERNEL);
> - if (!l1_desc->l2ptr)
> - return -ENOMEM;
> - return 0;
> -}
> -
> -static void arm_smmu_write_cd_l1_desc(__le64 *dst,
> - struct arm_smmu_l1_ctx_desc *l1_desc)
> -{
> - u64 val = (l1_desc->l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) |
> - CTXDESC_L1_DESC_V;
> + u64 val = (l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) | CTXDESC_L1_DESC_V;
>
> /* The HW has 64 bit atomicity with stores to the L2 CD table */
> WRITE_ONCE(*dst, cpu_to_le64(val));
> }
>
> +static dma_addr_t arm_smmu_cd_l1_get_desc(__le64 *src)
> +{
> + return le64_to_cpu(*src) & CTXDESC_L1_DESC_L2PTR_MASK;
> +}
> +
> struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
> u32 ssid)
> {
> @@ -1227,13 +1218,18 @@ struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
>
> l1_desc = &cd_table->l1_desc[idx];
> if (!l1_desc->l2ptr) {
> + dma_addr_t l2ptr_dma;
> __le64 *l1ptr;
>
> - if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc))
> + l1_desc->l2ptr = dma_alloc_coherent(
> + smmu->dev,
> + CTXDESC_L2_ENTRIES * sizeof(struct arm_smmu_cd),
> + &l2ptr_dma, GFP_KERNEL);
> + if (!l1_desc->l2ptr)
> return NULL;
>
> l1ptr = &cd_table->cdtab.l1_desc[idx];
> - arm_smmu_write_cd_l1_desc(l1ptr, l1_desc);
> + arm_smmu_write_cd_l1_desc(l1ptr, l2ptr_dma);
> /* An invalid L1CD can be cached */
> arm_smmu_sync_cd(master, ssid, false);
> }
> @@ -1406,7 +1402,8 @@ static void arm_smmu_free_cd_tables(struct arm_smmu_master *master)
>
> dma_free_coherent(smmu->dev, size,
> cd_table->l1_desc[i].l2ptr,
> - cd_table->l1_desc[i].l2ptr_dma);
> + arm_smmu_cd_l1_get_desc(
> + &cd_table->cdtab.l1_desc[i]));
> }
> kfree(cd_table->l1_desc);
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 21c1acf34dd29c..1ffe2fdfd3755f 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -587,7 +587,6 @@ struct arm_smmu_ctx_desc {
>
> struct arm_smmu_l1_ctx_desc {
> struct arm_smmu_cd *l2ptr;
> - dma_addr_t l2ptr_dma;
Maybe now we can also get rid of arm_smmu_l1_ctx_desc, and embed l2ptr directly
in arm_smmu_ctx_desc_cfg?
> };
>
> struct arm_smmu_ctx_desc_cfg {
> --
> 2.45.2
>
Thanks,
Mostafa
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 7/7] iommu/arm-smmu-v3: Use the new rb tree helpers
2024-06-03 22:31 ` [PATCH 7/7] iommu/arm-smmu-v3: Use the new rb tree helpers Jason Gunthorpe
@ 2024-06-04 16:22 ` Mostafa Saleh
0 siblings, 0 replies; 23+ messages in thread
From: Mostafa Saleh @ 2024-06-04 16:22 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, Nicolin Chen, patches, Ryan Roberts
Hi Jason,
On Mon, Jun 03, 2024 at 07:31:33PM -0300, Jason Gunthorpe wrote:
> Since v5.12 the rbtree has gained some simplifying helpers aimed at making
> rb tree users write less convoluted boiler plate code. Instead the caller
> provides a single comparison function and the helpers generate the prior
> open-coded stuff.
>
> Update smmu->streams to use rb_find_add() and rb_find().
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 68 ++++++++++-----------
> 1 file changed, 31 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 7a6c9aac4cd450..25bae0b05a488c 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1667,26 +1667,37 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
> return 0;
> }
>
> +static int arm_smmu_streams_cmp_key(const void *lhs, const struct rb_node *rhs)
> +{
> + struct arm_smmu_stream *stream_rhs =
> + rb_entry(rhs, struct arm_smmu_stream, node);
> + const u32 *sid_lhs = lhs;
> +
> + if (*sid_lhs < stream_rhs->id)
> + return -1;
> + if (*sid_lhs > stream_rhs->id)
> + return 1;
> + return 0;
> +}
> +
> +static int arm_smmu_streams_cmp_node(struct rb_node *lhs,
> + const struct rb_node *rhs)
> +{
> + return arm_smmu_streams_cmp_key(
> + &rb_entry(lhs, struct arm_smmu_stream, node)->id, rhs);
> +}
> +
> static struct arm_smmu_master *
> arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
> {
> struct rb_node *node;
> - struct arm_smmu_stream *stream;
>
> lockdep_assert_held(&smmu->streams_mutex);
>
> - node = smmu->streams.rb_node;
> - while (node) {
> - stream = rb_entry(node, struct arm_smmu_stream, node);
> - if (stream->id < sid)
> - node = node->rb_right;
> - else if (stream->id > sid)
> - node = node->rb_left;
> - else
> - return stream->master;
> - }
> -
> - return NULL;
> + node = rb_find(&sid, &smmu->streams, arm_smmu_streams_cmp_key);
> + if (!node)
> + return NULL;
> + return rb_entry(node, struct arm_smmu_stream, node)->master;
> }
>
> /* IRQ and event handlers */
> @@ -2795,8 +2806,6 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
> {
> int i;
> int ret = 0;
> - struct arm_smmu_stream *new_stream, *cur_stream;
> - struct rb_node **new_node, *parent_node = NULL;
> struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
>
> master->streams = kcalloc(fwspec->num_ids, sizeof(*master->streams),
> @@ -2807,9 +2816,9 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>
> mutex_lock(&smmu->streams_mutex);
> for (i = 0; i < fwspec->num_ids; i++) {
> + struct arm_smmu_stream *new_stream = &master->streams[i];
> u32 sid = fwspec->ids[i];
>
> - new_stream = &master->streams[i];
> new_stream->id = sid;
> new_stream->master = master;
>
> @@ -2818,28 +2827,13 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
> break;
>
> /* Insert into SID tree */
> - new_node = &(smmu->streams.rb_node);
> - while (*new_node) {
> - cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
> - node);
> - parent_node = *new_node;
> - if (cur_stream->id > new_stream->id) {
> - new_node = &((*new_node)->rb_left);
> - } else if (cur_stream->id < new_stream->id) {
> - new_node = &((*new_node)->rb_right);
> - } else {
> - dev_warn(master->dev,
> - "stream %u already in tree\n",
> - cur_stream->id);
> - ret = -EINVAL;
> - break;
> - }
> - }
> - if (ret)
> + if (rb_find_add(&new_stream->node, &smmu->streams,
> + arm_smmu_streams_cmp_node)) {
> + dev_warn(master->dev, "stream %u already in tree\n",
> + sid);
> + ret = -EINVAL;
> break;
> -
> - rb_link_node(&new_stream->node, parent_node, new_node);
> - rb_insert_color(&new_stream->node, &smmu->streams);
> + }
> }
>
> if (ret) {
> --
> 2.45.2
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/7] iommu/arm-smmu-v3: Split struct arm_smmu_strtab_cfg.strtab
2024-06-04 12:59 ` Jason Gunthorpe
@ 2024-06-04 18:28 ` Nicolin Chen
2024-06-04 19:02 ` Jason Gunthorpe
0 siblings, 1 reply; 23+ messages in thread
From: Nicolin Chen @ 2024-06-04 18:28 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, patches, Ryan Roberts, Mostafa Saleh
On Tue, Jun 04, 2024 at 09:59:55AM -0300, Jason Gunthorpe wrote:
> On Tue, Jun 04, 2024 at 01:32:20AM -0700, Nicolin Chen wrote:
> > On Mon, Jun 03, 2024 at 07:31:27PM -0300, Jason Gunthorpe wrote:
> >
> > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> > > index 1242a086c9f948..4769780259affc 100644
> > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> > > @@ -612,7 +610,10 @@ struct arm_smmu_s2_cfg {
> > > };
> > >
> > > struct arm_smmu_strtab_cfg {
> > > - __le64 *strtab;
> > > + union {
> > > + struct arm_smmu_ste *linear;
> > > + __le64 *l1_desc;
> > > + } strtab;
> > > dma_addr_t strtab_dma;
> > > struct arm_smmu_strtab_l1_desc *l1_desc;
> > > unsigned int num_l1_ents;
> >
> > It looks like we have two "l1_desc" ptrs now in the same struct:
> > strtab.l1_desc // raw level-1 descriptor memory
> > l1_desc // SW array to store level-2 descriptor memory
> >
> > And it gets a bit more confusing that they even use the same error
> > prints in arm_smmu_init_strtab_2lvl()...
>
> Yeah, I noticed that too, but failed to come with better names.. The
> CD has the same issue
>
> strtab.l1_desc is a pointer to the data structure that the HW fetches
> that is the first level of a 2 level strtab, it stores an encoded
> dma_addr_t.
>
> cfg.l1_desc is an array of CPU information for each HW L1 entry,
> eventually just being the CPU pointer to the L2 STE table.
>
> So they are both the l1 array, just one is a CPU pointer and one is a
> HW/DMA pointer.
>
> Let's call strtab.l1_desc --> strtab.l1_table ?
Yea. This seems to be good.
> > The "struct arm_smmu_strtab_l1_desc" seems to be only used at one
> > place in arm_smmu_init_l2_strtab(). So, how about:
>
> I didn't do it but, it would make some of the maths more obvious
> if we encoded the table structure in the types:
>
> struct arm_smmu_strtab_l2_stes {
> struct arm_smmu_ste l2[256];
> };
I personally prefer this one, though why 256?
I was also thinking of an alternative by separating linear/2lvl:
struct arm_smmu_ste {
__le64 data[8];
};
struct arm_smmu_strtab_linear {
struct arm_smmu_ste *ste;
dma_addr_t ste_dma;
};
struct arm_smmu_strtab_l1_desc { // so as to drop TRTAB_L1_DESC_DWORDS
__le64 data;
};
struct arm_smmu_strtab_l2_stes {
struct arm_smmu_ste *ste;
};
struct arm_smmu_strtab_l1 {
struct arm_smmu_strtab_l1_desc *l1;
dma_addr_t l1_dma;
struct arm_smmu_strtab_l2_stes *l2;
};
struct arm_smmu_device {
...
union {
struct arm_smmu_strtab_linear linear;
struct arm_smmu_strtab_l1 l1;
} strtab;
...
};
Only arm_smmu_device_reset() really needs strtab_base/_cfg values
that we could compute them over there, given that there are quite
amount of smmu->features checking already?
Thanks
Nicolin
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/7] iommu/arm-smmu-v3: Split struct arm_smmu_strtab_cfg.strtab
2024-06-04 18:28 ` Nicolin Chen
@ 2024-06-04 19:02 ` Jason Gunthorpe
2024-06-04 19:28 ` Nicolin Chen
0 siblings, 1 reply; 23+ messages in thread
From: Jason Gunthorpe @ 2024-06-04 19:02 UTC (permalink / raw)
To: Nicolin Chen
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, patches, Ryan Roberts, Mostafa Saleh
On Tue, Jun 04, 2024 at 11:28:05AM -0700, Nicolin Chen wrote:
> On Tue, Jun 04, 2024 at 09:59:55AM -0300, Jason Gunthorpe wrote:
> > On Tue, Jun 04, 2024 at 01:32:20AM -0700, Nicolin Chen wrote:
> > > On Mon, Jun 03, 2024 at 07:31:27PM -0300, Jason Gunthorpe wrote:
> > >
> > > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> > > > index 1242a086c9f948..4769780259affc 100644
> > > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> > > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> > > > @@ -612,7 +610,10 @@ struct arm_smmu_s2_cfg {
> > > > };
> > > >
> > > > struct arm_smmu_strtab_cfg {
> > > > - __le64 *strtab;
> > > > + union {
> > > > + struct arm_smmu_ste *linear;
> > > > + __le64 *l1_desc;
> > > > + } strtab;
> > > > dma_addr_t strtab_dma;
> > > > struct arm_smmu_strtab_l1_desc *l1_desc;
> > > > unsigned int num_l1_ents;
> > >
> > > It looks like we have two "l1_desc" ptrs now in the same struct:
> > > strtab.l1_desc // raw level-1 descriptor memory
> > > l1_desc // SW array to store level-2 descriptor memory
> > >
> > > And it gets a bit more confusing that they even use the same error
> > > prints in arm_smmu_init_strtab_2lvl()...
> >
> > Yeah, I noticed that too, but failed to come with better names.. The
> > CD has the same issue
> >
> > strtab.l1_desc is a pointer to the data structure that the HW fetches
> > that is the first level of a 2 level strtab, it stores an encoded
> > dma_addr_t.
> >
> > cfg.l1_desc is an array of CPU information for each HW L1 entry,
> > eventually just being the CPU pointer to the L2 STE table.
> >
> > So they are both the l1 array, just one is a CPU pointer and one is a
> > HW/DMA pointer.
> >
> > Let's call strtab.l1_desc --> strtab.l1_table ?
>
> Yea. This seems to be good.
>
> > > The "struct arm_smmu_strtab_l1_desc" seems to be only used at one
> > > place in arm_smmu_init_l2_strtab(). So, how about:
> >
> > I didn't do it but, it would make some of the maths more obvious
> > if we encoded the table structure in the types:
> >
> > struct arm_smmu_strtab_l2_stes {
> > struct arm_smmu_ste l2[256];
> > };
>
> I personally prefer this one, though why 256?
#define STRTAB_SPLIT 8
> I was also thinking of an alternative by separating linear/2lvl:
>
> struct arm_smmu_ste {
> __le64 data[8];
> };
>
> struct arm_smmu_strtab_linear {
> struct arm_smmu_ste *ste;
> dma_addr_t ste_dma;
> };
>
> struct arm_smmu_strtab_l1_desc { // so as to drop TRTAB_L1_DESC_DWORDS
> __le64 data;
> };
>
> struct arm_smmu_strtab_l2_stes {
> struct arm_smmu_ste *ste;
> };
>
> struct arm_smmu_strtab_l1 {
> struct arm_smmu_strtab_l1_desc *l1;
num_l1_ents too
> dma_addr_t l1_dma;
> struct arm_smmu_strtab_l2_stes *l2;
> };
>
> struct arm_smmu_device {
> ...
> union {
> struct arm_smmu_strtab_linear linear;
> struct arm_smmu_strtab_l1 l1;
> } strtab;
> ...
> };
Yes! That is quite readable and understandable! I was relucant to do
much more than just the small change Will asked about, and even that
expanded.. Let me see if I can reasonably squeeze that into a small
number of patches.
> Only arm_smmu_device_reset() really needs strtab_base/_cfg values
> that we could compute them over there, given that there are quite
> amount of smmu->features checking already?
Certainly could do, but that seems to have less advantage..
Jason
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/7] iommu/arm-smmu-v3: Split struct arm_smmu_strtab_cfg.strtab
2024-06-04 19:02 ` Jason Gunthorpe
@ 2024-06-04 19:28 ` Nicolin Chen
0 siblings, 0 replies; 23+ messages in thread
From: Nicolin Chen @ 2024-06-04 19:28 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, patches, Ryan Roberts, Mostafa Saleh
On Tue, Jun 04, 2024 at 04:02:47PM -0300, Jason Gunthorpe wrote:
> On Tue, Jun 04, 2024 at 11:28:05AM -0700, Nicolin Chen wrote:
> > On Tue, Jun 04, 2024 at 09:59:55AM -0300, Jason Gunthorpe wrote:
> > > On Tue, Jun 04, 2024 at 01:32:20AM -0700, Nicolin Chen wrote:
> > > > On Mon, Jun 03, 2024 at 07:31:27PM -0300, Jason Gunthorpe wrote:
> > > > The "struct arm_smmu_strtab_l1_desc" seems to be only used at one
> > > > place in arm_smmu_init_l2_strtab(). So, how about:
> > >
> > > I didn't do it but, it would make some of the maths more obvious
> > > if we encoded the table structure in the types:
> > >
> > > struct arm_smmu_strtab_l2_stes {
> > > struct arm_smmu_ste l2[256];
> > > };
> >
> > I personally prefer this one, though why 256?
>
> #define STRTAB_SPLIT 8
Oh right! And similarly, "struct arm_smmu_cd l2[1024]".
> > struct arm_smmu_strtab_linear {
> > struct arm_smmu_ste *ste;
> > dma_addr_t ste_dma;
> > };
> > struct arm_smmu_strtab_l1 {
> > struct arm_smmu_strtab_l1_desc *l1;
>
> num_l1_ents too
Yea, missed that.
> > struct arm_smmu_device {
> > ...
> > union {
> > struct arm_smmu_strtab_linear linear;
> > struct arm_smmu_strtab_l1 l1;
> > } strtab;
> > ...
> > };
>
> Yes! That is quite readable and understandable! I was relucant to do
> much more than just the small change Will asked about, and even that
> expanded.. Let me see if I can reasonably squeeze that into a small
> number of patches.
Yea, I hesitated too at the beginning, until I saw Mostafa suggest
something further.
> > Only arm_smmu_device_reset() really needs strtab_base/_cfg values
> > that we could compute them over there, given that there are quite
> > amount of smmu->features checking already?
>
> Certainly could do, but that seems to have less advantage..
Well, either way sounds good to me. I was just considering that we
already did something similar with s1/s2/cd cfg values. Yet, not a
problem to continue storing these two.
Thanks
Nicolin
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/7] iommu/arm-smmu-v3: Do not zero the strtab twice
2024-06-04 15:56 ` Mostafa Saleh
@ 2024-06-05 21:22 ` Jason Gunthorpe
0 siblings, 0 replies; 23+ messages in thread
From: Jason Gunthorpe @ 2024-06-05 21:22 UTC (permalink / raw)
To: Mostafa Saleh
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, Nicolin Chen, patches, Ryan Roberts
On Tue, Jun 04, 2024 at 03:56:04PM +0000, Mostafa Saleh wrote:
> > arm_smmu_init_l1_strtab() goes through and calls
> > arm_smmu_write_strtab_l1_desc() on the newly allocated (and zero'd) struct
> > arm_smmu_strtab_l1_desc, which ends up computing 'val = 0' and zeroing it
> > again.
> >
> > Remove arm_smmu_init_l1_strtab() and just call devm_kcalloc() from
> > arm_smmu_init_strtab_2lvl to allocate the companion struct.
> >
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>
> Looking at the code for dmam_alloc_coherent(basically dma_alloc_coherent)
> I see that the memory is zeroed for both DMA direct and IOMMU, however
> I don’t see that documented (in DMA-API.txt).
> Assuming that’s guaranteed to be zeroed (maybe we should update the docs
> if I am not missing something)
Yeah, people have been sending clean ups removing zeroing and GFP_ZERO's
in this space for a bit now. There are many places that rely on it.
Thanks,
Jason
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/7] iommu/arm-smmu-v3: Split struct arm_smmu_strtab_cfg.strtab
2024-06-04 15:52 ` Mostafa Saleh
@ 2024-06-05 23:51 ` Jason Gunthorpe
0 siblings, 0 replies; 23+ messages in thread
From: Jason Gunthorpe @ 2024-06-05 23:51 UTC (permalink / raw)
To: Mostafa Saleh
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, Nicolin Chen, patches, Ryan Roberts
On Tue, Jun 04, 2024 at 03:52:07PM +0000, Mostafa Saleh wrote:
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index ab415e107054c1..6b4f1a664288db 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -1661,8 +1661,8 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
> > if (desc->l2ptr)
> > return 0;
> >
> > - size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3);
> > - strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS];
>
> I believe also STRTAB_L1_DESC_DWORDS isn’t needed as l1_desc has the same size.
> Especially I already see sizeof(*cfg->l1_desc) used in some places instead of the macro.
> So we can remove it also as STRTAB_STE_DWORDS.
I didn't want to do this without another type, but sure, it is not
hard. I ended up with:
struct arm_smmu_ste {
__le64 data[STRTAB_STE_DWORDS];
};
struct arm_smmu_strtab_l2 {
struct arm_smmu_ste stes[1 << STRTAB_SPLIT];
};
struct arm_smmu_strtab_l1 {
__le64 l2ptr;
};
#define STRTAB_MAX_L1_ENTRIES (1 << 17)
struct arm_smmu_strtab_cfg {
union {
struct {
struct arm_smmu_ste *table;
dma_addr_t ste_dma;
unsigned int num_ents;
} linear;
struct {
struct arm_smmu_strtab_l1 *l1tab;
struct arm_smmu_strtab_l2 **l2ptrs;
dma_addr_t l1_dma;
unsigned int num_l1_ents;
} l2;
};
u64 strtab_base;
u32 strtab_base_cfg;
};
And we drop STRTAB_STE_DWORDS and STRTAB_L1_SZ_SHIFT.
> > @@ -3285,7 +3282,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
> > u32 size;
> > struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> >
> > - size = (1 << smmu->sid_bits) * (STRTAB_STE_DWORDS << 3);
> > + size = (1 << smmu->sid_bits) * sizeof(cfg->strtab.linear[0]);
>
> nit: maybe be consistent with "sizeof(struct arm_smmu_ste)" which was used earlier
> this patch and "sizeof(cfg->strtab.linear[0])"
I went with the struct version in all cases
Thanks,
Jason
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 4/7] iommu/arm-smmu-v3: Split struct arm_smmu_ctx_desc_cfg.cdtab
2024-06-04 16:07 ` Mostafa Saleh
@ 2024-06-06 23:59 ` Jason Gunthorpe
0 siblings, 0 replies; 23+ messages in thread
From: Jason Gunthorpe @ 2024-06-06 23:59 UTC (permalink / raw)
To: Mostafa Saleh
Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
Michael Shavit, Nicolin Chen, patches, Ryan Roberts
On Tue, Jun 04, 2024 at 04:07:21PM +0000, Mostafa Saleh wrote:
> > @@ -1236,7 +1235,7 @@ struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
> > if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc))
> > return NULL;
> >
> > - l1ptr = cd_table->cdtab + idx * CTXDESC_L1_DESC_DWORDS;
>
> Similarly to patch 1, I believe we can get rid of CTXDESC_L1_DESC_DWORDS entirely.
I gave CD the same treatment, it ends up like:
struct arm_smmu_cd {
__le64 data[8];
};
struct arm_smmu_cdtab_l2 {
struct arm_smmu_cd cds[CTXDESC_L2_ENTRIES];
};
struct arm_smmu_cdtab_l1 {
__le64 l2ptr;
};
struct arm_smmu_ctx_desc_cfg {
union {
struct {
struct arm_smmu_cd *table;
unsigned int num_ents;
} linear;
struct {
struct arm_smmu_cdtab_l1 *l1tab;
struct arm_smmu_cdtab_l2 **l2ptrs;
unsigned int num_l1_ents;
} l2;
};
dma_addr_t cdtab_dma;
u8 s1fmt;
/* log2 of the maximum number of CDs supported by this table */
u8 s1cdmax;
};
And we can remove a bunch of the #defines in favour of sizeof()
> > - dmam_free_coherent(smmu->dev, l1size, cd_table->cdtab, cd_table->cdtab_dma);
> > cd_table->cdtab_dma = 0;
> > - cd_table->cdtab = NULL;
> > + cd_table->cdtab.linear = NULL;
>
> nit: AFAIU, arm_smmu_cdtab_allocated() was added to check the cd table existence
> regardless the config.
> So, I think we should be consistent, as here it is set as linear, while it can be 2lvl.
I deleted this code, the tables are allocated once and stay allocated
forever until release when the entire cd_table is deleted. No reason
to 0 memory we are about to kfree.
Thanks,
Jason
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2024-06-06 23:59 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-03 22:31 [PATCH 0/7] Tidy some minor things in the stream table/cd table area Jason Gunthorpe
2024-06-03 22:31 ` [PATCH 1/7] iommu/arm-smmu-v3: Split struct arm_smmu_strtab_cfg.strtab Jason Gunthorpe
2024-06-04 8:32 ` Nicolin Chen
2024-06-04 12:59 ` Jason Gunthorpe
2024-06-04 18:28 ` Nicolin Chen
2024-06-04 19:02 ` Jason Gunthorpe
2024-06-04 19:28 ` Nicolin Chen
2024-06-04 15:52 ` Mostafa Saleh
2024-06-05 23:51 ` Jason Gunthorpe
2024-06-03 22:31 ` [PATCH 2/7] iommu/arm-smmu-v3: Do not zero the strtab twice Jason Gunthorpe
2024-06-04 15:56 ` Mostafa Saleh
2024-06-05 21:22 ` Jason Gunthorpe
2024-06-03 22:31 ` [PATCH 3/7] iommu/arm-smmu-v3: Shrink the strtab l1_desc array Jason Gunthorpe
2024-06-04 16:01 ` Mostafa Saleh
2024-06-03 22:31 ` [PATCH 4/7] iommu/arm-smmu-v3: Split struct arm_smmu_ctx_desc_cfg.cdtab Jason Gunthorpe
2024-06-04 16:07 ` Mostafa Saleh
2024-06-06 23:59 ` Jason Gunthorpe
2024-06-03 22:31 ` [PATCH 5/7] iommu/arm-smmu-v3: Do not use devm for the cd table allocations Jason Gunthorpe
2024-06-03 22:31 ` [PATCH 6/7] iommu/arm-smmu-v3: Shrink the cdtab l1_desc array Jason Gunthorpe
2024-06-04 16:14 ` Mostafa Saleh
2024-06-03 22:31 ` [PATCH 7/7] iommu/arm-smmu-v3: Use the new rb tree helpers Jason Gunthorpe
2024-06-04 16:22 ` Mostafa Saleh
2024-06-03 22:41 ` [PATCH 0/7] Tidy some minor things in the stream table/cd table area Nicolin Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).