* [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
@ 2024-10-01 18:03 Yang Shi
2024-10-01 18:27 ` Nicolin Chen
2024-10-02 16:58 ` James Morse
0 siblings, 2 replies; 17+ messages in thread
From: Yang Shi @ 2024-10-01 18:03 UTC (permalink / raw)
To: jgg, nicolinc, will, robin.murphy; +Cc: yang, linux-arm-kernel, linux-kernel
The commit ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()")
calculated the last index of L1 stream table by 1 << smmu->sid_bits. 1
is 32 bit value.
However AmpereOne has 32-bit stream id size. This resulted in
ouf-of-bound shift. The disassembly of shift is:
ldr w2, [x19, 828] //, smmu_7(D)->sid_bits
mov w20, 1
lsl w20, w20, w2
According to ARM spec, if the registers are 32 bit, the instruction actually
does:
dest = src << (shift % 32)
So it actually shifted by zero bit.
This caused v6.12-rc1 failed to boot on AmpereOne and UBSAN also
reported:
UBSAN: shift-out-of-bounds in drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3628:29
shift exponent 32 is too large for 32-bit type 'int'
CPU: 70 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc1 #4
Hardware name: ZOLLNER SUNMOONLAKE/SunMoon Lake, BIOS 00.00. 2024-08-28 18:42:45 08/28/2024
Call trace:
dump_backtrace+0xdc/0x140
show_stack+0x20/0x40
dump_stack_lvl+0x60/0x80
dump_stack+0x18/0x28
ubsan_epilogue+0x10/0x48
__ubsan_handle_shift_out_of_bounds+0xd8/0x1a0
arm_smmu_init_structures+0x374/0x3c8
arm_smmu_device_probe+0x208/0x600
platform_probe+0x70/0xe8
really_probe+0xc8/0x3a0
__driver_probe_device+0x84/0x160
driver_probe_device+0x44/0x130
__driver_attach+0xcc/0x208
bus_for_each_dev+0x84/0x100
driver_attach+0x2c/0x40
bus_add_driver+0x158/0x290
driver_register+0x70/0x138
__platform_driver_register+0x2c/0x40
arm_smmu_driver_init+0x28/0x40
do_one_initcall+0x60/0x318
do_initcalls+0x198/0x1e0
kernel_init_freeable+0x18c/0x1e8
kernel_init+0x28/0x160
ret_from_fork+0x10/0x20
Using 64 bit immediate when doing shift can solve the problem. The
disssembly after the fix looks like:
ldr w20, [x19, 828] //, smmu_7(D)->sid_bits
mov x0, 1
lsl x0, x0, x20
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 737c5b882355..01a2faee04bc 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3625,7 +3625,7 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
u32 l1size;
struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
unsigned int last_sid_idx =
- arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
+ arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
/* Calculate the L1 size, capped to the SIDSIZE. */
cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
--
2.41.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
2024-10-01 18:03 [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne Yang Shi
@ 2024-10-01 18:27 ` Nicolin Chen
2024-10-01 19:09 ` Yang Shi
2024-10-02 16:58 ` James Morse
1 sibling, 1 reply; 17+ messages in thread
From: Nicolin Chen @ 2024-10-01 18:27 UTC (permalink / raw)
To: Yang Shi; +Cc: jgg, will, robin.murphy, linux-arm-kernel, linux-kernel
On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote:
> Using 64 bit immediate when doing shift can solve the problem. The
> disssembly after the fix looks like:
[...]
> unsigned int last_sid_idx =
> - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
> + arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
Could a 32-bit build be a corner case where UL is no longer a
"64 bit" stated in the commit message?
Also, there are other places doing "1 << smmu->sid_bits", e.g.
arm_smmu_init_strtab_linear().
Then, can ssid_bits/s1cdmax be a concern similarly?
Thanks
Nicolin
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
2024-10-01 18:27 ` Nicolin Chen
@ 2024-10-01 19:09 ` Yang Shi
2024-10-01 19:18 ` Jason Gunthorpe
2024-10-01 19:29 ` Nicolin Chen
0 siblings, 2 replies; 17+ messages in thread
From: Yang Shi @ 2024-10-01 19:09 UTC (permalink / raw)
To: Nicolin Chen; +Cc: jgg, will, robin.murphy, linux-arm-kernel, linux-kernel
On 10/1/24 11:27 AM, Nicolin Chen wrote:
> On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote:
>> Using 64 bit immediate when doing shift can solve the problem. The
>> disssembly after the fix looks like:
> [...]
>
>> unsigned int last_sid_idx =
>> - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
>> + arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
> Could a 32-bit build be a corner case where UL is no longer a
> "64 bit" stated in the commit message?
It shouldn't. Because smmu v3 depends on ARM64.
config ARM_SMMU_V3
tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
depends on ARM64
>
> Also, there are other places doing "1 << smmu->sid_bits", e.g.
> arm_smmu_init_strtab_linear().
The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl.
1UL should be used if we want to take extra caution and don't prefer
rely on compiler.
>
> Then, can ssid_bits/s1cdmax be a concern similarly?
IIUC, ssid_bits is determined by IDR1_SSIDSIZE. It is GENMASK(10, 6). So
it shouldn't be 32. IDR1_SIDSIZE is GENMASK(5, 0).
>
> Thanks
> Nicolin
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
2024-10-01 19:09 ` Yang Shi
@ 2024-10-01 19:18 ` Jason Gunthorpe
2024-10-01 19:38 ` Yang Shi
2024-10-01 19:29 ` Nicolin Chen
1 sibling, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2024-10-01 19:18 UTC (permalink / raw)
To: Yang Shi; +Cc: Nicolin Chen, will, robin.murphy, linux-arm-kernel, linux-kernel
On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
> > Also, there are other places doing "1 << smmu->sid_bits", e.g.
> > arm_smmu_init_strtab_linear().
>
> The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL
> should be used if we want to take extra caution and don't prefer rely on
> compiler.
Still, we should be fixing them all if sid_bits == 32, all those
shifts should be throwing a UBSAN error. It would be crazy to have a
32 bit linear table but I guess it is allowed?
Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
2024-10-01 19:18 ` Jason Gunthorpe
@ 2024-10-01 19:38 ` Yang Shi
2024-10-01 19:46 ` Jason Gunthorpe
0 siblings, 1 reply; 17+ messages in thread
From: Yang Shi @ 2024-10-01 19:38 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Nicolin Chen, will, robin.murphy, linux-arm-kernel, linux-kernel
On 10/1/24 12:18 PM, Jason Gunthorpe wrote:
> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>>> Also, there are other places doing "1 << smmu->sid_bits", e.g.
>>> arm_smmu_init_strtab_linear().
>> The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL
>> should be used if we want to take extra caution and don't prefer rely on
>> compiler.
> Still, we should be fixing them all if sid_bits == 32, all those
> shifts should be throwing a UBSAN error. It would be crazy to have a
OK, will cover this is v2.
> 32 bit linear table but I guess it is allowed?
I'm not smmu expert, but if sid_bits is 32, it looks like the linear
table will consume 256GB IIUC? That is crazy.
>
> Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
2024-10-01 19:38 ` Yang Shi
@ 2024-10-01 19:46 ` Jason Gunthorpe
2024-10-01 20:31 ` Yang Shi
2024-10-01 20:47 ` Yang Shi
0 siblings, 2 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2024-10-01 19:46 UTC (permalink / raw)
To: Yang Shi; +Cc: Nicolin Chen, will, robin.murphy, linux-arm-kernel, linux-kernel
On Tue, Oct 01, 2024 at 12:38:56PM -0700, Yang Shi wrote:
>
>
> On 10/1/24 12:18 PM, Jason Gunthorpe wrote:
> > On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
> > > > Also, there are other places doing "1 << smmu->sid_bits", e.g.
> > > > arm_smmu_init_strtab_linear().
> > > The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL
> > > should be used if we want to take extra caution and don't prefer rely on
> > > compiler.
> > Still, we should be fixing them all if sid_bits == 32, all those
> > shifts should be throwing a UBSAN error. It would be crazy to have a
>
> OK, will cover this is v2.
Maybe just make a little inline function to do this math and remove
the repated open coding? Then the types can be right, etc.
Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
2024-10-01 19:46 ` Jason Gunthorpe
@ 2024-10-01 20:31 ` Yang Shi
2024-10-01 20:47 ` Yang Shi
1 sibling, 0 replies; 17+ messages in thread
From: Yang Shi @ 2024-10-01 20:31 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Nicolin Chen, will, robin.murphy, linux-arm-kernel, linux-kernel
On 10/1/24 12:46 PM, Jason Gunthorpe wrote:
> On Tue, Oct 01, 2024 at 12:38:56PM -0700, Yang Shi wrote:
>>
>> On 10/1/24 12:18 PM, Jason Gunthorpe wrote:
>>> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>>>>> Also, there are other places doing "1 << smmu->sid_bits", e.g.
>>>>> arm_smmu_init_strtab_linear().
>>>> The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL
>>>> should be used if we want to take extra caution and don't prefer rely on
>>>> compiler.
>>> Still, we should be fixing them all if sid_bits == 32, all those
>>> shifts should be throwing a UBSAN error. It would be crazy to have a
>> OK, will cover this is v2.
> Maybe just make a little inline function to do this math and remove
> the repated open coding? Then the types can be right, etc.
Fine to me.
>
> Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
2024-10-01 19:46 ` Jason Gunthorpe
2024-10-01 20:31 ` Yang Shi
@ 2024-10-01 20:47 ` Yang Shi
2024-10-01 21:02 ` Jason Gunthorpe
2024-10-02 16:58 ` James Morse
1 sibling, 2 replies; 17+ messages in thread
From: Yang Shi @ 2024-10-01 20:47 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Nicolin Chen, will, robin.murphy, linux-arm-kernel, linux-kernel
On 10/1/24 12:46 PM, Jason Gunthorpe wrote:
> On Tue, Oct 01, 2024 at 12:38:56PM -0700, Yang Shi wrote:
>>
>> On 10/1/24 12:18 PM, Jason Gunthorpe wrote:
>>> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>>>>> Also, there are other places doing "1 << smmu->sid_bits", e.g.
>>>>> arm_smmu_init_strtab_linear().
>>>> The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL
>>>> should be used if we want to take extra caution and don't prefer rely on
>>>> compiler.
>>> Still, we should be fixing them all if sid_bits == 32, all those
>>> shifts should be throwing a UBSAN error. It would be crazy to have a
>> OK, will cover this is v2.
> Maybe just make a little inline function to do this math and remove
> the repated open coding? Then the types can be right, etc.
Something like this?
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 01a2faee04bc..0f3aa7962117 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3624,8 +3624,9 @@ static int arm_smmu_init_strtab_2lvl(struct
arm_smmu_device *smmu)
{
u32 l1size;
struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+ unsigned int max_sid = arm_smmu_strtab_max_sid(smmu->sid_bits);
unsigned int last_sid_idx =
- arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
+ arm_smmu_strtab_l1_idx(max_sid - 1);
/* Calculate the L1 size, capped to the SIDSIZE. */
cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
@@ -3657,8 +3658,9 @@ static int arm_smmu_init_strtab_linear(struct
arm_smmu_device *smmu)
{
u32 size;
struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
+ unsigned int max_sid = arm_smmu_strtab_max_sid(smmu->sid_bits);
- size = (1 << smmu->sid_bits) * sizeof(struct arm_smmu_ste);
+ size = max_sid * sizeof(struct arm_smmu_ste);
cfg->linear.table = dmam_alloc_coherent(smmu->dev, size,
&cfg->linear.ste_dma,
GFP_KERNEL);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 1e9952ca989f..de9f14293485 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -235,6 +235,11 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid)
return sid % STRTAB_NUM_L2_STES;
}
+static inline unsigned int arm_smmu_strtab_max_sid(unsigned int sid_bits)
+{
+ return (1UL << sid_bits);
+}
+
>
> Jason
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
2024-10-01 20:47 ` Yang Shi
@ 2024-10-01 21:02 ` Jason Gunthorpe
2024-10-01 21:32 ` Yang Shi
2024-10-02 16:58 ` James Morse
1 sibling, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2024-10-01 21:02 UTC (permalink / raw)
To: Yang Shi; +Cc: Nicolin Chen, will, robin.murphy, linux-arm-kernel, linux-kernel
On Tue, Oct 01, 2024 at 01:47:24PM -0700, Yang Shi wrote:
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 1e9952ca989f..de9f14293485 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -235,6 +235,11 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid)
> return sid % STRTAB_NUM_L2_STES;
> }
>
> +static inline unsigned int arm_smmu_strtab_max_sid(unsigned int sid_bits)
> +{
Can we take the smmu struct here instead of the int?
But yes, this looks OK
Jason
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
2024-10-01 21:02 ` Jason Gunthorpe
@ 2024-10-01 21:32 ` Yang Shi
0 siblings, 0 replies; 17+ messages in thread
From: Yang Shi @ 2024-10-01 21:32 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Nicolin Chen, will, robin.murphy, linux-arm-kernel, linux-kernel
On 10/1/24 2:02 PM, Jason Gunthorpe wrote:
> On Tue, Oct 01, 2024 at 01:47:24PM -0700, Yang Shi wrote:
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> index 1e9952ca989f..de9f14293485 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> @@ -235,6 +235,11 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid)
>> return sid % STRTAB_NUM_L2_STES;
>> }
>>
>> +static inline unsigned int arm_smmu_strtab_max_sid(unsigned int sid_bits)
>> +{
> Can we take the smmu struct here instead of the int?
No problem.
>
> But yes, this looks OK
Thank you.
>
> Jason
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
2024-10-01 20:47 ` Yang Shi
2024-10-01 21:02 ` Jason Gunthorpe
@ 2024-10-02 16:58 ` James Morse
1 sibling, 0 replies; 17+ messages in thread
From: James Morse @ 2024-10-02 16:58 UTC (permalink / raw)
To: Yang Shi, Jason Gunthorpe
Cc: Nicolin Chen, will, robin.murphy, linux-arm-kernel, linux-kernel
Hello,
On 01/10/2024 21:47, Yang Shi wrote:
> On 10/1/24 12:46 PM, Jason Gunthorpe wrote:
>> On Tue, Oct 01, 2024 at 12:38:56PM -0700, Yang Shi wrote:
>>>
>>> On 10/1/24 12:18 PM, Jason Gunthorpe wrote:
>>>> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>>>>>> Also, there are other places doing "1 << smmu->sid_bits", e.g.
>>>>>> arm_smmu_init_strtab_linear().
>>>>> The disassembly shows it uses "sbfiz x21, x20, 6, 32" instead of lsl. 1UL
>>>>> should be used if we want to take extra caution and don't prefer rely on
>>>>> compiler.
>>>> Still, we should be fixing them all if sid_bits == 32, all those
>>>> shifts should be throwing a UBSAN error. It would be crazy to have a
>>> OK, will cover this is v2.
>> Maybe just make a little inline function to do this math and remove
>> the repated open coding? Then the types can be right, etc.
>
> Something like this?
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 01a2faee04bc..0f3aa7962117 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3624,8 +3624,9 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
> {
> u32 l1size;
> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> + unsigned int max_sid = arm_smmu_strtab_max_sid(smmu->sid_bits);
> unsigned int last_sid_idx =
> - arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
> + arm_smmu_strtab_l1_idx(max_sid - 1);
>
> /* Calculate the L1 size, capped to the SIDSIZE. */
> cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
> @@ -3657,8 +3658,9 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
> {
> u32 size;
> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> + unsigned int max_sid = arm_smmu_strtab_max_sid(smmu->sid_bits);
>
> - size = (1 << smmu->sid_bits) * sizeof(struct arm_smmu_ste);
> + size = max_sid * sizeof(struct arm_smmu_ste);
> cfg->linear.table = dmam_alloc_coherent(smmu->dev, size,
> &cfg->linear.ste_dma,
> GFP_KERNEL);
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 1e9952ca989f..de9f14293485 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -235,6 +235,11 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid)
> return sid % STRTAB_NUM_L2_STES;
> }
>
> +static inline unsigned int arm_smmu_strtab_max_sid(unsigned int sid_bits)
> +{
> + return (1UL << sid_bits);
> +}
> +
On the same platform - this also fixes the issue:
Tested-by: James Morse <james.morse@arm.com>
Thanks,
James
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
2024-10-01 19:09 ` Yang Shi
2024-10-01 19:18 ` Jason Gunthorpe
@ 2024-10-01 19:29 ` Nicolin Chen
2024-10-01 19:48 ` Yang Shi
1 sibling, 1 reply; 17+ messages in thread
From: Nicolin Chen @ 2024-10-01 19:29 UTC (permalink / raw)
To: Yang Shi; +Cc: jgg, will, robin.murphy, linux-arm-kernel, linux-kernel
On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
> On 10/1/24 11:27 AM, Nicolin Chen wrote:
> > On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote:
> > > Using 64 bit immediate when doing shift can solve the problem. The
> > > disssembly after the fix looks like:
> > [...]
> >
> > > unsigned int last_sid_idx =
> > > - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
> > > + arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
> > Could a 32-bit build be a corner case where UL is no longer a
> > "64 bit" stated in the commit message?
>
> It shouldn't. Because smmu v3 depends on ARM64.
>
> config ARM_SMMU_V3
> tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
> depends on ARM64
ARM64 can have aarch32 support. I am not sure if ARM64 running a
32-bit OS can be a case though, (and not confined to AmpereOne).
> > Then, can ssid_bits/s1cdmax be a concern similarly?
>
> IIUC, ssid_bits is determined by IDR1_SSIDSIZE. It is GENMASK(10, 6). So
> it shouldn't be 32. IDR1_SIDSIZE is GENMASK(5, 0).
Rechecked the RM. Yea, max sid can be 32 but max ssid is 20 at
this moment, so we should be safe.
Thanks
Nicolin
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
2024-10-01 19:29 ` Nicolin Chen
@ 2024-10-01 19:48 ` Yang Shi
2024-10-02 9:59 ` Robin Murphy
0 siblings, 1 reply; 17+ messages in thread
From: Yang Shi @ 2024-10-01 19:48 UTC (permalink / raw)
To: Nicolin Chen; +Cc: jgg, will, robin.murphy, linux-arm-kernel, linux-kernel
On 10/1/24 12:29 PM, Nicolin Chen wrote:
> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>> On 10/1/24 11:27 AM, Nicolin Chen wrote:
>>> On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote:
>>>> Using 64 bit immediate when doing shift can solve the problem. The
>>>> disssembly after the fix looks like:
>>> [...]
>>>
>>>> unsigned int last_sid_idx =
>>>> - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
>>>> + arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
>>> Could a 32-bit build be a corner case where UL is no longer a
>>> "64 bit" stated in the commit message?
>> It shouldn't. Because smmu v3 depends on ARM64.
>>
>> config ARM_SMMU_V3
>> tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
>> depends on ARM64
> ARM64 can have aarch32 support. I am not sure if ARM64 running a
> 32-bit OS can be a case though, (and not confined to AmpereOne).
I don't think ARM64 runs 32-bit kernel, at least for newer kernel.
>
>>> Then, can ssid_bits/s1cdmax be a concern similarly?
>> IIUC, ssid_bits is determined by IDR1_SSIDSIZE. It is GENMASK(10, 6). So
>> it shouldn't be 32. IDR1_SIDSIZE is GENMASK(5, 0).
> Rechecked the RM. Yea, max sid can be 32 but max ssid is 20 at
> this moment, so we should be safe.
>
> Thanks
> Nicolin
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
2024-10-01 19:48 ` Yang Shi
@ 2024-10-02 9:59 ` Robin Murphy
2024-10-02 15:38 ` Yang Shi
0 siblings, 1 reply; 17+ messages in thread
From: Robin Murphy @ 2024-10-02 9:59 UTC (permalink / raw)
To: Yang Shi, Nicolin Chen; +Cc: jgg, will, linux-arm-kernel, linux-kernel
On 2024-10-01 8:48 pm, Yang Shi wrote:
>
>
> On 10/1/24 12:29 PM, Nicolin Chen wrote:
>> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>>> On 10/1/24 11:27 AM, Nicolin Chen wrote:
>>>> On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote:
>>>>> Using 64 bit immediate when doing shift can solve the problem. The
>>>>> disssembly after the fix looks like:
>>>> [...]
>>>>
>>>>> unsigned int last_sid_idx =
>>>>> - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
>>>>> + arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
>>>> Could a 32-bit build be a corner case where UL is no longer a
>>>> "64 bit" stated in the commit message?
>>> It shouldn't. Because smmu v3 depends on ARM64.
>>>
>>> config ARM_SMMU_V3
>>> tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
>>> depends on ARM64
>> ARM64 can have aarch32 support. I am not sure if ARM64 running a
>> 32-bit OS can be a case though, (and not confined to AmpereOne).
>
> I don't think ARM64 runs 32-bit kernel, at least for newer kernel.
Just use ULL - if the point is that it must be a 64-bit shift for
correctness, then being clear about that intent is far more valuable
than saving one character of source code.
Thanks,
Robin.
>
>>
>>>> Then, can ssid_bits/s1cdmax be a concern similarly?
>>> IIUC, ssid_bits is determined by IDR1_SSIDSIZE. It is GENMASK(10, 6). So
>>> it shouldn't be 32. IDR1_SIDSIZE is GENMASK(5, 0).
>> Rechecked the RM. Yea, max sid can be 32 but max ssid is 20 at
>> this moment, so we should be safe.
>>
>> Thanks
>> Nicolin
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
2024-10-02 9:59 ` Robin Murphy
@ 2024-10-02 15:38 ` Yang Shi
0 siblings, 0 replies; 17+ messages in thread
From: Yang Shi @ 2024-10-02 15:38 UTC (permalink / raw)
To: Robin Murphy, Nicolin Chen; +Cc: jgg, will, linux-arm-kernel, linux-kernel
On 10/2/24 2:59 AM, Robin Murphy wrote:
> On 2024-10-01 8:48 pm, Yang Shi wrote:
>>
>>
>> On 10/1/24 12:29 PM, Nicolin Chen wrote:
>>> On Tue, Oct 01, 2024 at 12:09:03PM -0700, Yang Shi wrote:
>>>> On 10/1/24 11:27 AM, Nicolin Chen wrote:
>>>>> On Tue, Oct 01, 2024 at 11:03:46AM -0700, Yang Shi wrote:
>>>>>> Using 64 bit immediate when doing shift can solve the problem. The
>>>>>> disssembly after the fix looks like:
>>>>> [...]
>>>>>
>>>>>> unsigned int last_sid_idx =
>>>>>> - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
>>>>>> + arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
>>>>> Could a 32-bit build be a corner case where UL is no longer a
>>>>> "64 bit" stated in the commit message?
>>>> It shouldn't. Because smmu v3 depends on ARM64.
>>>>
>>>> config ARM_SMMU_V3
>>>> tristate "ARM Ltd. System MMU Version 3 (SMMUv3) Support"
>>>> depends on ARM64
>>> ARM64 can have aarch32 support. I am not sure if ARM64 running a
>>> 32-bit OS can be a case though, (and not confined to AmpereOne).
>>
>> I don't think ARM64 runs 32-bit kernel, at least for newer kernel.
>
> Just use ULL - if the point is that it must be a 64-bit shift for
> correctness, then being clear about that intent is far more valuable
> than saving one character of source code.
Yeah, it must be 64 bit. Will fix in v2.
>
> Thanks,
> Robin.
>
>>
>>>
>>>>> Then, can ssid_bits/s1cdmax be a concern similarly?
>>>> IIUC, ssid_bits is determined by IDR1_SSIDSIZE. It is GENMASK(10,
>>>> 6). So
>>>> it shouldn't be 32. IDR1_SIDSIZE is GENMASK(5, 0).
>>> Rechecked the RM. Yea, max sid can be 32 but max ssid is 20 at
>>> this moment, so we should be safe.
>>>
>>> Thanks
>>> Nicolin
>>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
2024-10-01 18:03 [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne Yang Shi
2024-10-01 18:27 ` Nicolin Chen
@ 2024-10-02 16:58 ` James Morse
2024-10-02 17:39 ` Yang Shi
1 sibling, 1 reply; 17+ messages in thread
From: James Morse @ 2024-10-02 16:58 UTC (permalink / raw)
To: Yang Shi, jgg, nicolinc, will, robin.murphy
Cc: linux-arm-kernel, linux-kernel
Hello!
On 01/10/2024 19:03, Yang Shi wrote:
> The commit ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()")
> calculated the last index of L1 stream table by 1 << smmu->sid_bits. 1
> is 32 bit value.
> However AmpereOne has 32-bit stream id size. This resulted in
> ouf-of-bound shift. The disassembly of shift is:
>
> ldr w2, [x19, 828] //, smmu_7(D)->sid_bits
> mov w20, 1
> lsl w20, w20, w2
>
> According to ARM spec, if the registers are 32 bit, the instruction actually
> does:
> dest = src << (shift % 32)
>
> So it actually shifted by zero bit.
>
> This caused v6.12-rc1 failed to boot on AmpereOne
Same here - one of arm's reference designs prints 1 giga-tonne of:
| arm-smmu-v3 arm-smmu-v3.5.auto: event 0x02 received:
| arm-smmu-v3 arm-smmu-v3.5.auto: 0x0000c90000000002
| arm-smmu-v3 arm-smmu-v3.5.auto: 0x0000000000000000
| arm-smmu-v3 arm-smmu-v3.5.auto: 0x0000000000000000
| arm-smmu-v3 arm-smmu-v3.5.auto: 0x0000000000000000
during boot - then fails to find the nvme.
Bisect points to ce410410f1a7, and the below diff fixes it.
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 737c5b882355..01a2faee04bc 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3625,7 +3625,7 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
> u32 l1size;
> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> unsigned int last_sid_idx =
> - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
> + arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
>
> /* Calculate the L1 size, capped to the SIDSIZE. */
> cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
Tested-by: James Morse <james.morse@arm.com>
Thanks,
James
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne
2024-10-02 16:58 ` James Morse
@ 2024-10-02 17:39 ` Yang Shi
0 siblings, 0 replies; 17+ messages in thread
From: Yang Shi @ 2024-10-02 17:39 UTC (permalink / raw)
To: James Morse, jgg, nicolinc, will, robin.murphy
Cc: linux-arm-kernel, linux-kernel
On 10/2/24 9:58 AM, James Morse wrote:
> Hello!
>
> On 01/10/2024 19:03, Yang Shi wrote:
>> The commit ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()")
>> calculated the last index of L1 stream table by 1 << smmu->sid_bits. 1
>> is 32 bit value.
>> However AmpereOne has 32-bit stream id size. This resulted in
>> ouf-of-bound shift. The disassembly of shift is:
>>
>> ldr w2, [x19, 828] //, smmu_7(D)->sid_bits
>> mov w20, 1
>> lsl w20, w20, w2
>>
>> According to ARM spec, if the registers are 32 bit, the instruction actually
>> does:
>> dest = src << (shift % 32)
>>
>> So it actually shifted by zero bit.
>>
>> This caused v6.12-rc1 failed to boot on AmpereOne
> Same here - one of arm's reference designs prints 1 giga-tonne of:
> | arm-smmu-v3 arm-smmu-v3.5.auto: event 0x02 received:
> | arm-smmu-v3 arm-smmu-v3.5.auto: 0x0000c90000000002
> | arm-smmu-v3 arm-smmu-v3.5.auto: 0x0000000000000000
> | arm-smmu-v3 arm-smmu-v3.5.auto: 0x0000000000000000
> | arm-smmu-v3 arm-smmu-v3.5.auto: 0x0000000000000000
>
> during boot - then fails to find the nvme.
> Bisect points to ce410410f1a7, and the below diff fixes it.
>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 737c5b882355..01a2faee04bc 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -3625,7 +3625,7 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
>> u32 l1size;
>> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
>> unsigned int last_sid_idx =
>> - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
>> + arm_smmu_strtab_l1_idx((1UL << smmu->sid_bits) - 1);
>>
>> /* Calculate the L1 size, capped to the SIDSIZE. */
>> cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
>
> Tested-by: James Morse <james.morse@arm.com>
Thank you for testing. I will change the patch subject a little bit to
make it more general.
>
>
> Thanks,
>
> James
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-10-02 17:41 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-01 18:03 [PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for AmpereOne Yang Shi
2024-10-01 18:27 ` Nicolin Chen
2024-10-01 19:09 ` Yang Shi
2024-10-01 19:18 ` Jason Gunthorpe
2024-10-01 19:38 ` Yang Shi
2024-10-01 19:46 ` Jason Gunthorpe
2024-10-01 20:31 ` Yang Shi
2024-10-01 20:47 ` Yang Shi
2024-10-01 21:02 ` Jason Gunthorpe
2024-10-01 21:32 ` Yang Shi
2024-10-02 16:58 ` James Morse
2024-10-01 19:29 ` Nicolin Chen
2024-10-01 19:48 ` Yang Shi
2024-10-02 9:59 ` Robin Murphy
2024-10-02 15:38 ` Yang Shi
2024-10-02 16:58 ` James Morse
2024-10-02 17:39 ` Yang Shi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).