* [PATCH 1/6] dt-bindings: firmware: Add arm,errata-management
2023-03-30 16:51 [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects James Morse
@ 2023-03-30 16:51 ` James Morse
2023-03-30 20:50 ` Rob Herring
` (2 more replies)
2023-03-30 16:51 ` [PATCH 2/6] firmware: smccc: Add support for erratum discovery API James Morse
` (7 subsequent siblings)
8 siblings, 3 replies; 27+ messages in thread
From: James Morse @ 2023-03-30 16:51 UTC (permalink / raw)
To: linux-arm-kernel, devicetree
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
Sudeep Holla, Marc Zyngier, Oliver Upton, James Morse,
Rob Herring, Krzysztof Kozlowski, Andre Przywara
The Errata Management SMCCC interface allows firmware to advertise whether
the OS is affected by an erratum, or if a higher exception level has
mitigated the issue. This allows properties of the device that are not
discoverable by the OS to be described. e.g. some errata depend on the
behaviour of the interconnect, which is not visible to the OS.
Deployed devices may find it significantly harder to update EL3
firmware than the device tree. Erratum workarounds typically have to
fail safe, and assume the platform is affected putting correctness
above performance.
Instead of adding a device-tree entry for any CPU errata that is
relevant (or not) to the platform, allow the device-tree to describe
firmware's responses for the SMCCC interface. This could be used as
the data source for the firmware interface, or be parsed by the OS if
the firmware interface is missing.
Most errata can be detected from CPU id registers. These mechanisms
are only needed for the rare cases that external knowledge is needed.
Suggested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
.../devicetree/bindings/arm/cpus.yaml | 5 ++
.../firmware/arm,errata-management.yaml | 77 +++++++++++++++++++
2 files changed, 82 insertions(+)
create mode 100644 Documentation/devicetree/bindings/firmware/arm,errata-management.yaml
diff --git a/Documentation/devicetree/bindings/arm/cpus.yaml b/Documentation/devicetree/bindings/arm/cpus.yaml
index c145f6a035ee..47b12761f305 100644
--- a/Documentation/devicetree/bindings/arm/cpus.yaml
+++ b/Documentation/devicetree/bindings/arm/cpus.yaml
@@ -257,6 +257,11 @@ properties:
List of phandles to idle state nodes supported
by this cpu (see ./idle-states.yaml).
+ arm,erratum-list:
+ $ref: '/schemas/types.yaml#/definitions/phandle'
+ description:
+ Specifies the firmware cpu-erratum-list node associated with this CPU.
+
capacity-dmips-mhz:
description:
u32 value representing CPU capacity (see ../cpu/cpu-capacity.txt) in
diff --git a/Documentation/devicetree/bindings/firmware/arm,errata-management.yaml b/Documentation/devicetree/bindings/firmware/arm,errata-management.yaml
new file mode 100644
index 000000000000..9baeb3d35213
--- /dev/null
+++ b/Documentation/devicetree/bindings/firmware/arm,errata-management.yaml
@@ -0,0 +1,77 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/firmware/arm,errata-management.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Errata Management Firmware Interface
+
+maintainers:
+ - James Morse <james.morse@arm.com>
+
+description: |+
+ The SMC-CC has an erratum discovery interface that allows the OS to discover
+ whether a particular CPU is affected by a specific erratum when the
+ configurations affected is only known by firmware. See the specification of
+ the same title on developer.arm.com, document DEN0100.
+ Provide the values that should be used by the interface, either to supplement
+ firmware, or override the values firmware provides.
+ Most errata can be detected from CPU id registers. These mechanisms are only
+ needed for the rare cases that external knowledge is needed.
+ The CPU node should hold a phandle that points to the cpu-erratum-list node.
+
+properties:
+ compatible:
+ items:
+ - const: arm,cpu-erratum-list
+
+ arm,erratum-affected:
+ description: Erratum numbers that this CPU is affected by.
+ $ref: /schemas/types.yaml#/definitions/uint32-array
+ minItems: 1
+
+ arm,erratum-not-affected:
+ description: Erratum numbers that this CPU is not affected by.
+ $ref: /schemas/types.yaml#/definitions/uint32-array
+ minItems: 1
+
+ arm,erratum-higher-el-mitigation:
+ description: Erratum numbers that have been mitigated by a higher level
+ of firmware
+ $ref: /schemas/types.yaml#/definitions/uint32-array
+ minItems: 1
+
+required:
+ - compatible
+anyOf:
+ - required:
+ - 'arm,erratum-affected'
+ - required:
+ - 'arm,erratum-not-affected'
+ - required:
+ - 'arm,erratum-higher-el-mitigation'
+
+additionalProperties: false
+
+examples:
+ - |
+ firmware {
+ CL1_ERRATA: cluster1-errata {
+ compatible = "arm,cpu-erratum-list";
+ arm,erratum-not-affected = <2701952>;
+ };
+ };
+
+ cpus {
+ #size-cells = <0>;
+ #address-cells = <1>;
+
+ cpu@0 {
+ device_type = "cpu";
+ compatible = "arm,cortex-x2";
+ reg = <0x0>;
+ arm,erratum-list = <&CL1_ERRATA>;
+ };
+ };
+
+...
--
2.39.2
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [PATCH 1/6] dt-bindings: firmware: Add arm,errata-management
2023-03-30 16:51 ` [PATCH 1/6] dt-bindings: firmware: Add arm,errata-management James Morse
@ 2023-03-30 20:50 ` Rob Herring
2023-03-31 8:29 ` Krzysztof Kozlowski
2023-03-31 13:46 ` Rob Herring
2 siblings, 0 replies; 27+ messages in thread
From: Rob Herring @ 2023-03-30 20:50 UTC (permalink / raw)
To: James Morse
Cc: devicetree, Will Deacon, Sudeep Holla, Oliver Upton,
Andre Przywara, Krzysztof Kozlowski, Mark Rutland, Rob Herring,
linux-arm-kernel, Catalin Marinas, Marc Zyngier,
Lorenzo Pieralisi
On Thu, 30 Mar 2023 17:51:23 +0100, James Morse wrote:
> The Errata Management SMCCC interface allows firmware to advertise whether
> the OS is affected by an erratum, or if a higher exception level has
> mitigated the issue. This allows properties of the device that are not
> discoverable by the OS to be described. e.g. some errata depend on the
> behaviour of the interconnect, which is not visible to the OS.
>
> Deployed devices may find it significantly harder to update EL3
> firmware than the device tree. Erratum workarounds typically have to
> fail safe, and assume the platform is affected putting correctness
> above performance.
>
> Instead of adding a device-tree entry for any CPU errata that is
> relevant (or not) to the platform, allow the device-tree to describe
> firmware's responses for the SMCCC interface. This could be used as
> the data source for the firmware interface, or be parsed by the OS if
> the firmware interface is missing.
>
> Most errata can be detected from CPU id registers. These mechanisms
> are only needed for the rare cases that external knowledge is needed.
>
> Suggested-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> .../devicetree/bindings/arm/cpus.yaml | 5 ++
> .../firmware/arm,errata-management.yaml | 77 +++++++++++++++++++
> 2 files changed, 82 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/firmware/arm,errata-management.yaml
>
My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):
yamllint warnings/errors:
./Documentation/devicetree/bindings/firmware/arm,errata-management.yaml:48:5: [warning] wrong indentation: expected 6 but found 4 (indentation)
./Documentation/devicetree/bindings/firmware/arm,errata-management.yaml:50:5: [warning] wrong indentation: expected 6 but found 4 (indentation)
./Documentation/devicetree/bindings/firmware/arm,errata-management.yaml:52:5: [warning] wrong indentation: expected 6 but found 4 (indentation)
dtschema/dtc warnings/errors:
doc reference errors (make refcheckdocs):
See https://patchwork.ozlabs.org/project/devicetree-bindings/patch/20230330165128.3237939-2-james.morse@arm.com
The base for the series is generally the latest rc1. A different dependency
should be noted in *this* patch.
If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:
pip3 install dtschema --upgrade
Please check and re-submit after running the above command yourself. Note
that DT_SCHEMA_FILES can be set to your schema file to speed up checking
your schema. However, it must be unset to test all examples with your schema.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/6] dt-bindings: firmware: Add arm,errata-management
2023-03-30 16:51 ` [PATCH 1/6] dt-bindings: firmware: Add arm,errata-management James Morse
2023-03-30 20:50 ` Rob Herring
@ 2023-03-31 8:29 ` Krzysztof Kozlowski
2023-03-31 16:58 ` James Morse
2023-03-31 13:46 ` Rob Herring
2 siblings, 1 reply; 27+ messages in thread
From: Krzysztof Kozlowski @ 2023-03-31 8:29 UTC (permalink / raw)
To: James Morse, linux-arm-kernel, devicetree
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
Sudeep Holla, Marc Zyngier, Oliver Upton, Rob Herring,
Krzysztof Kozlowski, Andre Przywara
On 30/03/2023 18:51, James Morse wrote:
> The Errata Management SMCCC interface allows firmware to advertise whether
> the OS is affected by an erratum, or if a higher exception level has
> mitigated the issue. This allows properties of the device that are not
> discoverable by the OS to be described. e.g. some errata depend on the
> behaviour of the interconnect, which is not visible to the OS.
>
> Deployed devices may find it significantly harder to update EL3
> firmware than the device tree. Erratum workarounds typically have to
> fail safe, and assume the platform is affected putting correctness
> above performance.
>
> Instead of adding a device-tree entry for any CPU errata that is
> relevant (or not) to the platform, allow the device-tree to describe
> firmware's responses for the SMCCC interface. This could be used as
> the data source for the firmware interface, or be parsed by the OS if
> the firmware interface is missing.
>
> Most errata can be detected from CPU id registers. These mechanisms
> are only needed for the rare cases that external knowledge is needed.
>
> Suggested-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> .../devicetree/bindings/arm/cpus.yaml | 5 ++
> .../firmware/arm,errata-management.yaml | 77 +++++++++++++++++++
> 2 files changed, 82 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/firmware/arm,errata-management.yaml
>
> diff --git a/Documentation/devicetree/bindings/arm/cpus.yaml b/Documentation/devicetree/bindings/arm/cpus.yaml
> index c145f6a035ee..47b12761f305 100644
> --- a/Documentation/devicetree/bindings/arm/cpus.yaml
> +++ b/Documentation/devicetree/bindings/arm/cpus.yaml
> @@ -257,6 +257,11 @@ properties:
> List of phandles to idle state nodes supported
> by this cpu (see ./idle-states.yaml).
>
> + arm,erratum-list:
> + $ref: '/schemas/types.yaml#/definitions/phandle'
> + description:
> + Specifies the firmware cpu-erratum-list node associated with this CPU.
> +
> capacity-dmips-mhz:
> description:
> u32 value representing CPU capacity (see ../cpu/cpu-capacity.txt) in
> diff --git a/Documentation/devicetree/bindings/firmware/arm,errata-management.yaml b/Documentation/devicetree/bindings/firmware/arm,errata-management.yaml
> new file mode 100644
> index 000000000000..9baeb3d35213
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/firmware/arm,errata-management.yaml
> @@ -0,0 +1,77 @@
> +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/firmware/arm,errata-management.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
Except missing testing...
> +
> +title: Errata Management Firmware Interface
> +
> +maintainers:
> + - James Morse <james.morse@arm.com>
> +
> +description: |+
Do not need '|+'.
> + The SMC-CC has an erratum discovery interface that allows the OS to discover
> + whether a particular CPU is affected by a specific erratum when the
> + configurations affected is only known by firmware. See the specification of
> + the same title on developer.arm.com, document DEN0100.
> + Provide the values that should be used by the interface, either to supplement
> + firmware, or override the values firmware provides.
Why? If you have the discovery interface, don't add stuff to the DT, but
use that interface.
> + Most errata can be detected from CPU id registers. These mechanisms are only
> + needed for the rare cases that external knowledge is needed.
> + The CPU node should hold a phandle that points to the cpu-erratum-list node.
> +
> +properties:
> + compatible:
> + items:
One item, so drop "items".
> + - const: arm,cpu-erratum-list
> +
> + arm,erratum-affected:
> + description: Erratum numbers that this CPU is affected by.
Isn't this explicit from CPU compatible?
> + $ref: /schemas/types.yaml#/definitions/uint32-array
> + minItems: 1
What do the numbers mean?
maxItems?
> +
> + arm,erratum-not-affected:
> + description: Erratum numbers that this CPU is not affected by.
> + $ref: /schemas/types.yaml#/definitions/uint32-array
> + minItems: 1
> +
> + arm,erratum-higher-el-mitigation:
> + description: Erratum numbers that have been mitigated by a higher level
> + of firmware
> + $ref: /schemas/types.yaml#/definitions/uint32-array
> + minItems: 1
> +
> +required:
> + - compatible
Missing blank line
> +anyOf:
> + - required:
> + - 'arm,erratum-affected'
> + - required:
> + - 'arm,erratum-not-affected'
> + - required:
> + - 'arm,erratum-higher-el-mitigation'
Drop quotes
> +
> +additionalProperties: false
> +
> +examples:
> + - |
> + firmware {
> + CL1_ERRATA: cluster1-errata {
> + compatible = "arm,cpu-erratum-list";
> + arm,erratum-not-affected = <2701952>;
> + };
> + };
> +
> + cpus {
> + #size-cells = <0>;
> + #address-cells = <1>;
> +
> + cpu@0 {
> + device_type = "cpu";
> + compatible = "arm,cortex-x2";
> + reg = <0x0>;
> + arm,erratum-list = <&CL1_ERRATA>;
> + };
> + };
> +
> +...
Best regards,
Krzysztof
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH 1/6] dt-bindings: firmware: Add arm,errata-management
2023-03-31 8:29 ` Krzysztof Kozlowski
@ 2023-03-31 16:58 ` James Morse
2023-04-03 9:15 ` Krzysztof Kozlowski
0 siblings, 1 reply; 27+ messages in thread
From: James Morse @ 2023-03-31 16:58 UTC (permalink / raw)
To: Krzysztof Kozlowski, linux-arm-kernel, devicetree
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
Sudeep Holla, Marc Zyngier, Oliver Upton, Rob Herring,
Krzysztof Kozlowski, Andre Przywara
Hi Krzysztof
On 31/03/2023 09:29, Krzysztof Kozlowski wrote:
> On 30/03/2023 18:51, James Morse wrote:
>> The Errata Management SMCCC interface allows firmware to advertise whether
>> the OS is affected by an erratum, or if a higher exception level has
>> mitigated the issue. This allows properties of the device that are not
>> discoverable by the OS to be described. e.g. some errata depend on the
>> behaviour of the interconnect, which is not visible to the OS.
>>
>> Deployed devices may find it significantly harder to update EL3
>> firmware than the device tree. Erratum workarounds typically have to
>> fail safe, and assume the platform is affected putting correctness
>> above performance.
>>
>> Instead of adding a device-tree entry for any CPU errata that is
>> relevant (or not) to the platform, allow the device-tree to describe
>> firmware's responses for the SMCCC interface. This could be used as
>> the data source for the firmware interface, or be parsed by the OS if
>> the firmware interface is missing.
>>
>> Most errata can be detected from CPU id registers. These mechanisms
>> are only needed for the rare cases that external knowledge is needed.
>> diff --git a/Documentation/devicetree/bindings/firmware/arm,errata-management.yaml b/Documentation/devicetree/bindings/firmware/arm,errata-management.yaml
>> new file mode 100644
>> index 000000000000..9baeb3d35213
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/firmware/arm,errata-management.yaml
>> @@ -0,0 +1,77 @@
>> +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
>> +%YAML 1.2
>> +---
>> +$id: http://devicetree.org/schemas/firmware/arm,errata-management.yaml#
>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>
> Except missing testing...
After a couple of hours of testing this, I went blind and missed that it was still
complaining about spaces.
>> +
>> +title: Errata Management Firmware Interface
>> +
>> +maintainers:
>> + - James Morse <james.morse@arm.com>
>> +
>> +description: |+
>
> Do not need '|+'.
>
>> + The SMC-CC has an erratum discovery interface that allows the OS to discover
>> + whether a particular CPU is affected by a specific erratum when the
>> + configurations affected is only known by firmware. See the specification of
>> + the same title on developer.arm.com, document DEN0100.
>> + Provide the values that should be used by the interface, either to supplement
>> + firmware, or override the values firmware provides.
>
> Why? If you have the discovery interface, don't add stuff to the DT, but
> use that interface.
A DT property was explicitly requested by Marc Z on the RFC:
https://lore.kernel.org/linux-arm-kernel/86mt5dxxbc.wl-maz@kernel.org/
The concern is that platforms where the CPU is affected, but the issue is masked by the
interconnect will never bother with a firmware interface. The kernel can't know this, so
has to enable the workaround at the cost of performance.
Adding a DT property to describe the hardware state of the erratum avoids the need for
separate kernel builds to work around the missing firmware.
>> + - const: arm,cpu-erratum-list
>> +
>> + arm,erratum-affected:
>> + description: Erratum numbers that this CPU is affected by.
>
> Isn't this explicit from CPU compatible?
I'll drop it from the compatible. The concern is arm add erratum in other IP to this
interface, hence shoving 'cpu' in a few places to future proof it against changes in the spec.
>> + $ref: /schemas/types.yaml#/definitions/uint32-array
>> + minItems: 1
> What do the numbers mean?
The numbers are unique identifiers issued by the CPU designer to identify the part
affected and the erratum. See the cover-letter for links to arms documents for all these
CPUs, or Documentation/arm64/silicon-errata.txt for a table of those the kernel works around.
> maxItems?
If there are zero entries, the table can be omitted, hence minItems.
This is the first erratum workaround that needs to know non-discoverable CPU properties,
but there will be others.
Thanks,
James
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/6] dt-bindings: firmware: Add arm,errata-management
2023-03-31 16:58 ` James Morse
@ 2023-04-03 9:15 ` Krzysztof Kozlowski
2023-04-03 12:05 ` Marc Zyngier
0 siblings, 1 reply; 27+ messages in thread
From: Krzysztof Kozlowski @ 2023-04-03 9:15 UTC (permalink / raw)
To: James Morse, linux-arm-kernel, devicetree
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
Sudeep Holla, Marc Zyngier, Oliver Upton, Rob Herring,
Krzysztof Kozlowski, Andre Przywara
On 31/03/2023 18:58, James Morse wrote:
> Hi Krzysztof
>
> On 31/03/2023 09:29, Krzysztof Kozlowski wrote:
>> On 30/03/2023 18:51, James Morse wrote:
>>> The Errata Management SMCCC interface allows firmware to advertise whether
>>> the OS is affected by an erratum, or if a higher exception level has
>>> mitigated the issue. This allows properties of the device that are not
>>> discoverable by the OS to be described. e.g. some errata depend on the
>>> behaviour of the interconnect, which is not visible to the OS.
>>>
>>> Deployed devices may find it significantly harder to update EL3
>>> firmware than the device tree. Erratum workarounds typically have to
>>> fail safe, and assume the platform is affected putting correctness
>>> above performance.
>>>
>>> Instead of adding a device-tree entry for any CPU errata that is
>>> relevant (or not) to the platform, allow the device-tree to describe
>>> firmware's responses for the SMCCC interface. This could be used as
>>> the data source for the firmware interface, or be parsed by the OS if
>>> the firmware interface is missing.
>>>
>>> Most errata can be detected from CPU id registers. These mechanisms
>>> are only needed for the rare cases that external knowledge is needed.
>
>>> diff --git a/Documentation/devicetree/bindings/firmware/arm,errata-management.yaml b/Documentation/devicetree/bindings/firmware/arm,errata-management.yaml
>>> new file mode 100644
>>> index 000000000000..9baeb3d35213
>>> --- /dev/null
>>> +++ b/Documentation/devicetree/bindings/firmware/arm,errata-management.yaml
>>> @@ -0,0 +1,77 @@
>>> +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
>>> +%YAML 1.2
>>> +---
>>> +$id: http://devicetree.org/schemas/firmware/arm,errata-management.yaml#
>>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>>
>> Except missing testing...
>
> After a couple of hours of testing this, I went blind and missed that it was still
> complaining about spaces.
>
>
>>> +
>>> +title: Errata Management Firmware Interface
>>> +
>>> +maintainers:
>>> + - James Morse <james.morse@arm.com>
>>> +
>>> +description: |+
>>
>> Do not need '|+'.
>>
>>> + The SMC-CC has an erratum discovery interface that allows the OS to discover
>>> + whether a particular CPU is affected by a specific erratum when the
>>> + configurations affected is only known by firmware. See the specification of
>>> + the same title on developer.arm.com, document DEN0100.
>>> + Provide the values that should be used by the interface, either to supplement
>>> + firmware, or override the values firmware provides.
>>
>> Why? If you have the discovery interface, don't add stuff to the DT, but
>> use that interface.
>
> A DT property was explicitly requested by Marc Z on the RFC:
> https://lore.kernel.org/linux-arm-kernel/86mt5dxxbc.wl-maz@kernel.org/
>
> The concern is that platforms where the CPU is affected, but the issue is masked by the
> interconnect will never bother with a firmware interface. The kernel can't know this, so
> has to enable the workaround at the cost of performance.
It would have to bother DT, so same problem... DT is not optimization
mechanism for SW decisions.
>
> Adding a DT property to describe the hardware state of the erratum avoids the need for
> separate kernel builds to work around the missing firmware.
The purpose of DT is not to avoid separate kernel builds thus this is
not an argument whether property fits DT or it doesn't.
>
>
>>> + - const: arm,cpu-erratum-list
>>> +
>>> + arm,erratum-affected:
>>> + description: Erratum numbers that this CPU is affected by.
>>
>> Isn't this explicit from CPU compatible?
>
> I'll drop it from the compatible. The concern is arm add erratum in other IP to this
> interface, hence shoving 'cpu' in a few places to future proof it against changes in the spec.
>
>
>>> + $ref: /schemas/types.yaml#/definitions/uint32-array
>>> + minItems: 1
>
>> What do the numbers mean?
>
> The numbers are unique identifiers issued by the CPU designer to identify the part
> affected and the erratum. See the cover-letter for links to arms documents for all these
> CPUs, or Documentation/arm64/silicon-errata.txt for a table of those the kernel works around.
The answer should be in description, not in cover letter.
>
>
>> maxItems?
>
> If there are zero entries, the table can be omitted, hence minItems.
>
> This is the first erratum workaround that needs to know non-discoverable CPU properties,
> but there will be others.
Best regards,
Krzysztof
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/6] dt-bindings: firmware: Add arm,errata-management
2023-04-03 9:15 ` Krzysztof Kozlowski
@ 2023-04-03 12:05 ` Marc Zyngier
0 siblings, 0 replies; 27+ messages in thread
From: Marc Zyngier @ 2023-04-03 12:05 UTC (permalink / raw)
To: Krzysztof Kozlowski
Cc: James Morse, linux-arm-kernel, devicetree, Catalin Marinas,
Will Deacon, Mark Rutland, Lorenzo Pieralisi, Sudeep Holla,
Oliver Upton, Rob Herring, Krzysztof Kozlowski, Andre Przywara
On Mon, 03 Apr 2023 10:15:19 +0100,
Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> wrote:
>
> On 31/03/2023 18:58, James Morse wrote:
> > Hi Krzysztof
> >
> > On 31/03/2023 09:29, Krzysztof Kozlowski wrote:
> >> On 30/03/2023 18:51, James Morse wrote:
> >>> The Errata Management SMCCC interface allows firmware to advertise whether
> >>> the OS is affected by an erratum, or if a higher exception level has
> >>> mitigated the issue. This allows properties of the device that are not
> >>> discoverable by the OS to be described. e.g. some errata depend on the
> >>> behaviour of the interconnect, which is not visible to the OS.
> >>>
> >>> Deployed devices may find it significantly harder to update EL3
> >>> firmware than the device tree. Erratum workarounds typically have to
> >>> fail safe, and assume the platform is affected putting correctness
> >>> above performance.
> >>>
> >>> Instead of adding a device-tree entry for any CPU errata that is
> >>> relevant (or not) to the platform, allow the device-tree to describe
> >>> firmware's responses for the SMCCC interface. This could be used as
> >>> the data source for the firmware interface, or be parsed by the OS if
> >>> the firmware interface is missing.
> >>>
> >>> Most errata can be detected from CPU id registers. These mechanisms
> >>> are only needed for the rare cases that external knowledge is needed.
> >
> >>> diff --git a/Documentation/devicetree/bindings/firmware/arm,errata-management.yaml b/Documentation/devicetree/bindings/firmware/arm,errata-management.yaml
> >>> new file mode 100644
> >>> index 000000000000..9baeb3d35213
> >>> --- /dev/null
> >>> +++ b/Documentation/devicetree/bindings/firmware/arm,errata-management.yaml
> >>> @@ -0,0 +1,77 @@
> >>> +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
> >>> +%YAML 1.2
> >>> +---
> >>> +$id: http://devicetree.org/schemas/firmware/arm,errata-management.yaml#
> >>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> >>
> >> Except missing testing...
> >
> > After a couple of hours of testing this, I went blind and missed that it was still
> > complaining about spaces.
> >
> >
> >>> +
> >>> +title: Errata Management Firmware Interface
> >>> +
> >>> +maintainers:
> >>> + - James Morse <james.morse@arm.com>
> >>> +
> >>> +description: |+
> >>
> >> Do not need '|+'.
> >>
> >>> + The SMC-CC has an erratum discovery interface that allows the OS to discover
> >>> + whether a particular CPU is affected by a specific erratum when the
> >>> + configurations affected is only known by firmware. See the specification of
> >>> + the same title on developer.arm.com, document DEN0100.
> >>> + Provide the values that should be used by the interface, either to supplement
> >>> + firmware, or override the values firmware provides.
> >>
> >> Why? If you have the discovery interface, don't add stuff to the DT, but
> >> use that interface.
If you read the cover letter, you'll notice that *nobody* implements
the discovery mechanism, and yet we still need a way to identify
broken systems.
> >
> > A DT property was explicitly requested by Marc Z on the RFC:
> > https://lore.kernel.org/linux-arm-kernel/86mt5dxxbc.wl-maz@kernel.org/
> >
> > The concern is that platforms where the CPU is affected, but the issue is masked by the
> > interconnect will never bother with a firmware interface. The kernel can't know this, so
> > has to enable the workaround at the cost of performance.
>
> It would have to bother DT, so same problem... DT is not optimization
> mechanism for SW decisions.
What does SW has to do with this? This describes the state of the
HW. The HW is broken, SW has no way to discover it otherwise, so DT
*is* the place to put it.
In any case, it is far easier to update a DT that your EL3 firmware.
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/6] dt-bindings: firmware: Add arm,errata-management
2023-03-30 16:51 ` [PATCH 1/6] dt-bindings: firmware: Add arm,errata-management James Morse
2023-03-30 20:50 ` Rob Herring
2023-03-31 8:29 ` Krzysztof Kozlowski
@ 2023-03-31 13:46 ` Rob Herring
2023-03-31 16:58 ` James Morse
2 siblings, 1 reply; 27+ messages in thread
From: Rob Herring @ 2023-03-31 13:46 UTC (permalink / raw)
To: James Morse
Cc: linux-arm-kernel, devicetree, Catalin Marinas, Will Deacon,
Mark Rutland, Lorenzo Pieralisi, Sudeep Holla, Marc Zyngier,
Oliver Upton, Krzysztof Kozlowski, Andre Przywara
On Thu, Mar 30, 2023 at 11:52 AM James Morse <james.morse@arm.com> wrote:
>
> The Errata Management SMCCC interface allows firmware to advertise whether
> the OS is affected by an erratum, or if a higher exception level has
> mitigated the issue. This allows properties of the device that are not
> discoverable by the OS to be described. e.g. some errata depend on the
> behaviour of the interconnect, which is not visible to the OS.
>
> Deployed devices may find it significantly harder to update EL3
> firmware than the device tree. Erratum workarounds typically have to
> fail safe, and assume the platform is affected putting correctness
> above performance.
Updating the DT is still harder than updating the kernel. A well
designed binding allows for errata fixes without DT changes. That
generally means specific compatibles up front rather than adding
properties to turn things on/off.
Do we really want to encourage/endorse/support non-updatable firmware?
We've rejected plenty of things with 'fix your firmware'.
> Instead of adding a device-tree entry for any CPU errata that is
> relevant (or not) to the platform, allow the device-tree to describe
> firmware's responses for the SMCCC interface. This could be used as
> the data source for the firmware interface, or be parsed by the OS if
> the firmware interface is missing.
What's to prevent vendors from only using the DT mechanism and never
supporting the SMCCC interface? I'm sure some will love to not have to
make a firmware update when they can just fix it in DT.
The downside to this extendable binding compared to simple properties
is parsing a flat tree is slow and more complicated. So it may be
difficult to support if you need this early in boot.
> Most errata can be detected from CPU id registers. These mechanisms
> are only needed for the rare cases that external knowledge is needed.
And also have significant performance impact. In the end, how many
platforms are there that can't fix these in firmware and need a
mainline/distro kernel optimized to avoid the work-around. I suspect
the number is small enough it could be a list in the kernel.
Rob
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/6] dt-bindings: firmware: Add arm,errata-management
2023-03-31 13:46 ` Rob Herring
@ 2023-03-31 16:58 ` James Morse
2023-04-03 15:45 ` Rob Herring
0 siblings, 1 reply; 27+ messages in thread
From: James Morse @ 2023-03-31 16:58 UTC (permalink / raw)
To: Rob Herring
Cc: linux-arm-kernel, devicetree, Catalin Marinas, Will Deacon,
Mark Rutland, Lorenzo Pieralisi, Sudeep Holla, Marc Zyngier,
Oliver Upton, Krzysztof Kozlowski, Andre Przywara
Hi Rob,
On 31/03/2023 14:46, Rob Herring wrote:
> On Thu, Mar 30, 2023 at 11:52 AM James Morse <james.morse@arm.com> wrote:
>> The Errata Management SMCCC interface allows firmware to advertise whether
>> the OS is affected by an erratum, or if a higher exception level has
>> mitigated the issue. This allows properties of the device that are not
>> discoverable by the OS to be described. e.g. some errata depend on the
>> behaviour of the interconnect, which is not visible to the OS.
>>
>> Deployed devices may find it significantly harder to update EL3
>> firmware than the device tree. Erratum workarounds typically have to
>> fail safe, and assume the platform is affected putting correctness
>> above performance.
>
> Updating the DT is still harder than updating the kernel. A well
> designed binding allows for errata fixes without DT changes. That
> generally means specific compatibles up front rather than adding
> properties to turn things on/off.
I started with a per-erratum identifier, but there are 8 of them, and its hard to know
where to put it. The CPU side is detectable from the MIDR,its an interconnect property
that we need to know ... but the interconnect isn't described in the DT. (but the obvious
compatible string identifies the PMU)
> Do we really want to encourage/endorse/support non-updatable firmware?
> We've rejected plenty of things with 'fix your firmware'.
A DT property was explicitly requested by Marc Z on the RFC:
https://lore.kernel.org/linux-arm-kernel/86mt5dxxbc.wl-maz@kernel.org/
The concern is that platforms where the CPU is affected, but the issue is masked by the
interconnect will never bother with a firmware interface. The kernel can't know this, so
has to enable the workaround at the cost of performance.
Adding a DT property to describe the hardware state of the erratum avoids the need for
separate kernel builds to work around the missing firmware.
>> Instead of adding a device-tree entry for any CPU errata that is
>> relevant (or not) to the platform, allow the device-tree to describe
>> firmware's responses for the SMCCC interface. This could be used as
>> the data source for the firmware interface, or be parsed by the OS if
>> the firmware interface is missing.
> What's to prevent vendors from only using the DT mechanism and never
> supporting the SMCCC interface? I'm sure some will love to not have to
> make a firmware update when they can just fix it in DT.
The firmware interface has to exist for ACPI systems where EL3 can't influence the ACPI
tables, which typically get replaced if the part is OEM'd.
Ultimately all the interface is describing is a non-discoverable hardware property
relevant to an erratum. These are often configurations the silicon manufacturer chooses.
In this case its the behaviour of the interconnect.
If we didn't have to support ACPI systems, this stuff would only have been in the DT. With
> The downside to this extendable binding compared to simple properties
> is parsing a flat tree is slow and more complicated. So it may be
> difficult to support if you need this early in boot.
I do need this early in the boot, but I'm not worried about the delay as these tables
should be small.
>> Most errata can be detected from CPU id registers. These mechanisms
>> are only needed for the rare cases that external knowledge is needed.
>
> And also have significant performance impact. In the end, how many
> platforms are there that can't fix these in firmware and need a
> mainline/distro kernel optimized to avoid the work-around. I suspect
> the number is small enough it could be a list in the kernel.
At a guess, its all mobile phones produced in the last 2 years that want to run a version
of Android that uses virtualisation. Cortex-A78 is affected, but I don't know when the
first products were shipped.
Thanks,
James
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/6] dt-bindings: firmware: Add arm,errata-management
2023-03-31 16:58 ` James Morse
@ 2023-04-03 15:45 ` Rob Herring
2023-04-04 15:19 ` James Morse
0 siblings, 1 reply; 27+ messages in thread
From: Rob Herring @ 2023-04-03 15:45 UTC (permalink / raw)
To: James Morse
Cc: linux-arm-kernel, devicetree, Catalin Marinas, Will Deacon,
Mark Rutland, Lorenzo Pieralisi, Sudeep Holla, Marc Zyngier,
Oliver Upton, Krzysztof Kozlowski, Andre Przywara
On Fri, Mar 31, 2023 at 11:59 AM James Morse <james.morse@arm.com> wrote:
>
> Hi Rob,
>
> On 31/03/2023 14:46, Rob Herring wrote:
> > On Thu, Mar 30, 2023 at 11:52 AM James Morse <james.morse@arm.com> wrote:
> >> The Errata Management SMCCC interface allows firmware to advertise whether
> >> the OS is affected by an erratum, or if a higher exception level has
> >> mitigated the issue. This allows properties of the device that are not
> >> discoverable by the OS to be described. e.g. some errata depend on the
> >> behaviour of the interconnect, which is not visible to the OS.
> >>
> >> Deployed devices may find it significantly harder to update EL3
> >> firmware than the device tree. Erratum workarounds typically have to
> >> fail safe, and assume the platform is affected putting correctness
> >> above performance.
> >
> > Updating the DT is still harder than updating the kernel. A well
> > designed binding allows for errata fixes without DT changes. That
> > generally means specific compatibles up front rather than adding
> > properties to turn things on/off.
>
> I started with a per-erratum identifier, but there are 8 of them, and its hard to know
> where to put it.
That's still requiring updating the DT to fix things.
> The CPU side is detectable from the MIDR,its an interconnect property
> that we need to know ... but the interconnect isn't described in the DT. (but the obvious
> compatible string identifies the PMU)
But the interconnect could be described. In fact, there's a binding
for such things already. Surprisingly, it's called 'interconnects'...
Of course, there are lots of interconnects in SoCs and the one you
need may not be described ('cause it is invisible to s/w (until it's
not)). There's a binding going back to the CCI-400 in fact. So it
isn't really that interconnects aren't described, it's that they
aren't consistently described.
If you can add this errata table to the DT, then why not add
describing the interconnect? Then it will be there for the next thing
we need the interconnect for. I imagine some of the interconnects are
already described if not explicitly in bits and pieces (i.e. clocks or
power domains for the interconnect get tossed into some other node).
> > Do we really want to encourage/endorse/support non-updatable firmware?
> > We've rejected plenty of things with 'fix your firmware'.
>
> A DT property was explicitly requested by Marc Z on the RFC:
> https://lore.kernel.org/linux-arm-kernel/86mt5dxxbc.wl-maz@kernel.org/
>
> The concern is that platforms where the CPU is affected, but the issue is masked by the
> interconnect will never bother with a firmware interface. The kernel can't know this, so
> has to enable the workaround at the cost of performance.
Sure it can. Worst case, the kernel knows the exact SoC and board it
is running on because we have root level compatibles for those. It's
just a question of whether using those would scale or not. Whether
that scales or not depends on both how long the lists are and how
distributed the implementation is (e.g. PCI quirks). More on that
below.
> Adding a DT property to describe the hardware state of the erratum avoids the need for
> separate kernel builds to work around the missing firmware.
DT is not the kernel's runtime configuration mechanism. That assumes a
tight coupling of the DT and kernel. That may work for some cases like
Android with kernel and DT updated together, but for upstream we can't
assume that coupling and must treat it as an ABI (not an issue for
this case).
The kernel command line is a runtime config mechanism for the kernel.
So why not use only that? There's certainly some room for improvement
with handling it. For example, it's not well defined with what happens
with 'bootargs' as you go from a dtb to bootloader to kernel in terms
of overriding/prepending/appending, but that could be addressed.
> >> Instead of adding a device-tree entry for any CPU errata that is
> >> relevant (or not) to the platform, allow the device-tree to describe
> >> firmware's responses for the SMCCC interface. This could be used as
> >> the data source for the firmware interface, or be parsed by the OS if
> >> the firmware interface is missing.
>
> > What's to prevent vendors from only using the DT mechanism and never
> > supporting the SMCCC interface? I'm sure some will love to not have to
> > make a firmware update when they can just fix it in DT.
>
> The firmware interface has to exist for ACPI systems where EL3 can't influence the ACPI
> tables, which typically get replaced if the part is OEM'd.
>
> Ultimately all the interface is describing is a non-discoverable hardware property
> relevant to an erratum. These are often configurations the silicon manufacturer chooses.
> In this case its the behaviour of the interconnect.
>
> If we didn't have to support ACPI systems, this stuff would only have been in the DT. With
With...?
I fail to see what ACPI has to do with DT platforms adopting the SMCCC
interface or not.
> > The downside to this extendable binding compared to simple properties
> > is parsing a flat tree is slow and more complicated. So it may be
> > difficult to support if you need this early in boot.
>
> I do need this early in the boot, but I'm not worried about the delay as these tables
> should be small.
>
>
> >> Most errata can be detected from CPU id registers. These mechanisms
> >> are only needed for the rare cases that external knowledge is needed.
> >
> > And also have significant performance impact. In the end, how many
> > platforms are there that can't fix these in firmware and need a
> > mainline/distro kernel optimized to avoid the work-around. I suspect
> > the number is small enough it could be a list in the kernel.
>
> At a guess, its all mobile phones produced in the last 2 years that want to run a version
> of Android that uses virtualisation. Cortex-A78 is affected, but I don't know when the
> first products were shipped.
How many run mainline and run it well enough to even care about the
optimization yet?
Even if we went with the above list, that's 2 years x 4 vendors (QCom,
Mediatek, Samsung, Google) x 1-2 Soc (high and mid tier). Subtract out
any vendors capable of updating their firmware. So a worst case list
of potentially 8-16 SoCs? It shouldn't grow because anything newer is
going to implement the SMCCC interface, right? That's not unmanageable
in my book.
Rob
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH 1/6] dt-bindings: firmware: Add arm,errata-management
2023-04-03 15:45 ` Rob Herring
@ 2023-04-04 15:19 ` James Morse
0 siblings, 0 replies; 27+ messages in thread
From: James Morse @ 2023-04-04 15:19 UTC (permalink / raw)
To: Rob Herring
Cc: linux-arm-kernel, devicetree, Catalin Marinas, Will Deacon,
Mark Rutland, Lorenzo Pieralisi, Sudeep Holla, Marc Zyngier,
Oliver Upton, Krzysztof Kozlowski, Andre Przywara
Hi Rob,
On 03/04/2023 16:45, Rob Herring wrote:
> On Fri, Mar 31, 2023 at 11:59 AM James Morse <james.morse@arm.com> wrote:
>> On 31/03/2023 14:46, Rob Herring wrote:
>>> On Thu, Mar 30, 2023 at 11:52 AM James Morse <james.morse@arm.com> wrote:
>>>> The Errata Management SMCCC interface allows firmware to advertise whether
>>>> the OS is affected by an erratum, or if a higher exception level has
>>>> mitigated the issue. This allows properties of the device that are not
>>>> discoverable by the OS to be described. e.g. some errata depend on the
>>>> behaviour of the interconnect, which is not visible to the OS.
>>>>
>>>> Deployed devices may find it significantly harder to update EL3
>>>> firmware than the device tree. Erratum workarounds typically have to
>>>> fail safe, and assume the platform is affected putting correctness
>>>> above performance.
>>>
>>> Updating the DT is still harder than updating the kernel. A well
>>> designed binding allows for errata fixes without DT changes. That
>>> generally means specific compatibles up front rather than adding
>>> properties to turn things on/off.
>>
>> I started with a per-erratum identifier, but there are 8 of them, and its hard to know
>> where to put it.
> That's still requiring updating the DT to fix things.
.... this discussion is about how to add this stuff to DT ...
>> The CPU side is detectable from the MIDR,its an interconnect property
>> that we need to know ... but the interconnect isn't described in the DT. (but the obvious
>> compatible string identifies the PMU)
>
> But the interconnect could be described. In fact, there's a binding
> for such things already. Surprisingly, it's called 'interconnects'...
> Of course, there are lots of interconnects in SoCs and the one you
> need may not be described ('cause it is invisible to s/w (until it's
> not)). There's a binding going back to the CCI-400 in fact. So it
> isn't really that interconnects aren't described, it's that they
> aren't consistently described.
This is the CPU interconnect, but it already has a binding:
Documentation/devicetree/bindings/perf/arm,cmn.yaml
This describes the PMU, which is the only bit of that IP that is interesting to the OS.
I can search for the "arm,cmn-600", "arm,cmn-650", "arm,cmn-700" compatibles, but this is
a list that would have to be added to each time arm, or someone else, create a new
interconnect. It is more manageable than listing the affected platforms.
But if a SoC is not configured to allow normal-world accesses to the interconnect PMU, the
existing binding is of no use.
Should I be looking at adding a compatible "arm,interconnect" to the existing PMU binding?
Presumably I'd need an empty node ... somewhere ... with the "arm,interconnect" property
if the platform doesn't have a usable PMU.
Alternatively I could add a 'non_cacheable_fetch_hits_cacheable_data' property to these
nodes, but it doesn't feel right to search all the nodes for the property. The position in
the tree doesn't help as the CPUs live in their own special node, not in the CPU
interconnect node.
(I asked about a top-level property, and was told its not 2012 anymore!)
> If you can add this errata table to the DT, then why not add
> describing the interconnect? Then it will be there for the next thing
> we need the interconnect for. I imagine some of the interconnects are
> already described if not explicitly in bits and pieces (i.e. clocks or
> power domains for the interconnect get tossed into some other node).
Another example in this space might help:
Cortex-A510 has an erratum #2658417 "BFMMLA or VMMLA instructions might produce incorrect
result"
The configurations affected if the shared VPU datapath (whatever that is) between a pair
of cores is 2x64. Which cores share what doesn't fit with DT's model of CPUs.
It's a darn sight easier to ask "are you affected by #2658417".
Today, the workaround for that erratum nobbles the feature on all CPUs, because the OS
can't know. With either the firmware interface, or this description of it, BF16 can be
re-enabled on those parts that aren't affected.
Improving this falls in to the same trap of people not updating EL3 firmware when they
aren't affected by an erratum.
Another example in the same document is #2022138 "Core in FULL_RET might not have clock
gated" ... fortunately EL3 implements this one, as I've no idea what "if the FULL_RET
power mode has been implemented for logic retention" means.
My point is: these things are nasty. Using the erratum number to identify the conditions
and the workaround gives us one unambiguous source of truth.
>>> Do we really want to encourage/endorse/support non-updatable firmware?
>>> We've rejected plenty of things with 'fix your firmware'.
>>
>> A DT property was explicitly requested by Marc Z on the RFC:
>> https://lore.kernel.org/linux-arm-kernel/86mt5dxxbc.wl-maz@kernel.org/
>>
>> The concern is that platforms where the CPU is affected, but the issue is masked by the
>> interconnect will never bother with a firmware interface. The kernel can't know this, so
>> has to enable the workaround at the cost of performance.
>
> Sure it can. Worst case, the kernel knows the exact SoC and board it
> is running on because we have root level compatibles for those. It's
> just a question of whether using those would scale or not. Whether
> that scales or not depends on both how long the lists are and how
> distributed the implementation is (e.g. PCI quirks). More on that
> below.
My stab in the dark at the length of the list is: every variant of every phone shipped in
the last two years, and every variant using these CPUs for a few more years. We'd never
have a complete list. Moving this to firmware (be that DT or EL3) is the only way to make
this scale.
>> Adding a DT property to describe the hardware state of the erratum avoids the need for
>> separate kernel builds to work around the missing firmware.
> DT is not the kernel's runtime configuration mechanism. That assumes a
> tight coupling of the DT and kernel. That may work for some cases like
> Android with kernel and DT updated together, but for upstream we can't
> assume that coupling and must treat it as an ABI (not an issue for
> this case).
> The kernel command line is a runtime config mechanism for the kernel.
> So why not use only that?
As implemented in patch 6.
> There's certainly some room for improvement
> with handling it. For example, it's not well defined with what happens
> with 'bootargs' as you go from a dtb to bootloader to kernel in terms
> of overriding/prepending/appending, but that could be addressed.
Command-line arguments are horrific for the distributions to work with. Some tool would
need the list of affected parts to insert the argument, we've just moved the problem.
This isn't policy, so the command-line is the wrong tool for the job. Patch 6 is only
included to give people stuck with an affected platform something they can do.
>>>> Instead of adding a device-tree entry for any CPU errata that is
>>>> relevant (or not) to the platform, allow the device-tree to describe
>>>> firmware's responses for the SMCCC interface. This could be used as
>>>> the data source for the firmware interface, or be parsed by the OS if
>>>> the firmware interface is missing.
>>
>>> What's to prevent vendors from only using the DT mechanism and never
>>> supporting the SMCCC interface? I'm sure some will love to not have to
>>> make a firmware update when they can just fix it in DT.
>>
>> The firmware interface has to exist for ACPI systems where EL3 can't influence the ACPI
>> tables, which typically get replaced if the part is OEM'd.
>>
>> Ultimately all the interface is describing is a non-discoverable hardware property
>> relevant to an erratum. These are often configurations the silicon manufacturer chooses.
>> In this case its the behaviour of the interconnect.
>>
>> If we didn't have to support ACPI systems, this stuff would only have been in the DT. With
> With...?
Looks like I got interrupted when writing this.
> I fail to see what ACPI has to do with DT platforms adopting the SMCCC
> interface or not.
My point was the SMCCC interface has to exist because of ACPI systems. Without them, the
available tools to solve the problem would be different.
I don't see a difference between using SMCCC as a source of this information, or the
firmware node of the DT. I have no problem if the manufacturers of DT systems never
implement the SMCCC. The EL3 firmware already contains these tables, including them in the
firmware node of the DT is just making the information visible.
>>> The downside to this extendable binding compared to simple properties
>>> is parsing a flat tree is slow and more complicated. So it may be
>>> difficult to support if you need this early in boot.
>>
>> I do need this early in the boot, but I'm not worried about the delay as these tables
>> should be small.
>>
>>
>>>> Most errata can be detected from CPU id registers. These mechanisms
>>>> are only needed for the rare cases that external knowledge is needed.
>>>
>>> And also have significant performance impact. In the end, how many
>>> platforms are there that can't fix these in firmware and need a
>>> mainline/distro kernel optimized to avoid the work-around. I suspect
>>> the number is small enough it could be a list in the kernel.
>>
>> At a guess, its all mobile phones produced in the last 2 years that want to run a version
>> of Android that uses virtualisation. Cortex-A78 is affected, but I don't know when the
>> first products were shipped.
> How many run mainline and run it well enough to even care about the
> optimization yet?
By the optimization I assume you mean FWB?
This happens when the VM takes a stage2 translation fault because the host's mm code moved
the memory. (due to THP, KSM, compaction, etc). The guest vCPU is blocked until the cache
maintenance completes. The feature was added because of the overhead for memory intensive
workloads.
Affected platforms will be stuck with this, but I anticipate they are in the minority.
> Even if we went with the above list, that's 2 years x 4 vendors (QCom,
> Mediatek, Samsung, Google) x 1-2 Soc (high and mid tier).
If this is the top-level DT machine compatible, wouldn't you see Xiaomi, Oppo, Huawei etc?
Wikipedia lists 175 brands.
If this is SoC-ID ... I've no idea how to start collecting those values.
> Subtract out
> any vendors capable of updating their firmware. So a worst case list
> of potentially 8-16 SoCs? It shouldn't grow because anything newer is
> going to implement the SMCCC interface, right? That's not unmanageable
> in my book.
If we wanted a short list, we could just list the affected platforms.
But I don't think the scaleability or manageability is about the length of the list, its
about knowing you've got the complete list. I've seen VMs running on network switches, I
wouldn't be surprised if there are DT systems in this space.
Pushing this back into the firmware description moves this problem to someone much closer
to the device, who is able to get the SoC manufacturer to answer questions.
Doing it in terms of the erratum number means the question is unambiguous.
Thanks,
James
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 2/6] firmware: smccc: Add support for erratum discovery API
2023-03-30 16:51 [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects James Morse
2023-03-30 16:51 ` [PATCH 1/6] dt-bindings: firmware: Add arm,errata-management James Morse
@ 2023-03-30 16:51 ` James Morse
2023-03-30 20:34 ` kernel test robot
2023-03-30 16:51 ` [PATCH 3/6] arm64: cputype: Add new part numbers for Cortex-X3, and Neoverse-V2 James Morse
` (6 subsequent siblings)
8 siblings, 1 reply; 27+ messages in thread
From: James Morse @ 2023-03-30 16:51 UTC (permalink / raw)
To: linux-arm-kernel, devicetree
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
Sudeep Holla, Marc Zyngier, Oliver Upton, James Morse,
Rob Herring, Krzysztof Kozlowski
It is not always possible for the OS to determine if a CPU is affected by
a particular erratum. For example, it may depend on an integration choice
the chip designer made, or whether firmware has enabled some particular
feature.
Add support for the SMCCC 'Errata Management Firmware Interface' that lets
the OS query firmware for this information.
Link: https://developer.arm.com/documentation/den0100/1-0/?lang=en
Signed-off-by: James Morse <james.morse@arm.com>
---
arch/arm64/kernel/cpufeature.c | 7 +++
drivers/firmware/smccc/Kconfig | 8 ++++
drivers/firmware/smccc/Makefile | 1 +
drivers/firmware/smccc/em.c | 78 +++++++++++++++++++++++++++++++++
include/linux/arm-smccc.h | 28 ++++++++++++
include/linux/arm_smccc_em.h | 11 +++++
6 files changed, 133 insertions(+)
create mode 100644 drivers/firmware/smccc/em.c
create mode 100644 include/linux/arm_smccc_em.h
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 2e3e55139777..62f996006783 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -62,6 +62,7 @@
#define pr_fmt(fmt) "CPU features: " fmt
+#include <linux/arm_smccc_em.h>
#include <linux/bsearch.h>
#include <linux/cpumask.h>
#include <linux/crash_dump.h>
@@ -1047,6 +1048,12 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info)
*/
init_cpu_hwcaps_indirect_list();
+ /*
+ * Early erratum workaround may need to be discovered from firmware.
+ */
+ if (IS_ENABLED(CONFIG_ARM_SMCCC_EM))
+ arm_smccc_em_init();
+
/*
* Detect and enable early CPU capabilities based on the boot CPU,
* after we have initialised the CPU feature infrastructure.
diff --git a/drivers/firmware/smccc/Kconfig b/drivers/firmware/smccc/Kconfig
index 15e7466179a6..a10a150d49bb 100644
--- a/drivers/firmware/smccc/Kconfig
+++ b/drivers/firmware/smccc/Kconfig
@@ -23,3 +23,11 @@ config ARM_SMCCC_SOC_ID
help
Include support for the SoC bus on the ARM SMCCC firmware based
platforms providing some sysfs information about the SoC variant.
+
+config ARM_SMCCC_EM
+ bool "Errata discovery by ARM SMCCC"
+ depends on HAVE_ARM_SMCCC_DISCOVERY
+ default y
+ help
+ Include support for querying firmware via SMCCC to determine whether
+ the CPU is affected by a specific erratum.
diff --git a/drivers/firmware/smccc/Makefile b/drivers/firmware/smccc/Makefile
index 40d19144a860..39ed128b59b5 100644
--- a/drivers/firmware/smccc/Makefile
+++ b/drivers/firmware/smccc/Makefile
@@ -2,3 +2,4 @@
#
obj-$(CONFIG_HAVE_ARM_SMCCC_DISCOVERY) += smccc.o kvm_guest.o
obj-$(CONFIG_ARM_SMCCC_SOC_ID) += soc_id.o
+obj-$(CONFIG_ARM_SMCCC_EM) += em.o
diff --git a/drivers/firmware/smccc/em.c b/drivers/firmware/smccc/em.c
new file mode 100644
index 000000000000..2c66240d8707
--- /dev/null
+++ b/drivers/firmware/smccc/em.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Arm Errata Management firmware interface.
+ *
+ * This firmware interface advertises support for firmware mitigations for CPU
+ * errata. It can also be used to discover erratum where the 'configurations
+ * affected' depends on the integration.
+ *
+ * Copyright (C) 2022 ARM Limited
+ */
+
+#define pr_fmt(fmt) "arm_smccc_em: " fmt
+
+#include <linux/arm_smccc_em.h>
+#include <linux/arm-smccc.h>
+#include <linux/errno.h>
+#include <linux/printk.h>
+
+#include <asm/alternative.h>
+
+#include <uapi/linux/psci.h>
+
+static u32 supported;
+
+int arm_smccc_em_cpu_features(u32 erratum_id)
+{
+ struct arm_smccc_res res;
+
+ if (!READ_ONCE(supported))
+ return -EOPNOTSUPP;
+
+ arm_smccc_1_1_invoke(ARM_SMCCC_EM_CPU_ERRATUM_FEATURES, erratum_id, 0, &res);
+ switch (res.a0) {
+ case SMCCC_RET_NOT_SUPPORTED:
+ return -EOPNOTSUPP;
+ case SMCCC_EM_RET_INVALID_PARAMTER:
+ return -EINVAL;
+ case SMCCC_EM_RET_UNKNOWN:
+ return -ENOENT;
+ case SMCCC_EM_RET_HIGHER_EL_MITIGATION:
+ case SMCCC_EM_RET_NOT_AFFECTED:
+ case SMCCC_EM_RET_AFFECTED:
+ return res.a0;
+ };
+
+ return -EIO;
+}
+
+int __init arm_smccc_em_init(void)
+{
+ u32 major_ver, minor_ver;
+ struct arm_smccc_res res;
+ enum arm_smccc_conduit conduit = arm_smccc_1_1_get_conduit();
+
+ if (conduit == SMCCC_CONDUIT_NONE)
+ return -EOPNOTSUPP;
+
+ arm_smccc_1_1_invoke(ARM_SMCCC_EM_VERSION, &res);
+ if (res.a0 == SMCCC_RET_NOT_SUPPORTED)
+ return -EOPNOTSUPP;
+
+ major_ver = PSCI_VERSION_MAJOR(res.a0);
+ minor_ver = PSCI_VERSION_MINOR(res.a0);
+ if (major_ver != 1)
+ return -EIO;
+
+ arm_smccc_1_1_invoke(ARM_SMCCC_EM_FEATURES,
+ ARM_SMCCC_EM_CPU_ERRATUM_FEATURES, &res);
+ if (res.a0 == SMCCC_RET_NOT_SUPPORTED)
+ return -EOPNOTSUPP;
+
+ pr_info("SMCCC Errata Management Interface v%d.%d\n",
+ major_ver, minor_ver);
+
+ WRITE_ONCE(supported, 1);
+
+ return 0;
+}
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index 220c8c60e021..cc2e38ce8707 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -182,6 +182,25 @@
ARM_SMCCC_OWNER_STANDARD, \
0x53)
+/* Errata Management calls (defined by ARM DEN0100) */
+#define ARM_SMCCC_EM_VERSION \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_STANDARD, \
+ 0xF0)
+
+#define ARM_SMCCC_EM_FEATURES \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_STANDARD, \
+ 0xF1)
+
+#define ARM_SMCCC_EM_CPU_ERRATUM_FEATURES \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_STANDARD, \
+ 0xF2)
+
/*
* Return codes defined in ARM DEN 0070A
* ARM DEN 0070A is now merged/consolidated into ARM DEN 0028 C
@@ -191,6 +210,15 @@
#define SMCCC_RET_NOT_REQUIRED -2
#define SMCCC_RET_INVALID_PARAMETER -3
+/*
+ * Return codes defined in ARM DEN 0100
+ */
+#define SMCCC_EM_RET_HIGHER_EL_MITIGATION 3
+#define SMCCC_EM_RET_NOT_AFFECTED 2
+#define SMCCC_EM_RET_AFFECTED 1
+#define SMCCC_EM_RET_INVALID_PARAMTER -2
+#define SMCCC_EM_RET_UNKNOWN -3
+
#ifndef __ASSEMBLY__
#include <linux/linkage.h>
diff --git a/include/linux/arm_smccc_em.h b/include/linux/arm_smccc_em.h
new file mode 100644
index 000000000000..71293cbbe545
--- /dev/null
+++ b/include/linux/arm_smccc_em.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2023 ARM Limited */
+#ifndef __LINUX_ARM_SMCCC_EM_H
+#define __LINUX_ARM_SMCCC_EM_H
+
+#include <linux/types.h>
+
+int arm_smccc_em_init(void);
+int arm_smccc_em_cpu_features(u32 erratum_id);
+
+#endif /* __LINUX_ARM_SMCCC_EM_H */
--
2.39.2
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [PATCH 2/6] firmware: smccc: Add support for erratum discovery API
2023-03-30 16:51 ` [PATCH 2/6] firmware: smccc: Add support for erratum discovery API James Morse
@ 2023-03-30 20:34 ` kernel test robot
0 siblings, 0 replies; 27+ messages in thread
From: kernel test robot @ 2023-03-30 20:34 UTC (permalink / raw)
To: James Morse, linux-arm-kernel, devicetree
Cc: oe-kbuild-all, Catalin Marinas, Will Deacon, Mark Rutland,
Lorenzo Pieralisi, Sudeep Holla, Marc Zyngier, Oliver Upton,
James Morse, Rob Herring, Krzysztof Kozlowski
Hi James,
I love your patch! Yet something to improve:
[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on robh/for-next arm/for-next arm/fixes kvmarm/next soc/for-next xilinx-xlnx/master linus/master v6.3-rc4 next-20230330]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/James-Morse/dt-bindings-firmware-Add-arm-errata-management/20230331-005505
base: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
patch link: https://lore.kernel.org/r/20230330165128.3237939-3-james.morse%40arm.com
patch subject: [PATCH 2/6] firmware: smccc: Add support for erratum discovery API
config: arm-randconfig-r046-20230329 (https://download.01.org/0day-ci/archive/20230331/202303310415.JJnkYNSr-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/d17afdb2a97dfce25c3005a165eb38c4d35d31ed
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review James-Morse/dt-bindings-firmware-Add-arm-errata-management/20230331-005505
git checkout d17afdb2a97dfce25c3005a165eb38c4d35d31ed
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash drivers/firmware/smccc/
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303310415.JJnkYNSr-lkp@intel.com/
All errors (new ones prefixed by >>):
>> drivers/firmware/smccc/em.c:19:10: fatal error: asm/alternative.h: No such file or directory
19 | #include <asm/alternative.h>
| ^~~~~~~~~~~~~~~~~~~
compilation terminated.
vim +19 drivers/firmware/smccc/em.c
18
> 19 #include <asm/alternative.h>
20
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 3/6] arm64: cputype: Add new part numbers for Cortex-X3, and Neoverse-V2
2023-03-30 16:51 [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects James Morse
2023-03-30 16:51 ` [PATCH 1/6] dt-bindings: firmware: Add arm,errata-management James Morse
2023-03-30 16:51 ` [PATCH 2/6] firmware: smccc: Add support for erratum discovery API James Morse
@ 2023-03-30 16:51 ` James Morse
2023-03-30 16:51 ` [PATCH 4/6] arm64: errata: Disable FWB on parts with non-ARM interconnects James Morse
` (5 subsequent siblings)
8 siblings, 0 replies; 27+ messages in thread
From: James Morse @ 2023-03-30 16:51 UTC (permalink / raw)
To: linux-arm-kernel, devicetree
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
Sudeep Holla, Marc Zyngier, Oliver Upton, James Morse,
Rob Herring, Krzysztof Kozlowski
New CPUs have new errata. Add the new partnumbers.
Signed-off-by: James Morse <james.morse@arm.com>
---
Cortex-X3:
https://developer.arm.com/documentation/101593/0102/?lang=en
Neoverse-V2:
https://developer.arm.com/documentation/102375/0002/?lang=en
---
arch/arm64/include/asm/cputype.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h
index 683ca3af4084..1a2c55e172e8 100644
--- a/arch/arm64/include/asm/cputype.h
+++ b/arch/arm64/include/asm/cputype.h
@@ -84,6 +84,8 @@
#define ARM_CPU_PART_CORTEX_X2 0xD48
#define ARM_CPU_PART_NEOVERSE_N2 0xD49
#define ARM_CPU_PART_CORTEX_A78C 0xD4B
+#define ARM_CPU_PART_CORTEX_X3 0xD4E
+#define ARM_CPU_PART_NEOVERSE_V2 0xD4F
#define APM_CPU_PART_POTENZA 0x000
@@ -149,6 +151,8 @@
#define MIDR_CORTEX_X2 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_X2)
#define MIDR_NEOVERSE_N2 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_NEOVERSE_N2)
#define MIDR_CORTEX_A78C MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A78C)
+#define MIDR_CORTEX_X3 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_X3)
+#define MIDR_NEOVERSE_V2 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_NEOVERSE_V2)
#define MIDR_THUNDERX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX)
#define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_81XX)
#define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_83XX)
--
2.39.2
^ permalink raw reply related [flat|nested] 27+ messages in thread* [PATCH 4/6] arm64: errata: Disable FWB on parts with non-ARM interconnects
2023-03-30 16:51 [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects James Morse
` (2 preceding siblings ...)
2023-03-30 16:51 ` [PATCH 3/6] arm64: cputype: Add new part numbers for Cortex-X3, and Neoverse-V2 James Morse
@ 2023-03-30 16:51 ` James Morse
2023-03-30 16:51 ` [PATCH 5/6] firmware: smccc: Allow errata management to be overridden by device tree James Morse
` (4 subsequent siblings)
8 siblings, 0 replies; 27+ messages in thread
From: James Morse @ 2023-03-30 16:51 UTC (permalink / raw)
To: linux-arm-kernel, devicetree
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
Sudeep Holla, Marc Zyngier, Oliver Upton, James Morse,
Rob Herring, Krzysztof Kozlowski
Force Write Back (FWB) allows the hypervisor to force non-cacheable
accesses made by a guest to be cacheable. This saves the hypervisor
from doing cache maintenance on all pages the guest can access, to
ensure the guest doesn't see stale (and possibly sensitive) data when
making a non-cacheable access.
When stage1 translation is disabled, the SCTRL_E1.I bit controls the
attributes used for instruction fetch, one of the options results in a
non-cacheable access. A whole host of CPUs missed the FWB override
in this case, meaning a KVM guest could fetch stale/junk data instead of
instructions.
The workaround is to always do the cache maintenance. These parts don't
have fine-grained-traps, so it isn't feasible to detect the guest
disabling the MMU. Instead, disable FWB on the host.
While the CPUs are affected, this erratum doesn't occur on parts using
Arm's CMN interconnects. Use the Errata Management API to discover whether
this CPU is affected.
Because guest execution is compromised, the workaround is enabled by
default. If the Errata Management API isn't implemented by firmware, the
workaround will be enabled. If a target platform is not affected, and it
isn't possible to add support for the Errata Management API, the erratum
can be disabled in Kconfig.
Signed-off-by: James Morse <james.morse@arm.com>
---
This patch causes the additional output:
| Stage-2 Force Write-Back disabled due to erratum #2701951
| CPU features: detected: ARM erratum 2701951
---
Documentation/arm64/silicon-errata.rst | 18 ++++++
arch/arm64/Kconfig | 27 ++++++++
arch/arm64/include/asm/cpufeature.h | 1 +
arch/arm64/kernel/cpu_errata.c | 86 ++++++++++++++++++++++++++
arch/arm64/kernel/cpufeature.c | 16 ++++-
arch/arm64/tools/cpucaps | 1 +
6 files changed, 148 insertions(+), 1 deletion(-)
diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
index ec5f889d7681..d6ca86ebc7af 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -106,6 +106,10 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A77 | #1508412 | ARM64_ERRATUM_1508412 |
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | Cortex-A78 | #2712571 | ARM64_ERRATUM_2701951 |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM | Cortex-A78C | #2712575,2712572| ARM64_ERRATUM_2701951 |
++----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A510 | #2051678 | ARM64_ERRATUM_2051678 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A510 | #2077057 | ARM64_ERRATUM_2077057 |
@@ -120,12 +124,20 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A710 | #2224489 | ARM64_ERRATUM_2224489 |
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | Cortex-A710 | #2701952 | ARM64_ERRATUM_2701951 |
++----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A715 | #2645198 | ARM64_ERRATUM_2645198 |
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | Cortex-X1 | #2712571 | ARM64_ERRATUM_2701951 |
++----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-X2 | #2119858 | ARM64_ERRATUM_2119858 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-X2 | #2224489 | ARM64_ERRATUM_2224489 |
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | Cortex-X2 | #2701952 | ARM64_ERRATUM_2701951 |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM | Cortex-X3 | #2701951 | ARM64_ERRATUM_2701951 |
++----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N1 | #1349291 | N/A |
@@ -138,6 +150,12 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N2 | #2253138 | ARM64_ERRATUM_2253138 |
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | Neoverse-N2 | #2728475 | ARM64_ERRATUM_2701951 |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM | Neoverse-V1 | #2701953 | ARM64_ERRATUM_2701951 |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM | Neoverse-V2 | #2719103 | ARM64_ERRATUM_2701951 |
++----------------+-----------------+-----------------+-----------------------------+
| ARM | MMU-500 | #841119,826419 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1023e896d46b..0d07ddd15bfb 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -990,6 +990,33 @@ config ARM64_ERRATUM_2645198
If unsure, say Y.
+config ARM64_ERRATUM_2701951
+ bool "ARM CPUs: 2701951: disable FWB on affected parts"
+ select ARM_SMCCC_EM
+ default y
+ help
+ This option adds the workaround for multiple ARM errata titled
+ "The core might fetch stale instruction from memory when both Stage 1
+ Translation and Instruction Cache are Disabled with Stage 2 forced
+ Write-Back".
+ This affects Cortex cores: A78, A78C, A710, X1, X2, X3, and Neoverse
+ cores: V1, V2 and N2.
+
+ Affected cores fail to apply the FWB override to instruction fetch
+ when stage1 translation is disabled, and SCTLR_EL1.I is clear. This
+ results in stale data being fetched and executed. Only CPUs that are
+ connected to a non-Arm interconnect will exhibit symptoms due to this
+ errata.
+
+ Work around this problem in the driver by disabling FWB on affected
+ parts. The SMCCC Errata Management API is used to query firmware to
+ learn if the part is affected.
+
+ If the SMCCC Errata Management API is not implemented on a platform
+ with an affected core, the workaround will be applied.
+
+ If unsure, say Y.
+
config CAVIUM_ERRATUM_22375
bool "Cavium erratum 22375, 24313"
default y
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 6bf013fb110d..435e5d1b49ab 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -635,6 +635,7 @@ static inline bool id_aa64pfr1_mte(u64 pfr1)
void __init setup_cpu_features(void);
void check_local_cpu_capabilities(void);
+bool has_stage2_fwb_errata(const struct arm64_cpu_capabilities *entry, int scope);
u64 read_sanitised_ftr_reg(u32 id);
u64 __read_sysreg_by_encoding(u32 sys_id);
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 307faa2b4395..55da9e588b9e 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -6,6 +6,7 @@
*/
#include <linux/arm-smccc.h>
+#include <linux/arm_smccc_em.h>
#include <linux/types.h>
#include <linux/cpu.h>
#include <asm/cpu.h>
@@ -137,6 +138,81 @@ cpu_clear_bf16_from_user_emulation(const struct arm64_cpu_capabilities *__unused
raw_spin_unlock(®_user_mask_modification);
}
+bool has_stage2_fwb_errata(const struct arm64_cpu_capabilities *ignored,
+ int scope)
+{
+ u64 idr;
+ bool has_feature;
+
+ /* List of CPUs which may have broken FWB support. */
+ static const struct midr_range cpus[] = {
+#ifdef CONFIG_ARM64_ERRATUM_2701951
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A78),
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A78C),
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_X1),
+ MIDR_ALL_VERSIONS(MIDR_CORTEX_X2),
+ MIDR_RANGE(MIDR_CORTEX_X3, 0, 0, 1, 1),
+ MIDR_RANGE(MIDR_NEOVERSE_V1, 0, 0, 1, 1),
+ MIDR_RANGE(MIDR_NEOVERSE_V2, 0, 0, 0, 1),
+ MIDR_RANGE(MIDR_NEOVERSE_N2, 0, 0, 0, 2),
+#endif
+ { /* sentinel */ },
+ };
+
+ if (scope == ARM64_CPUCAP_SCOPE_SYSTEM)
+ return cpus_have_cap(ARM64_WORKAROUND_NO_FWB);
+
+ idr = read_cpuid(ID_AA64MMFR2_EL1);
+ has_feature = FIELD_GET(ID_AA64MMFR2_EL1_FWB, idr);
+ if (!has_feature)
+ return false;
+
+ if (is_midr_in_range_list(read_cpuid_id(), cpus)) {
+ int i;
+ bool fwb_broken = true;
+
+ /*
+ * List of erratum numbers for these CPUs.
+ * It isn't possible to match these to their CPUs, as A78C has
+ * two erratum numbers. The errata management API will return
+ * 'UNKNOWN' for an erratum it doesn't recognise.
+ */
+ static const u32 erratum_nums[] = {
+ 2701951,
+ 2701952,
+ 2701953,
+ 2712571,
+ 2712572,
+ 2712575,
+ 2719103,
+ 2728475,
+ };
+
+ /*
+ * The CPU is affected, but what about this configuration?
+ * Only firmware has the answer. Assume the part is affected,
+ * and query firmware for the set of erratum numbers. If one
+ * returns not-affected, the workaround isn't needed.
+ */
+ for (i = 0; i < ARRAY_SIZE(erratum_nums); i++) {
+ int state = arm_smccc_em_cpu_features(erratum_nums[i]);
+
+ if (state == SMCCC_EM_RET_NOT_AFFECTED) {
+ fwb_broken = false;
+ break;
+ }
+ }
+
+ if (fwb_broken) {
+ pr_info_once("Stage-2 Force Write-Back disabled due to erratum #2701951\n");
+ return true;
+ }
+ }
+
+ return false;
+}
+
#define CAP_MIDR_RANGE(model, v_min, r_min, v_max, r_max) \
.matches = is_affected_midr_range, \
.midr_range = MIDR_RANGE(model, v_min, r_min, v_max, r_max)
@@ -730,6 +806,16 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
.cpu_enable = cpu_clear_bf16_from_user_emulation,
},
#endif
+#ifdef CONFIG_ARM64_ERRATUM_2701951
+ {
+ .desc = "ARM erratum 2701951",
+ .capability = ARM64_WORKAROUND_NO_FWB,
+ .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM,
+ .matches = has_stage2_fwb_errata,
+
+ },
+#endif
+
{
}
};
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 62f996006783..099bf6ad7552 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1586,6 +1586,20 @@ static bool has_cache_dic(const struct arm64_cpu_capabilities *entry,
return ctr & BIT(CTR_EL0_DIC_SHIFT);
}
+static bool has_stage2_fwb(const struct arm64_cpu_capabilities *entry,
+ int scope)
+{
+ bool has_feature = has_cpuid_feature(entry, scope);
+
+ if (!has_feature)
+ return false;
+
+ if (has_stage2_fwb_errata(NULL, scope))
+ return false;
+
+ return has_feature;
+}
+
static bool __maybe_unused
has_useable_cnp(const struct arm64_cpu_capabilities *entry, int scope)
{
@@ -2438,7 +2452,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
.field_pos = ID_AA64MMFR2_EL1_FWB_SHIFT,
.field_width = 4,
.min_field_value = 1,
- .matches = has_cpuid_feature,
+ .matches = has_stage2_fwb,
},
{
.desc = "ARMv8.4 Translation Table Level",
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 37b1340e9646..2e5f70ec6410 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -86,6 +86,7 @@ WORKAROUND_CAVIUM_TX2_219_PRFM
WORKAROUND_CAVIUM_TX2_219_TVM
WORKAROUND_CLEAN_CACHE
WORKAROUND_DEVICE_LOAD_ACQUIRE
+WORKAROUND_NO_FWB
WORKAROUND_NVIDIA_CARMEL_CNP
WORKAROUND_QCOM_FALKOR_E1003
WORKAROUND_REPEAT_TLBI
--
2.39.2
^ permalink raw reply related [flat|nested] 27+ messages in thread* [PATCH 5/6] firmware: smccc: Allow errata management to be overridden by device tree
2023-03-30 16:51 [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects James Morse
` (3 preceding siblings ...)
2023-03-30 16:51 ` [PATCH 4/6] arm64: errata: Disable FWB on parts with non-ARM interconnects James Morse
@ 2023-03-30 16:51 ` James Morse
2023-03-30 20:44 ` kernel test robot
2023-03-30 16:51 ` [PATCH 6/6] arm64: errata: Add a commandline option to enable/disable #2701951 James Morse
` (3 subsequent siblings)
8 siblings, 1 reply; 27+ messages in thread
From: James Morse @ 2023-03-30 16:51 UTC (permalink / raw)
To: linux-arm-kernel, devicetree
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
Sudeep Holla, Marc Zyngier, Oliver Upton, James Morse,
Rob Herring, Krzysztof Kozlowski
The Errata Management SMCCC interface allows firmware to advertise whether
the OS is affected by an erratum, or if a higher exception level has
mitigated the issue. This allows properties of the device that are not
discoverable by the OS to be described. e.g. some errata depend on the
behaviour of the interconnect, which is not visible to the OS.
Deployed devices may find it significantly harder to update EL3
firmware than the device tree. Erratum workarounds typically have to
fail safe, and assume the platform is affected putting correctness
above performance.
Instead of adding a device-tree entry for any CPU errata that is
relevant (or not) to the platform, allow the device-tree to provide
the data firmware should provide via the SMCCC interface. This can be
used to provide the value if the firmware is not implemented, or
override the firmware response if the value provided is wrong.
The vast majority of CPU errata solely depend on the CPU, and can
be detected from the CPUs id registers. The number of entries in
these tables is expected to be small.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/firmware/smccc/em.c | 227 ++++++++++++++++++++++++++++++++++--
1 file changed, 217 insertions(+), 10 deletions(-)
diff --git a/drivers/firmware/smccc/em.c b/drivers/firmware/smccc/em.c
index 2c66240d8707..62e05e6b2140 100644
--- a/drivers/firmware/smccc/em.c
+++ b/drivers/firmware/smccc/em.c
@@ -6,14 +6,21 @@
* errata. It can also be used to discover erratum where the 'configurations
* affected' depends on the integration.
*
+ * The interfaces's return codes have negative values. These should always be
+ * converted to linux errno values to avoid confusion.
+ *
* Copyright (C) 2022 ARM Limited
*/
#define pr_fmt(fmt) "arm_smccc_em: " fmt
+#include <linux/acpi.h>
#include <linux/arm_smccc_em.h>
#include <linux/arm-smccc.h>
#include <linux/errno.h>
+#include <linux/memblock.h>
+#include <linux/of.h>
+#include <linux/percpu.h>
#include <linux/printk.h>
#include <asm/alternative.h>
@@ -22,52 +29,252 @@
static u32 supported;
+/*
+ * This driver may be called when the boot CPU needs to detect an erratum and
+ * apply alternatives. This happens before page_alloc_init(), so this driver
+ * cannot allocate memory using slab. Allocate a page instead, and re-use it
+ * for CPUs that share a set of errata.
+ */
+#define MAX_EM_ARRAY_SIZE ((PAGE_SIZE / sizeof(u32) / 3) - 1)
+
+struct arm_em_dt_supplement_array {
+ u32 num;
+ u32 val[MAX_EM_ARRAY_SIZE];
+} __packed;
+
+struct arm_em_dt_supplement {
+ struct arm_em_dt_supplement_array affected;
+ struct arm_em_dt_supplement_array not_affected;
+ struct arm_em_dt_supplement_array higher_el_mitigated;
+} __packed;
+
+static DEFINE_PER_CPU(struct arm_em_dt_supplement *, em_table);
+
+static bool arm_smccc_em_search_table(struct arm_em_dt_supplement_array *entry,
+ u32 erratum_id)
+{
+ int i;
+
+ for (i = 0; i <= entry->num; i++) {
+ if (entry->val[i] == erratum_id)
+ return true;
+ }
+
+ return false;
+}
+
+static int arm_smccc_em_query_dt(u32 erratum_id)
+{
+ struct arm_em_dt_supplement *tbl = *this_cpu_ptr(&em_table);
+
+ if (!tbl)
+ return -ENOENT;
+
+ if (arm_smccc_em_search_table(&tbl->affected, erratum_id))
+ return SMCCC_EM_RET_AFFECTED;
+ if (arm_smccc_em_search_table(&tbl->not_affected, erratum_id))
+ return SMCCC_EM_RET_NOT_AFFECTED;
+ if (arm_smccc_em_search_table(&tbl->higher_el_mitigated, erratum_id))
+ return SMCCC_EM_RET_HIGHER_EL_MITIGATION;
+
+ return -ENOENT;
+}
+
+/* Only call when both values >= 0 */
+static int arm_smccc_em_merge_retval(u32 erratum_id, u32 one, u32 two)
+{
+ if (one == two)
+ return one;
+
+ pr_warn_once("FW/DT mismatch for errataum #%u", erratum_id);
+ pr_warn_once("(Any subsequent mismatch warnings are suppressed)\n");
+
+ if (one == SMCCC_EM_RET_AFFECTED || two == SMCCC_EM_RET_AFFECTED)
+ return SMCCC_EM_RET_AFFECTED;
+
+ if (one == SMCCC_EM_RET_HIGHER_EL_MITIGATION ||
+ two == SMCCC_EM_RET_HIGHER_EL_MITIGATION)
+ return SMCCC_EM_RET_HIGHER_EL_MITIGATION;
+
+ return SMCCC_EM_RET_NOT_AFFECTED;
+}
+
int arm_smccc_em_cpu_features(u32 erratum_id)
{
+ int dt_retval;
struct arm_smccc_res res;
+ bool _supported = READ_ONCE(supported);
- if (!READ_ONCE(supported))
+ dt_retval = arm_smccc_em_query_dt(erratum_id);
+ if (!_supported && dt_retval <= 0)
return -EOPNOTSUPP;
- arm_smccc_1_1_invoke(ARM_SMCCC_EM_CPU_ERRATUM_FEATURES, erratum_id, 0, &res);
+ if (_supported)
+ arm_smccc_1_1_invoke(ARM_SMCCC_EM_CPU_ERRATUM_FEATURES,
+ erratum_id, 0, &res);
+ else
+ res.a0 = SMCCC_RET_NOT_SUPPORTED;
+
switch (res.a0) {
+ /* DT can always override errata firmware doesn't know about */
case SMCCC_RET_NOT_SUPPORTED:
- return -EOPNOTSUPP;
case SMCCC_EM_RET_INVALID_PARAMTER:
- return -EINVAL;
case SMCCC_EM_RET_UNKNOWN:
- return -ENOENT;
+ return dt_retval;
+
+ /*
+ * But if there is a mismatch - print a warning and prefer to enable
+ * the erratum workaround.
+ */
case SMCCC_EM_RET_HIGHER_EL_MITIGATION:
case SMCCC_EM_RET_NOT_AFFECTED:
case SMCCC_EM_RET_AFFECTED:
- return res.a0;
+ if (dt_retval > 0)
+ return arm_smccc_em_merge_retval(erratum_id, res.a0,
+ dt_retval);
+ else
+ return res.a0;
};
return -EIO;
}
+int arm_smccc_em_dt_alloc_tbl_entry(struct device_node *np, const char *name,
+ struct arm_em_dt_supplement_array *entry)
+{
+ int ret = of_property_count_u32_elems(np, name);
+
+ if (ret <= 0)
+ return 0;
+ if (ret > ARRAY_SIZE(entry->val))
+ return -E2BIG;
+
+ entry->num = ret;
+ return of_property_read_u32_array(np, name, entry->val, entry->num);
+}
+
+static struct arm_em_dt_supplement *arm_smccc_em_dt_alloc_tbl(struct device_node *np)
+{
+ struct arm_em_dt_supplement *tbl = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
+
+ BUILD_BUG_ON(sizeof(struct arm_em_dt_supplement) > PAGE_SIZE);
+
+ if (!tbl)
+ return ERR_PTR(-ENOMEM);
+
+ if (arm_smccc_em_dt_alloc_tbl_entry(np, "arm,erratum-affected",
+ &tbl->affected)) {
+ memblock_free(tbl, PAGE_SIZE);
+ return ERR_PTR(-EIO);
+ }
+ if (arm_smccc_em_dt_alloc_tbl_entry(np, "arm,erratum-not-affected",
+ &tbl->not_affected)) {
+ memblock_free(tbl, PAGE_SIZE);
+ return ERR_PTR(-EIO);
+ }
+ if (arm_smccc_em_dt_alloc_tbl_entry(np, "arm,erratum-higher-el-mitigated",
+ &tbl->higher_el_mitigated)) {
+ memblock_free(tbl, PAGE_SIZE);
+ return ERR_PTR(-EIO);
+ }
+
+ return tbl;
+}
+
+static int __init arm_smccc_em_dt_probe(void)
+{
+ int cpu, cpu2;
+ bool one_entry_found;
+ struct arm_em_dt_supplement *tbl;
+ struct device_node *np, *cpu_np, *np2, *cpu_np2;
+
+ for_each_possible_cpu(cpu) {
+ /* Pre-populated? */
+ if (per_cpu(em_table, cpu))
+ continue;
+
+ cpu_np = of_get_cpu_node(cpu, 0);
+ if (!cpu_np)
+ continue;
+
+ np = of_parse_phandle(cpu_np, "arm,erratum-list", 0);
+ if (!np) {
+ of_node_put(cpu_np);
+ continue;
+ }
+
+ tbl = arm_smccc_em_dt_alloc_tbl(np);
+ if (IS_ERR(tbl)) {
+ pr_err_once("Failed to allocate memory for DT supplement\n");
+ of_node_put(cpu_np);
+ break;
+ }
+ per_cpu(em_table, cpu) = tbl;
+
+ /* Pre-populate all CPUs with the same phandle */
+ for_each_possible_cpu(cpu2) {
+ if (cpu2 == cpu)
+ continue;
+
+ /* Pre-populated? */
+ if (per_cpu(em_table, cpu2))
+ continue;
+
+ cpu_np2 = of_get_cpu_node(cpu2, 0);
+ if (!cpu_np2)
+ continue;
+
+ np2 = of_parse_phandle(cpu_np2, "arm,erratum-list", 0);
+ if (!np) {
+ of_node_put(cpu_np2);
+ continue;
+ }
+
+ if (np2 == np)
+ per_cpu(em_table, cpu2) = tbl;
+
+ of_node_put(cpu_np2);
+ of_node_put(np2);
+ }
+
+ of_node_put(cpu_np);
+ of_node_put(np);
+
+ one_entry_found = true;
+ }
+
+ if (one_entry_found)
+ pr_info("Found DT supplements for SMCCC Errata Management Interface\n");
+
+ return one_entry_found ? 0 : -EOPNOTSUPP;
+}
+
int __init arm_smccc_em_init(void)
{
+ int dt_supported = false;
u32 major_ver, minor_ver;
struct arm_smccc_res res;
enum arm_smccc_conduit conduit = arm_smccc_1_1_get_conduit();
+ if (acpi_disabled)
+ dt_supported = arm_smccc_em_dt_probe();
+
if (conduit == SMCCC_CONDUIT_NONE)
- return -EOPNOTSUPP;
+ return dt_supported;
arm_smccc_1_1_invoke(ARM_SMCCC_EM_VERSION, &res);
if (res.a0 == SMCCC_RET_NOT_SUPPORTED)
- return -EOPNOTSUPP;
+ return dt_supported;
major_ver = PSCI_VERSION_MAJOR(res.a0);
minor_ver = PSCI_VERSION_MINOR(res.a0);
if (major_ver != 1)
- return -EIO;
+ return dt_supported;
arm_smccc_1_1_invoke(ARM_SMCCC_EM_FEATURES,
ARM_SMCCC_EM_CPU_ERRATUM_FEATURES, &res);
if (res.a0 == SMCCC_RET_NOT_SUPPORTED)
- return -EOPNOTSUPP;
+ return dt_supported;
pr_info("SMCCC Errata Management Interface v%d.%d\n",
major_ver, minor_ver);
--
2.39.2
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [PATCH 5/6] firmware: smccc: Allow errata management to be overridden by device tree
2023-03-30 16:51 ` [PATCH 5/6] firmware: smccc: Allow errata management to be overridden by device tree James Morse
@ 2023-03-30 20:44 ` kernel test robot
2023-03-31 17:05 ` James Morse
0 siblings, 1 reply; 27+ messages in thread
From: kernel test robot @ 2023-03-30 20:44 UTC (permalink / raw)
To: James Morse, linux-arm-kernel, devicetree
Cc: oe-kbuild-all, Catalin Marinas, Will Deacon, Mark Rutland,
Lorenzo Pieralisi, Sudeep Holla, Marc Zyngier, Oliver Upton,
James Morse, Rob Herring, Krzysztof Kozlowski
Hi James,
I love your patch! Perhaps something to improve:
[auto build test WARNING on arm64/for-next/core]
[also build test WARNING on robh/for-next arm/for-next arm/fixes kvmarm/next soc/for-next linus/master v6.3-rc4 next-20230330]
[cannot apply to xilinx-xlnx/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/James-Morse/dt-bindings-firmware-Add-arm-errata-management/20230331-005505
base: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
patch link: https://lore.kernel.org/r/20230330165128.3237939-6-james.morse%40arm.com
patch subject: [PATCH 5/6] firmware: smccc: Allow errata management to be overridden by device tree
config: arm64-randconfig-r032-20230329 (https://download.01.org/0day-ci/archive/20230331/202303310444.3JHIsByA-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/79e2d2cd6684ebd4e8c4787905486b63301117a9
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review James-Morse/dt-bindings-firmware-Add-arm-errata-management/20230331-005505
git checkout 79e2d2cd6684ebd4e8c4787905486b63301117a9
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm64 olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/firmware/smccc/
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303310444.3JHIsByA-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> drivers/firmware/smccc/em.c:142:5: warning: no previous prototype for 'arm_smccc_em_dt_alloc_tbl_entry' [-Wmissing-prototypes]
142 | int arm_smccc_em_dt_alloc_tbl_entry(struct device_node *np, const char *name,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
vim +/arm_smccc_em_dt_alloc_tbl_entry +142 drivers/firmware/smccc/em.c
141
> 142 int arm_smccc_em_dt_alloc_tbl_entry(struct device_node *np, const char *name,
143 struct arm_em_dt_supplement_array *entry)
144 {
145 int ret = of_property_count_u32_elems(np, name);
146
147 if (ret <= 0)
148 return 0;
149 if (ret > ARRAY_SIZE(entry->val))
150 return -E2BIG;
151
152 entry->num = ret;
153 return of_property_read_u32_array(np, name, entry->val, entry->num);
154 }
155
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH 5/6] firmware: smccc: Allow errata management to be overridden by device tree
2023-03-30 20:44 ` kernel test robot
@ 2023-03-31 17:05 ` James Morse
0 siblings, 0 replies; 27+ messages in thread
From: James Morse @ 2023-03-31 17:05 UTC (permalink / raw)
To: kernel test robot, linux-arm-kernel, devicetree
Cc: oe-kbuild-all, Catalin Marinas, Will Deacon, Mark Rutland,
Lorenzo Pieralisi, Sudeep Holla, Marc Zyngier, Oliver Upton,
Rob Herring, Krzysztof Kozlowski
On 30/03/2023 21:44, kernel test robot wrote:
> I love your patch! Perhaps something to improve:
> All warnings (new ones prefixed by >>):
>
>>> drivers/firmware/smccc/em.c:142:5: warning: no previous prototype for 'arm_smccc_em_dt_alloc_tbl_entry' [-Wmissing-prototypes]
> 142 | int arm_smccc_em_dt_alloc_tbl_entry(struct device_node *np, const char *name,
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It's just missing a static. Fixed locally.
Thanks,
James
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 6/6] arm64: errata: Add a commandline option to enable/disable #2701951
2023-03-30 16:51 [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects James Morse
` (4 preceding siblings ...)
2023-03-30 16:51 ` [PATCH 5/6] firmware: smccc: Allow errata management to be overridden by device tree James Morse
@ 2023-03-30 16:51 ` James Morse
2023-03-31 12:57 ` [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects Rob Herring
` (2 subsequent siblings)
8 siblings, 0 replies; 27+ messages in thread
From: James Morse @ 2023-03-30 16:51 UTC (permalink / raw)
To: linux-arm-kernel, devicetree
Cc: Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
Sudeep Holla, Marc Zyngier, Oliver Upton, James Morse,
Rob Herring, Krzysztof Kozlowski
Erratum #2701951 affects the FWB feature in a number of CPUs, but is
only going to be visible on parts that don't use an arm interconnect.
This is not something the operating system can discover, it has to
be described by platform firmware.
The firmware discovery API is not deployed on existing systems.
Add a commandline option to allow the workaround to override the
value from firmware, or provide a value if the firmware is not
implemented.
The property is named arm64.arm-interconnect, as this is the
description in the 'configurations affected' section of the erratum.
Signed-off-by: James Morse <james.morse@arm.com>
---
.../admin-guide/kernel-parameters.txt | 4 +++
arch/arm64/kernel/cpu_errata.c | 36 +++++++++++++++++++
2 files changed, 40 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6221a1d057dd..5898fde6a9e4 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -391,6 +391,10 @@
arcrimi= [HW,NET] ARCnet - "RIM I" (entirely mem-mapped) cards
Format: <io>,<irq>,<nodeID>
+ arm64.arm-interconnect [ARM64]
+ Indicates the FWB erratum can be disabled because this
+ SoC uses an arm interconnect.
+
arm64.nobti [ARM64] Unconditionally disable Branch Target
Identification support
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 55da9e588b9e..c5570904e8b4 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -138,6 +138,32 @@ cpu_clear_bf16_from_user_emulation(const struct arm64_cpu_capabilities *__unused
raw_spin_unlock(®_user_mask_modification);
}
+static enum {
+ FWB_WA_FORCED_ON = 1,
+ FWB_WA_UNKNOWN = 0,
+ FWB_WA_FORCED_OFF = -1,
+} __fwb_workaround_forced;
+#ifdef CONFIG_ARM64_ERRATUM_2701951
+static int __init parse_fwb_workaround_cmdline_override(char *str)
+{
+ bool arm_interconnect;
+ int ret = kstrtobool(str, &arm_interconnect);
+
+ if (ret)
+ return ret;
+
+ /*
+ * Erratum #2701951's "Configurations Affected" says the erratum can
+ * only be seen on SoC's "that do not use Arm interconnect IP."
+ */
+ if (arm_interconnect)
+ __fwb_workaround_forced = FWB_WA_FORCED_OFF;
+ else
+ __fwb_workaround_forced = FWB_WA_FORCED_ON;
+ return 0;
+}
+early_param("arm64.arm-interconnect", parse_fwb_workaround_cmdline_override);
+#endif /* CONFIG_ARM64_ERRATUM_2701951 */
bool has_stage2_fwb_errata(const struct arm64_cpu_capabilities *ignored,
int scope)
{
@@ -205,9 +231,19 @@ bool has_stage2_fwb_errata(const struct arm64_cpu_capabilities *ignored,
}
if (fwb_broken) {
+ if (__fwb_workaround_forced == FWB_WA_FORCED_OFF) {
+ pr_info_once("Workaround for erratum #2701951 disabled by command-line option\n");
+ return false;
+ }
pr_info_once("Stage-2 Force Write-Back disabled due to erratum #2701951\n");
return true;
}
+
+ /* Allow the commandline to override whatever firmware said */
+ if (has_feature && __fwb_workaround_forced == FWB_WA_FORCED_ON) {
+ pr_info_once("Workaround for erratum #2701951 enabled by command-line option\n");
+ return true;
+ }
}
return false;
--
2.39.2
^ permalink raw reply related [flat|nested] 27+ messages in thread* Re: [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects
2023-03-30 16:51 [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects James Morse
` (5 preceding siblings ...)
2023-03-30 16:51 ` [PATCH 6/6] arm64: errata: Add a commandline option to enable/disable #2701951 James Morse
@ 2023-03-31 12:57 ` Rob Herring
2023-03-31 13:03 ` Rob Herring
2023-05-11 17:15 ` Catalin Marinas
2023-05-23 10:48 ` Will Deacon
8 siblings, 1 reply; 27+ messages in thread
From: Rob Herring @ 2023-03-31 12:57 UTC (permalink / raw)
To: James Morse
Cc: linux-arm-kernel, devicetree, Catalin Marinas, Will Deacon,
Mark Rutland, Lorenzo Pieralisi, Sudeep Holla, Marc Zyngier,
Oliver Upton, Krzysztof Kozlowski
On Thu, Mar 30, 2023 at 11:51 AM James Morse <james.morse@arm.com> wrote:
>
> Hello!
>
> Changes since the RFC?:
> * Added DT support, in a way that means we don't end up with per-erratum
> strings, or bloat in the calling code to check for those strings.
> * Added a commandline argument. (boo)
> * Changes to support errata affecting features on big-little systems properly.
>
> ~
>
> When stage1 translation is disabled, the SCTRL_E1.I bit controls the
> attributes used for instruction fetch, one of the options results in a
> non-cacheable access. A whole host of CPUs missed the FWB override
> in this case, meaning a KVM guest could fetch stale/junk data instead of
> instructions.
>
> The workaround is to disable FWB, and do the required cache maintenance
> instead.
What's FWB? I don't see it defined anywhere in the series.
Rob
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects
2023-03-31 12:57 ` [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects Rob Herring
@ 2023-03-31 13:03 ` Rob Herring
0 siblings, 0 replies; 27+ messages in thread
From: Rob Herring @ 2023-03-31 13:03 UTC (permalink / raw)
To: James Morse
Cc: linux-arm-kernel, devicetree, Catalin Marinas, Will Deacon,
Mark Rutland, Lorenzo Pieralisi, Sudeep Holla, Marc Zyngier,
Oliver Upton, Krzysztof Kozlowski
On Fri, Mar 31, 2023 at 7:57 AM Rob Herring <robh+dt@kernel.org> wrote:
>
> On Thu, Mar 30, 2023 at 11:51 AM James Morse <james.morse@arm.com> wrote:
> >
> > Hello!
> >
> > Changes since the RFC?:
> > * Added DT support, in a way that means we don't end up with per-erratum
> > strings, or bloat in the calling code to check for those strings.
> > * Added a commandline argument. (boo)
> > * Changes to support errata affecting features on big-little systems properly.
> >
> > ~
> >
> > When stage1 translation is disabled, the SCTRL_E1.I bit controls the
> > attributes used for instruction fetch, one of the options results in a
> > non-cacheable access. A whole host of CPUs missed the FWB override
> > in this case, meaning a KVM guest could fetch stale/junk data instead of
> > instructions.
> >
> > The workaround is to disable FWB, and do the required cache maintenance
> > instead.
>
> What's FWB? I don't see it defined anywhere in the series.
Ah, there it is in patch 1. It wasn't in patch 6, so naturally I went
searching in the cover letter.
Rob
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects
2023-03-30 16:51 [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects James Morse
` (6 preceding siblings ...)
2023-03-31 12:57 ` [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects Rob Herring
@ 2023-05-11 17:15 ` Catalin Marinas
2023-05-11 18:42 ` Marc Zyngier
2023-05-23 10:48 ` Will Deacon
8 siblings, 1 reply; 27+ messages in thread
From: Catalin Marinas @ 2023-05-11 17:15 UTC (permalink / raw)
To: James Morse
Cc: linux-arm-kernel, devicetree, Will Deacon, Mark Rutland,
Lorenzo Pieralisi, Sudeep Holla, Marc Zyngier, Oliver Upton,
Rob Herring, Krzysztof Kozlowski
Hi James,
On Thu, Mar 30, 2023 at 05:51:22PM +0100, James Morse wrote:
> When stage1 translation is disabled, the SCTRL_E1.I bit controls the
> attributes used for instruction fetch, one of the options results in a
> non-cacheable access. A whole host of CPUs missed the FWB override
> in this case, meaning a KVM guest could fetch stale/junk data instead of
> instructions.
>
> The workaround is to disable FWB, and do the required cache maintenance
> instead.
I think the workaround can be to only do the required cache maintenance
without disabling FWB. Having FWB on doesn't bring any performance
benefits if we do the cache maintenance anyway but keeping it around may
be useful for other reasons (e.g. KVM device pass-through using
cacheable mappings, though not something KVM supports currently).
> Unfortunately, no-one has firmware that supports this new interface yet,
> and the least surprising thing to do is to enable the workaround by default,
> meaning FWB is disabled on all these cores, even for unaffected platforms.
> ACPI Platforms that are not-affected can either take a firmware-update to
> support the interface, or if the kernel they run will only run on hardware
> that is unaffected, disable the workaround at build time.
Given that we know of more platforms that are _not_ affected and vendors
are pretty vague on whether they need this, I'd rather swap the default
and keep the workaround off with a firmware interface, DT or command
line opt-in.
That said, maybe we can reduce the risk further by doing the
vcpu_has_run_once() trick with !FWB and clean the D side to PoC on a
stage 2 exec fault (together with the I-cache invalidation). We can then
ignore any other cache maintenance on S2 faults until someone shouts (we
can maybe recommend forcing FWB off on the command line through the
cpuid override).
--
Catalin
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects
2023-05-11 17:15 ` Catalin Marinas
@ 2023-05-11 18:42 ` Marc Zyngier
2023-05-11 21:13 ` Catalin Marinas
0 siblings, 1 reply; 27+ messages in thread
From: Marc Zyngier @ 2023-05-11 18:42 UTC (permalink / raw)
To: Catalin Marinas
Cc: James Morse, linux-arm-kernel, devicetree, Will Deacon,
Mark Rutland, Lorenzo Pieralisi, Sudeep Holla, Oliver Upton,
Rob Herring, Krzysztof Kozlowski
On Thu, 11 May 2023 18:15:15 +0100,
Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> Hi James,
>
> On Thu, Mar 30, 2023 at 05:51:22PM +0100, James Morse wrote:
> > When stage1 translation is disabled, the SCTRL_E1.I bit controls the
> > attributes used for instruction fetch, one of the options results in a
> > non-cacheable access. A whole host of CPUs missed the FWB override
> > in this case, meaning a KVM guest could fetch stale/junk data instead of
> > instructions.
> >
> > The workaround is to disable FWB, and do the required cache maintenance
> > instead.
>
> I think the workaround can be to only do the required cache maintenance
> without disabling FWB. Having FWB on doesn't bring any performance
> benefits if we do the cache maintenance anyway but keeping it around may
> be useful for other reasons (e.g. KVM device pass-through using
> cacheable mappings, though not something KVM supports currently).
But you'd also rely on the guest doing its own cache maintenance for
instructions it writes, right?
Which probably means exposing a different CLIDR_EL1 so that
LoC/LoUU/LoUIS are consistent with *not* having FWB... I also wonder
if keeping FWB set has the potential to change the semantics of the
CMOs (the spec seems silent on that front).
> > Unfortunately, no-one has firmware that supports this new interface yet,
> > and the least surprising thing to do is to enable the workaround by default,
> > meaning FWB is disabled on all these cores, even for unaffected platforms.
> > ACPI Platforms that are not-affected can either take a firmware-update to
> > support the interface, or if the kernel they run will only run on hardware
> > that is unaffected, disable the workaround at build time.
>
> Given that we know of more platforms that are _not_ affected and vendors
> are pretty vague on whether they need this, I'd rather swap the default
> and keep the workaround off with a firmware interface, DT or command
> line opt-in.
That'd be my preferred way. I really dislike putting the onus on
working systems to declare themselves safe (although there are some
Spectre-shaped precedents to that).
> That said, maybe we can reduce the risk further by doing the
> vcpu_has_run_once() trick with !FWB and clean the D side to PoC on a
> stage 2 exec fault (together with the I-cache invalidation). We can then
> ignore any other cache maintenance on S2 faults until someone shouts (we
> can maybe recommend forcing FWB off on the command line through the
> cpuid override).
You lost me here with your vcpu_has_run_once().
Keeping the CMOs on exec fault is definitely manageable. But is that
enough?
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects
2023-05-11 18:42 ` Marc Zyngier
@ 2023-05-11 21:13 ` Catalin Marinas
2023-05-16 16:29 ` James Morse
0 siblings, 1 reply; 27+ messages in thread
From: Catalin Marinas @ 2023-05-11 21:13 UTC (permalink / raw)
To: Marc Zyngier
Cc: James Morse, linux-arm-kernel, devicetree, Will Deacon,
Mark Rutland, Lorenzo Pieralisi, Sudeep Holla, Oliver Upton,
Rob Herring, Krzysztof Kozlowski
On Thu, May 11, 2023 at 07:42:58PM +0100, Marc Zyngier wrote:
> On Thu, 11 May 2023 18:15:15 +0100,
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Thu, Mar 30, 2023 at 05:51:22PM +0100, James Morse wrote:
> > > When stage1 translation is disabled, the SCTRL_E1.I bit controls the
> > > attributes used for instruction fetch, one of the options results in a
> > > non-cacheable access. A whole host of CPUs missed the FWB override
> > > in this case, meaning a KVM guest could fetch stale/junk data instead of
> > > instructions.
> > >
> > > The workaround is to disable FWB, and do the required cache maintenance
> > > instead.
> >
> > I think the workaround can be to only do the required cache maintenance
> > without disabling FWB. Having FWB on doesn't bring any performance
> > benefits if we do the cache maintenance anyway but keeping it around may
> > be useful for other reasons (e.g. KVM device pass-through using
> > cacheable mappings, though not something KVM supports currently).
>
> But you'd also rely on the guest doing its own cache maintenance for
> instructions it writes, right?
Ah, you are right. It looks like I only considered the host writing
instructions. If the guest disabled stage 1 and wrote some instructions
with FWB on, they'd not necessarily reach the PoC while the instructions
are fetched from PoC with this bug. Even with SCTLR_EL1.I==0, the guest
is supposed to do an IC IVAU if it wrote instructions but that's not
sufficient (hint to the micro-architects, add a chicken bit to upgrade
IC IVAU to also do a DC CVAC ;))
> Which probably means exposing a different CLIDR_EL1 so that
> LoC/LoUU/LoUIS are consistent with *not* having FWB... I also wonder
> if keeping FWB set has the potential to change the semantics of the
> CMOs (the spec seems silent on that front).
Not sure about CMOs, I'd expect them to behave in the same way. However,
I don't see how faking CLIDR_EL1 can trick the guest into doing DC CVAC
when its MMU is off.
> > That said, maybe we can reduce the risk further by doing the
> > vcpu_has_run_once() trick with !FWB and clean the D side to PoC on a
> > stage 2 exec fault (together with the I-cache invalidation). We can then
> > ignore any other cache maintenance on S2 faults until someone shouts (we
> > can maybe recommend forcing FWB off on the command line through the
> > cpuid override).
>
> You lost me here with your vcpu_has_run_once().
Most likely I lost myself in the code. So the tricks we used in the past
tracking the guest MMU off/on was only for the D side. If (we hope that)
the guest only wrote instructions to a page once before executing them
(and never writing instructions again), we could trap a subsequent exec
fault and do the D-cache clean to PoC again.
> Keeping the CMOs on exec fault is definitely manageable. But is that
> enough?
Yeah, not sure it's enough if the guest keeps writing instructions to
the same page with the MMU off.
--
Catalin
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects
2023-05-11 21:13 ` Catalin Marinas
@ 2023-05-16 16:29 ` James Morse
2023-05-23 12:24 ` Robin Murphy
0 siblings, 1 reply; 27+ messages in thread
From: James Morse @ 2023-05-16 16:29 UTC (permalink / raw)
To: Catalin Marinas, Marc Zyngier
Cc: linux-arm-kernel, devicetree, Will Deacon, Mark Rutland,
Lorenzo Pieralisi, Sudeep Holla, Oliver Upton, Rob Herring,
Krzysztof Kozlowski
Hi Catalin, Marc,
On 11/05/2023 22:13, Catalin Marinas wrote:
> On Thu, May 11, 2023 at 07:42:58PM +0100, Marc Zyngier wrote:
>> On Thu, 11 May 2023 18:15:15 +0100,
>> Catalin Marinas <catalin.marinas@arm.com> wrote:
>>> On Thu, Mar 30, 2023 at 05:51:22PM +0100, James Morse wrote:
>>>> When stage1 translation is disabled, the SCTRL_E1.I bit controls the
>>>> attributes used for instruction fetch, one of the options results in a
>>>> non-cacheable access. A whole host of CPUs missed the FWB override
>>>> in this case, meaning a KVM guest could fetch stale/junk data instead of
>>>> instructions.
>>>>
>>>> The workaround is to disable FWB, and do the required cache maintenance
>>>> instead.
>>> I think the workaround can be to only do the required cache maintenance
>>> without disabling FWB. Having FWB on doesn't bring any performance
>>> benefits if we do the cache maintenance anyway but keeping it around may
>>> be useful for other reasons (e.g. KVM device pass-through using
>>> cacheable mappings, though not something KVM supports currently).
>>
>> But you'd also rely on the guest doing its own cache maintenance for
>> instructions it writes, right?
>
> Ah, you are right. It looks like I only considered the host writing
> instructions. If the guest disabled stage 1 and wrote some instructions
> with FWB on, they'd not necessarily reach the PoC while the instructions
> are fetched from PoC with this bug. Even with SCTLR_EL1.I==0, the guest
> is supposed to do an IC IVAU if it wrote instructions but that's not
> sufficient (hint to the micro-architects, add a chicken bit to upgrade
> IC IVAU to also do a DC CVAC ;))
>
>> Which probably means exposing a different CLIDR_EL1 so that
>> LoC/LoUU/LoUIS are consistent with *not* having FWB... I also wonder
>> if keeping FWB set has the potential to change the semantics of the
>> CMOs (the spec seems silent on that front).
>
> Not sure about CMOs, I'd expect them to behave in the same way. However,
> I don't see how faking CLIDR_EL1 can trick the guest into doing DC CVAC
> when its MMU is off.
I think the request is to keep the FWB feature, but to disable it for all host memory
the guest can execute from. I presume this 'device pass-through using cacheable mappings'
would mark that address range as XN at stage2, ( ... it's special right?).
If this is for something like CXL: it can't set XN, and the guest would still be exposed
to the erratum if it executes from theses addresses with the MMU off.
Does this need doing now? It wouldn't need backporting to older kernels...
>>> That said, maybe we can reduce the risk further by doing the
>>> vcpu_has_run_once() trick with !FWB and clean the D side to PoC on a
>>> stage 2 exec fault (together with the I-cache invalidation). We can then
>>> ignore any other cache maintenance on S2 faults until someone shouts (we
>>> can maybe recommend forcing FWB off on the command line through the
>>> cpuid override).
>>
>> You lost me here with your vcpu_has_run_once().
>
> Most likely I lost myself in the code. So the tricks we used in the past
> tracking the guest MMU off/on was only for the D side. If (we hope that)
> the guest only wrote instructions to a page once before executing them
> (and never writing instructions again), we could trap a subsequent exec
> fault and do the D-cache clean to PoC again.
>
>> Keeping the CMOs on exec fault is definitely manageable. But is that
>> enough?
>
> Yeah, not sure it's enough if the guest keeps writing instructions to
> the same page with the MMU off.
The difference between FWB and IDC/DIC still does my head in: My reading is FWB implies
IDC, (but the CTR_EL0.IDC bit might not be set). This doesn't help if the wrong attributes
are being used for instruction fetch.
This is cache-maintenance that wasn't needed before, so there are no tricks with the id
registers we can pull to make the guest do it.
v2 of this will flip the polarity, and also detect based on an 'arm,interconnect'
compatible, or the existing compatible the PMU driver uses.
Thanks,
James
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects
2023-05-16 16:29 ` James Morse
@ 2023-05-23 12:24 ` Robin Murphy
0 siblings, 0 replies; 27+ messages in thread
From: Robin Murphy @ 2023-05-23 12:24 UTC (permalink / raw)
To: James Morse, Catalin Marinas, Marc Zyngier
Cc: linux-arm-kernel, devicetree, Will Deacon, Mark Rutland,
Lorenzo Pieralisi, Sudeep Holla, Oliver Upton, Rob Herring,
Krzysztof Kozlowski
On 2023-05-16 17:29, James Morse wrote:
> Hi Catalin, Marc,
>
> On 11/05/2023 22:13, Catalin Marinas wrote:
>> On Thu, May 11, 2023 at 07:42:58PM +0100, Marc Zyngier wrote:
>>> On Thu, 11 May 2023 18:15:15 +0100,
>>> Catalin Marinas <catalin.marinas@arm.com> wrote:
>>>> On Thu, Mar 30, 2023 at 05:51:22PM +0100, James Morse wrote:
>>>>> When stage1 translation is disabled, the SCTRL_E1.I bit controls the
>>>>> attributes used for instruction fetch, one of the options results in a
>>>>> non-cacheable access. A whole host of CPUs missed the FWB override
>>>>> in this case, meaning a KVM guest could fetch stale/junk data instead of
>>>>> instructions.
>>>>>
>>>>> The workaround is to disable FWB, and do the required cache maintenance
>>>>> instead.
>
>>>> I think the workaround can be to only do the required cache maintenance
>>>> without disabling FWB. Having FWB on doesn't bring any performance
>>>> benefits if we do the cache maintenance anyway but keeping it around may
>>>> be useful for other reasons (e.g. KVM device pass-through using
>>>> cacheable mappings, though not something KVM supports currently).
>>>
>>> But you'd also rely on the guest doing its own cache maintenance for
>>> instructions it writes, right?
>>
>> Ah, you are right. It looks like I only considered the host writing
>> instructions. If the guest disabled stage 1 and wrote some instructions
>> with FWB on, they'd not necessarily reach the PoC while the instructions
>> are fetched from PoC with this bug. Even with SCTLR_EL1.I==0, the guest
>> is supposed to do an IC IVAU if it wrote instructions but that's not
>> sufficient (hint to the micro-architects, add a chicken bit to upgrade
>> IC IVAU to also do a DC CVAC ;))
>>
>>> Which probably means exposing a different CLIDR_EL1 so that
>>> LoC/LoUU/LoUIS are consistent with *not* having FWB... I also wonder
>>> if keeping FWB set has the potential to change the semantics of the
>>> CMOs (the spec seems silent on that front).
>>
>> Not sure about CMOs, I'd expect them to behave in the same way. However,
>> I don't see how faking CLIDR_EL1 can trick the guest into doing DC CVAC
>> when its MMU is off.
>
> I think the request is to keep the FWB feature, but to disable it for all host memory
> the guest can execute from. I presume this 'device pass-through using cacheable mappings'
> would mark that address range as XN at stage2, ( ... it's special right?).
>
> If this is for something like CXL: it can't set XN, and the guest would still be exposed
> to the erratum if it executes from theses addresses with the MMU off.
>
> Does this need doing now? It wouldn't need backporting to older kernels...
>
>
>>>> That said, maybe we can reduce the risk further by doing the
>>>> vcpu_has_run_once() trick with !FWB and clean the D side to PoC on a
>>>> stage 2 exec fault (together with the I-cache invalidation). We can then
>>>> ignore any other cache maintenance on S2 faults until someone shouts (we
>>>> can maybe recommend forcing FWB off on the command line through the
>>>> cpuid override).
>>>
>>> You lost me here with your vcpu_has_run_once().
>>
>> Most likely I lost myself in the code. So the tricks we used in the past
>> tracking the guest MMU off/on was only for the D side. If (we hope that)
>> the guest only wrote instructions to a page once before executing them
>> (and never writing instructions again), we could trap a subsequent exec
>> fault and do the D-cache clean to PoC again.
>>
>>> Keeping the CMOs on exec fault is definitely manageable. But is that
>>> enough?
>>
>> Yeah, not sure it's enough if the guest keeps writing instructions to
>> the same page with the MMU off.
>
> The difference between FWB and IDC/DIC still does my head in: My reading is FWB implies
> IDC, (but the CTR_EL0.IDC bit might not be set). This doesn't help if the wrong attributes
> are being used for instruction fetch.
> This is cache-maintenance that wasn't needed before, so there are no tricks with the id
> registers we can pull to make the guest do it.
>
>
> v2 of this will flip the polarity, and also detect based on an 'arm,interconnect'
> compatible, or the existing compatible the PMU driver uses.
Unfortunately I don't think PMUs are going to be a meaningful indicator
in general since they don't imply anything about topology - you may
infer that, say, an Arm CMN exists *somewhere* in the system, but it
could conceivably be on some I/O chiplet connected via CCIX/CXL to a
different interconnect on the CPU die which *does* still need the
mitigation.
Thanks,
Robin.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects
2023-03-30 16:51 [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects James Morse
` (7 preceding siblings ...)
2023-05-11 17:15 ` Catalin Marinas
@ 2023-05-23 10:48 ` Will Deacon
8 siblings, 0 replies; 27+ messages in thread
From: Will Deacon @ 2023-05-23 10:48 UTC (permalink / raw)
To: James Morse
Cc: linux-arm-kernel, devicetree, Catalin Marinas, Mark Rutland,
Lorenzo Pieralisi, Sudeep Holla, Marc Zyngier, Oliver Upton,
Rob Herring, Krzysztof Kozlowski
Hi James,
On Thu, Mar 30, 2023 at 05:51:22PM +0100, James Morse wrote:
> Changes since the RFC?:
> * Added DT support, in a way that means we don't end up with per-erratum
> strings, or bloat in the calling code to check for those strings.
> * Added a commandline argument. (boo)
> * Changes to support errata affecting features on big-little systems properly.
>
> ~
>
> When stage1 translation is disabled, the SCTRL_E1.I bit controls the
> attributes used for instruction fetch, one of the options results in a
> non-cacheable access. A whole host of CPUs missed the FWB override
> in this case, meaning a KVM guest could fetch stale/junk data instead of
> instructions.
>
> The workaround is to disable FWB, and do the required cache maintenance
> instead.
>
> The good news is, this isn't a problem for systems using Arm's
> interconnect IP. The bad news is: linux can't know this. Arm knows of
> at least one platform that is affected by this erratum.
>
>
> This series adds support for the 'Errata Management Firmware Interface', [0]
> and queries that to determine if the CPU is affected or not. DT support is
> added so that the firmware interface values can be queried directly from the
> DT. This can be used as a fallback for platforms that don't yet support the
> interface.
It occurs to me that, when a device is assigned to a VM, there are a
whole bunch of non-probeable reasons why FWB cannot be used and perhaps
we should be looking to advertise that from firmware without pulling in
a reliance on this errata management interface.
Right now, I don't think KVM or VFIO do anything to prevent the assignment
of a device capable of non-coherent DMA traffic and FWB is used if the CPUs
support it. This, in theory, allows the guest to read stale data from host
memory pages using non-coherent DMA and I think we should gate usage of FWB
for a given VM on whether or not that VM has such a device assigned.
The problem is that I don't think you can probe this property reliably. It's
not enough to check for "dma-coherent"; rather we also need to know details
about the IOMMU and device-specific properties such as the ability to
generate NoSnoop transactions. I think firmware is probably the only option
here, so since you're proposing something similar, please can we make it
general enough to be used outside of errata scenarios?
Will
^ permalink raw reply [flat|nested] 27+ messages in thread