* [PATCH] of, numa: Validate some distance map rules
@ 2018-11-06 12:39 John Garry
2018-11-07 15:44 ` Will Deacon
0 siblings, 1 reply; 4+ messages in thread
From: John Garry @ 2018-11-06 12:39 UTC (permalink / raw)
To: robh+dt, frowand.list
Cc: devicetree, linux-kernel, linuxarm, will.deacon,
anshuman.khandual, peterz, John Garry
Currently the NUMA distance map parsing does not validate the distance
table for the distance-matrix rules 1-2 in [1].
However the arch NUMA code may enforce some of these rules, but not all.
Such is the case for the arm64 port, which does not enforce the rule that
the distance between separates nodes cannot equal LOCAL_DISTANCE.
The patch adds the following rules validation:
- distance of node to self equals LOCAL_DISTANCE
- distance of separate nodes > LOCAL_DISTANCE
A note on dealing with symmetrical distances between nodes:
Validating symmetrical distances between nodes is difficult. If it were
mandated in the bindings that every distance must be recorded in the
table, validating symmetrical distances would be straightforward. However,
it isn't.
In addition to this, it is also possible to record [b, a] distance only
(and not [a, b]). So, when processing the table for [b, a], we cannot
assert that current distance of [a, b] != [b, a] as invalid, as [a, b]
distance may not be present in the table and current distance would be
default at REMOTE_DISTANCE.
As such, we maintain the policy that we overwrite distance [a, b] = [b, a]
for b > a. This policy is different to kernel ACPI SLIT validation, which
allows non-symmetrical distances (ACPI spec SLIT rules allow it). However,
the debug message is dropped as it may be misleading (for a distance which
is later overwritten).
Some final notes on semantics:
- It is implied that it is the responsibility of the arch NUMA code to
reset the NUMA distance map for an error in distance map parsing.
- It is the responsibility of the FW NUMA topology parsing (whether OF or
ACPI) to enforce NUMA distance rules, and not arch NUMA code.
[1] Documents/devicetree/bindings/numa.txt
Signed-off-by: John Garry <john.garry@huawei.com>
diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
index 35c64a4295e0..fe6b13608e51 100644
--- a/drivers/of/of_numa.c
+++ b/drivers/of/of_numa.c
@@ -104,9 +104,14 @@ static int __init of_numa_parse_distance_map_v1(struct device_node *map)
distance = of_read_number(matrix, 1);
matrix++;
+ if ((nodea == nodeb && distance != LOCAL_DISTANCE) ||
+ (nodea != nodeb && distance <= LOCAL_DISTANCE)) {
+ pr_err("Invalid distance[node%d -> node%d] = %d\n",
+ nodea, nodeb, distance);
+ return -EINVAL;
+ }
+
numa_set_distance(nodea, nodeb, distance);
- pr_debug("distance[node%d -> node%d] = %d\n",
- nodea, nodeb, distance);
/* Set default distance of node B->A same as A->B */
if (nodeb > nodea)
--
1.9.1
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH] of, numa: Validate some distance map rules
2018-11-06 12:39 [PATCH] of, numa: Validate some distance map rules John Garry
@ 2018-11-07 15:44 ` Will Deacon
2018-11-07 15:55 ` Rob Herring
0 siblings, 1 reply; 4+ messages in thread
From: Will Deacon @ 2018-11-07 15:44 UTC (permalink / raw)
To: John Garry
Cc: robh+dt, frowand.list, devicetree, linux-kernel, linuxarm,
anshuman.khandual, peterz
Hi John,
On Tue, Nov 06, 2018 at 08:39:33PM +0800, John Garry wrote:
> Currently the NUMA distance map parsing does not validate the distance
> table for the distance-matrix rules 1-2 in [1].
>
> However the arch NUMA code may enforce some of these rules, but not all.
> Such is the case for the arm64 port, which does not enforce the rule that
> the distance between separates nodes cannot equal LOCAL_DISTANCE.
>
> The patch adds the following rules validation:
> - distance of node to self equals LOCAL_DISTANCE
> - distance of separate nodes > LOCAL_DISTANCE
>
> A note on dealing with symmetrical distances between nodes:
>
> Validating symmetrical distances between nodes is difficult. If it were
> mandated in the bindings that every distance must be recorded in the
> table, validating symmetrical distances would be straightforward. However,
> it isn't.
>
> In addition to this, it is also possible to record [b, a] distance only
> (and not [a, b]). So, when processing the table for [b, a], we cannot
> assert that current distance of [a, b] != [b, a] as invalid, as [a, b]
> distance may not be present in the table and current distance would be
> default at REMOTE_DISTANCE.
>
> As such, we maintain the policy that we overwrite distance [a, b] = [b, a]
> for b > a. This policy is different to kernel ACPI SLIT validation, which
> allows non-symmetrical distances (ACPI spec SLIT rules allow it). However,
> the debug message is dropped as it may be misleading (for a distance which
> is later overwritten).
>
> Some final notes on semantics:
>
> - It is implied that it is the responsibility of the arch NUMA code to
> reset the NUMA distance map for an error in distance map parsing.
>
> - It is the responsibility of the FW NUMA topology parsing (whether OF or
> ACPI) to enforce NUMA distance rules, and not arch NUMA code.
>
> [1] Documents/devicetree/bindings/numa.txt
>
> Signed-off-by: John Garry <john.garry@huawei.com>
Is it worth mentioning that the lack of this check was leading to a kernel
crash with a malformed DT entry?
> diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
> index 35c64a4295e0..fe6b13608e51 100644
> --- a/drivers/of/of_numa.c
> +++ b/drivers/of/of_numa.c
> @@ -104,9 +104,14 @@ static int __init of_numa_parse_distance_map_v1(struct device_node *map)
> distance = of_read_number(matrix, 1);
> matrix++;
>
> + if ((nodea == nodeb && distance != LOCAL_DISTANCE) ||
> + (nodea != nodeb && distance <= LOCAL_DISTANCE)) {
> + pr_err("Invalid distance[node%d -> node%d] = %d\n",
> + nodea, nodeb, distance);
> + return -EINVAL;
> + }
> +
> numa_set_distance(nodea, nodeb, distance);
> - pr_debug("distance[node%d -> node%d] = %d\n",
> - nodea, nodeb, distance);
Looks good to me, although I'm not sure which tree this should go through.
Acked-by: Will Deacon <will.deacon@arm.com>
Will
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] of, numa: Validate some distance map rules
2018-11-07 15:44 ` Will Deacon
@ 2018-11-07 15:55 ` Rob Herring
2018-11-07 16:24 ` John Garry
0 siblings, 1 reply; 4+ messages in thread
From: Rob Herring @ 2018-11-07 15:55 UTC (permalink / raw)
To: Will Deacon, John Garry
Cc: frowand.list, devicetree, linux-kernel, linuxarm,
anshuman.khandual, peterz
On Wed, Nov 07, 2018 at 03:44:31PM +0000, Will Deacon wrote:
> Hi John,
>
> On Tue, Nov 06, 2018 at 08:39:33PM +0800, John Garry wrote:
> > Currently the NUMA distance map parsing does not validate the distance
> > table for the distance-matrix rules 1-2 in [1].
> >
> > However the arch NUMA code may enforce some of these rules, but not all.
> > Such is the case for the arm64 port, which does not enforce the rule that
> > the distance between separates nodes cannot equal LOCAL_DISTANCE.
> >
> > The patch adds the following rules validation:
> > - distance of node to self equals LOCAL_DISTANCE
> > - distance of separate nodes > LOCAL_DISTANCE
> >
> > A note on dealing with symmetrical distances between nodes:
> >
> > Validating symmetrical distances between nodes is difficult. If it were
> > mandated in the bindings that every distance must be recorded in the
> > table, validating symmetrical distances would be straightforward. However,
> > it isn't.
> >
> > In addition to this, it is also possible to record [b, a] distance only
> > (and not [a, b]). So, when processing the table for [b, a], we cannot
> > assert that current distance of [a, b] != [b, a] as invalid, as [a, b]
> > distance may not be present in the table and current distance would be
> > default at REMOTE_DISTANCE.
> >
> > As such, we maintain the policy that we overwrite distance [a, b] = [b, a]
> > for b > a. This policy is different to kernel ACPI SLIT validation, which
> > allows non-symmetrical distances (ACPI spec SLIT rules allow it). However,
> > the debug message is dropped as it may be misleading (for a distance which
> > is later overwritten).
> >
> > Some final notes on semantics:
> >
> > - It is implied that it is the responsibility of the arch NUMA code to
> > reset the NUMA distance map for an error in distance map parsing.
> >
> > - It is the responsibility of the FW NUMA topology parsing (whether OF or
> > ACPI) to enforce NUMA distance rules, and not arch NUMA code.
> >
> > [1] Documents/devicetree/bindings/numa.txt
> >
> > Signed-off-by: John Garry <john.garry@huawei.com>
>
> Is it worth mentioning that the lack of this check was leading to a kernel
> crash with a malformed DT entry?
So should be marked for stable too?
>
> > diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
> > index 35c64a4295e0..fe6b13608e51 100644
> > --- a/drivers/of/of_numa.c
> > +++ b/drivers/of/of_numa.c
> > @@ -104,9 +104,14 @@ static int __init of_numa_parse_distance_map_v1(struct device_node *map)
> > distance = of_read_number(matrix, 1);
> > matrix++;
> >
> > + if ((nodea == nodeb && distance != LOCAL_DISTANCE) ||
> > + (nodea != nodeb && distance <= LOCAL_DISTANCE)) {
> > + pr_err("Invalid distance[node%d -> node%d] = %d\n",
> > + nodea, nodeb, distance);
> > + return -EINVAL;
> > + }
> > +
> > numa_set_distance(nodea, nodeb, distance);
> > - pr_debug("distance[node%d -> node%d] = %d\n",
> > - nodea, nodeb, distance);
>
> Looks good to me, although I'm not sure which tree this should go through.
>
> Acked-by: Will Deacon <will.deacon@arm.com>
I'll take it. Please resend with the comment Will asked for.
Rob
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] of, numa: Validate some distance map rules
2018-11-07 15:55 ` Rob Herring
@ 2018-11-07 16:24 ` John Garry
0 siblings, 0 replies; 4+ messages in thread
From: John Garry @ 2018-11-07 16:24 UTC (permalink / raw)
To: Rob Herring, Will Deacon
Cc: frowand.list, devicetree, linux-kernel, linuxarm,
anshuman.khandual, peterz
On 07/11/2018 15:55, Rob Herring wrote:
> On Wed, Nov 07, 2018 at 03:44:31PM +0000, Will Deacon wrote:
>> Hi John,
>>
>> On Tue, Nov 06, 2018 at 08:39:33PM +0800, John Garry wrote:
>>> Currently the NUMA distance map parsing does not validate the distance
>>> table for the distance-matrix rules 1-2 in [1].
>>>
>>> However the arch NUMA code may enforce some of these rules, but not all.
>>> Such is the case for the arm64 port, which does not enforce the rule that
>>> the distance between separates nodes cannot equal LOCAL_DISTANCE.
>>>
>>> The patch adds the following rules validation:
>>> - distance of node to self equals LOCAL_DISTANCE
>>> - distance of separate nodes > LOCAL_DISTANCE
>>>
>>> A note on dealing with symmetrical distances between nodes:
>>>
>>> Validating symmetrical distances between nodes is difficult. If it were
>>> mandated in the bindings that every distance must be recorded in the
>>> table, validating symmetrical distances would be straightforward. However,
>>> it isn't.
>>>
>>> In addition to this, it is also possible to record [b, a] distance only
>>> (and not [a, b]). So, when processing the table for [b, a], we cannot
>>> assert that current distance of [a, b] != [b, a] as invalid, as [a, b]
>>> distance may not be present in the table and current distance would be
>>> default at REMOTE_DISTANCE.
>>>
>>> As such, we maintain the policy that we overwrite distance [a, b] = [b, a]
>>> for b > a. This policy is different to kernel ACPI SLIT validation, which
>>> allows non-symmetrical distances (ACPI spec SLIT rules allow it). However,
>>> the debug message is dropped as it may be misleading (for a distance which
>>> is later overwritten).
>>>
>>> Some final notes on semantics:
>>>
>>> - It is implied that it is the responsibility of the arch NUMA code to
>>> reset the NUMA distance map for an error in distance map parsing.
>>>
>>> - It is the responsibility of the FW NUMA topology parsing (whether OF or
>>> ACPI) to enforce NUMA distance rules, and not arch NUMA code.
>>>
>>> [1] Documents/devicetree/bindings/numa.txt
>>>
>>> Signed-off-by: John Garry <john.garry@huawei.com>
>>
>> Is it worth mentioning that the lack of this check was leading to a kernel
>> crash with a malformed DT entry?
Yeah, I was thinking in hindsight that I should have mentioned the
yet-unresolved crash we avoid.
>
> So should be marked for stable too?
Probably. So this patch is masking a crash I have observed, which may be
good enough reason on its own.
In addition, I would still say that failing to validate the distance map
falls into the "oh, that's not good" category of stable rules.
>
>>
>>> diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
>>> index 35c64a4295e0..fe6b13608e51 100644
>>> --- a/drivers/of/of_numa.c
>>> +++ b/drivers/of/of_numa.c
>>> @@ -104,9 +104,14 @@ static int __init of_numa_parse_distance_map_v1(struct device_node *map)
>>> distance = of_read_number(matrix, 1);
>>> matrix++;
>>>
>>> + if ((nodea == nodeb && distance != LOCAL_DISTANCE) ||
>>> + (nodea != nodeb && distance <= LOCAL_DISTANCE)) {
>>> + pr_err("Invalid distance[node%d -> node%d] = %d\n",
>>> + nodea, nodeb, distance);
>>> + return -EINVAL;
>>> + }
>>> +
>>> numa_set_distance(nodea, nodeb, distance);
>>> - pr_debug("distance[node%d -> node%d] = %d\n",
>>> - nodea, nodeb, distance);
>>
>> Looks good to me, although I'm not sure which tree this should go through.
>>
>> Acked-by: Will Deacon <will.deacon@arm.com>
>
Thanks Will.
> I'll take it. Please resend with the comment Will asked for.
>
OK, I'll repost an updated version.
> Rob
>
Cheers,
john
> .
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-11-07 16:24 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-06 12:39 [PATCH] of, numa: Validate some distance map rules John Garry
2018-11-07 15:44 ` Will Deacon
2018-11-07 15:55 ` Rob Herring
2018-11-07 16:24 ` John Garry
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox