* [PATCH] drm/amdkfd: potential crash in kfd_create_indirect_link_prop()
@ 2022-08-12 6:20 ` Dan Carpenter
0 siblings, 0 replies; 6+ messages in thread
From: Dan Carpenter @ 2022-08-12 6:20 UTC (permalink / raw)
To: Felix Kuehling, Ramesh Errabolu
Cc: David Airlie, Pan, Xinhui, kernel-janitors, amd-gfx,
Daniel Vetter, Alex Deucher, Christian König
This code has two bugs. If kfd_topology_device_by_proximity_domain()
failed on the first iteration through the loop then "cpu_link" is
uninitialized and should not be dereferenced.
The second bug is that we cannot dereference a list iterator when it
points to the list head. In other words, if we exit the
list_for_each_entry() loop exits without hitting a break then "cpu_link"
is not a valid pointer and should not be dereferenced.
Fix both of these problems by setting "cpu_link" to NULL when it is invalid
and non-NULL when it is valid. That makes it easier to test for
valid vs invalid.
Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
---
I reported these in June but never heard back.
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 25990bec600d..3f0a4a415907 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1392,8 +1392,8 @@ static int kfd_build_p2p_node_entry(struct kfd_topology_device *dev,
static int kfd_create_indirect_link_prop(struct kfd_topology_device *kdev, int gpu_node)
{
+ struct kfd_iolink_properties *gpu_link, *tmp_link, *cpu_link;
struct kfd_iolink_properties *props = NULL, *props2 = NULL;
- struct kfd_iolink_properties *gpu_link, *cpu_link;
struct kfd_topology_device *cpu_dev;
int ret = 0;
int i, num_cpu;
@@ -1416,16 +1416,19 @@ static int kfd_create_indirect_link_prop(struct kfd_topology_device *kdev, int g
continue;
/* find CPU <--> CPU links */
+ cpu_link = NULL;
cpu_dev = kfd_topology_device_by_proximity_domain(i);
if (cpu_dev) {
- list_for_each_entry(cpu_link,
+ list_for_each_entry(tmp_link,
&cpu_dev->io_link_props, list) {
- if (cpu_link->node_to == gpu_link->node_to)
+ if (tmp_link->node_to == gpu_link->node_to) {
+ cpu_link = tmp_link;
break;
+ }
}
}
- if (cpu_link->node_to != gpu_link->node_to)
+ if (!cpu_link)
return -ENOMEM;
/* CPU <--> CPU <--> GPU, GPU node*/
--
2.35.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH] drm/amdkfd: potential crash in kfd_create_indirect_link_prop()
@ 2022-08-12 6:20 ` Dan Carpenter
0 siblings, 0 replies; 6+ messages in thread
From: Dan Carpenter @ 2022-08-12 6:20 UTC (permalink / raw)
To: Felix Kuehling, Ramesh Errabolu
Cc: Alex Deucher, Christian König, Pan, Xinhui, David Airlie,
Daniel Vetter, amd-gfx, kernel-janitors
This code has two bugs. If kfd_topology_device_by_proximity_domain()
failed on the first iteration through the loop then "cpu_link" is
uninitialized and should not be dereferenced.
The second bug is that we cannot dereference a list iterator when it
points to the list head. In other words, if we exit the
list_for_each_entry() loop exits without hitting a break then "cpu_link"
is not a valid pointer and should not be dereferenced.
Fix both of these problems by setting "cpu_link" to NULL when it is invalid
and non-NULL when it is valid. That makes it easier to test for
valid vs invalid.
Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
---
I reported these in June but never heard back.
drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 25990bec600d..3f0a4a415907 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1392,8 +1392,8 @@ static int kfd_build_p2p_node_entry(struct kfd_topology_device *dev,
static int kfd_create_indirect_link_prop(struct kfd_topology_device *kdev, int gpu_node)
{
+ struct kfd_iolink_properties *gpu_link, *tmp_link, *cpu_link;
struct kfd_iolink_properties *props = NULL, *props2 = NULL;
- struct kfd_iolink_properties *gpu_link, *cpu_link;
struct kfd_topology_device *cpu_dev;
int ret = 0;
int i, num_cpu;
@@ -1416,16 +1416,19 @@ static int kfd_create_indirect_link_prop(struct kfd_topology_device *kdev, int g
continue;
/* find CPU <--> CPU links */
+ cpu_link = NULL;
cpu_dev = kfd_topology_device_by_proximity_domain(i);
if (cpu_dev) {
- list_for_each_entry(cpu_link,
+ list_for_each_entry(tmp_link,
&cpu_dev->io_link_props, list) {
- if (cpu_link->node_to == gpu_link->node_to)
+ if (tmp_link->node_to == gpu_link->node_to) {
+ cpu_link = tmp_link;
break;
+ }
}
}
- if (cpu_link->node_to != gpu_link->node_to)
+ if (!cpu_link)
return -ENOMEM;
/* CPU <--> CPU <--> GPU, GPU node*/
--
2.35.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] drm/amdkfd: potential crash in kfd_create_indirect_link_prop()
2022-08-12 6:20 ` Dan Carpenter
@ 2022-08-12 21:10 ` Felix Kuehling
-1 siblings, 0 replies; 6+ messages in thread
From: Felix Kuehling @ 2022-08-12 21:10 UTC (permalink / raw)
To: Dan Carpenter, Ramesh Errabolu
Cc: David Airlie, Pan, Xinhui, kernel-janitors, amd-gfx,
Daniel Vetter, Alex Deucher, Christian König
On 2022-08-12 02:20, Dan Carpenter wrote:
> This code has two bugs. If kfd_topology_device_by_proximity_domain()
> failed on the first iteration through the loop then "cpu_link" is
> uninitialized and should not be dereferenced.
>
> The second bug is that we cannot dereference a list iterator when it
> points to the list head. In other words, if we exit the
> list_for_each_entry() loop exits without hitting a break then "cpu_link"
> is not a valid pointer and should not be dereferenced.
>
> Fix both of these problems by setting "cpu_link" to NULL when it is invalid
> and non-NULL when it is valid. That makes it easier to test for
> valid vs invalid.
>
> Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> ---
> I reported these in June but never heard back.
I thought Ramesh implemented a fix for this:
https://lore.kernel.org/all/20220706183302.1719795-1-Ramesh.Errabolu@amd.com/
You commented on a version of his patch:
https://lore.kernel.org/all/20220629161241.GM11460@kadam/
Did this get lost somehow? Anyway, your patch looks good to me and I'm
going to apply it to amd-staging-drm-next now.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Thanks,
Felix
>
> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index 25990bec600d..3f0a4a415907 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -1392,8 +1392,8 @@ static int kfd_build_p2p_node_entry(struct kfd_topology_device *dev,
>
> static int kfd_create_indirect_link_prop(struct kfd_topology_device *kdev, int gpu_node)
> {
> + struct kfd_iolink_properties *gpu_link, *tmp_link, *cpu_link;
> struct kfd_iolink_properties *props = NULL, *props2 = NULL;
> - struct kfd_iolink_properties *gpu_link, *cpu_link;
> struct kfd_topology_device *cpu_dev;
> int ret = 0;
> int i, num_cpu;
> @@ -1416,16 +1416,19 @@ static int kfd_create_indirect_link_prop(struct kfd_topology_device *kdev, int g
> continue;
>
> /* find CPU <--> CPU links */
> + cpu_link = NULL;
> cpu_dev = kfd_topology_device_by_proximity_domain(i);
> if (cpu_dev) {
> - list_for_each_entry(cpu_link,
> + list_for_each_entry(tmp_link,
> &cpu_dev->io_link_props, list) {
> - if (cpu_link->node_to == gpu_link->node_to)
> + if (tmp_link->node_to == gpu_link->node_to) {
> + cpu_link = tmp_link;
> break;
> + }
> }
> }
>
> - if (cpu_link->node_to != gpu_link->node_to)
> + if (!cpu_link)
> return -ENOMEM;
>
> /* CPU <--> CPU <--> GPU, GPU node*/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] drm/amdkfd: potential crash in kfd_create_indirect_link_prop()
@ 2022-08-12 21:10 ` Felix Kuehling
0 siblings, 0 replies; 6+ messages in thread
From: Felix Kuehling @ 2022-08-12 21:10 UTC (permalink / raw)
To: Dan Carpenter, Ramesh Errabolu
Cc: Alex Deucher, Christian König, Pan, Xinhui, David Airlie,
Daniel Vetter, amd-gfx, kernel-janitors
On 2022-08-12 02:20, Dan Carpenter wrote:
> This code has two bugs. If kfd_topology_device_by_proximity_domain()
> failed on the first iteration through the loop then "cpu_link" is
> uninitialized and should not be dereferenced.
>
> The second bug is that we cannot dereference a list iterator when it
> points to the list head. In other words, if we exit the
> list_for_each_entry() loop exits without hitting a break then "cpu_link"
> is not a valid pointer and should not be dereferenced.
>
> Fix both of these problems by setting "cpu_link" to NULL when it is invalid
> and non-NULL when it is valid. That makes it easier to test for
> valid vs invalid.
>
> Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> ---
> I reported these in June but never heard back.
I thought Ramesh implemented a fix for this:
https://lore.kernel.org/all/20220706183302.1719795-1-Ramesh.Errabolu@amd.com/
You commented on a version of his patch:
https://lore.kernel.org/all/20220629161241.GM11460@kadam/
Did this get lost somehow? Anyway, your patch looks good to me and I'm
going to apply it to amd-staging-drm-next now.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Thanks,
Felix
>
> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index 25990bec600d..3f0a4a415907 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -1392,8 +1392,8 @@ static int kfd_build_p2p_node_entry(struct kfd_topology_device *dev,
>
> static int kfd_create_indirect_link_prop(struct kfd_topology_device *kdev, int gpu_node)
> {
> + struct kfd_iolink_properties *gpu_link, *tmp_link, *cpu_link;
> struct kfd_iolink_properties *props = NULL, *props2 = NULL;
> - struct kfd_iolink_properties *gpu_link, *cpu_link;
> struct kfd_topology_device *cpu_dev;
> int ret = 0;
> int i, num_cpu;
> @@ -1416,16 +1416,19 @@ static int kfd_create_indirect_link_prop(struct kfd_topology_device *kdev, int g
> continue;
>
> /* find CPU <--> CPU links */
> + cpu_link = NULL;
> cpu_dev = kfd_topology_device_by_proximity_domain(i);
> if (cpu_dev) {
> - list_for_each_entry(cpu_link,
> + list_for_each_entry(tmp_link,
> &cpu_dev->io_link_props, list) {
> - if (cpu_link->node_to == gpu_link->node_to)
> + if (tmp_link->node_to == gpu_link->node_to) {
> + cpu_link = tmp_link;
> break;
> + }
> }
> }
>
> - if (cpu_link->node_to != gpu_link->node_to)
> + if (!cpu_link)
> return -ENOMEM;
>
> /* CPU <--> CPU <--> GPU, GPU node*/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] drm/amdkfd: potential crash in kfd_create_indirect_link_prop()
2022-08-12 21:10 ` Felix Kuehling
@ 2022-08-15 11:23 ` Dan Carpenter
-1 siblings, 0 replies; 6+ messages in thread
From: Dan Carpenter @ 2022-08-15 11:23 UTC (permalink / raw)
To: Felix Kuehling
Cc: David Airlie, Pan, Xinhui, kernel-janitors, amd-gfx,
Ramesh Errabolu, Daniel Vetter, Alex Deucher,
Christian König
On Fri, Aug 12, 2022 at 05:10:51PM -0400, Felix Kuehling wrote:
> On 2022-08-12 02:20, Dan Carpenter wrote:
> > This code has two bugs. If kfd_topology_device_by_proximity_domain()
> > failed on the first iteration through the loop then "cpu_link" is
> > uninitialized and should not be dereferenced.
> >
> > The second bug is that we cannot dereference a list iterator when it
> > points to the list head. In other words, if we exit the
> > list_for_each_entry() loop exits without hitting a break then "cpu_link"
> > is not a valid pointer and should not be dereferenced.
> >
> > Fix both of these problems by setting "cpu_link" to NULL when it is invalid
> > and non-NULL when it is valid. That makes it easier to test for
> > valid vs invalid.
> >
> > Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links")
> > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> > ---
> > I reported these in June but never heard back.
>
> I thought Ramesh implemented a fix for this:
> https://lore.kernel.org/all/20220706183302.1719795-1-Ramesh.Errabolu@amd.com/
>
> You commented on a version of his patch:
> https://lore.kernel.org/all/20220629161241.GM11460@kadam/
Oh, Sorry! I appologize for forgetting that.
>
> Did this get lost somehow? Anyway, your patch looks good to me and I'm going
> to apply it to amd-staging-drm-next now.
>
> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Looking at Ramesh's patch now, he added a continue if
kfd_topology_device_by_proximity_domain() failed. That is a behavior
change, but it might also be a bug fix.
My patch does not change the behavior except for eliminating the crash
so I stand by my patch, but adding the continue might be a good thing to
do as a separate step. I don't know the code well enough to say one way
or the other.
regards,
dan carpenter
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] drm/amdkfd: potential crash in kfd_create_indirect_link_prop()
@ 2022-08-15 11:23 ` Dan Carpenter
0 siblings, 0 replies; 6+ messages in thread
From: Dan Carpenter @ 2022-08-15 11:23 UTC (permalink / raw)
To: Felix Kuehling
Cc: Ramesh Errabolu, Alex Deucher, Christian König, Pan, Xinhui,
David Airlie, Daniel Vetter, amd-gfx, kernel-janitors
On Fri, Aug 12, 2022 at 05:10:51PM -0400, Felix Kuehling wrote:
> On 2022-08-12 02:20, Dan Carpenter wrote:
> > This code has two bugs. If kfd_topology_device_by_proximity_domain()
> > failed on the first iteration through the loop then "cpu_link" is
> > uninitialized and should not be dereferenced.
> >
> > The second bug is that we cannot dereference a list iterator when it
> > points to the list head. In other words, if we exit the
> > list_for_each_entry() loop exits without hitting a break then "cpu_link"
> > is not a valid pointer and should not be dereferenced.
> >
> > Fix both of these problems by setting "cpu_link" to NULL when it is invalid
> > and non-NULL when it is valid. That makes it easier to test for
> > valid vs invalid.
> >
> > Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links")
> > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> > ---
> > I reported these in June but never heard back.
>
> I thought Ramesh implemented a fix for this:
> https://lore.kernel.org/all/20220706183302.1719795-1-Ramesh.Errabolu@amd.com/
>
> You commented on a version of his patch:
> https://lore.kernel.org/all/20220629161241.GM11460@kadam/
Oh, Sorry! I appologize for forgetting that.
>
> Did this get lost somehow? Anyway, your patch looks good to me and I'm going
> to apply it to amd-staging-drm-next now.
>
> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Looking at Ramesh's patch now, he added a continue if
kfd_topology_device_by_proximity_domain() failed. That is a behavior
change, but it might also be a bug fix.
My patch does not change the behavior except for eliminating the crash
so I stand by my patch, but adding the continue might be a good thing to
do as a separate step. I don't know the code well enough to say one way
or the other.
regards,
dan carpenter
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-08-15 11:25 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-12 6:20 [PATCH] drm/amdkfd: potential crash in kfd_create_indirect_link_prop() Dan Carpenter
2022-08-12 6:20 ` Dan Carpenter
2022-08-12 21:10 ` Felix Kuehling
2022-08-12 21:10 ` Felix Kuehling
2022-08-15 11:23 ` Dan Carpenter
2022-08-15 11:23 ` Dan Carpenter
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.