All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdkfd: potential crash in kfd_create_indirect_link_prop()
@ 2022-08-12  6:20 ` Dan Carpenter
  0 siblings, 0 replies; 6+ messages in thread
From: Dan Carpenter @ 2022-08-12  6:20 UTC (permalink / raw)
  To: Felix Kuehling, Ramesh Errabolu
  Cc: David Airlie, Pan, Xinhui, kernel-janitors, amd-gfx,
	Daniel Vetter, Alex Deucher, Christian König

This code has two bugs.  If kfd_topology_device_by_proximity_domain()
failed on the first iteration through the loop then "cpu_link" is
uninitialized and should not be dereferenced.

The second bug is that we cannot dereference a list iterator when it
points to the list head.  In other words, if we exit the
list_for_each_entry() loop exits without hitting a break then "cpu_link"
is not a valid pointer and should not be dereferenced.

Fix both of these problems by setting "cpu_link" to NULL when it is invalid
and non-NULL when it is valid.  That makes it easier to test for
valid vs invalid.

Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
---
I reported these in June but never heard back.

 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 25990bec600d..3f0a4a415907 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1392,8 +1392,8 @@ static int kfd_build_p2p_node_entry(struct kfd_topology_device *dev,
 
 static int kfd_create_indirect_link_prop(struct kfd_topology_device *kdev, int gpu_node)
 {
+	struct kfd_iolink_properties *gpu_link, *tmp_link, *cpu_link;
 	struct kfd_iolink_properties *props = NULL, *props2 = NULL;
-	struct kfd_iolink_properties *gpu_link, *cpu_link;
 	struct kfd_topology_device *cpu_dev;
 	int ret = 0;
 	int i, num_cpu;
@@ -1416,16 +1416,19 @@ static int kfd_create_indirect_link_prop(struct kfd_topology_device *kdev, int g
 			continue;
 
 		/* find CPU <-->  CPU links */
+		cpu_link = NULL;
 		cpu_dev = kfd_topology_device_by_proximity_domain(i);
 		if (cpu_dev) {
-			list_for_each_entry(cpu_link,
+			list_for_each_entry(tmp_link,
 					&cpu_dev->io_link_props, list) {
-				if (cpu_link->node_to == gpu_link->node_to)
+				if (tmp_link->node_to == gpu_link->node_to) {
+					cpu_link = tmp_link;
 					break;
+				}
 			}
 		}
 
-		if (cpu_link->node_to != gpu_link->node_to)
+		if (!cpu_link)
 			return -ENOMEM;
 
 		/* CPU <--> CPU <--> GPU, GPU node*/
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] drm/amdkfd: potential crash in kfd_create_indirect_link_prop()
@ 2022-08-12  6:20 ` Dan Carpenter
  0 siblings, 0 replies; 6+ messages in thread
From: Dan Carpenter @ 2022-08-12  6:20 UTC (permalink / raw)
  To: Felix Kuehling, Ramesh Errabolu
  Cc: Alex Deucher, Christian König, Pan, Xinhui, David Airlie,
	Daniel Vetter, amd-gfx, kernel-janitors

This code has two bugs.  If kfd_topology_device_by_proximity_domain()
failed on the first iteration through the loop then "cpu_link" is
uninitialized and should not be dereferenced.

The second bug is that we cannot dereference a list iterator when it
points to the list head.  In other words, if we exit the
list_for_each_entry() loop exits without hitting a break then "cpu_link"
is not a valid pointer and should not be dereferenced.

Fix both of these problems by setting "cpu_link" to NULL when it is invalid
and non-NULL when it is valid.  That makes it easier to test for
valid vs invalid.

Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
---
I reported these in June but never heard back.

 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 25990bec600d..3f0a4a415907 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1392,8 +1392,8 @@ static int kfd_build_p2p_node_entry(struct kfd_topology_device *dev,
 
 static int kfd_create_indirect_link_prop(struct kfd_topology_device *kdev, int gpu_node)
 {
+	struct kfd_iolink_properties *gpu_link, *tmp_link, *cpu_link;
 	struct kfd_iolink_properties *props = NULL, *props2 = NULL;
-	struct kfd_iolink_properties *gpu_link, *cpu_link;
 	struct kfd_topology_device *cpu_dev;
 	int ret = 0;
 	int i, num_cpu;
@@ -1416,16 +1416,19 @@ static int kfd_create_indirect_link_prop(struct kfd_topology_device *kdev, int g
 			continue;
 
 		/* find CPU <-->  CPU links */
+		cpu_link = NULL;
 		cpu_dev = kfd_topology_device_by_proximity_domain(i);
 		if (cpu_dev) {
-			list_for_each_entry(cpu_link,
+			list_for_each_entry(tmp_link,
 					&cpu_dev->io_link_props, list) {
-				if (cpu_link->node_to == gpu_link->node_to)
+				if (tmp_link->node_to == gpu_link->node_to) {
+					cpu_link = tmp_link;
 					break;
+				}
 			}
 		}
 
-		if (cpu_link->node_to != gpu_link->node_to)
+		if (!cpu_link)
 			return -ENOMEM;
 
 		/* CPU <--> CPU <--> GPU, GPU node*/
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/amdkfd: potential crash in kfd_create_indirect_link_prop()
  2022-08-12  6:20 ` Dan Carpenter
@ 2022-08-12 21:10   ` Felix Kuehling
  -1 siblings, 0 replies; 6+ messages in thread
From: Felix Kuehling @ 2022-08-12 21:10 UTC (permalink / raw)
  To: Dan Carpenter, Ramesh Errabolu
  Cc: David Airlie, Pan, Xinhui, kernel-janitors, amd-gfx,
	Daniel Vetter, Alex Deucher, Christian König

On 2022-08-12 02:20, Dan Carpenter wrote:
> This code has two bugs.  If kfd_topology_device_by_proximity_domain()
> failed on the first iteration through the loop then "cpu_link" is
> uninitialized and should not be dereferenced.
>
> The second bug is that we cannot dereference a list iterator when it
> points to the list head.  In other words, if we exit the
> list_for_each_entry() loop exits without hitting a break then "cpu_link"
> is not a valid pointer and should not be dereferenced.
>
> Fix both of these problems by setting "cpu_link" to NULL when it is invalid
> and non-NULL when it is valid.  That makes it easier to test for
> valid vs invalid.
>
> Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> ---
> I reported these in June but never heard back.

I thought Ramesh implemented a fix for this: 
https://lore.kernel.org/all/20220706183302.1719795-1-Ramesh.Errabolu@amd.com/

You commented on a version of his patch: 
https://lore.kernel.org/all/20220629161241.GM11460@kadam/

Did this get lost somehow? Anyway, your patch looks good to me and I'm 
going to apply it to amd-staging-drm-next now.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>

Thanks,
   Felix


>
>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 11 +++++++----
>   1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index 25990bec600d..3f0a4a415907 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -1392,8 +1392,8 @@ static int kfd_build_p2p_node_entry(struct kfd_topology_device *dev,
>   
>   static int kfd_create_indirect_link_prop(struct kfd_topology_device *kdev, int gpu_node)
>   {
> +	struct kfd_iolink_properties *gpu_link, *tmp_link, *cpu_link;
>   	struct kfd_iolink_properties *props = NULL, *props2 = NULL;
> -	struct kfd_iolink_properties *gpu_link, *cpu_link;
>   	struct kfd_topology_device *cpu_dev;
>   	int ret = 0;
>   	int i, num_cpu;
> @@ -1416,16 +1416,19 @@ static int kfd_create_indirect_link_prop(struct kfd_topology_device *kdev, int g
>   			continue;
>   
>   		/* find CPU <-->  CPU links */
> +		cpu_link = NULL;
>   		cpu_dev = kfd_topology_device_by_proximity_domain(i);
>   		if (cpu_dev) {
> -			list_for_each_entry(cpu_link,
> +			list_for_each_entry(tmp_link,
>   					&cpu_dev->io_link_props, list) {
> -				if (cpu_link->node_to == gpu_link->node_to)
> +				if (tmp_link->node_to == gpu_link->node_to) {
> +					cpu_link = tmp_link;
>   					break;
> +				}
>   			}
>   		}
>   
> -		if (cpu_link->node_to != gpu_link->node_to)
> +		if (!cpu_link)
>   			return -ENOMEM;
>   
>   		/* CPU <--> CPU <--> GPU, GPU node*/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/amdkfd: potential crash in kfd_create_indirect_link_prop()
@ 2022-08-12 21:10   ` Felix Kuehling
  0 siblings, 0 replies; 6+ messages in thread
From: Felix Kuehling @ 2022-08-12 21:10 UTC (permalink / raw)
  To: Dan Carpenter, Ramesh Errabolu
  Cc: Alex Deucher, Christian König, Pan, Xinhui, David Airlie,
	Daniel Vetter, amd-gfx, kernel-janitors

On 2022-08-12 02:20, Dan Carpenter wrote:
> This code has two bugs.  If kfd_topology_device_by_proximity_domain()
> failed on the first iteration through the loop then "cpu_link" is
> uninitialized and should not be dereferenced.
>
> The second bug is that we cannot dereference a list iterator when it
> points to the list head.  In other words, if we exit the
> list_for_each_entry() loop exits without hitting a break then "cpu_link"
> is not a valid pointer and should not be dereferenced.
>
> Fix both of these problems by setting "cpu_link" to NULL when it is invalid
> and non-NULL when it is valid.  That makes it easier to test for
> valid vs invalid.
>
> Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> ---
> I reported these in June but never heard back.

I thought Ramesh implemented a fix for this: 
https://lore.kernel.org/all/20220706183302.1719795-1-Ramesh.Errabolu@amd.com/

You commented on a version of his patch: 
https://lore.kernel.org/all/20220629161241.GM11460@kadam/

Did this get lost somehow? Anyway, your patch looks good to me and I'm 
going to apply it to amd-staging-drm-next now.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>

Thanks,
   Felix


>
>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 11 +++++++----
>   1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index 25990bec600d..3f0a4a415907 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -1392,8 +1392,8 @@ static int kfd_build_p2p_node_entry(struct kfd_topology_device *dev,
>   
>   static int kfd_create_indirect_link_prop(struct kfd_topology_device *kdev, int gpu_node)
>   {
> +	struct kfd_iolink_properties *gpu_link, *tmp_link, *cpu_link;
>   	struct kfd_iolink_properties *props = NULL, *props2 = NULL;
> -	struct kfd_iolink_properties *gpu_link, *cpu_link;
>   	struct kfd_topology_device *cpu_dev;
>   	int ret = 0;
>   	int i, num_cpu;
> @@ -1416,16 +1416,19 @@ static int kfd_create_indirect_link_prop(struct kfd_topology_device *kdev, int g
>   			continue;
>   
>   		/* find CPU <-->  CPU links */
> +		cpu_link = NULL;
>   		cpu_dev = kfd_topology_device_by_proximity_domain(i);
>   		if (cpu_dev) {
> -			list_for_each_entry(cpu_link,
> +			list_for_each_entry(tmp_link,
>   					&cpu_dev->io_link_props, list) {
> -				if (cpu_link->node_to == gpu_link->node_to)
> +				if (tmp_link->node_to == gpu_link->node_to) {
> +					cpu_link = tmp_link;
>   					break;
> +				}
>   			}
>   		}
>   
> -		if (cpu_link->node_to != gpu_link->node_to)
> +		if (!cpu_link)
>   			return -ENOMEM;
>   
>   		/* CPU <--> CPU <--> GPU, GPU node*/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/amdkfd: potential crash in kfd_create_indirect_link_prop()
  2022-08-12 21:10   ` Felix Kuehling
@ 2022-08-15 11:23     ` Dan Carpenter
  -1 siblings, 0 replies; 6+ messages in thread
From: Dan Carpenter @ 2022-08-15 11:23 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: David Airlie, Pan, Xinhui, kernel-janitors, amd-gfx,
	Ramesh Errabolu, Daniel Vetter, Alex Deucher,
	Christian König

On Fri, Aug 12, 2022 at 05:10:51PM -0400, Felix Kuehling wrote:
> On 2022-08-12 02:20, Dan Carpenter wrote:
> > This code has two bugs.  If kfd_topology_device_by_proximity_domain()
> > failed on the first iteration through the loop then "cpu_link" is
> > uninitialized and should not be dereferenced.
> > 
> > The second bug is that we cannot dereference a list iterator when it
> > points to the list head.  In other words, if we exit the
> > list_for_each_entry() loop exits without hitting a break then "cpu_link"
> > is not a valid pointer and should not be dereferenced.
> > 
> > Fix both of these problems by setting "cpu_link" to NULL when it is invalid
> > and non-NULL when it is valid.  That makes it easier to test for
> > valid vs invalid.
> > 
> > Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links")
> > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> > ---
> > I reported these in June but never heard back.
> 
> I thought Ramesh implemented a fix for this:
> https://lore.kernel.org/all/20220706183302.1719795-1-Ramesh.Errabolu@amd.com/
> 
> You commented on a version of his patch:
> https://lore.kernel.org/all/20220629161241.GM11460@kadam/

Oh, Sorry!  I appologize for forgetting that.

> 
> Did this get lost somehow? Anyway, your patch looks good to me and I'm going
> to apply it to amd-staging-drm-next now.
> 
> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>

Looking at Ramesh's patch now, he added a continue if
kfd_topology_device_by_proximity_domain() failed.  That is a behavior
change, but it might also be a bug fix.

My patch does not change the behavior except for eliminating the crash
so I stand by my patch, but adding the continue might be a good thing to
do as a separate step.  I don't know the code well enough to say one way
or the other.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/amdkfd: potential crash in kfd_create_indirect_link_prop()
@ 2022-08-15 11:23     ` Dan Carpenter
  0 siblings, 0 replies; 6+ messages in thread
From: Dan Carpenter @ 2022-08-15 11:23 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: Ramesh Errabolu, Alex Deucher, Christian König, Pan, Xinhui,
	David Airlie, Daniel Vetter, amd-gfx, kernel-janitors

On Fri, Aug 12, 2022 at 05:10:51PM -0400, Felix Kuehling wrote:
> On 2022-08-12 02:20, Dan Carpenter wrote:
> > This code has two bugs.  If kfd_topology_device_by_proximity_domain()
> > failed on the first iteration through the loop then "cpu_link" is
> > uninitialized and should not be dereferenced.
> > 
> > The second bug is that we cannot dereference a list iterator when it
> > points to the list head.  In other words, if we exit the
> > list_for_each_entry() loop exits without hitting a break then "cpu_link"
> > is not a valid pointer and should not be dereferenced.
> > 
> > Fix both of these problems by setting "cpu_link" to NULL when it is invalid
> > and non-NULL when it is valid.  That makes it easier to test for
> > valid vs invalid.
> > 
> > Fixes: 0f28cca87e9a ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links")
> > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> > ---
> > I reported these in June but never heard back.
> 
> I thought Ramesh implemented a fix for this:
> https://lore.kernel.org/all/20220706183302.1719795-1-Ramesh.Errabolu@amd.com/
> 
> You commented on a version of his patch:
> https://lore.kernel.org/all/20220629161241.GM11460@kadam/

Oh, Sorry!  I appologize for forgetting that.

> 
> Did this get lost somehow? Anyway, your patch looks good to me and I'm going
> to apply it to amd-staging-drm-next now.
> 
> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>

Looking at Ramesh's patch now, he added a continue if
kfd_topology_device_by_proximity_domain() failed.  That is a behavior
change, but it might also be a bug fix.

My patch does not change the behavior except for eliminating the crash
so I stand by my patch, but adding the continue might be a good thing to
do as a separate step.  I don't know the code well enough to say one way
or the other.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-08-15 11:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-12  6:20 [PATCH] drm/amdkfd: potential crash in kfd_create_indirect_link_prop() Dan Carpenter
2022-08-12  6:20 ` Dan Carpenter
2022-08-12 21:10 ` Felix Kuehling
2022-08-12 21:10   ` Felix Kuehling
2022-08-15 11:23   ` Dan Carpenter
2022-08-15 11:23     ` Dan Carpenter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.