* [PATCH v2 02/08] drm/amdgpu: screen freeze and userq driver crash
2024-09-25 19:59 [PATCH v2 01/08] drm/amdgpu: Implement userqueue signal/wait IOCTL Arunpravin Paneer Selvam
@ 2024-09-25 19:59 ` Arunpravin Paneer Selvam
2024-09-26 11:38 ` Christian König
2024-09-25 19:59 ` [PATCH v2 03/08] drm/amdgpu: Add wait IOCTL timeline syncobj support Arunpravin Paneer Selvam
` (6 subsequent siblings)
7 siblings, 1 reply; 18+ messages in thread
From: Arunpravin Paneer Selvam @ 2024-09-25 19:59 UTC (permalink / raw)
To: amd-gfx; +Cc: christian.koenig, alexander.deucher, Arunpravin Paneer Selvam
Screen freeze and userq fence driver crash while playing Xonotic
v2: (Christian)
- There is change that fence might signal in between testing
and grabbing the lock. Hence we can move the lock above the
if..else check and use the dma_fence_is_signaled_locked().
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
index 96d1caf4c815..97b1af574407 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
@@ -185,6 +185,7 @@ int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
struct amdgpu_userq_fence_driver *fence_drv;
struct amdgpu_userq_fence *userq_fence;
struct dma_fence *fence;
+ unsigned long flags;
fence_drv = userq->fence_drv;
if (!fence_drv)
@@ -232,14 +233,14 @@ int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
userq_fence->fence_drv_array_count = 0;
}
- spin_lock(&fence_drv->fence_list_lock);
/* Check if hardware has already processed the job */
- if (!dma_fence_is_signaled(fence))
+ spin_lock_irqsave(&fence_drv->fence_list_lock, flags);
+ if (!dma_fence_is_signaled_locked(fence))
list_add_tail(&userq_fence->link, &fence_drv->fences);
else
dma_fence_put(fence);
- spin_unlock(&fence_drv->fence_list_lock);
+ spin_unlock_irqrestore(&fence_drv->fence_list_lock, flags);
*f = fence;
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v2 02/08] drm/amdgpu: screen freeze and userq driver crash
2024-09-25 19:59 ` [PATCH v2 02/08] drm/amdgpu: screen freeze and userq driver crash Arunpravin Paneer Selvam
@ 2024-09-26 11:38 ` Christian König
0 siblings, 0 replies; 18+ messages in thread
From: Christian König @ 2024-09-26 11:38 UTC (permalink / raw)
To: Arunpravin Paneer Selvam, amd-gfx; +Cc: christian.koenig, alexander.deucher
Am 25.09.24 um 21:59 schrieb Arunpravin Paneer Selvam:
> Screen freeze and userq fence driver crash while playing Xonotic
>
> v2: (Christian)
> - There is change that fence might signal in between testing
> and grabbing the lock. Hence we can move the lock above the
> if..else check and use the dma_fence_is_signaled_locked().
>
> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> index 96d1caf4c815..97b1af574407 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> @@ -185,6 +185,7 @@ int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
> struct amdgpu_userq_fence_driver *fence_drv;
> struct amdgpu_userq_fence *userq_fence;
> struct dma_fence *fence;
> + unsigned long flags;
>
> fence_drv = userq->fence_drv;
> if (!fence_drv)
> @@ -232,14 +233,14 @@ int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
> userq_fence->fence_drv_array_count = 0;
> }
>
> - spin_lock(&fence_drv->fence_list_lock);
> /* Check if hardware has already processed the job */
> - if (!dma_fence_is_signaled(fence))
> + spin_lock_irqsave(&fence_drv->fence_list_lock, flags);
> + if (!dma_fence_is_signaled_locked(fence))
> list_add_tail(&userq_fence->link, &fence_drv->fences);
> else
> dma_fence_put(fence);
>
> - spin_unlock(&fence_drv->fence_list_lock);
> + spin_unlock_irqrestore(&fence_drv->fence_list_lock, flags);
>
> *f = fence;
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 03/08] drm/amdgpu: Add wait IOCTL timeline syncobj support
2024-09-25 19:59 [PATCH v2 01/08] drm/amdgpu: Implement userqueue signal/wait IOCTL Arunpravin Paneer Selvam
2024-09-25 19:59 ` [PATCH v2 02/08] drm/amdgpu: screen freeze and userq driver crash Arunpravin Paneer Selvam
@ 2024-09-25 19:59 ` Arunpravin Paneer Selvam
2024-09-26 11:49 ` Christian König
2024-09-25 19:59 ` [PATCH v2 04/08] drm/amdgpu: Enable userq fence interrupt support Arunpravin Paneer Selvam
` (5 subsequent siblings)
7 siblings, 1 reply; 18+ messages in thread
From: Arunpravin Paneer Selvam @ 2024-09-25 19:59 UTC (permalink / raw)
To: amd-gfx; +Cc: christian.koenig, alexander.deucher, Arunpravin Paneer Selvam
Add user fence wait IOCTL timeline syncobj support.
v2:(Christian)
- handle dma_fence_wait() return value.
- shorten the variable name syncobj_timeline_points a bit.
- move num_points up to avoid padding issues.
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
---
.../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 82 +++++++++++++++++--
include/uapi/drm/amdgpu_drm.h | 16 +++-
2 files changed, 88 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
index 97b1af574407..2465ca307644 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
@@ -474,11 +474,11 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
struct drm_file *filp)
{
+ u32 *syncobj_handles, *timeline_points, *timeline_handles, *bo_handles;
+ u32 num_syncobj, num_bo_handles, num_points;
struct drm_amdgpu_userq_fence_info *fence_info = NULL;
struct drm_amdgpu_userq_wait *wait_info = data;
- u32 *syncobj_handles, *bo_handles;
struct dma_fence **fences = NULL;
- u32 num_syncobj, num_bo_handles;
struct drm_gem_object **gobj;
struct drm_exec exec;
int r, i, entry, cnt;
@@ -498,11 +498,26 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
goto free_bo_handles;
}
+ num_points = wait_info->num_points;
+ timeline_handles = memdup_user(u64_to_user_ptr(wait_info->syncobj_timeline_handles),
+ sizeof(u32) * num_points);
+ if (IS_ERR(timeline_handles)) {
+ r = PTR_ERR(timeline_handles);
+ goto free_syncobj_handles;
+ }
+
+ timeline_points = memdup_user(u64_to_user_ptr(wait_info->syncobj_timeline_points),
+ sizeof(u32) * num_points);
+ if (IS_ERR(timeline_points)) {
+ r = PTR_ERR(timeline_points);
+ goto free_timeline_handles;
+ }
+
/* Array of GEM object handles */
gobj = kmalloc_array(num_bo_handles, sizeof(*gobj), GFP_KERNEL);
if (!gobj) {
r = -ENOMEM;
- goto free_syncobj_handles;
+ goto free_timeline_points;
}
for (entry = 0; entry < num_bo_handles; entry++) {
@@ -524,17 +539,34 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
}
if (!wait_info->num_fences) {
+ if (num_points) {
+ struct dma_fence *fence;
+
+ for (i = 0; i < num_points; i++) {
+ r = drm_syncobj_find_fence(filp, timeline_handles[i],
+ timeline_points[i],
+ DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,
+ &fence);
+ if (r || !fence)
+ continue;
+
+ dma_fence_put(fence);
+ num_fences++;
+ }
+ }
+
/* Count syncobj's fence */
for (i = 0; i < num_syncobj; i++) {
struct dma_fence *fence;
r = drm_syncobj_find_fence(filp, syncobj_handles[i],
- 0, 0, &fence);
- dma_fence_put(fence);
-
+ 0,
+ DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,
+ &fence);
if (r || !fence)
continue;
+ dma_fence_put(fence);
num_fences++;
}
@@ -589,12 +621,34 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
}
}
+ if (num_points) {
+ struct dma_fence *fence;
+
+ for (i = 0; i < num_points; i++) {
+ r = drm_syncobj_find_fence(filp, timeline_handles[i],
+ timeline_points[i],
+ DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,
+ &fence);
+ if (r || !fence)
+ continue;
+
+ if (WARN_ON_ONCE(num_fences >= wait_info->num_fences)) {
+ r = -EINVAL;
+ goto free_fences;
+ }
+
+ fences[num_fences++] = fence;
+ }
+ }
+
/* Retrieve syncobj's fence */
for (i = 0; i < num_syncobj; i++) {
struct dma_fence *fence;
r = drm_syncobj_find_fence(filp, syncobj_handles[i],
- 0, 0, &fence);
+ 0,
+ DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,
+ &fence);
if (r || !fence)
continue;
@@ -617,9 +671,13 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
* Just waiting on other driver fences should
* be good for now
*/
- dma_fence_wait(fences[i], false);
- dma_fence_put(fences[i]);
+ r = dma_fence_wait(fences[i], true);
+ if (r) {
+ dma_fence_put(fences[i]);
+ goto free_fences;
+ }
+ dma_fence_put(fences[i]);
continue;
}
@@ -665,6 +723,8 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
drm_gem_object_put(gobj[i]);
kfree(gobj);
+ kfree(timeline_points);
+ kfree(timeline_handles);
kfree(syncobj_handles);
kfree(bo_handles);
@@ -682,6 +742,10 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
while (entry-- > 0)
drm_gem_object_put(gobj[entry]);
kfree(gobj);
+free_timeline_points:
+ kfree(timeline_points);
+free_timeline_handles:
+ kfree(timeline_handles);
free_syncobj_handles:
kfree(syncobj_handles);
free_bo_handles:
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index af42798e901d..3b24e0cb1b51 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -521,12 +521,26 @@ struct drm_amdgpu_userq_wait {
* matching fence wait info pair in @userq_fence_info.
*/
__u32 bo_wait_flags;
- __u32 pad;
+ /**
+ * @num_points: A count that represents the number of timeline syncobj handles in
+ * syncobj_handles_array.
+ */
+ __u32 num_points;
/**
* @syncobj_handles_array: An array of syncobj handles defined to get the
* fence wait information of every syncobj handles in the array.
*/
__u64 syncobj_handles_array;
+ /**
+ * @syncobj_timeline_handles: An array of timeline syncobj handles defined to get the
+ * fence wait information of every timeline syncobj handles in the array.
+ */
+ __u64 syncobj_timeline_handles;
+ /**
+ * @syncobj_timeline_points: An array of timeline syncobj points defined to get the
+ * fence wait points of every timeline syncobj handles in the syncobj_handles_array.
+ */
+ __u64 syncobj_timeline_points;
/**
* @bo_handles_array: An array of GEM BO handles defined to fetch the fence
* wait information of every BO handles in the array.
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v2 03/08] drm/amdgpu: Add wait IOCTL timeline syncobj support
2024-09-25 19:59 ` [PATCH v2 03/08] drm/amdgpu: Add wait IOCTL timeline syncobj support Arunpravin Paneer Selvam
@ 2024-09-26 11:49 ` Christian König
0 siblings, 0 replies; 18+ messages in thread
From: Christian König @ 2024-09-26 11:49 UTC (permalink / raw)
To: Arunpravin Paneer Selvam, amd-gfx; +Cc: christian.koenig, alexander.deucher
Am 25.09.24 um 21:59 schrieb Arunpravin Paneer Selvam:
> Add user fence wait IOCTL timeline syncobj support.
>
> v2:(Christian)
> - handle dma_fence_wait() return value.
> - shorten the variable name syncobj_timeline_points a bit.
> - move num_points up to avoid padding issues.
>
> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
> ---
> .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 82 +++++++++++++++++--
> include/uapi/drm/amdgpu_drm.h | 16 +++-
> 2 files changed, 88 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> index 97b1af574407..2465ca307644 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> @@ -474,11 +474,11 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
> int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
> struct drm_file *filp)
> {
> + u32 *syncobj_handles, *timeline_points, *timeline_handles, *bo_handles;
> + u32 num_syncobj, num_bo_handles, num_points;
> struct drm_amdgpu_userq_fence_info *fence_info = NULL;
> struct drm_amdgpu_userq_wait *wait_info = data;
> - u32 *syncobj_handles, *bo_handles;
> struct dma_fence **fences = NULL;
> - u32 num_syncobj, num_bo_handles;
> struct drm_gem_object **gobj;
> struct drm_exec exec;
> int r, i, entry, cnt;
> @@ -498,11 +498,26 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
> goto free_bo_handles;
> }
>
> + num_points = wait_info->num_points;
> + timeline_handles = memdup_user(u64_to_user_ptr(wait_info->syncobj_timeline_handles),
> + sizeof(u32) * num_points);
> + if (IS_ERR(timeline_handles)) {
> + r = PTR_ERR(timeline_handles);
> + goto free_syncobj_handles;
> + }
> +
> + timeline_points = memdup_user(u64_to_user_ptr(wait_info->syncobj_timeline_points),
> + sizeof(u32) * num_points);
> + if (IS_ERR(timeline_points)) {
> + r = PTR_ERR(timeline_points);
> + goto free_timeline_handles;
> + }
> +
> /* Array of GEM object handles */
> gobj = kmalloc_array(num_bo_handles, sizeof(*gobj), GFP_KERNEL);
> if (!gobj) {
> r = -ENOMEM;
> - goto free_syncobj_handles;
> + goto free_timeline_points;
> }
>
> for (entry = 0; entry < num_bo_handles; entry++) {
> @@ -524,17 +539,34 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
> }
>
> if (!wait_info->num_fences) {
> + if (num_points) {
> + struct dma_fence *fence;
> ++
> + for (i = 0; i < num_points; i++) {
> + r = drm_syncobj_find_fence(filp, timeline_handles[i],
> + timeline_points[i],
> + DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,
> + &fence);
> + if (r || !fence)
You can't simply ignore errors here. That needs some goto error handling
and cleanup.
> + continue;
> +
> + dma_fence_put(fence);
> + num_fences++;
There can be more than one fence in the timeline we need to wait for.
You should probably use dma_fence_unwrap_for_each() here.
> + }
> + }
> +
> /* Count syncobj's fence */
> for (i = 0; i < num_syncobj; i++) {
> struct dma_fence *fence;
>
> r = drm_syncobj_find_fence(filp, syncobj_handles[i],
> - 0, 0, &fence);
> - dma_fence_put(fence);
> -
> + 0,
> + DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,
> + &fence);
> if (r || !fence)
> continue;
>
> + dma_fence_put(fence);
> num_fences++;
> }
>
> @@ -589,12 +621,34 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
> }
> }
>
> + if (num_points) {
> + struct dma_fence *fence;
> +
> + for (i = 0; i < num_points; i++) {
> + r = drm_syncobj_find_fence(filp, timeline_handles[i],
> + timeline_points[i],
> + DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,
> + &fence);
> + if (r || !fence)
Same here, just ignoring errors is a no-go.
Regards,
Christian.
> + continue;
> +
> + if (WARN_ON_ONCE(num_fences >= wait_info->num_fences)) {
> + r = -EINVAL;
> + goto free_fences;
> + }
> +
> + fences[num_fences++] = fence;
> + }
> + }
> +
> /* Retrieve syncobj's fence */
> for (i = 0; i < num_syncobj; i++) {
> struct dma_fence *fence;
>
> r = drm_syncobj_find_fence(filp, syncobj_handles[i],
> - 0, 0, &fence);
> + 0,
> + DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,
> + &fence);
> if (r || !fence)
> continue;
>
> @@ -617,9 +671,13 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
> * Just waiting on other driver fences should
> * be good for now
> */
> - dma_fence_wait(fences[i], false);
> - dma_fence_put(fences[i]);
> + r = dma_fence_wait(fences[i], true);
> + if (r) {
> + dma_fence_put(fences[i]);
> + goto free_fences;
> + }
>
> + dma_fence_put(fences[i]);
> continue;
> }
>
> @@ -665,6 +723,8 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
> drm_gem_object_put(gobj[i]);
> kfree(gobj);
>
> + kfree(timeline_points);
> + kfree(timeline_handles);
> kfree(syncobj_handles);
> kfree(bo_handles);
>
> @@ -682,6 +742,10 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
> while (entry-- > 0)
> drm_gem_object_put(gobj[entry]);
> kfree(gobj);
> +free_timeline_points:
> + kfree(timeline_points);
> +free_timeline_handles:
> + kfree(timeline_handles);
> free_syncobj_handles:
> kfree(syncobj_handles);
> free_bo_handles:
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index af42798e901d..3b24e0cb1b51 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -521,12 +521,26 @@ struct drm_amdgpu_userq_wait {
> * matching fence wait info pair in @userq_fence_info.
> */
> __u32 bo_wait_flags;
> - __u32 pad;
> + /**
> + * @num_points: A count that represents the number of timeline syncobj handles in
> + * syncobj_handles_array.
> + */
> + __u32 num_points;
> /**
> * @syncobj_handles_array: An array of syncobj handles defined to get the
> * fence wait information of every syncobj handles in the array.
> */
> __u64 syncobj_handles_array;
> + /**
> + * @syncobj_timeline_handles: An array of timeline syncobj handles defined to get the
> + * fence wait information of every timeline syncobj handles in the array.
> + */
> + __u64 syncobj_timeline_handles;
> + /**
> + * @syncobj_timeline_points: An array of timeline syncobj points defined to get the
> + * fence wait points of every timeline syncobj handles in the syncobj_handles_array.
> + */
> + __u64 syncobj_timeline_points;
> /**
> * @bo_handles_array: An array of GEM BO handles defined to fetch the fence
> * wait information of every BO handles in the array.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 04/08] drm/amdgpu: Enable userq fence interrupt support
2024-09-25 19:59 [PATCH v2 01/08] drm/amdgpu: Implement userqueue signal/wait IOCTL Arunpravin Paneer Selvam
2024-09-25 19:59 ` [PATCH v2 02/08] drm/amdgpu: screen freeze and userq driver crash Arunpravin Paneer Selvam
2024-09-25 19:59 ` [PATCH v2 03/08] drm/amdgpu: Add wait IOCTL timeline syncobj support Arunpravin Paneer Selvam
@ 2024-09-25 19:59 ` Arunpravin Paneer Selvam
2024-09-25 19:59 ` [PATCH v2 05/08] drm/amdgpu: Remove the MES self test Arunpravin Paneer Selvam
` (4 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Arunpravin Paneer Selvam @ 2024-09-25 19:59 UTC (permalink / raw)
To: amd-gfx; +Cc: christian.koenig, alexander.deucher, Arunpravin Paneer Selvam
Add support to handle the userqueue protected fence signal hardware
interrupt.
Create a xarray which maps the doorbell index to the fence driver address.
This would help to retrieve the fence driver information when an userq fence
interrupt is triggered. Firmware sends the doorbell offset value and
this info is compared with the queue's mqd doorbell offset value.
If they are same, we process the userq fence interrupt.
v1:(Christian):
- use xa_load to extract the fence driver.
- move the amdgpu_userq_fence_driver_process call within the xa_lock
as there is a chance that fence_drv might be freed.
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++
.../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 6 +++++
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 25 +++++++++----------
3 files changed, 20 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 94bdb5fa6ebc..2da895f91e4d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3967,6 +3967,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
spin_lock_init(&adev->audio_endpt_idx_lock);
spin_lock_init(&adev->mm_stats.lock);
+ xa_init_flags(&adev->userq_xa, XA_FLAGS_LOCK_IRQ);
+
INIT_LIST_HEAD(&adev->shadow_list);
mutex_init(&adev->shadow_list_lock);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
index 2465ca307644..f1689a8b5723 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
@@ -70,6 +70,7 @@ int amdgpu_userq_fence_driver_alloc(struct amdgpu_device *adev,
struct amdgpu_usermode_queue *userq)
{
struct amdgpu_userq_fence_driver *fence_drv;
+ unsigned long flags;
int r;
fence_drv = kzalloc(sizeof(*fence_drv), GFP_KERNEL);
@@ -97,6 +98,11 @@ int amdgpu_userq_fence_driver_alloc(struct amdgpu_device *adev,
fence_drv->context = dma_fence_context_alloc(1);
get_task_comm(fence_drv->timeline_name, current);
+ xa_lock_irqsave(&adev->userq_xa, flags);
+ __xa_store(&adev->userq_xa, userq->doorbell_index,
+ fence_drv, GFP_KERNEL);
+ xa_unlock_irqrestore(&adev->userq_xa, flags);
+
userq->fence_drv = fence_drv;
return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 508e0acebb0d..68f908cafb6f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -50,6 +50,7 @@
#include "nbio_v4_3.h"
#include "mes_v11_0.h"
#include "mes_v11_0_userqueue.h"
+#include "amdgpu_userq_fence.h"
#define GFX11_NUM_GFX_RINGS 1
#define GFX11_MEC_HPD_SIZE 2048
@@ -5865,25 +5866,23 @@ static int gfx_v11_0_eop_irq(struct amdgpu_device *adev,
struct amdgpu_irq_src *source,
struct amdgpu_iv_entry *entry)
{
- int i;
+ u32 doorbell_offset = entry->src_data[0];
u8 me_id, pipe_id, queue_id;
struct amdgpu_ring *ring;
- uint32_t mes_queue_id = entry->src_data[0];
+ int i;
DRM_DEBUG("IH: CP EOP\n");
- if (adev->enable_mes && (mes_queue_id & AMDGPU_FENCE_MES_QUEUE_FLAG)) {
- struct amdgpu_mes_queue *queue;
+ if (adev->enable_mes && doorbell_offset) {
+ struct amdgpu_userq_fence_driver *fence_drv = NULL;
+ struct xarray *xa = &adev->userq_xa;
+ unsigned long flags;
- mes_queue_id &= AMDGPU_FENCE_MES_QUEUE_ID_MASK;
-
- spin_lock(&adev->mes.queue_id_lock);
- queue = idr_find(&adev->mes.queue_id_idr, mes_queue_id);
- if (queue) {
- DRM_DEBUG("process mes queue id = %d\n", mes_queue_id);
- amdgpu_fence_process(queue->ring);
- }
- spin_unlock(&adev->mes.queue_id_lock);
+ xa_lock_irqsave(xa, flags);
+ fence_drv = xa_load(xa, doorbell_offset);
+ if (fence_drv)
+ amdgpu_userq_fence_driver_process(fence_drv);
+ xa_unlock_irqrestore(xa, flags);
} else {
me_id = (entry->ring_id & 0x0c) >> 2;
pipe_id = (entry->ring_id & 0x03) >> 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v2 05/08] drm/amdgpu: Remove the MES self test
2024-09-25 19:59 [PATCH v2 01/08] drm/amdgpu: Implement userqueue signal/wait IOCTL Arunpravin Paneer Selvam
` (2 preceding siblings ...)
2024-09-25 19:59 ` [PATCH v2 04/08] drm/amdgpu: Enable userq fence interrupt support Arunpravin Paneer Selvam
@ 2024-09-25 19:59 ` Arunpravin Paneer Selvam
2024-09-25 19:59 ` [PATCH v2 06/08] drm/amdgpu: Add few optimizations to userq fence driver Arunpravin Paneer Selvam
` (3 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Arunpravin Paneer Selvam @ 2024-09-25 19:59 UTC (permalink / raw)
To: amd-gfx; +Cc: christian.koenig, alexander.deucher, Arunpravin Paneer Selvam
Remove MES self test as this conflicts the userqueue fence
interrupts.
v2:(Christian)
- remove the amdgpu_mes_self_test() function and any now unused code.
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 -
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 169 ---------------------
drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 2 -
drivers/gpu/drm/amd/amdgpu/mes_v10_1.c | 12 +-
drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 14 +-
5 files changed, 2 insertions(+), 198 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2da895f91e4d..5ec6cb237c81 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4700,9 +4700,6 @@ int amdgpu_device_resume(struct drm_device *dev, bool fbcon)
}
adev->in_suspend = false;
- if (adev->enable_mes)
- amdgpu_mes_self_test(adev);
-
if (amdgpu_acpi_smart_shift_update(dev, AMDGPU_SS_DEV_D0))
DRM_WARN("smart shift update failed\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index da48b6da0107..dbe7cf4b322d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -1297,175 +1297,6 @@ int amdgpu_mes_ctx_unmap_meta_data(struct amdgpu_device *adev,
return r;
}
-static int amdgpu_mes_test_create_gang_and_queues(struct amdgpu_device *adev,
- int pasid, int *gang_id,
- int queue_type, int num_queue,
- struct amdgpu_ring **added_rings,
- struct amdgpu_mes_ctx_data *ctx_data)
-{
- struct amdgpu_ring *ring;
- struct amdgpu_mes_gang_properties gprops = {0};
- int r, j;
-
- /* create a gang for the process */
- gprops.priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
- gprops.gang_quantum = adev->mes.default_gang_quantum;
- gprops.inprocess_gang_priority = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
- gprops.priority_level = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
- gprops.global_priority_level = AMDGPU_MES_PRIORITY_LEVEL_NORMAL;
-
- r = amdgpu_mes_add_gang(adev, pasid, &gprops, gang_id);
- if (r) {
- DRM_ERROR("failed to add gang\n");
- return r;
- }
-
- /* create queues for the gang */
- for (j = 0; j < num_queue; j++) {
- r = amdgpu_mes_add_ring(adev, *gang_id, queue_type, j,
- ctx_data, &ring);
- if (r) {
- DRM_ERROR("failed to add ring\n");
- break;
- }
-
- DRM_INFO("ring %s was added\n", ring->name);
- added_rings[j] = ring;
- }
-
- return 0;
-}
-
-static int amdgpu_mes_test_queues(struct amdgpu_ring **added_rings)
-{
- struct amdgpu_ring *ring;
- int i, r;
-
- for (i = 0; i < AMDGPU_MES_CTX_MAX_RINGS; i++) {
- ring = added_rings[i];
- if (!ring)
- continue;
-
- r = amdgpu_ring_test_helper(ring);
- if (r)
- return r;
-
- r = amdgpu_ring_test_ib(ring, 1000 * 10);
- if (r) {
- DRM_DEV_ERROR(ring->adev->dev,
- "ring %s ib test failed (%d)\n",
- ring->name, r);
- return r;
- } else
- DRM_INFO("ring %s ib test pass\n", ring->name);
- }
-
- return 0;
-}
-
-int amdgpu_mes_self_test(struct amdgpu_device *adev)
-{
- struct amdgpu_vm *vm = NULL;
- struct amdgpu_mes_ctx_data ctx_data = {0};
- struct amdgpu_ring *added_rings[AMDGPU_MES_CTX_MAX_RINGS] = { NULL };
- int gang_ids[3] = {0};
- int queue_types[][2] = { { AMDGPU_RING_TYPE_GFX, 1 },
- { AMDGPU_RING_TYPE_COMPUTE, 1 },
- { AMDGPU_RING_TYPE_SDMA, 1} };
- int i, r, pasid, k = 0;
-
- pasid = amdgpu_pasid_alloc(16);
- if (pasid < 0) {
- dev_warn(adev->dev, "No more PASIDs available!");
- pasid = 0;
- }
-
- vm = kzalloc(sizeof(*vm), GFP_KERNEL);
- if (!vm) {
- r = -ENOMEM;
- goto error_pasid;
- }
-
- r = amdgpu_vm_init(adev, vm, -1);
- if (r) {
- DRM_ERROR("failed to initialize vm\n");
- goto error_pasid;
- }
-
- r = amdgpu_mes_ctx_alloc_meta_data(adev, &ctx_data);
- if (r) {
- DRM_ERROR("failed to alloc ctx meta data\n");
- goto error_fini;
- }
-
- ctx_data.meta_data_gpu_addr = AMDGPU_VA_RESERVED_SIZE;
- r = amdgpu_mes_ctx_map_meta_data(adev, vm, &ctx_data);
- if (r) {
- DRM_ERROR("failed to map ctx meta data\n");
- goto error_vm;
- }
-
- r = amdgpu_mes_create_process(adev, pasid, vm);
- if (r) {
- DRM_ERROR("failed to create MES process\n");
- goto error_vm;
- }
-
- for (i = 0; i < ARRAY_SIZE(queue_types); i++) {
- /* On GFX v10.3, fw hasn't supported to map sdma queue. */
- if (amdgpu_ip_version(adev, GC_HWIP, 0) >=
- IP_VERSION(10, 3, 0) &&
- amdgpu_ip_version(adev, GC_HWIP, 0) <
- IP_VERSION(11, 0, 0) &&
- queue_types[i][0] == AMDGPU_RING_TYPE_SDMA)
- continue;
-
- r = amdgpu_mes_test_create_gang_and_queues(adev, pasid,
- &gang_ids[i],
- queue_types[i][0],
- queue_types[i][1],
- &added_rings[k],
- &ctx_data);
- if (r)
- goto error_queues;
-
- k += queue_types[i][1];
- }
-
- /* start ring test and ib test for MES queues */
- amdgpu_mes_test_queues(added_rings);
-
-error_queues:
- /* remove all queues */
- for (i = 0; i < ARRAY_SIZE(added_rings); i++) {
- if (!added_rings[i])
- continue;
- amdgpu_mes_remove_ring(adev, added_rings[i]);
- }
-
- for (i = 0; i < ARRAY_SIZE(gang_ids); i++) {
- if (!gang_ids[i])
- continue;
- amdgpu_mes_remove_gang(adev, gang_ids[i]);
- }
-
- amdgpu_mes_destroy_process(adev, pasid);
-
-error_vm:
- amdgpu_mes_ctx_unmap_meta_data(adev, &ctx_data);
-
-error_fini:
- amdgpu_vm_fini(adev, vm);
-
-error_pasid:
- if (pasid)
- amdgpu_pasid_free(pasid);
-
- amdgpu_mes_ctx_free_meta_data(&ctx_data);
- kfree(vm);
- return 0;
-}
-
int amdgpu_mes_init_microcode(struct amdgpu_device *adev, int pipe)
{
const struct mes_firmware_header_v1_0 *mes_hdr;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index 7d4f93fea937..e7af28834766 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -396,8 +396,6 @@ int amdgpu_mes_ctx_map_meta_data(struct amdgpu_device *adev,
int amdgpu_mes_ctx_unmap_meta_data(struct amdgpu_device *adev,
struct amdgpu_mes_ctx_data *ctx_data);
-int amdgpu_mes_self_test(struct amdgpu_device *adev);
-
int amdgpu_mes_doorbell_process_slice(struct amdgpu_device *adev);
/*
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
index 1e5ad1e08d2a..1b2745670d4c 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v10_1.c
@@ -1156,20 +1156,10 @@ static int mes_v10_0_early_init(void *handle)
return 0;
}
-static int mes_v10_0_late_init(void *handle)
-{
- struct amdgpu_device *adev = (struct amdgpu_device *)handle;
-
- if (!amdgpu_in_reset(adev))
- amdgpu_mes_self_test(adev);
-
- return 0;
-}
-
static const struct amd_ip_funcs mes_v10_1_ip_funcs = {
.name = "mes_v10_1",
.early_init = mes_v10_0_early_init,
- .late_init = mes_v10_0_late_init,
+ .late_init = NULL,
.sw_init = mes_v10_1_sw_init,
.sw_fini = mes_v10_1_sw_fini,
.hw_init = mes_v10_1_hw_init,
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 26d71a22395d..38e35c0a2876 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -1257,22 +1257,10 @@ static int mes_v11_0_early_init(void *handle)
return 0;
}
-static int mes_v11_0_late_init(void *handle)
-{
- struct amdgpu_device *adev = (struct amdgpu_device *)handle;
-
- /* it's only intended for use in mes_self_test case, not for s0ix and reset */
- if (!amdgpu_in_reset(adev) && !adev->in_s0ix && !adev->in_suspend &&
- (amdgpu_ip_version(adev, GC_HWIP, 0) != IP_VERSION(11, 0, 3)))
- amdgpu_mes_self_test(adev);
-
- return 0;
-}
-
static const struct amd_ip_funcs mes_v11_0_ip_funcs = {
.name = "mes_v11_0",
.early_init = mes_v11_0_early_init,
- .late_init = mes_v11_0_late_init,
+ .late_init = NULL,
.sw_init = mes_v11_0_sw_init,
.sw_fini = mes_v11_0_sw_fini,
.hw_init = mes_v11_0_hw_init,
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* [PATCH v2 06/08] drm/amdgpu: Add few optimizations to userq fence driver
2024-09-25 19:59 [PATCH v2 01/08] drm/amdgpu: Implement userqueue signal/wait IOCTL Arunpravin Paneer Selvam
` (3 preceding siblings ...)
2024-09-25 19:59 ` [PATCH v2 05/08] drm/amdgpu: Remove the MES self test Arunpravin Paneer Selvam
@ 2024-09-25 19:59 ` Arunpravin Paneer Selvam
2024-09-26 12:28 ` Christian König
2024-09-25 19:59 ` [PATCH v2 07/08] drm/amdgpu: Add the missing error handling for xa_store() call Arunpravin Paneer Selvam
` (2 subsequent siblings)
7 siblings, 1 reply; 18+ messages in thread
From: Arunpravin Paneer Selvam @ 2024-09-25 19:59 UTC (permalink / raw)
To: amd-gfx; +Cc: christian.koenig, alexander.deucher, Arunpravin Paneer Selvam
Add few optimizations to userq fence driver.
v1:(Christian):
- Remove unnecessary comments.
- In drm_exec_init call give num_bo_handles as last parameter it would
making allocation of the array more efficient
- Handle return value of __xa_store() and improve the error handling of
amdgpu_userq_fence_driver_alloc().
v2:(Christian):
- Revert userq_xa xarray init to XA_FLAGS_LOCK_IRQ.
- move the xa_unlock before the error check of the call xa_err(__xa_store())
and moved this change to a separate patch as this is adding a missing error
handling.
- Removed the unnecessary comments.
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
---
.../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 44 ++++++++++++-------
.../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 6 +--
.../gpu/drm/amd/include/amdgpu_userqueue.h | 2 +-
4 files changed, 32 insertions(+), 22 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
index 8d2a0331cd96..f3576c31428c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
@@ -76,7 +76,8 @@ int amdgpu_userq_fence_driver_alloc(struct amdgpu_device *adev,
fence_drv = kzalloc(sizeof(*fence_drv), GFP_KERNEL);
if (!fence_drv) {
DRM_ERROR("Failed to allocate memory for fence driver\n");
- return -ENOMEM;
+ r = -ENOMEM;
+ goto free_fence_drv;
}
/* Acquire seq64 memory */
@@ -84,7 +85,8 @@ int amdgpu_userq_fence_driver_alloc(struct amdgpu_device *adev,
&fence_drv->cpu_addr);
if (r) {
kfree(fence_drv);
- return -ENOMEM;
+ r = -ENOMEM;
+ goto free_seq64;
}
memset(fence_drv->cpu_addr, 0, sizeof(u64));
@@ -94,7 +96,7 @@ int amdgpu_userq_fence_driver_alloc(struct amdgpu_device *adev,
spin_lock_init(&fence_drv->fence_list_lock);
fence_drv->adev = adev;
- fence_drv->uq_fence_drv_xa_ref = &userq->uq_fence_drv_xa;
+ fence_drv->fence_drv_xa_ptr = &userq->fence_drv_xa;
fence_drv->context = dma_fence_context_alloc(1);
get_task_comm(fence_drv->timeline_name, current);
@@ -106,6 +108,13 @@ int amdgpu_userq_fence_driver_alloc(struct amdgpu_device *adev,
userq->fence_drv = fence_drv;
return 0;
+
+free_seq64:
+ amdgpu_seq64_free(adev, fence_drv->gpu_addr);
+free_fence_drv:
+ kfree(fence_drv);
+
+ return r;
}
void amdgpu_userq_fence_driver_process(struct amdgpu_userq_fence_driver *fence_drv)
@@ -147,7 +156,7 @@ void amdgpu_userq_fence_driver_destroy(struct kref *ref)
struct amdgpu_device *adev = fence_drv->adev;
struct amdgpu_userq_fence *fence, *tmp;
struct xarray *xa = &adev->userq_xa;
- unsigned long index;
+ unsigned long index, flags;
struct dma_fence *f;
spin_lock(&fence_drv->fence_list_lock);
@@ -164,11 +173,11 @@ void amdgpu_userq_fence_driver_destroy(struct kref *ref)
}
spin_unlock(&fence_drv->fence_list_lock);
- xa_lock(xa);
+ xa_lock_irqsave(xa, flags);
xa_for_each(xa, index, xa_fence_drv)
if (xa_fence_drv == fence_drv)
__xa_erase(xa, index);
- xa_unlock(xa);
+ xa_unlock_irqrestore(xa, flags);
/* Free seq64 memory */
amdgpu_seq64_free(adev, fence_drv->gpu_addr);
@@ -212,12 +221,12 @@ int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
amdgpu_userq_fence_driver_get(fence_drv);
dma_fence_get(fence);
- if (!xa_empty(&userq->uq_fence_drv_xa)) {
+ if (!xa_empty(&userq->fence_drv_xa)) {
struct amdgpu_userq_fence_driver *stored_fence_drv;
unsigned long index, count = 0;
int i = 0;
- xa_for_each(&userq->uq_fence_drv_xa, index, stored_fence_drv)
+ xa_for_each(&userq->fence_drv_xa, index, stored_fence_drv)
count++;
userq_fence->fence_drv_array =
@@ -226,9 +235,9 @@ int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
GFP_KERNEL);
if (userq_fence->fence_drv_array) {
- xa_for_each(&userq->uq_fence_drv_xa, index, stored_fence_drv) {
+ xa_for_each(&userq->fence_drv_xa, index, stored_fence_drv) {
userq_fence->fence_drv_array[i] = stored_fence_drv;
- xa_erase(&userq->uq_fence_drv_xa, index);
+ xa_erase(&userq->fence_drv_xa, index);
i++;
}
}
@@ -378,7 +387,6 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
struct drm_exec exec;
u64 wptr;
- /* Array of syncobj handles */
num_syncobj_handles = args->num_syncobj_handles;
syncobj_handles = memdup_user(u64_to_user_ptr(args->syncobj_handles_array),
sizeof(u32) * num_syncobj_handles);
@@ -400,7 +408,6 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
}
}
- /* Array of bo handles */
num_bo_handles = args->num_bo_handles;
bo_handles = memdup_user(u64_to_user_ptr(args->bo_handles_array),
sizeof(u32) * num_bo_handles);
@@ -422,7 +429,9 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
}
}
- drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
+ drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, num_bo_handles);
+
+ /* Lock all BOs with retry handling */
drm_exec_until_all_locked(&exec) {
r = drm_exec_prepare_array(&exec, gobj, num_bo_handles, 1);
drm_exec_retry_on_contention(&exec);
@@ -527,7 +536,6 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
goto free_timeline_handles;
}
- /* Array of GEM object handles */
gobj = kmalloc_array(num_bo_handles, sizeof(*gobj), GFP_KERNEL);
if (!gobj) {
r = -ENOMEM;
@@ -542,7 +550,9 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
}
}
- drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
+ drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, num_bo_handles);
+
+ /* Lock all BOs with retry handling */
drm_exec_until_all_locked(&exec) {
r = drm_exec_prepare_array(&exec, gobj, num_bo_handles, 0);
drm_exec_retry_on_contention(&exec);
@@ -702,8 +712,8 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
* Otherwise, we would gather those references until we don't
* have any more space left and crash.
*/
- if (fence_drv->uq_fence_drv_xa_ref) {
- r = xa_alloc(fence_drv->uq_fence_drv_xa_ref, &index, fence_drv,
+ if (fence_drv->fence_drv_xa_ptr) {
+ r = xa_alloc(fence_drv->fence_drv_xa_ptr, &index, fence_drv,
xa_limit_32b, GFP_KERNEL);
if (r)
goto free_fences;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
index f72424248cc5..89c82ba38b50 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
@@ -54,7 +54,7 @@ struct amdgpu_userq_fence_driver {
spinlock_t fence_list_lock;
struct list_head fences;
struct amdgpu_device *adev;
- struct xarray *uq_fence_drv_xa_ref;
+ struct xarray *fence_drv_xa_ptr;
char timeline_name[TASK_COMM_LEN];
};
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
index e7f7354e0c0e..9b24218f7ad2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -48,8 +48,8 @@ static void amdgpu_userq_walk_and_drop_fence_drv(struct xarray *xa)
static void
amdgpu_userq_fence_driver_free(struct amdgpu_usermode_queue *userq)
{
- amdgpu_userq_walk_and_drop_fence_drv(&userq->uq_fence_drv_xa);
- xa_destroy(&userq->uq_fence_drv_xa);
+ amdgpu_userq_walk_and_drop_fence_drv(&userq->fence_drv_xa);
+ xa_destroy(&userq->fence_drv_xa);
/* Drop the fence_drv reference held by user queue */
amdgpu_userq_fence_driver_put(userq->fence_drv);
}
@@ -348,7 +348,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
}
queue->doorbell_index = index;
- xa_init_flags(&queue->uq_fence_drv_xa, XA_FLAGS_ALLOC);
+ xa_init_flags(&queue->fence_drv_xa, XA_FLAGS_ALLOC);
r = amdgpu_userq_fence_driver_alloc(adev, queue);
if (r) {
DRM_ERROR("Failed to alloc fence driver\n");
diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
index d035b5c2b14b..6eb094a54f4b 100644
--- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
+++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
@@ -52,7 +52,7 @@ struct amdgpu_usermode_queue {
struct amdgpu_userq_obj db_obj;
struct amdgpu_userq_obj fw_obj;
struct amdgpu_userq_obj wptr_obj;
- struct xarray uq_fence_drv_xa;
+ struct xarray fence_drv_xa;
struct amdgpu_userq_fence_driver *fence_drv;
struct dma_fence *last_fence;
};
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v2 06/08] drm/amdgpu: Add few optimizations to userq fence driver
2024-09-25 19:59 ` [PATCH v2 06/08] drm/amdgpu: Add few optimizations to userq fence driver Arunpravin Paneer Selvam
@ 2024-09-26 12:28 ` Christian König
0 siblings, 0 replies; 18+ messages in thread
From: Christian König @ 2024-09-26 12:28 UTC (permalink / raw)
To: Arunpravin Paneer Selvam, amd-gfx; +Cc: christian.koenig, alexander.deucher
Am 25.09.24 um 21:59 schrieb Arunpravin Paneer Selvam:
> Add few optimizations to userq fence driver.
"Few optimization and fixes for userq fence driver".
>
> v1:(Christian):
> - Remove unnecessary comments.
> - In drm_exec_init call give num_bo_handles as last parameter it would
> making allocation of the array more efficient
> - Handle return value of __xa_store() and improve the error handling of
> amdgpu_userq_fence_driver_alloc().
>
> v2:(Christian):
> - Revert userq_xa xarray init to XA_FLAGS_LOCK_IRQ.
> - move the xa_unlock before the error check of the call xa_err(__xa_store())
> and moved this change to a separate patch as this is adding a missing error
> handling.
> - Removed the unnecessary comments.
>
> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
> ---
> .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 44 ++++++++++++-------
> .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 6 +--
> .../gpu/drm/amd/include/amdgpu_userqueue.h | 2 +-
> 4 files changed, 32 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> index 8d2a0331cd96..f3576c31428c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> @@ -76,7 +76,8 @@ int amdgpu_userq_fence_driver_alloc(struct amdgpu_device *adev,
> fence_drv = kzalloc(sizeof(*fence_drv), GFP_KERNEL);
> if (!fence_drv) {
> DRM_ERROR("Failed to allocate memory for fence driver\n");
> - return -ENOMEM;
> + r = -ENOMEM;
> + goto free_fence_drv;
> }
>
> /* Acquire seq64 memory */
> @@ -84,7 +85,8 @@ int amdgpu_userq_fence_driver_alloc(struct amdgpu_device *adev,
> &fence_drv->cpu_addr);
> if (r) {
> kfree(fence_drv);
> - return -ENOMEM;
> + r = -ENOMEM;
> + goto free_seq64;
> }
>
> memset(fence_drv->cpu_addr, 0, sizeof(u64));
> @@ -94,7 +96,7 @@ int amdgpu_userq_fence_driver_alloc(struct amdgpu_device *adev,
> spin_lock_init(&fence_drv->fence_list_lock);
>
> fence_drv->adev = adev;
> - fence_drv->uq_fence_drv_xa_ref = &userq->uq_fence_drv_xa;
> + fence_drv->fence_drv_xa_ptr = &userq->fence_drv_xa;
> fence_drv->context = dma_fence_context_alloc(1);
> get_task_comm(fence_drv->timeline_name, current);
>
> @@ -106,6 +108,13 @@ int amdgpu_userq_fence_driver_alloc(struct amdgpu_device *adev,
> userq->fence_drv = fence_drv;
>
> return 0;
> +
> +free_seq64:
> + amdgpu_seq64_free(adev, fence_drv->gpu_addr);
> +free_fence_drv:
> + kfree(fence_drv);
> +
> + return r;
> }
>
> void amdgpu_userq_fence_driver_process(struct amdgpu_userq_fence_driver *fence_drv)
> @@ -147,7 +156,7 @@ void amdgpu_userq_fence_driver_destroy(struct kref *ref)
> struct amdgpu_device *adev = fence_drv->adev;
> struct amdgpu_userq_fence *fence, *tmp;
> struct xarray *xa = &adev->userq_xa;
> - unsigned long index;
> + unsigned long index, flags;
> struct dma_fence *f;
>
> spin_lock(&fence_drv->fence_list_lock);
> @@ -164,11 +173,11 @@ void amdgpu_userq_fence_driver_destroy(struct kref *ref)
> }
> spin_unlock(&fence_drv->fence_list_lock);
>
> - xa_lock(xa);
> + xa_lock_irqsave(xa, flags);
> xa_for_each(xa, index, xa_fence_drv)
> if (xa_fence_drv == fence_drv)
> __xa_erase(xa, index);
> - xa_unlock(xa);
> + xa_unlock_irqrestore(xa, flags);
>
> /* Free seq64 memory */
> amdgpu_seq64_free(adev, fence_drv->gpu_addr);
> @@ -212,12 +221,12 @@ int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
> amdgpu_userq_fence_driver_get(fence_drv);
> dma_fence_get(fence);
>
> - if (!xa_empty(&userq->uq_fence_drv_xa)) {
> + if (!xa_empty(&userq->fence_drv_xa)) {
> struct amdgpu_userq_fence_driver *stored_fence_drv;
> unsigned long index, count = 0;
> int i = 0;
>
> - xa_for_each(&userq->uq_fence_drv_xa, index, stored_fence_drv)
> + xa_for_each(&userq->fence_drv_xa, index, stored_fence_drv)
> count++;
>
> userq_fence->fence_drv_array =
> @@ -226,9 +235,9 @@ int amdgpu_userq_fence_create(struct amdgpu_usermode_queue *userq,
> GFP_KERNEL);
>
> if (userq_fence->fence_drv_array) {
> - xa_for_each(&userq->uq_fence_drv_xa, index, stored_fence_drv) {
> + xa_for_each(&userq->fence_drv_xa, index, stored_fence_drv) {
> userq_fence->fence_drv_array[i] = stored_fence_drv;
> - xa_erase(&userq->uq_fence_drv_xa, index);
> + xa_erase(&userq->fence_drv_xa, index);
> i++;
> }
> }
> @@ -378,7 +387,6 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
> struct drm_exec exec;
> u64 wptr;
>
> - /* Array of syncobj handles */
> num_syncobj_handles = args->num_syncobj_handles;
> syncobj_handles = memdup_user(u64_to_user_ptr(args->syncobj_handles_array),
> sizeof(u32) * num_syncobj_handles);
> @@ -400,7 +408,6 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
> }
> }
>
> - /* Array of bo handles */
> num_bo_handles = args->num_bo_handles;
> bo_handles = memdup_user(u64_to_user_ptr(args->bo_handles_array),
> sizeof(u32) * num_bo_handles);
> @@ -422,7 +429,9 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
> }
> }
>
> - drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> + drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, num_bo_handles);
> +
> + /* Lock all BOs with retry handling */
> drm_exec_until_all_locked(&exec) {
> r = drm_exec_prepare_array(&exec, gobj, num_bo_handles, 1);
> drm_exec_retry_on_contention(&exec);
> @@ -527,7 +536,6 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
> goto free_timeline_handles;
> }
>
> - /* Array of GEM object handles */
> gobj = kmalloc_array(num_bo_handles, sizeof(*gobj), GFP_KERNEL);
> if (!gobj) {
> r = -ENOMEM;
> @@ -542,7 +550,9 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
> }
> }
>
> - drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> + drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, num_bo_handles);
> +
> + /* Lock all BOs with retry handling */
> drm_exec_until_all_locked(&exec) {
> r = drm_exec_prepare_array(&exec, gobj, num_bo_handles, 0);
> drm_exec_retry_on_contention(&exec);
> @@ -702,8 +712,8 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
> * Otherwise, we would gather those references until we don't
> * have any more space left and crash.
> */
> - if (fence_drv->uq_fence_drv_xa_ref) {
> - r = xa_alloc(fence_drv->uq_fence_drv_xa_ref, &index, fence_drv,
> + if (fence_drv->fence_drv_xa_ptr) {
> + r = xa_alloc(fence_drv->fence_drv_xa_ptr, &index, fence_drv,
> xa_limit_32b, GFP_KERNEL);
> if (r)
> goto free_fences;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
> index f72424248cc5..89c82ba38b50 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.h
> @@ -54,7 +54,7 @@ struct amdgpu_userq_fence_driver {
> spinlock_t fence_list_lock;
> struct list_head fences;
> struct amdgpu_device *adev;
> - struct xarray *uq_fence_drv_xa_ref;
> + struct xarray *fence_drv_xa_ptr;
> char timeline_name[TASK_COMM_LEN];
> };
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> index e7f7354e0c0e..9b24218f7ad2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> @@ -48,8 +48,8 @@ static void amdgpu_userq_walk_and_drop_fence_drv(struct xarray *xa)
> static void
> amdgpu_userq_fence_driver_free(struct amdgpu_usermode_queue *userq)
> {
> - amdgpu_userq_walk_and_drop_fence_drv(&userq->uq_fence_drv_xa);
> - xa_destroy(&userq->uq_fence_drv_xa);
> + amdgpu_userq_walk_and_drop_fence_drv(&userq->fence_drv_xa);
> + xa_destroy(&userq->fence_drv_xa);
> /* Drop the fence_drv reference held by user queue */
> amdgpu_userq_fence_driver_put(userq->fence_drv);
> }
> @@ -348,7 +348,7 @@ amdgpu_userqueue_create(struct drm_file *filp, union drm_amdgpu_userq *args)
> }
> queue->doorbell_index = index;
>
> - xa_init_flags(&queue->uq_fence_drv_xa, XA_FLAGS_ALLOC);
> + xa_init_flags(&queue->fence_drv_xa, XA_FLAGS_ALLOC);
> r = amdgpu_userq_fence_driver_alloc(adev, queue);
> if (r) {
> DRM_ERROR("Failed to alloc fence driver\n");
> diff --git a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> index d035b5c2b14b..6eb094a54f4b 100644
> --- a/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> +++ b/drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> @@ -52,7 +52,7 @@ struct amdgpu_usermode_queue {
> struct amdgpu_userq_obj db_obj;
> struct amdgpu_userq_obj fw_obj;
> struct amdgpu_userq_obj wptr_obj;
> - struct xarray uq_fence_drv_xa;
> + struct xarray fence_drv_xa;
> struct amdgpu_userq_fence_driver *fence_drv;
> struct dma_fence *last_fence;
> };
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 07/08] drm/amdgpu: Add the missing error handling for xa_store() call
2024-09-25 19:59 [PATCH v2 01/08] drm/amdgpu: Implement userqueue signal/wait IOCTL Arunpravin Paneer Selvam
` (4 preceding siblings ...)
2024-09-25 19:59 ` [PATCH v2 06/08] drm/amdgpu: Add few optimizations to userq fence driver Arunpravin Paneer Selvam
@ 2024-09-25 19:59 ` Arunpravin Paneer Selvam
2024-09-26 12:29 ` Christian König
2024-09-25 19:59 ` [PATCH v2 08/08] drm/amdgpu: add vm root BO lock before accessing the vm Arunpravin Paneer Selvam
2024-09-26 9:27 ` [PATCH v2 01/08] drm/amdgpu: Implement userqueue signal/wait IOCTL Christian König
7 siblings, 1 reply; 18+ messages in thread
From: Arunpravin Paneer Selvam @ 2024-09-25 19:59 UTC (permalink / raw)
To: amd-gfx; +Cc: christian.koenig, alexander.deucher, Arunpravin Paneer Selvam
Add the missing error handling for xa_store() call in the function
amdgpu_userq_fence_driver_alloc().
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
index f3576c31428c..43429661f62d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
@@ -101,9 +101,11 @@ int amdgpu_userq_fence_driver_alloc(struct amdgpu_device *adev,
get_task_comm(fence_drv->timeline_name, current);
xa_lock_irqsave(&adev->userq_xa, flags);
- __xa_store(&adev->userq_xa, userq->doorbell_index,
- fence_drv, GFP_KERNEL);
+ r = xa_err(__xa_store(&adev->userq_xa, userq->doorbell_index,
+ fence_drv, GFP_KERNEL));
xa_unlock_irqrestore(&adev->userq_xa, flags);
+ if (r)
+ goto free_seq64;
userq->fence_drv = fence_drv;
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v2 07/08] drm/amdgpu: Add the missing error handling for xa_store() call
2024-09-25 19:59 ` [PATCH v2 07/08] drm/amdgpu: Add the missing error handling for xa_store() call Arunpravin Paneer Selvam
@ 2024-09-26 12:29 ` Christian König
0 siblings, 0 replies; 18+ messages in thread
From: Christian König @ 2024-09-26 12:29 UTC (permalink / raw)
To: Arunpravin Paneer Selvam, amd-gfx; +Cc: christian.koenig, alexander.deucher
Am 25.09.24 um 21:59 schrieb Arunpravin Paneer Selvam:
> Add the missing error handling for xa_store() call in the function
> amdgpu_userq_fence_driver_alloc().
>
> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> index f3576c31428c..43429661f62d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> @@ -101,9 +101,11 @@ int amdgpu_userq_fence_driver_alloc(struct amdgpu_device *adev,
> get_task_comm(fence_drv->timeline_name, current);
>
> xa_lock_irqsave(&adev->userq_xa, flags);
> - __xa_store(&adev->userq_xa, userq->doorbell_index,
> - fence_drv, GFP_KERNEL);
> + r = xa_err(__xa_store(&adev->userq_xa, userq->doorbell_index,
> + fence_drv, GFP_KERNEL));
> xa_unlock_irqrestore(&adev->userq_xa, flags);
> + if (r)
> + goto free_seq64;
>
> userq->fence_drv = fence_drv;
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 08/08] drm/amdgpu: add vm root BO lock before accessing the vm
2024-09-25 19:59 [PATCH v2 01/08] drm/amdgpu: Implement userqueue signal/wait IOCTL Arunpravin Paneer Selvam
` (5 preceding siblings ...)
2024-09-25 19:59 ` [PATCH v2 07/08] drm/amdgpu: Add the missing error handling for xa_store() call Arunpravin Paneer Selvam
@ 2024-09-25 19:59 ` Arunpravin Paneer Selvam
2024-09-26 12:32 ` Christian König
2024-09-26 9:27 ` [PATCH v2 01/08] drm/amdgpu: Implement userqueue signal/wait IOCTL Christian König
7 siblings, 1 reply; 18+ messages in thread
From: Arunpravin Paneer Selvam @ 2024-09-25 19:59 UTC (permalink / raw)
To: amd-gfx; +Cc: christian.koenig, alexander.deucher, Arunpravin Paneer Selvam
Add a vm root BO lock before accessing the userqueue VM.
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
---
.../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
index 43429661f62d..52722b738316 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
@@ -320,7 +320,6 @@ static const struct dma_fence_ops amdgpu_userq_fence_ops = {
/**
* amdgpu_userq_fence_read_wptr - Read the userq wptr value
*
- * @filp: drm file private data structure
* @queue: user mode queue structure pointer
* @wptr: write pointer value
*
@@ -330,23 +329,27 @@ static const struct dma_fence_ops amdgpu_userq_fence_ops = {
*
* Returns wptr value on success, error on failure.
*/
-static int amdgpu_userq_fence_read_wptr(struct drm_file *filp,
- struct amdgpu_usermode_queue *queue,
+static int amdgpu_userq_fence_read_wptr(struct amdgpu_usermode_queue *queue,
u64 *wptr)
{
- struct amdgpu_fpriv *fpriv = filp->driver_priv;
struct amdgpu_bo_va_mapping *mapping;
- struct amdgpu_vm *vm = &fpriv->vm;
struct amdgpu_bo *bo;
u64 addr, *ptr;
int r;
+ r = amdgpu_bo_reserve(queue->vm->root.bo, false);
+ if (r)
+ return r;
+
addr = queue->userq_prop->wptr_gpu_addr;
addr &= AMDGPU_GMC_HOLE_MASK;
- mapping = amdgpu_vm_bo_lookup_mapping(vm, addr >> PAGE_SHIFT);
- if (!mapping)
+ mapping = amdgpu_vm_bo_lookup_mapping(queue->vm, addr >> PAGE_SHIFT);
+ amdgpu_bo_unreserve(queue->vm->root.bo);
+ if (!mapping) {
+ DRM_ERROR("Failed to lookup amdgpu_bo_va_mapping\n");
return -EINVAL;
+ }
bo = mapping->bo_va->base.bo;
r = amdgpu_bo_reserve(bo, true);
@@ -448,7 +451,7 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
goto exec_fini;
}
- r = amdgpu_userq_fence_read_wptr(filp, queue, &wptr);
+ r = amdgpu_userq_fence_read_wptr(queue, &wptr);
if (r)
goto exec_fini;
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Re: [PATCH v2 08/08] drm/amdgpu: add vm root BO lock before accessing the vm
2024-09-25 19:59 ` [PATCH v2 08/08] drm/amdgpu: add vm root BO lock before accessing the vm Arunpravin Paneer Selvam
@ 2024-09-26 12:32 ` Christian König
0 siblings, 0 replies; 18+ messages in thread
From: Christian König @ 2024-09-26 12:32 UTC (permalink / raw)
To: Arunpravin Paneer Selvam, amd-gfx; +Cc: christian.koenig, alexander.deucher
Am 25.09.24 um 21:59 schrieb Arunpravin Paneer Selvam:
> Add a vm root BO lock before accessing the userqueue VM.
>
> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
> ---
> .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 19 +++++++++++--------
> 1 file changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> index 43429661f62d..52722b738316 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c
> @@ -320,7 +320,6 @@ static const struct dma_fence_ops amdgpu_userq_fence_ops = {
> /**
> * amdgpu_userq_fence_read_wptr - Read the userq wptr value
> *
> - * @filp: drm file private data structure
> * @queue: user mode queue structure pointer
> * @wptr: write pointer value
> *
> @@ -330,23 +329,27 @@ static const struct dma_fence_ops amdgpu_userq_fence_ops = {
> *
> * Returns wptr value on success, error on failure.
> */
> -static int amdgpu_userq_fence_read_wptr(struct drm_file *filp,
> - struct amdgpu_usermode_queue *queue,
> +static int amdgpu_userq_fence_read_wptr(struct amdgpu_usermode_queue *queue,
> u64 *wptr)
> {
> - struct amdgpu_fpriv *fpriv = filp->driver_priv;
> struct amdgpu_bo_va_mapping *mapping;
> - struct amdgpu_vm *vm = &fpriv->vm;
> struct amdgpu_bo *bo;
> u64 addr, *ptr;
> int r;
>
> + r = amdgpu_bo_reserve(queue->vm->root.bo, false);
> + if (r)
> + return r;
> +
> addr = queue->userq_prop->wptr_gpu_addr;
> addr &= AMDGPU_GMC_HOLE_MASK;
>
> - mapping = amdgpu_vm_bo_lookup_mapping(vm, addr >> PAGE_SHIFT);
> - if (!mapping)
> + mapping = amdgpu_vm_bo_lookup_mapping(queue->vm, addr >> PAGE_SHIFT);
> + amdgpu_bo_unreserve(queue->vm->root.bo);
You need to keep the VM locked until you are done with the mapping.
Otherwise the mapping could be released at any time.
Regards,
Christian.
> + if (!mapping) {
> + DRM_ERROR("Failed to lookup amdgpu_bo_va_mapping\n");
> return -EINVAL;
> + }
>
> bo = mapping->bo_va->base.bo;
If you only need the BO then grab a temporary BO reference here, drop
the VM lock and acquire the BO.
When you are done with everything just drop the BO lock and then the
temporary BO reference.
Regards,
Christian.
> r = amdgpu_bo_reserve(bo, true);
> @@ -448,7 +451,7 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data,
> goto exec_fini;
> }
>
> - r = amdgpu_userq_fence_read_wptr(filp, queue, &wptr);
> + r = amdgpu_userq_fence_read_wptr(queue, &wptr);
> if (r)
> goto exec_fini;
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 01/08] drm/amdgpu: Implement userqueue signal/wait IOCTL
2024-09-25 19:59 [PATCH v2 01/08] drm/amdgpu: Implement userqueue signal/wait IOCTL Arunpravin Paneer Selvam
` (6 preceding siblings ...)
2024-09-25 19:59 ` [PATCH v2 08/08] drm/amdgpu: add vm root BO lock before accessing the vm Arunpravin Paneer Selvam
@ 2024-09-26 9:27 ` Christian König
2024-09-26 9:31 ` Paneer Selvam, Arunpravin
7 siblings, 1 reply; 18+ messages in thread
From: Christian König @ 2024-09-26 9:27 UTC (permalink / raw)
To: Arunpravin Paneer Selvam, amd-gfx; +Cc: christian.koenig, alexander.deucher
Am 25.09.24 um 21:59 schrieb Arunpravin Paneer Selvam:
> [SNIP]
> +int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
> + struct drm_file *filp)
> +{
> + struct drm_amdgpu_userq_fence_info *fence_info = NULL;
> + struct drm_amdgpu_userq_wait *wait_info = data;
> + u32 *syncobj_handles, *bo_handles;
> + struct dma_fence **fences = NULL;
> + u32 num_syncobj, num_bo_handles;
> + struct drm_gem_object **gobj;
> + struct drm_exec exec;
> + int r, i, entry, cnt;
> + u64 num_fences = 0;
> +
> + num_bo_handles = wait_info->num_bo_handles;
> + bo_handles = memdup_user(u64_to_user_ptr(wait_info->bo_handles_array),
> + sizeof(u32) * num_bo_handles);
> + if (IS_ERR(bo_handles))
> + return PTR_ERR(bo_handles);
> +
> + num_syncobj = wait_info->num_syncobj_handles;
> + syncobj_handles = memdup_user(u64_to_user_ptr(wait_info->syncobj_handles_array),
> + sizeof(u32) * num_syncobj);
> + if (IS_ERR(syncobj_handles)) {
> + r = PTR_ERR(syncobj_handles);
> + goto free_bo_handles;
> + }
> +
> + /* Array of GEM object handles */
> + gobj = kmalloc_array(num_bo_handles, sizeof(*gobj), GFP_KERNEL);
> + if (!gobj) {
> + r = -ENOMEM;
> + goto free_syncobj_handles;
> + }
> +
> + for (entry = 0; entry < num_bo_handles; entry++) {
> + gobj[entry] = drm_gem_object_lookup(filp, bo_handles[entry]);
> + if (!gobj[entry]) {
> + r = -ENOENT;
> + goto put_gobj;
> + }
> + }
> +
> + drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
> + drm_exec_until_all_locked(&exec) {
> + r = drm_exec_prepare_array(&exec, gobj, num_bo_handles, 0);
> + drm_exec_retry_on_contention(&exec);
> + if (r) {
> + drm_exec_fini(&exec);
> + goto put_gobj;
> + }
> + }
> +
> + if (!wait_info->num_fences) {
> + /* Count syncobj's fence */
> + for (i = 0; i < num_syncobj; i++) {
> + struct dma_fence *fence;
> +
> + r = drm_syncobj_find_fence(filp, syncobj_handles[i],
> + 0, 0, &fence);
> + dma_fence_put(fence);
> +
> + if (r || !fence)
> + continue;
> +
> + num_fences++;
> + }
> +
> + /* Count GEM objects fence */
> + for (i = 0; i < num_bo_handles; i++) {
> + struct dma_resv_iter resv_cursor;
> + struct dma_fence *fence;
> +
> + dma_resv_for_each_fence(&resv_cursor, gobj[i]->resv,
> + dma_resv_usage_rw(wait_info->bo_wait_flags &
> + AMDGPU_USERQ_BO_WRITE), fence)
> + num_fences++;
We should probably adjust the UAPI here once more.
The problem is that we only provide the AMDGPU_USERQ_BO_WRITE for the
whole IOCTL instead of per BO.
So the best approach would probably be to drop the AMDGPU_USERQ_BO_WRITE
flag and split up the array of BOs into readers and writers.
Can you work on that Arun? Shouldn't be more than a bit typing exercise.
Thanks,
Christian.
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v2 01/08] drm/amdgpu: Implement userqueue signal/wait IOCTL
2024-09-26 9:27 ` [PATCH v2 01/08] drm/amdgpu: Implement userqueue signal/wait IOCTL Christian König
@ 2024-09-26 9:31 ` Paneer Selvam, Arunpravin
2024-09-26 9:34 ` Christian König
0 siblings, 1 reply; 18+ messages in thread
From: Paneer Selvam, Arunpravin @ 2024-09-26 9:31 UTC (permalink / raw)
To: Christian König, amd-gfx; +Cc: christian.koenig, alexander.deucher
Hi Christian,
On 9/26/2024 2:57 PM, Christian König wrote:
> Am 25.09.24 um 21:59 schrieb Arunpravin Paneer Selvam:
>> [SNIP]
>> +int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
>> + struct drm_file *filp)
>> +{
>> + struct drm_amdgpu_userq_fence_info *fence_info = NULL;
>> + struct drm_amdgpu_userq_wait *wait_info = data;
>> + u32 *syncobj_handles, *bo_handles;
>> + struct dma_fence **fences = NULL;
>> + u32 num_syncobj, num_bo_handles;
>> + struct drm_gem_object **gobj;
>> + struct drm_exec exec;
>> + int r, i, entry, cnt;
>> + u64 num_fences = 0;
>> +
>> + num_bo_handles = wait_info->num_bo_handles;
>> + bo_handles =
>> memdup_user(u64_to_user_ptr(wait_info->bo_handles_array),
>> + sizeof(u32) * num_bo_handles);
>> + if (IS_ERR(bo_handles))
>> + return PTR_ERR(bo_handles);
>> +
>> + num_syncobj = wait_info->num_syncobj_handles;
>> + syncobj_handles =
>> memdup_user(u64_to_user_ptr(wait_info->syncobj_handles_array),
>> + sizeof(u32) * num_syncobj);
>> + if (IS_ERR(syncobj_handles)) {
>> + r = PTR_ERR(syncobj_handles);
>> + goto free_bo_handles;
>> + }
>> +
>> + /* Array of GEM object handles */
>> + gobj = kmalloc_array(num_bo_handles, sizeof(*gobj), GFP_KERNEL);
>> + if (!gobj) {
>> + r = -ENOMEM;
>> + goto free_syncobj_handles;
>> + }
>> +
>> + for (entry = 0; entry < num_bo_handles; entry++) {
>> + gobj[entry] = drm_gem_object_lookup(filp, bo_handles[entry]);
>> + if (!gobj[entry]) {
>> + r = -ENOENT;
>> + goto put_gobj;
>> + }
>> + }
>> +
>> + drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
>> + drm_exec_until_all_locked(&exec) {
>> + r = drm_exec_prepare_array(&exec, gobj, num_bo_handles, 0);
>> + drm_exec_retry_on_contention(&exec);
>> + if (r) {
>> + drm_exec_fini(&exec);
>> + goto put_gobj;
>> + }
>> + }
>> +
>> + if (!wait_info->num_fences) {
>> + /* Count syncobj's fence */
>> + for (i = 0; i < num_syncobj; i++) {
>> + struct dma_fence *fence;
>> +
>> + r = drm_syncobj_find_fence(filp, syncobj_handles[i],
>> + 0, 0, &fence);
>> + dma_fence_put(fence);
>> +
>> + if (r || !fence)
>> + continue;
>> +
>> + num_fences++;
>> + }
>> +
>> + /* Count GEM objects fence */
>> + for (i = 0; i < num_bo_handles; i++) {
>> + struct dma_resv_iter resv_cursor;
>> + struct dma_fence *fence;
>> +
>> + dma_resv_for_each_fence(&resv_cursor, gobj[i]->resv,
>> + dma_resv_usage_rw(wait_info->bo_wait_flags &
>> + AMDGPU_USERQ_BO_WRITE), fence)
>> + num_fences++;
>
> We should probably adjust the UAPI here once more.
>
> The problem is that we only provide the AMDGPU_USERQ_BO_WRITE for the
> whole IOCTL instead of per BO.
>
> So the best approach would probably be to drop the
> AMDGPU_USERQ_BO_WRITE flag and split up the array of BOs into readers
> and writers.
>
> Can you work on that Arun? Shouldn't be more than a bit typing exercise.
Sure, I will modify and send the next version of this file.
Thanks,
Arun.
>
> Thanks,
> Christian.
>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v2 01/08] drm/amdgpu: Implement userqueue signal/wait IOCTL
2024-09-26 9:31 ` Paneer Selvam, Arunpravin
@ 2024-09-26 9:34 ` Christian König
2024-09-26 10:26 ` Paneer Selvam, Arunpravin
0 siblings, 1 reply; 18+ messages in thread
From: Christian König @ 2024-09-26 9:34 UTC (permalink / raw)
To: Paneer Selvam, Arunpravin, Christian König, amd-gfx
Cc: alexander.deucher
Am 26.09.24 um 11:31 schrieb Paneer Selvam, Arunpravin:
> Hi Christian,
>
> On 9/26/2024 2:57 PM, Christian König wrote:
>> Am 25.09.24 um 21:59 schrieb Arunpravin Paneer Selvam:
>>> [SNIP]
>>> +int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
>>> + struct drm_file *filp)
>>> +{
>>> + struct drm_amdgpu_userq_fence_info *fence_info = NULL;
>>> + struct drm_amdgpu_userq_wait *wait_info = data;
>>> + u32 *syncobj_handles, *bo_handles;
>>> + struct dma_fence **fences = NULL;
>>> + u32 num_syncobj, num_bo_handles;
>>> + struct drm_gem_object **gobj;
>>> + struct drm_exec exec;
>>> + int r, i, entry, cnt;
>>> + u64 num_fences = 0;
>>> +
>>> + num_bo_handles = wait_info->num_bo_handles;
>>> + bo_handles =
>>> memdup_user(u64_to_user_ptr(wait_info->bo_handles_array),
>>> + sizeof(u32) * num_bo_handles);
>>> + if (IS_ERR(bo_handles))
>>> + return PTR_ERR(bo_handles);
>>> +
>>> + num_syncobj = wait_info->num_syncobj_handles;
>>> + syncobj_handles =
>>> memdup_user(u64_to_user_ptr(wait_info->syncobj_handles_array),
>>> + sizeof(u32) * num_syncobj);
>>> + if (IS_ERR(syncobj_handles)) {
>>> + r = PTR_ERR(syncobj_handles);
>>> + goto free_bo_handles;
>>> + }
>>> +
>>> + /* Array of GEM object handles */
>>> + gobj = kmalloc_array(num_bo_handles, sizeof(*gobj), GFP_KERNEL);
>>> + if (!gobj) {
>>> + r = -ENOMEM;
>>> + goto free_syncobj_handles;
>>> + }
>>> +
>>> + for (entry = 0; entry < num_bo_handles; entry++) {
>>> + gobj[entry] = drm_gem_object_lookup(filp, bo_handles[entry]);
>>> + if (!gobj[entry]) {
>>> + r = -ENOENT;
>>> + goto put_gobj;
>>> + }
>>> + }
>>> +
>>> + drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
>>> + drm_exec_until_all_locked(&exec) {
>>> + r = drm_exec_prepare_array(&exec, gobj, num_bo_handles, 0);
>>> + drm_exec_retry_on_contention(&exec);
>>> + if (r) {
>>> + drm_exec_fini(&exec);
>>> + goto put_gobj;
>>> + }
>>> + }
>>> +
>>> + if (!wait_info->num_fences) {
>>> + /* Count syncobj's fence */
>>> + for (i = 0; i < num_syncobj; i++) {
>>> + struct dma_fence *fence;
>>> +
>>> + r = drm_syncobj_find_fence(filp, syncobj_handles[i],
>>> + 0, 0, &fence);
>>> + dma_fence_put(fence);
>>> +
>>> + if (r || !fence)
>>> + continue;
>>> +
>>> + num_fences++;
>>> + }
>>> +
>>> + /* Count GEM objects fence */
>>> + for (i = 0; i < num_bo_handles; i++) {
>>> + struct dma_resv_iter resv_cursor;
>>> + struct dma_fence *fence;
>>> +
>>> + dma_resv_for_each_fence(&resv_cursor, gobj[i]->resv,
>>> + dma_resv_usage_rw(wait_info->bo_wait_flags &
>>> + AMDGPU_USERQ_BO_WRITE), fence)
>>> + num_fences++;
>>
>> We should probably adjust the UAPI here once more.
>>
>> The problem is that we only provide the AMDGPU_USERQ_BO_WRITE for the
>> whole IOCTL instead of per BO.
>>
>> So the best approach would probably be to drop the
>> AMDGPU_USERQ_BO_WRITE flag and split up the array of BOs into readers
>> and writers.
>>
>> Can you work on that Arun? Shouldn't be more than a bit typing exercise.
> Sure, I will modify and send the next version of this file.
Thanks.
In the meantime I'm going to review the rest of the series, so there
could be more comments. But please update the UAPI first.
Regards,
Christian.
>
> Thanks,
> Arun.
>>
>> Thanks,
>> Christian.
>>
>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v2 01/08] drm/amdgpu: Implement userqueue signal/wait IOCTL
2024-09-26 9:34 ` Christian König
@ 2024-09-26 10:26 ` Paneer Selvam, Arunpravin
2024-09-26 11:16 ` Christian König
0 siblings, 1 reply; 18+ messages in thread
From: Paneer Selvam, Arunpravin @ 2024-09-26 10:26 UTC (permalink / raw)
To: Christian König, Christian König, amd-gfx; +Cc: alexander.deucher
Hi Christian,
On 9/26/2024 3:04 PM, Christian König wrote:
> Am 26.09.24 um 11:31 schrieb Paneer Selvam, Arunpravin:
>> Hi Christian,
>>
>> On 9/26/2024 2:57 PM, Christian König wrote:
>>> Am 25.09.24 um 21:59 schrieb Arunpravin Paneer Selvam:
>>>> [SNIP]
>>>> +int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
>>>> + struct drm_file *filp)
>>>> +{
>>>> + struct drm_amdgpu_userq_fence_info *fence_info = NULL;
>>>> + struct drm_amdgpu_userq_wait *wait_info = data;
>>>> + u32 *syncobj_handles, *bo_handles;
>>>> + struct dma_fence **fences = NULL;
>>>> + u32 num_syncobj, num_bo_handles;
>>>> + struct drm_gem_object **gobj;
>>>> + struct drm_exec exec;
>>>> + int r, i, entry, cnt;
>>>> + u64 num_fences = 0;
>>>> +
>>>> + num_bo_handles = wait_info->num_bo_handles;
>>>> + bo_handles =
>>>> memdup_user(u64_to_user_ptr(wait_info->bo_handles_array),
>>>> + sizeof(u32) * num_bo_handles);
>>>> + if (IS_ERR(bo_handles))
>>>> + return PTR_ERR(bo_handles);
>>>> +
>>>> + num_syncobj = wait_info->num_syncobj_handles;
>>>> + syncobj_handles =
>>>> memdup_user(u64_to_user_ptr(wait_info->syncobj_handles_array),
>>>> + sizeof(u32) * num_syncobj);
>>>> + if (IS_ERR(syncobj_handles)) {
>>>> + r = PTR_ERR(syncobj_handles);
>>>> + goto free_bo_handles;
>>>> + }
>>>> +
>>>> + /* Array of GEM object handles */
>>>> + gobj = kmalloc_array(num_bo_handles, sizeof(*gobj), GFP_KERNEL);
>>>> + if (!gobj) {
>>>> + r = -ENOMEM;
>>>> + goto free_syncobj_handles;
>>>> + }
>>>> +
>>>> + for (entry = 0; entry < num_bo_handles; entry++) {
>>>> + gobj[entry] = drm_gem_object_lookup(filp, bo_handles[entry]);
>>>> + if (!gobj[entry]) {
>>>> + r = -ENOENT;
>>>> + goto put_gobj;
>>>> + }
>>>> + }
>>>> +
>>>> + drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
>>>> + drm_exec_until_all_locked(&exec) {
>>>> + r = drm_exec_prepare_array(&exec, gobj, num_bo_handles, 0);
>>>> + drm_exec_retry_on_contention(&exec);
>>>> + if (r) {
>>>> + drm_exec_fini(&exec);
>>>> + goto put_gobj;
>>>> + }
>>>> + }
>>>> +
>>>> + if (!wait_info->num_fences) {
>>>> + /* Count syncobj's fence */
>>>> + for (i = 0; i < num_syncobj; i++) {
>>>> + struct dma_fence *fence;
>>>> +
>>>> + r = drm_syncobj_find_fence(filp, syncobj_handles[i],
>>>> + 0, 0, &fence);
>>>> + dma_fence_put(fence);
>>>> +
>>>> + if (r || !fence)
>>>> + continue;
>>>> +
>>>> + num_fences++;
>>>> + }
>>>> +
>>>> + /* Count GEM objects fence */
>>>> + for (i = 0; i < num_bo_handles; i++) {
>>>> + struct dma_resv_iter resv_cursor;
>>>> + struct dma_fence *fence;
>>>> +
>>>> + dma_resv_for_each_fence(&resv_cursor, gobj[i]->resv,
>>>> + dma_resv_usage_rw(wait_info->bo_wait_flags &
>>>> + AMDGPU_USERQ_BO_WRITE), fence)
>>>> + num_fences++;
>>>
>>> We should probably adjust the UAPI here once more.
>>>
>>> The problem is that we only provide the AMDGPU_USERQ_BO_WRITE for
>>> the whole IOCTL instead of per BO.
>>>
>>> So the best approach would probably be to drop the
>>> AMDGPU_USERQ_BO_WRITE flag and split up the array of BOs into
>>> readers and writers.
>>>
>>> Can you work on that Arun? Shouldn't be more than a bit typing
>>> exercise.
>> Sure, I will modify and send the next version of this file.
>
> Thanks.
>
> In the meantime I'm going to review the rest of the series, so there
> could be more comments. But please update the UAPI first.
Can we have the bo_handles_read_array, num_read_bo_handles,
bo_handles_write_array, num_write_bo_handles in both
signal IOCTL and wait IOCTL?
Thanks,
Arun.
>
> Regards,
> Christian.
>
>>
>> Thanks,
>> Arun.
>>>
>>> Thanks,
>>> Christian.
>>>
>>
>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH v2 01/08] drm/amdgpu: Implement userqueue signal/wait IOCTL
2024-09-26 10:26 ` Paneer Selvam, Arunpravin
@ 2024-09-26 11:16 ` Christian König
0 siblings, 0 replies; 18+ messages in thread
From: Christian König @ 2024-09-26 11:16 UTC (permalink / raw)
To: Paneer Selvam, Arunpravin, Christian König, amd-gfx
Cc: alexander.deucher
Am 26.09.24 um 12:26 schrieb Paneer Selvam, Arunpravin:
> Hi Christian,
>
> On 9/26/2024 3:04 PM, Christian König wrote:
>> Am 26.09.24 um 11:31 schrieb Paneer Selvam, Arunpravin:
>>> Hi Christian,
>>>
>>> On 9/26/2024 2:57 PM, Christian König wrote:
>>>> Am 25.09.24 um 21:59 schrieb Arunpravin Paneer Selvam:
>>>>> [SNIP]
>>>>> +int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data,
>>>>> + struct drm_file *filp)
>>>>> +{
>>>>> + struct drm_amdgpu_userq_fence_info *fence_info = NULL;
>>>>> + struct drm_amdgpu_userq_wait *wait_info = data;
>>>>> + u32 *syncobj_handles, *bo_handles;
>>>>> + struct dma_fence **fences = NULL;
>>>>> + u32 num_syncobj, num_bo_handles;
>>>>> + struct drm_gem_object **gobj;
>>>>> + struct drm_exec exec;
>>>>> + int r, i, entry, cnt;
>>>>> + u64 num_fences = 0;
>>>>> +
>>>>> + num_bo_handles = wait_info->num_bo_handles;
>>>>> + bo_handles =
>>>>> memdup_user(u64_to_user_ptr(wait_info->bo_handles_array),
>>>>> + sizeof(u32) * num_bo_handles);
>>>>> + if (IS_ERR(bo_handles))
>>>>> + return PTR_ERR(bo_handles);
>>>>> +
>>>>> + num_syncobj = wait_info->num_syncobj_handles;
>>>>> + syncobj_handles =
>>>>> memdup_user(u64_to_user_ptr(wait_info->syncobj_handles_array),
>>>>> + sizeof(u32) * num_syncobj);
>>>>> + if (IS_ERR(syncobj_handles)) {
>>>>> + r = PTR_ERR(syncobj_handles);
>>>>> + goto free_bo_handles;
>>>>> + }
>>>>> +
>>>>> + /* Array of GEM object handles */
>>>>> + gobj = kmalloc_array(num_bo_handles, sizeof(*gobj), GFP_KERNEL);
>>>>> + if (!gobj) {
>>>>> + r = -ENOMEM;
>>>>> + goto free_syncobj_handles;
>>>>> + }
>>>>> +
>>>>> + for (entry = 0; entry < num_bo_handles; entry++) {
>>>>> + gobj[entry] = drm_gem_object_lookup(filp,
>>>>> bo_handles[entry]);
>>>>> + if (!gobj[entry]) {
>>>>> + r = -ENOENT;
>>>>> + goto put_gobj;
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, 0);
>>>>> + drm_exec_until_all_locked(&exec) {
>>>>> + r = drm_exec_prepare_array(&exec, gobj, num_bo_handles, 0);
>>>>> + drm_exec_retry_on_contention(&exec);
>>>>> + if (r) {
>>>>> + drm_exec_fini(&exec);
>>>>> + goto put_gobj;
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + if (!wait_info->num_fences) {
>>>>> + /* Count syncobj's fence */
>>>>> + for (i = 0; i < num_syncobj; i++) {
>>>>> + struct dma_fence *fence;
>>>>> +
>>>>> + r = drm_syncobj_find_fence(filp, syncobj_handles[i],
>>>>> + 0, 0, &fence);
>>>>> + dma_fence_put(fence);
>>>>> +
>>>>> + if (r || !fence)
>>>>> + continue;
>>>>> +
>>>>> + num_fences++;
>>>>> + }
>>>>> +
>>>>> + /* Count GEM objects fence */
>>>>> + for (i = 0; i < num_bo_handles; i++) {
>>>>> + struct dma_resv_iter resv_cursor;
>>>>> + struct dma_fence *fence;
>>>>> +
>>>>> + dma_resv_for_each_fence(&resv_cursor, gobj[i]->resv,
>>>>> + dma_resv_usage_rw(wait_info->bo_wait_flags &
>>>>> + AMDGPU_USERQ_BO_WRITE), fence)
>>>>> + num_fences++;
>>>>
>>>> We should probably adjust the UAPI here once more.
>>>>
>>>> The problem is that we only provide the AMDGPU_USERQ_BO_WRITE for
>>>> the whole IOCTL instead of per BO.
>>>>
>>>> So the best approach would probably be to drop the
>>>> AMDGPU_USERQ_BO_WRITE flag and split up the array of BOs into
>>>> readers and writers.
>>>>
>>>> Can you work on that Arun? Shouldn't be more than a bit typing
>>>> exercise.
>>> Sure, I will modify and send the next version of this file.
>>
>> Thanks.
>>
>> In the meantime I'm going to review the rest of the series, so there
>> could be more comments. But please update the UAPI first.
>
> Can we have the bo_handles_read_array, num_read_bo_handles,
> bo_handles_write_array, num_write_bo_handles in both
> signal IOCTL and wait IOCTL?
I think so, yes.
Regards,
Christian.
>
> Thanks,
> Arun.
>>
>> Regards,
>> Christian.
>>
>>>
>>> Thanks,
>>> Arun.
>>>>
>>>> Thanks,
>>>> Christian.
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 18+ messages in thread