* [PATCH v5 0/6] A few drm_syncobj optimisations
@ 2025-06-11 14:00 Tvrtko Ursulin
2025-06-11 14:00 ` [PATCH v5 1/6] drm/syncobj: Remove unhelpful helper Tvrtko Ursulin
` (5 more replies)
0 siblings, 6 replies; 13+ messages in thread
From: Tvrtko Ursulin @ 2025-06-11 14:00 UTC (permalink / raw)
To: dri-devel
Cc: kernel-dev, amd-gfx, intel-xe, Michel Dänzer, Tvrtko Ursulin,
Maíra Canal
A small set of drm_syncobj optimisations which should make things a tiny bit
more efficient on the CPU side of things.
Improvement seems to be around 1.5%* more FPS if observed with "vkgears
-present-mailbox" on a Steam Deck Plasma desktop, but I am reluctant to make a
definitive claim on the numbers since there is some run to run variance. But, as
suggested by Michel Dänzer, I did do a five ~100 second runs on the each kernel
to be able to show the ministat analysis.
x before
+ after
+------------------------------------------------------------+
| x + |
| x x + |
| x xx ++++ |
| x x xx x ++++ |
| x xx x xx x+ ++++ |
| xxxxx xxxxxx+ ++++ + + |
| xxxxxxx xxxxxx+x ++++ +++ |
| x xxxxxxxxxxx*xx+* x++++++++ ++ |
| x x xxxxxxxxxxxx**x*+*+*++++++++ ++++ + |
| xx x xxxxxxxxxx*x****+***+**+++++ ++++++ |
|x xxx x xxxxx*x****x***********+*++**+++++++ + + +|
| |_______A______| |
| |______A_______| |
+------------------------------------------------------------+
N Min Max Median Avg Stddev
x 135 21697.58 22809.467 22321.396 22307.707 198.75011
+ 118 22200.746 23277.09 22661.4 22671.442 192.10609
Difference at 95.0% confidence
363.735 +/- 48.3345
1.63054% +/- 0.216672%
(Student's t, pooled s = 195.681)
Or when tested on Intel Alderlake, KDE Wayland:
x base
+ syncobj
+--------------------------------------------------------------+
| + |
| + + |
| + + |
| + ++ |
| ++ ++ |
| x ++ ++ |
| x x + ++ ++ |
| x xx xx x x +++++++ |
| x x xx xxx xxxx*xxx +++++++++ |
|x xx x x x xx xxxxxxxxxx*xxx****xxx +x+ ++++++++++|
| |__________A_M_______| |____A_M___| |
+--------------------------------------------------------------+
N Min Max Median Avg Stddev
x 55 7158.232 8058.753 7803.506 7754.5195 191.69526
+ 55 7801.23 8272.271 8172.435 8150.6303 105.84085
Difference at 95.0% confidence
396.111 +/- 57.8717
5.10813% +/- 0.746296%
(Student's t, pooled s = 154.838)
Scores may seem low but I had to fix to conservative CPU freq to avoid some
pretty strong thermal throttling causing wild swings within a run. Nevertheless
the improvement is clearly shown here as well.
v2:
* Implemented review feedback - see patch change logs.
v3:
* Moved #define DRM_SYNCOBJ_FAST_PATH_ENTRIES one patch earlier for less churn.
v3.1:
* Consolidated testing results.
v4:
* Kernel test robot reports 32-bit ARM does not implement 64-bit get/put_user.
Switch to copy_to/from_user in relevant places.
v5:
* Fixed copy_from_user argument order mixup.
Cc: Maíra Canal <mcanal@igalia.com>
Tvrtko Ursulin (6):
drm/syncobj: Remove unhelpful helper
drm/syncobj: Do not allocate an array to store zeros when waiting
drm/syncobj: Avoid one temporary allocation in drm_syncobj_array_find
drm/syncobj: Avoid temporary allocation in
drm_syncobj_timeline_signal_ioctl
drm/syncobj: Add a fast path to drm_syncobj_array_wait_timeout
drm/syncobj: Add a fast path to drm_syncobj_array_find
drivers/gpu/drm/drm_syncobj.c | 277 ++++++++++++++++++----------------
1 file changed, 147 insertions(+), 130 deletions(-)
--
2.48.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v5 1/6] drm/syncobj: Remove unhelpful helper
2025-06-11 14:00 [PATCH v5 0/6] A few drm_syncobj optimisations Tvrtko Ursulin
@ 2025-06-11 14:00 ` Tvrtko Ursulin
2025-06-11 14:00 ` [PATCH v5 2/6] drm/syncobj: Do not allocate an array to store zeros when waiting Tvrtko Ursulin
` (4 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Tvrtko Ursulin @ 2025-06-11 14:00 UTC (permalink / raw)
To: dri-devel
Cc: kernel-dev, amd-gfx, intel-xe, Michel Dänzer, Tvrtko Ursulin,
Maíra Canal
Helper which fails to consolidate the code and instead just forks into two
copies of the code based on a boolean parameter is not very helpful or
readable. Lets just remove it and proof in the pudding is the net smaller
code.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
---
v2:
* Assign timeout to a local variable.
---
drivers/gpu/drm/drm_syncobj.c | 100 ++++++++++++++++------------------
1 file changed, 46 insertions(+), 54 deletions(-)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 636cd83ca29e..e2d97dd6e45b 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1248,42 +1248,6 @@ signed long drm_timeout_abs_to_jiffies(int64_t timeout_nsec)
}
EXPORT_SYMBOL(drm_timeout_abs_to_jiffies);
-static int drm_syncobj_array_wait(struct drm_device *dev,
- struct drm_file *file_private,
- struct drm_syncobj_wait *wait,
- struct drm_syncobj_timeline_wait *timeline_wait,
- struct drm_syncobj **syncobjs, bool timeline,
- ktime_t *deadline)
-{
- signed long timeout = 0;
- uint32_t first = ~0;
-
- if (!timeline) {
- timeout = drm_timeout_abs_to_jiffies(wait->timeout_nsec);
- timeout = drm_syncobj_array_wait_timeout(syncobjs,
- NULL,
- wait->count_handles,
- wait->flags,
- timeout, &first,
- deadline);
- if (timeout < 0)
- return timeout;
- wait->first_signaled = first;
- } else {
- timeout = drm_timeout_abs_to_jiffies(timeline_wait->timeout_nsec);
- timeout = drm_syncobj_array_wait_timeout(syncobjs,
- u64_to_user_ptr(timeline_wait->points),
- timeline_wait->count_handles,
- timeline_wait->flags,
- timeout, &first,
- deadline);
- if (timeout < 0)
- return timeout;
- timeline_wait->first_signaled = first;
- }
- return 0;
-}
-
static int drm_syncobj_array_find(struct drm_file *file_private,
void __user *user_handles,
uint32_t count_handles,
@@ -1346,9 +1310,12 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private)
{
struct drm_syncobj_wait *args = data;
+ ktime_t deadline, *pdeadline = NULL;
+ u32 count = args->count_handles;
struct drm_syncobj **syncobjs;
unsigned int possible_flags;
- ktime_t t, *tp = NULL;
+ u32 first = ~0;
+ long timeout;
int ret = 0;
if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
@@ -1361,27 +1328,38 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
if (args->flags & ~possible_flags)
return -EINVAL;
- if (args->count_handles == 0)
+ if (count == 0)
return 0;
ret = drm_syncobj_array_find(file_private,
u64_to_user_ptr(args->handles),
- args->count_handles,
+ count,
&syncobjs);
if (ret < 0)
return ret;
if (args->flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE) {
- t = ns_to_ktime(args->deadline_nsec);
- tp = &t;
+ deadline = ns_to_ktime(args->deadline_nsec);
+ pdeadline = &deadline;
}
- ret = drm_syncobj_array_wait(dev, file_private,
- args, NULL, syncobjs, false, tp);
+ timeout = drm_timeout_abs_to_jiffies(args->timeout_nsec);
+ timeout = drm_syncobj_array_wait_timeout(syncobjs,
+ NULL,
+ count,
+ args->flags,
+ timeout,
+ &first,
+ pdeadline);
- drm_syncobj_array_free(syncobjs, args->count_handles);
+ drm_syncobj_array_free(syncobjs, count);
- return ret;
+ if (timeout < 0)
+ return timeout;
+
+ args->first_signaled = first;
+
+ return 0;
}
int
@@ -1389,9 +1367,12 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private)
{
struct drm_syncobj_timeline_wait *args = data;
+ ktime_t deadline, *pdeadline = NULL;
+ u32 count = args->count_handles;
struct drm_syncobj **syncobjs;
unsigned int possible_flags;
- ktime_t t, *tp = NULL;
+ u32 first = ~0;
+ long timeout;
int ret = 0;
if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE))
@@ -1405,27 +1386,38 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
if (args->flags & ~possible_flags)
return -EINVAL;
- if (args->count_handles == 0)
+ if (count == 0)
return 0;
ret = drm_syncobj_array_find(file_private,
u64_to_user_ptr(args->handles),
- args->count_handles,
+ count,
&syncobjs);
if (ret < 0)
return ret;
if (args->flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE) {
- t = ns_to_ktime(args->deadline_nsec);
- tp = &t;
+ deadline = ns_to_ktime(args->deadline_nsec);
+ pdeadline = &deadline;
}
- ret = drm_syncobj_array_wait(dev, file_private,
- NULL, args, syncobjs, true, tp);
+ timeout = drm_timeout_abs_to_jiffies(args->timeout_nsec);
+ timeout = drm_syncobj_array_wait_timeout(syncobjs,
+ u64_to_user_ptr(args->points),
+ count,
+ args->flags,
+ timeout,
+ &first,
+ pdeadline);
- drm_syncobj_array_free(syncobjs, args->count_handles);
+ drm_syncobj_array_free(syncobjs, count);
- return ret;
+ if (timeout < 0)
+ return timeout;
+
+ args->first_signaled = first;
+
+ return 0;
}
static void syncobj_eventfd_entry_fence_func(struct dma_fence *fence,
--
2.48.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 2/6] drm/syncobj: Do not allocate an array to store zeros when waiting
2025-06-11 14:00 [PATCH v5 0/6] A few drm_syncobj optimisations Tvrtko Ursulin
2025-06-11 14:00 ` [PATCH v5 1/6] drm/syncobj: Remove unhelpful helper Tvrtko Ursulin
@ 2025-06-11 14:00 ` Tvrtko Ursulin
2025-06-11 14:00 ` [PATCH v5 3/6] drm/syncobj: Avoid one temporary allocation in drm_syncobj_array_find Tvrtko Ursulin
` (3 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Tvrtko Ursulin @ 2025-06-11 14:00 UTC (permalink / raw)
To: dri-devel
Cc: kernel-dev, amd-gfx, intel-xe, Michel Dänzer, Tvrtko Ursulin,
Maíra Canal
When waiting on syncobjs the current code allocates a temporary array only
to fill it up with all zeros.
We can avoid that by relying on the allocated entry array already being
zero allocated.
For the timeline mode we can fetch the timeline point values as we
populate the entries array so also do not need this additional temporary
allocation.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com> # v1
---
v2:
* Change back to copy_from_user due 32-bit ARM not implementing 64-bit
get_user.
v3:
* Fix argument order mixup.
---
drivers/gpu/drm/drm_syncobj.c | 38 ++++++++++++-----------------------
1 file changed, 13 insertions(+), 25 deletions(-)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index e2d97dd6e45b..1ac27a695dc4 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1055,7 +1055,7 @@ static void syncobj_wait_syncobj_func(struct drm_syncobj *syncobj,
}
static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
- void __user *user_points,
+ u64 __user *user_points,
uint32_t count,
uint32_t flags,
signed long timeout,
@@ -1063,9 +1063,8 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
ktime_t *deadline)
{
struct syncobj_wait_entry *entries;
- struct dma_fence *fence;
- uint64_t *points;
uint32_t signaled_count, i;
+ struct dma_fence *fence;
if (flags & (DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT |
DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE)) {
@@ -1073,24 +1072,10 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
lockdep_assert_none_held_once();
}
- points = kmalloc_array(count, sizeof(*points), GFP_KERNEL);
- if (points == NULL)
- return -ENOMEM;
-
- if (!user_points) {
- memset(points, 0, count * sizeof(uint64_t));
-
- } else if (copy_from_user(points, user_points,
- sizeof(uint64_t) * count)) {
- timeout = -EFAULT;
- goto err_free_points;
- }
-
entries = kcalloc(count, sizeof(*entries), GFP_KERNEL);
- if (!entries) {
- timeout = -ENOMEM;
- goto err_free_points;
- }
+ if (!entries)
+ return -ENOMEM;
+
/* Walk the list of sync objects and initialize entries. We do
* this up-front so that we can properly return -EINVAL if there is
* a syncobj with a missing fence and then never have the chance of
@@ -1101,9 +1086,15 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
struct dma_fence *fence;
entries[i].task = current;
- entries[i].point = points[i];
+ if (user_points &&
+ copy_from_user(&entries[i].point, user_points++,
+ sizeof(*user_points))) {
+ timeout = -EFAULT;
+ goto cleanup_entries;
+ }
fence = drm_syncobj_fence_get(syncobjs[i]);
- if (!fence || dma_fence_chain_find_seqno(&fence, points[i])) {
+ if (!fence ||
+ dma_fence_chain_find_seqno(&fence, entries[i].point)) {
dma_fence_put(fence);
if (flags & (DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT |
DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE)) {
@@ -1209,9 +1200,6 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
}
kfree(entries);
-err_free_points:
- kfree(points);
-
return timeout;
}
--
2.48.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 3/6] drm/syncobj: Avoid one temporary allocation in drm_syncobj_array_find
2025-06-11 14:00 [PATCH v5 0/6] A few drm_syncobj optimisations Tvrtko Ursulin
2025-06-11 14:00 ` [PATCH v5 1/6] drm/syncobj: Remove unhelpful helper Tvrtko Ursulin
2025-06-11 14:00 ` [PATCH v5 2/6] drm/syncobj: Do not allocate an array to store zeros when waiting Tvrtko Ursulin
@ 2025-06-11 14:00 ` Tvrtko Ursulin
2025-06-11 14:00 ` [PATCH v5 4/6] drm/syncobj: Avoid temporary allocation in drm_syncobj_timeline_signal_ioctl Tvrtko Ursulin
` (2 subsequent siblings)
5 siblings, 0 replies; 13+ messages in thread
From: Tvrtko Ursulin @ 2025-06-11 14:00 UTC (permalink / raw)
To: dri-devel
Cc: kernel-dev, amd-gfx, intel-xe, Michel Dänzer, Tvrtko Ursulin,
Maíra Canal
Drm_syncobj_array_find() helper is used from many userspace ioctl entry
points with the task of looking up userspace handles to internal objects.
We can easily avoid one temporary allocation by making it read the handles
as it is looking them up.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
---
v2:
* Fix handle type.
* Undo pointless unwind change.
---
drivers/gpu/drm/drm_syncobj.c | 36 +++++++++++++++--------------------
1 file changed, 15 insertions(+), 21 deletions(-)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 1ac27a695dc4..497009fc66f8 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1237,39 +1237,35 @@ signed long drm_timeout_abs_to_jiffies(int64_t timeout_nsec)
EXPORT_SYMBOL(drm_timeout_abs_to_jiffies);
static int drm_syncobj_array_find(struct drm_file *file_private,
- void __user *user_handles,
- uint32_t count_handles,
+ u32 __user *handles,
+ uint32_t count,
struct drm_syncobj ***syncobjs_out)
{
- uint32_t i, *handles;
struct drm_syncobj **syncobjs;
+ uint32_t i;
int ret;
- handles = kmalloc_array(count_handles, sizeof(*handles), GFP_KERNEL);
- if (handles == NULL)
+ if (!access_ok(handles, count * sizeof(*handles)))
+ return -EFAULT;
+
+ syncobjs = kmalloc_array(count, sizeof(*syncobjs), GFP_KERNEL);
+ if (!syncobjs)
return -ENOMEM;
- if (copy_from_user(handles, user_handles,
- sizeof(uint32_t) * count_handles)) {
- ret = -EFAULT;
- goto err_free_handles;
- }
+ for (i = 0; i < count; i++) {
+ u32 handle;
- syncobjs = kmalloc_array(count_handles, sizeof(*syncobjs), GFP_KERNEL);
- if (syncobjs == NULL) {
- ret = -ENOMEM;
- goto err_free_handles;
- }
-
- for (i = 0; i < count_handles; i++) {
- syncobjs[i] = drm_syncobj_find(file_private, handles[i]);
+ if (__get_user(handle, handles++)) {
+ ret = -EFAULT;
+ goto err_put_syncobjs;
+ }
+ syncobjs[i] = drm_syncobj_find(file_private, handle);
if (!syncobjs[i]) {
ret = -ENOENT;
goto err_put_syncobjs;
}
}
- kfree(handles);
*syncobjs_out = syncobjs;
return 0;
@@ -1277,8 +1273,6 @@ static int drm_syncobj_array_find(struct drm_file *file_private,
while (i-- > 0)
drm_syncobj_put(syncobjs[i]);
kfree(syncobjs);
-err_free_handles:
- kfree(handles);
return ret;
}
--
2.48.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 4/6] drm/syncobj: Avoid temporary allocation in drm_syncobj_timeline_signal_ioctl
2025-06-11 14:00 [PATCH v5 0/6] A few drm_syncobj optimisations Tvrtko Ursulin
` (2 preceding siblings ...)
2025-06-11 14:00 ` [PATCH v5 3/6] drm/syncobj: Avoid one temporary allocation in drm_syncobj_array_find Tvrtko Ursulin
@ 2025-06-11 14:00 ` Tvrtko Ursulin
2025-06-11 14:15 ` Christian König
2025-06-11 14:00 ` [PATCH v5 5/6] drm/syncobj: Add a fast path to drm_syncobj_array_wait_timeout Tvrtko Ursulin
2025-06-11 14:00 ` [PATCH v5 6/6] drm/syncobj: Add a fast path to drm_syncobj_array_find Tvrtko Ursulin
5 siblings, 1 reply; 13+ messages in thread
From: Tvrtko Ursulin @ 2025-06-11 14:00 UTC (permalink / raw)
To: dri-devel
Cc: kernel-dev, amd-gfx, intel-xe, Michel Dänzer, Tvrtko Ursulin,
Maíra Canal
We can avoid one of the two temporary allocations if we read the userspace
supplied timeline points as we go along.
The only new complication is to unwind unused fence chains on the error
path, but even that code was already present in the function.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com> #v1
---
v2:
* Change back to copy_from_user due 32-bit ARM not implementing 64-bit
get_user.
v3:
* Fix argument order mixup.
---
drivers/gpu/drm/drm_syncobj.c | 43 ++++++++++++++---------------------
1 file changed, 17 insertions(+), 26 deletions(-)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 497009fc66f8..9968d9429d90 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1579,10 +1579,10 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private)
{
struct drm_syncobj_timeline_array *args = data;
+ uint64_t __user *points = u64_to_user_ptr(args->points);
+ uint32_t i, j, count = args->count_handles;
struct drm_syncobj **syncobjs;
struct dma_fence_chain **chains;
- uint64_t *points;
- uint32_t i, j;
int ret;
if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE))
@@ -1596,31 +1596,17 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
ret = drm_syncobj_array_find(file_private,
u64_to_user_ptr(args->handles),
- args->count_handles,
+ count,
&syncobjs);
if (ret < 0)
return ret;
- points = kmalloc_array(args->count_handles, sizeof(*points),
- GFP_KERNEL);
- if (!points) {
- ret = -ENOMEM;
- goto out;
- }
- if (!u64_to_user_ptr(args->points)) {
- memset(points, 0, args->count_handles * sizeof(uint64_t));
- } else if (copy_from_user(points, u64_to_user_ptr(args->points),
- sizeof(uint64_t) * args->count_handles)) {
- ret = -EFAULT;
- goto err_points;
- }
-
- chains = kmalloc_array(args->count_handles, sizeof(void *), GFP_KERNEL);
+ chains = kmalloc_array(count, sizeof(void *), GFP_KERNEL);
if (!chains) {
ret = -ENOMEM;
- goto err_points;
+ goto out;
}
- for (i = 0; i < args->count_handles; i++) {
+ for (i = 0; i < count; i++) {
chains[i] = dma_fence_chain_alloc();
if (!chains[i]) {
for (j = 0; j < i; j++)
@@ -1630,19 +1616,24 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
}
}
- for (i = 0; i < args->count_handles; i++) {
+ for (i = 0; i < count; i++) {
struct dma_fence *fence = dma_fence_get_stub();
+ u64 point = 0;
- drm_syncobj_add_point(syncobjs[i], chains[i],
- fence, points[i]);
+ if (points && copy_from_user(&point, points++, sizeof(point))) {
+ ret = -EFAULT;
+ for (j = i; j < count; j++)
+ dma_fence_chain_free(chains[j]);
+ goto err_chains;
+ }
+
+ drm_syncobj_add_point(syncobjs[i], chains[i], fence, point);
dma_fence_put(fence);
}
err_chains:
kfree(chains);
-err_points:
- kfree(points);
out:
- drm_syncobj_array_free(syncobjs, args->count_handles);
+ drm_syncobj_array_free(syncobjs, count);
return ret;
}
--
2.48.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 5/6] drm/syncobj: Add a fast path to drm_syncobj_array_wait_timeout
2025-06-11 14:00 [PATCH v5 0/6] A few drm_syncobj optimisations Tvrtko Ursulin
` (3 preceding siblings ...)
2025-06-11 14:00 ` [PATCH v5 4/6] drm/syncobj: Avoid temporary allocation in drm_syncobj_timeline_signal_ioctl Tvrtko Ursulin
@ 2025-06-11 14:00 ` Tvrtko Ursulin
2025-06-11 14:00 ` [PATCH v5 6/6] drm/syncobj: Add a fast path to drm_syncobj_array_find Tvrtko Ursulin
5 siblings, 0 replies; 13+ messages in thread
From: Tvrtko Ursulin @ 2025-06-11 14:00 UTC (permalink / raw)
To: dri-devel
Cc: kernel-dev, amd-gfx, intel-xe, Michel Dänzer, Tvrtko Ursulin,
Maíra Canal
Running the Cyberpunk 2077 benchmark we can observe that waiting on DRM
sycobjs is relatively hot, but the 96% of the calls are for a single
object. (~4% for two points, and never more than three points. While
a more trivial workload like vkmark under Plasma is even more skewed
to single point waits.)
Therefore lets add a fast path to bypass the kcalloc/kfree and use a pre-
allocated stack array for those cases.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
---
v2:
* Document rationale for stack array in a comment.
v3:
* Added DRM_SYNCOBJ_FAST_PATH_ENTRIES to avoid hardcoding fast path array
size.
---
drivers/gpu/drm/drm_syncobj.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 9968d9429d90..be5905dca87f 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -236,6 +236,14 @@ static void
syncobj_eventfd_entry_func(struct drm_syncobj *syncobj,
struct syncobj_eventfd_entry *entry);
+/*
+ * Empirically vast majority of ioctls pass in a single syncobj (96%) and never
+ * more than three points. Therefore implement a fast path with a small stack
+ * array to avoid going into the allocator sometimes several times per
+ * userspace rendered frame.
+ */
+#define DRM_SYNCOBJ_FAST_PATH_ENTRIES 4
+
/**
* drm_syncobj_find - lookup and reference a sync object.
* @file_private: drm file private pointer
@@ -1062,6 +1070,7 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
uint32_t *idx,
ktime_t *deadline)
{
+ struct syncobj_wait_entry stack_entries[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
struct syncobj_wait_entry *entries;
uint32_t signaled_count, i;
struct dma_fence *fence;
@@ -1076,6 +1085,15 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
if (!entries)
return -ENOMEM;
+ if (count > ARRAY_SIZE(stack_entries)) {
+ entries = kcalloc(count, sizeof(*entries), GFP_KERNEL);
+ if (!entries)
+ return -ENOMEM;
+ } else {
+ memset(stack_entries, 0, sizeof(stack_entries));
+ entries = stack_entries;
+ }
+
/* Walk the list of sync objects and initialize entries. We do
* this up-front so that we can properly return -EINVAL if there is
* a syncobj with a missing fence and then never have the chance of
@@ -1198,7 +1216,9 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
&entries[i].fence_cb);
dma_fence_put(entries[i].fence);
}
- kfree(entries);
+
+ if (entries != stack_entries)
+ kfree(entries);
return timeout;
}
--
2.48.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 6/6] drm/syncobj: Add a fast path to drm_syncobj_array_find
2025-06-11 14:00 [PATCH v5 0/6] A few drm_syncobj optimisations Tvrtko Ursulin
` (4 preceding siblings ...)
2025-06-11 14:00 ` [PATCH v5 5/6] drm/syncobj: Add a fast path to drm_syncobj_array_wait_timeout Tvrtko Ursulin
@ 2025-06-11 14:00 ` Tvrtko Ursulin
2025-06-11 14:21 ` Christian König
5 siblings, 1 reply; 13+ messages in thread
From: Tvrtko Ursulin @ 2025-06-11 14:00 UTC (permalink / raw)
To: dri-devel
Cc: kernel-dev, amd-gfx, intel-xe, Michel Dänzer, Tvrtko Ursulin,
Maíra Canal
Running the Cyberpunk 2077 benchmark we can observe that the lookup helper
is relatively hot, but the 97% of the calls are for a single object. (~3%
for two points, and never more than three points. While a more trivial
workload like vkmark under Plasma is even more skewed to single point
lookups.)
Therefore lets add a fast path to bypass the kmalloc_array/kfree and use a
pre-allocated stack array for those cases.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Maíra Canal <mcanal@igalia.com>
---
v2:
* Added comments describing how the fast path arrays were sized.
* Make container freeing criteria clearer by using a boolean.
---
drivers/gpu/drm/drm_syncobj.c | 56 +++++++++++++++++++++++++++--------
1 file changed, 44 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index be5905dca87f..65c301852f0d 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1259,6 +1259,8 @@ EXPORT_SYMBOL(drm_timeout_abs_to_jiffies);
static int drm_syncobj_array_find(struct drm_file *file_private,
u32 __user *handles,
uint32_t count,
+ struct drm_syncobj **stack_syncobjs,
+ u32 stack_count,
struct drm_syncobj ***syncobjs_out)
{
struct drm_syncobj **syncobjs;
@@ -1268,9 +1270,13 @@ static int drm_syncobj_array_find(struct drm_file *file_private,
if (!access_ok(handles, count * sizeof(*handles)))
return -EFAULT;
- syncobjs = kmalloc_array(count, sizeof(*syncobjs), GFP_KERNEL);
- if (!syncobjs)
- return -ENOMEM;
+ if (count > stack_count) {
+ syncobjs = kmalloc_array(count, sizeof(*syncobjs), GFP_KERNEL);
+ if (!syncobjs)
+ return -ENOMEM;
+ } else {
+ syncobjs = stack_syncobjs;
+ }
for (i = 0; i < count; i++) {
u32 handle;
@@ -1292,25 +1298,31 @@ static int drm_syncobj_array_find(struct drm_file *file_private,
err_put_syncobjs:
while (i-- > 0)
drm_syncobj_put(syncobjs[i]);
- kfree(syncobjs);
+
+ if (syncobjs != stack_syncobjs)
+ kfree(syncobjs);
return ret;
}
static void drm_syncobj_array_free(struct drm_syncobj **syncobjs,
- uint32_t count)
+ uint32_t count,
+ bool free_container)
{
uint32_t i;
for (i = 0; i < count; i++)
drm_syncobj_put(syncobjs[i]);
- kfree(syncobjs);
+
+ if (free_container)
+ kfree(syncobjs);
}
int
drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private)
{
+ struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
struct drm_syncobj_wait *args = data;
ktime_t deadline, *pdeadline = NULL;
u32 count = args->count_handles;
@@ -1336,6 +1348,8 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
ret = drm_syncobj_array_find(file_private,
u64_to_user_ptr(args->handles),
count,
+ stack_syncobjs,
+ ARRAY_SIZE(stack_syncobjs),
&syncobjs);
if (ret < 0)
return ret;
@@ -1354,7 +1368,7 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
&first,
pdeadline);
- drm_syncobj_array_free(syncobjs, count);
+ drm_syncobj_array_free(syncobjs, count, syncobjs != stack_syncobjs);
if (timeout < 0)
return timeout;
@@ -1368,6 +1382,7 @@ int
drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private)
{
+ struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
struct drm_syncobj_timeline_wait *args = data;
ktime_t deadline, *pdeadline = NULL;
u32 count = args->count_handles;
@@ -1394,6 +1409,8 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
ret = drm_syncobj_array_find(file_private,
u64_to_user_ptr(args->handles),
count,
+ stack_syncobjs,
+ ARRAY_SIZE(stack_syncobjs),
&syncobjs);
if (ret < 0)
return ret;
@@ -1412,7 +1429,7 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
&first,
pdeadline);
- drm_syncobj_array_free(syncobjs, count);
+ drm_syncobj_array_free(syncobjs, count, syncobjs != stack_syncobjs);
if (timeout < 0)
return timeout;
@@ -1529,6 +1546,7 @@ int
drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private)
{
+ struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
struct drm_syncobj_array *args = data;
struct drm_syncobj **syncobjs;
uint32_t i;
@@ -1546,6 +1564,8 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
ret = drm_syncobj_array_find(file_private,
u64_to_user_ptr(args->handles),
args->count_handles,
+ stack_syncobjs,
+ ARRAY_SIZE(stack_syncobjs),
&syncobjs);
if (ret < 0)
return ret;
@@ -1553,7 +1573,8 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
for (i = 0; i < args->count_handles; i++)
drm_syncobj_replace_fence(syncobjs[i], NULL);
- drm_syncobj_array_free(syncobjs, args->count_handles);
+ drm_syncobj_array_free(syncobjs, args->count_handles,
+ syncobjs != stack_syncobjs);
return 0;
}
@@ -1562,6 +1583,7 @@ int
drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private)
{
+ struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
struct drm_syncobj_array *args = data;
struct drm_syncobj **syncobjs;
uint32_t i;
@@ -1579,6 +1601,8 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
ret = drm_syncobj_array_find(file_private,
u64_to_user_ptr(args->handles),
args->count_handles,
+ stack_syncobjs,
+ ARRAY_SIZE(stack_syncobjs),
&syncobjs);
if (ret < 0)
return ret;
@@ -1589,7 +1613,8 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
break;
}
- drm_syncobj_array_free(syncobjs, args->count_handles);
+ drm_syncobj_array_free(syncobjs, args->count_handles,
+ syncobjs != stack_syncobjs);
return ret;
}
@@ -1598,6 +1623,7 @@ int
drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private)
{
+ struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
struct drm_syncobj_timeline_array *args = data;
uint64_t __user *points = u64_to_user_ptr(args->points);
uint32_t i, j, count = args->count_handles;
@@ -1617,6 +1643,8 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
ret = drm_syncobj_array_find(file_private,
u64_to_user_ptr(args->handles),
count,
+ stack_syncobjs,
+ ARRAY_SIZE(stack_syncobjs),
&syncobjs);
if (ret < 0)
return ret;
@@ -1653,7 +1681,7 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
err_chains:
kfree(chains);
out:
- drm_syncobj_array_free(syncobjs, count);
+ drm_syncobj_array_free(syncobjs, count, syncobjs != stack_syncobjs);
return ret;
}
@@ -1661,6 +1689,7 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private)
{
+ struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
struct drm_syncobj_timeline_array *args = data;
struct drm_syncobj **syncobjs;
uint64_t __user *points = u64_to_user_ptr(args->points);
@@ -1679,6 +1708,8 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
ret = drm_syncobj_array_find(file_private,
u64_to_user_ptr(args->handles),
args->count_handles,
+ stack_syncobjs,
+ ARRAY_SIZE(stack_syncobjs),
&syncobjs);
if (ret < 0)
return ret;
@@ -1722,7 +1753,8 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
if (ret)
break;
}
- drm_syncobj_array_free(syncobjs, args->count_handles);
+ drm_syncobj_array_free(syncobjs, args->count_handles,
+ syncobjs != stack_syncobjs);
return ret;
}
--
2.48.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v5 4/6] drm/syncobj: Avoid temporary allocation in drm_syncobj_timeline_signal_ioctl
2025-06-11 14:00 ` [PATCH v5 4/6] drm/syncobj: Avoid temporary allocation in drm_syncobj_timeline_signal_ioctl Tvrtko Ursulin
@ 2025-06-11 14:15 ` Christian König
0 siblings, 0 replies; 13+ messages in thread
From: Christian König @ 2025-06-11 14:15 UTC (permalink / raw)
To: Tvrtko Ursulin, dri-devel
Cc: kernel-dev, amd-gfx, intel-xe, Michel Dänzer,
Maíra Canal
On 6/11/25 16:00, Tvrtko Ursulin wrote:
> We can avoid one of the two temporary allocations if we read the userspace
> supplied timeline points as we go along.
The problem with that is that calling copy_from_user multiple times is really inefficient.
So that improves performance with few entries and decreases performance with many entries.
Not sure if having many entries is really a valid use case here.
Regards,
Christian.
>
> The only new complication is to unwind unused fence chains on the error
> path, but even that code was already present in the function.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Reviewed-by: Maíra Canal <mcanal@igalia.com> #v1
> ---
> v2:
> * Change back to copy_from_user due 32-bit ARM not implementing 64-bit
> get_user.
>
> v3:
> * Fix argument order mixup.
> ---
> drivers/gpu/drm/drm_syncobj.c | 43 ++++++++++++++---------------------
> 1 file changed, 17 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> index 497009fc66f8..9968d9429d90 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -1579,10 +1579,10 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
> struct drm_file *file_private)
> {
> struct drm_syncobj_timeline_array *args = data;
> + uint64_t __user *points = u64_to_user_ptr(args->points);
> + uint32_t i, j, count = args->count_handles;
> struct drm_syncobj **syncobjs;
> struct dma_fence_chain **chains;
> - uint64_t *points;
> - uint32_t i, j;
> int ret;
>
> if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE))
> @@ -1596,31 +1596,17 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>
> ret = drm_syncobj_array_find(file_private,
> u64_to_user_ptr(args->handles),
> - args->count_handles,
> + count,
> &syncobjs);
> if (ret < 0)
> return ret;
>
> - points = kmalloc_array(args->count_handles, sizeof(*points),
> - GFP_KERNEL);
> - if (!points) {
> - ret = -ENOMEM;
> - goto out;
> - }
> - if (!u64_to_user_ptr(args->points)) {
> - memset(points, 0, args->count_handles * sizeof(uint64_t));
> - } else if (copy_from_user(points, u64_to_user_ptr(args->points),
> - sizeof(uint64_t) * args->count_handles)) {
> - ret = -EFAULT;
> - goto err_points;
> - }
> -
> - chains = kmalloc_array(args->count_handles, sizeof(void *), GFP_KERNEL);
> + chains = kmalloc_array(count, sizeof(void *), GFP_KERNEL);
> if (!chains) {
> ret = -ENOMEM;
> - goto err_points;
> + goto out;
> }
> - for (i = 0; i < args->count_handles; i++) {
> + for (i = 0; i < count; i++) {
> chains[i] = dma_fence_chain_alloc();
> if (!chains[i]) {
> for (j = 0; j < i; j++)
> @@ -1630,19 +1616,24 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
> }
> }
>
> - for (i = 0; i < args->count_handles; i++) {
> + for (i = 0; i < count; i++) {
> struct dma_fence *fence = dma_fence_get_stub();
> + u64 point = 0;
>
> - drm_syncobj_add_point(syncobjs[i], chains[i],
> - fence, points[i]);
> + if (points && copy_from_user(&point, points++, sizeof(point))) {
> + ret = -EFAULT;
> + for (j = i; j < count; j++)
> + dma_fence_chain_free(chains[j]);
> + goto err_chains;
> + }
> +
> + drm_syncobj_add_point(syncobjs[i], chains[i], fence, point);
> dma_fence_put(fence);
> }
> err_chains:
> kfree(chains);
> -err_points:
> - kfree(points);
> out:
> - drm_syncobj_array_free(syncobjs, args->count_handles);
> + drm_syncobj_array_free(syncobjs, count);
>
> return ret;
> }
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 6/6] drm/syncobj: Add a fast path to drm_syncobj_array_find
2025-06-11 14:00 ` [PATCH v5 6/6] drm/syncobj: Add a fast path to drm_syncobj_array_find Tvrtko Ursulin
@ 2025-06-11 14:21 ` Christian König
2025-06-11 15:29 ` Tvrtko Ursulin
0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2025-06-11 14:21 UTC (permalink / raw)
To: Tvrtko Ursulin, dri-devel
Cc: kernel-dev, amd-gfx, intel-xe, Michel Dänzer,
Maíra Canal
On 6/11/25 16:00, Tvrtko Ursulin wrote:
> Running the Cyberpunk 2077 benchmark we can observe that the lookup helper
> is relatively hot, but the 97% of the calls are for a single object. (~3%
> for two points, and never more than three points. While a more trivial
> workload like vkmark under Plasma is even more skewed to single point
> lookups.)
>
> Therefore lets add a fast path to bypass the kmalloc_array/kfree and use a
> pre-allocated stack array for those cases.
Have you considered using memdup_user()? That's using a separate bucket IIRC and might give similar performance.
If that is still not sufficient I'm really wondering if we shouldn't have a macro for doing this. It's a really common use case as far as I can see.
Regards,
Christian.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Reviewed-by: Maíra Canal <mcanal@igalia.com>
> ---
> v2:
> * Added comments describing how the fast path arrays were sized.
> * Make container freeing criteria clearer by using a boolean.
> ---
> drivers/gpu/drm/drm_syncobj.c | 56 +++++++++++++++++++++++++++--------
> 1 file changed, 44 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> index be5905dca87f..65c301852f0d 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -1259,6 +1259,8 @@ EXPORT_SYMBOL(drm_timeout_abs_to_jiffies);
> static int drm_syncobj_array_find(struct drm_file *file_private,
> u32 __user *handles,
> uint32_t count,
> + struct drm_syncobj **stack_syncobjs,
> + u32 stack_count,
> struct drm_syncobj ***syncobjs_out)
> {
> struct drm_syncobj **syncobjs;
> @@ -1268,9 +1270,13 @@ static int drm_syncobj_array_find(struct drm_file *file_private,
> if (!access_ok(handles, count * sizeof(*handles)))
> return -EFAULT;
>
> - syncobjs = kmalloc_array(count, sizeof(*syncobjs), GFP_KERNEL);
> - if (!syncobjs)
> - return -ENOMEM;
> + if (count > stack_count) {
> + syncobjs = kmalloc_array(count, sizeof(*syncobjs), GFP_KERNEL);
> + if (!syncobjs)
> + return -ENOMEM;
> + } else {
> + syncobjs = stack_syncobjs;
> + }
>
> for (i = 0; i < count; i++) {
> u32 handle;
> @@ -1292,25 +1298,31 @@ static int drm_syncobj_array_find(struct drm_file *file_private,
> err_put_syncobjs:
> while (i-- > 0)
> drm_syncobj_put(syncobjs[i]);
> - kfree(syncobjs);
> +
> + if (syncobjs != stack_syncobjs)
> + kfree(syncobjs);
>
> return ret;
> }
>
> static void drm_syncobj_array_free(struct drm_syncobj **syncobjs,
> - uint32_t count)
> + uint32_t count,
> + bool free_container)
> {
> uint32_t i;
>
> for (i = 0; i < count; i++)
> drm_syncobj_put(syncobjs[i]);
> - kfree(syncobjs);
> +
> + if (free_container)
> + kfree(syncobjs);
> }
>
> int
> drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
> struct drm_file *file_private)
> {
> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
> struct drm_syncobj_wait *args = data;
> ktime_t deadline, *pdeadline = NULL;
> u32 count = args->count_handles;
> @@ -1336,6 +1348,8 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
> ret = drm_syncobj_array_find(file_private,
> u64_to_user_ptr(args->handles),
> count,
> + stack_syncobjs,
> + ARRAY_SIZE(stack_syncobjs),
> &syncobjs);
> if (ret < 0)
> return ret;
> @@ -1354,7 +1368,7 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
> &first,
> pdeadline);
>
> - drm_syncobj_array_free(syncobjs, count);
> + drm_syncobj_array_free(syncobjs, count, syncobjs != stack_syncobjs);
>
> if (timeout < 0)
> return timeout;
> @@ -1368,6 +1382,7 @@ int
> drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
> struct drm_file *file_private)
> {
> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
> struct drm_syncobj_timeline_wait *args = data;
> ktime_t deadline, *pdeadline = NULL;
> u32 count = args->count_handles;
> @@ -1394,6 +1409,8 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
> ret = drm_syncobj_array_find(file_private,
> u64_to_user_ptr(args->handles),
> count,
> + stack_syncobjs,
> + ARRAY_SIZE(stack_syncobjs),
> &syncobjs);
> if (ret < 0)
> return ret;
> @@ -1412,7 +1429,7 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
> &first,
> pdeadline);
>
> - drm_syncobj_array_free(syncobjs, count);
> + drm_syncobj_array_free(syncobjs, count, syncobjs != stack_syncobjs);
>
> if (timeout < 0)
> return timeout;
> @@ -1529,6 +1546,7 @@ int
> drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
> struct drm_file *file_private)
> {
> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
> struct drm_syncobj_array *args = data;
> struct drm_syncobj **syncobjs;
> uint32_t i;
> @@ -1546,6 +1564,8 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
> ret = drm_syncobj_array_find(file_private,
> u64_to_user_ptr(args->handles),
> args->count_handles,
> + stack_syncobjs,
> + ARRAY_SIZE(stack_syncobjs),
> &syncobjs);
> if (ret < 0)
> return ret;
> @@ -1553,7 +1573,8 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
> for (i = 0; i < args->count_handles; i++)
> drm_syncobj_replace_fence(syncobjs[i], NULL);
>
> - drm_syncobj_array_free(syncobjs, args->count_handles);
> + drm_syncobj_array_free(syncobjs, args->count_handles,
> + syncobjs != stack_syncobjs);
>
> return 0;
> }
> @@ -1562,6 +1583,7 @@ int
> drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
> struct drm_file *file_private)
> {
> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
> struct drm_syncobj_array *args = data;
> struct drm_syncobj **syncobjs;
> uint32_t i;
> @@ -1579,6 +1601,8 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
> ret = drm_syncobj_array_find(file_private,
> u64_to_user_ptr(args->handles),
> args->count_handles,
> + stack_syncobjs,
> + ARRAY_SIZE(stack_syncobjs),
> &syncobjs);
> if (ret < 0)
> return ret;
> @@ -1589,7 +1613,8 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
> break;
> }
>
> - drm_syncobj_array_free(syncobjs, args->count_handles);
> + drm_syncobj_array_free(syncobjs, args->count_handles,
> + syncobjs != stack_syncobjs);
>
> return ret;
> }
> @@ -1598,6 +1623,7 @@ int
> drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
> struct drm_file *file_private)
> {
> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
> struct drm_syncobj_timeline_array *args = data;
> uint64_t __user *points = u64_to_user_ptr(args->points);
> uint32_t i, j, count = args->count_handles;
> @@ -1617,6 +1643,8 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
> ret = drm_syncobj_array_find(file_private,
> u64_to_user_ptr(args->handles),
> count,
> + stack_syncobjs,
> + ARRAY_SIZE(stack_syncobjs),
> &syncobjs);
> if (ret < 0)
> return ret;
> @@ -1653,7 +1681,7 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
> err_chains:
> kfree(chains);
> out:
> - drm_syncobj_array_free(syncobjs, count);
> + drm_syncobj_array_free(syncobjs, count, syncobjs != stack_syncobjs);
>
> return ret;
> }
> @@ -1661,6 +1689,7 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
> int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
> struct drm_file *file_private)
> {
> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
> struct drm_syncobj_timeline_array *args = data;
> struct drm_syncobj **syncobjs;
> uint64_t __user *points = u64_to_user_ptr(args->points);
> @@ -1679,6 +1708,8 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
> ret = drm_syncobj_array_find(file_private,
> u64_to_user_ptr(args->handles),
> args->count_handles,
> + stack_syncobjs,
> + ARRAY_SIZE(stack_syncobjs),
> &syncobjs);
> if (ret < 0)
> return ret;
> @@ -1722,7 +1753,8 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
> if (ret)
> break;
> }
> - drm_syncobj_array_free(syncobjs, args->count_handles);
> + drm_syncobj_array_free(syncobjs, args->count_handles,
> + syncobjs != stack_syncobjs);
>
> return ret;
> }
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 6/6] drm/syncobj: Add a fast path to drm_syncobj_array_find
2025-06-11 14:21 ` Christian König
@ 2025-06-11 15:29 ` Tvrtko Ursulin
2025-06-12 7:21 ` Christian König
0 siblings, 1 reply; 13+ messages in thread
From: Tvrtko Ursulin @ 2025-06-11 15:29 UTC (permalink / raw)
To: Christian König, dri-devel
Cc: kernel-dev, amd-gfx, intel-xe, Michel Dänzer,
Maíra Canal
On 11/06/2025 15:21, Christian König wrote:
> On 6/11/25 16:00, Tvrtko Ursulin wrote:
>> Running the Cyberpunk 2077 benchmark we can observe that the lookup helper
>> is relatively hot, but the 97% of the calls are for a single object. (~3%
>> for two points, and never more than three points. While a more trivial
>> workload like vkmark under Plasma is even more skewed to single point
>> lookups.)
>>
>> Therefore lets add a fast path to bypass the kmalloc_array/kfree and use a
>> pre-allocated stack array for those cases.
>
> Have you considered using memdup_user()? That's using a separate bucket IIRC and might give similar performance.
I haven't but I can try it. I would be surprised if it made a (positive)
difference though.
And I realised I need to repeat the benchmarks anyway, since in v4 I had
to stop doing access_ok+__get_user, after kernel test robot let me know
64-bit get_user is a not a thing on all platforms. I thought the gains
are from avoiding allocations but, as you say, now I need to see if
copy_from_user doesn't nullify them..
> If that is still not sufficient I'm really wondering if we shouldn't have a macro for doing this. It's a really common use case as far as I can see.
Hmm macro for what exactly?
Regards,
Tvrtko
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Reviewed-by: Maíra Canal <mcanal@igalia.com>
>> ---
>> v2:
>> * Added comments describing how the fast path arrays were sized.
>> * Make container freeing criteria clearer by using a boolean.
>> ---
>> drivers/gpu/drm/drm_syncobj.c | 56 +++++++++++++++++++++++++++--------
>> 1 file changed, 44 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
>> index be5905dca87f..65c301852f0d 100644
>> --- a/drivers/gpu/drm/drm_syncobj.c
>> +++ b/drivers/gpu/drm/drm_syncobj.c
>> @@ -1259,6 +1259,8 @@ EXPORT_SYMBOL(drm_timeout_abs_to_jiffies);
>> static int drm_syncobj_array_find(struct drm_file *file_private,
>> u32 __user *handles,
>> uint32_t count,
>> + struct drm_syncobj **stack_syncobjs,
>> + u32 stack_count,
>> struct drm_syncobj ***syncobjs_out)
>> {
>> struct drm_syncobj **syncobjs;
>> @@ -1268,9 +1270,13 @@ static int drm_syncobj_array_find(struct drm_file *file_private,
>> if (!access_ok(handles, count * sizeof(*handles)))
>> return -EFAULT;
>>
>> - syncobjs = kmalloc_array(count, sizeof(*syncobjs), GFP_KERNEL);
>> - if (!syncobjs)
>> - return -ENOMEM;
>> + if (count > stack_count) {
>> + syncobjs = kmalloc_array(count, sizeof(*syncobjs), GFP_KERNEL);
>> + if (!syncobjs)
>> + return -ENOMEM;
>> + } else {
>> + syncobjs = stack_syncobjs;
>> + }
>>
>> for (i = 0; i < count; i++) {
>> u32 handle;
>> @@ -1292,25 +1298,31 @@ static int drm_syncobj_array_find(struct drm_file *file_private,
>> err_put_syncobjs:
>> while (i-- > 0)
>> drm_syncobj_put(syncobjs[i]);
>> - kfree(syncobjs);
>> +
>> + if (syncobjs != stack_syncobjs)
>> + kfree(syncobjs);
>>
>> return ret;
>> }
>>
>> static void drm_syncobj_array_free(struct drm_syncobj **syncobjs,
>> - uint32_t count)
>> + uint32_t count,
>> + bool free_container)
>> {
>> uint32_t i;
>>
>> for (i = 0; i < count; i++)
>> drm_syncobj_put(syncobjs[i]);
>> - kfree(syncobjs);
>> +
>> + if (free_container)
>> + kfree(syncobjs);
>> }
>>
>> int
>> drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
>> struct drm_file *file_private)
>> {
>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>> struct drm_syncobj_wait *args = data;
>> ktime_t deadline, *pdeadline = NULL;
>> u32 count = args->count_handles;
>> @@ -1336,6 +1348,8 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
>> ret = drm_syncobj_array_find(file_private,
>> u64_to_user_ptr(args->handles),
>> count,
>> + stack_syncobjs,
>> + ARRAY_SIZE(stack_syncobjs),
>> &syncobjs);
>> if (ret < 0)
>> return ret;
>> @@ -1354,7 +1368,7 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
>> &first,
>> pdeadline);
>>
>> - drm_syncobj_array_free(syncobjs, count);
>> + drm_syncobj_array_free(syncobjs, count, syncobjs != stack_syncobjs);
>>
>> if (timeout < 0)
>> return timeout;
>> @@ -1368,6 +1382,7 @@ int
>> drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
>> struct drm_file *file_private)
>> {
>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>> struct drm_syncobj_timeline_wait *args = data;
>> ktime_t deadline, *pdeadline = NULL;
>> u32 count = args->count_handles;
>> @@ -1394,6 +1409,8 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
>> ret = drm_syncobj_array_find(file_private,
>> u64_to_user_ptr(args->handles),
>> count,
>> + stack_syncobjs,
>> + ARRAY_SIZE(stack_syncobjs),
>> &syncobjs);
>> if (ret < 0)
>> return ret;
>> @@ -1412,7 +1429,7 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
>> &first,
>> pdeadline);
>>
>> - drm_syncobj_array_free(syncobjs, count);
>> + drm_syncobj_array_free(syncobjs, count, syncobjs != stack_syncobjs);
>>
>> if (timeout < 0)
>> return timeout;
>> @@ -1529,6 +1546,7 @@ int
>> drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
>> struct drm_file *file_private)
>> {
>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>> struct drm_syncobj_array *args = data;
>> struct drm_syncobj **syncobjs;
>> uint32_t i;
>> @@ -1546,6 +1564,8 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
>> ret = drm_syncobj_array_find(file_private,
>> u64_to_user_ptr(args->handles),
>> args->count_handles,
>> + stack_syncobjs,
>> + ARRAY_SIZE(stack_syncobjs),
>> &syncobjs);
>> if (ret < 0)
>> return ret;
>> @@ -1553,7 +1573,8 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
>> for (i = 0; i < args->count_handles; i++)
>> drm_syncobj_replace_fence(syncobjs[i], NULL);
>>
>> - drm_syncobj_array_free(syncobjs, args->count_handles);
>> + drm_syncobj_array_free(syncobjs, args->count_handles,
>> + syncobjs != stack_syncobjs);
>>
>> return 0;
>> }
>> @@ -1562,6 +1583,7 @@ int
>> drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
>> struct drm_file *file_private)
>> {
>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>> struct drm_syncobj_array *args = data;
>> struct drm_syncobj **syncobjs;
>> uint32_t i;
>> @@ -1579,6 +1601,8 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
>> ret = drm_syncobj_array_find(file_private,
>> u64_to_user_ptr(args->handles),
>> args->count_handles,
>> + stack_syncobjs,
>> + ARRAY_SIZE(stack_syncobjs),
>> &syncobjs);
>> if (ret < 0)
>> return ret;
>> @@ -1589,7 +1613,8 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
>> break;
>> }
>>
>> - drm_syncobj_array_free(syncobjs, args->count_handles);
>> + drm_syncobj_array_free(syncobjs, args->count_handles,
>> + syncobjs != stack_syncobjs);
>>
>> return ret;
>> }
>> @@ -1598,6 +1623,7 @@ int
>> drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>> struct drm_file *file_private)
>> {
>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>> struct drm_syncobj_timeline_array *args = data;
>> uint64_t __user *points = u64_to_user_ptr(args->points);
>> uint32_t i, j, count = args->count_handles;
>> @@ -1617,6 +1643,8 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>> ret = drm_syncobj_array_find(file_private,
>> u64_to_user_ptr(args->handles),
>> count,
>> + stack_syncobjs,
>> + ARRAY_SIZE(stack_syncobjs),
>> &syncobjs);
>> if (ret < 0)
>> return ret;
>> @@ -1653,7 +1681,7 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>> err_chains:
>> kfree(chains);
>> out:
>> - drm_syncobj_array_free(syncobjs, count);
>> + drm_syncobj_array_free(syncobjs, count, syncobjs != stack_syncobjs);
>>
>> return ret;
>> }
>> @@ -1661,6 +1689,7 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>> int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
>> struct drm_file *file_private)
>> {
>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>> struct drm_syncobj_timeline_array *args = data;
>> struct drm_syncobj **syncobjs;
>> uint64_t __user *points = u64_to_user_ptr(args->points);
>> @@ -1679,6 +1708,8 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
>> ret = drm_syncobj_array_find(file_private,
>> u64_to_user_ptr(args->handles),
>> args->count_handles,
>> + stack_syncobjs,
>> + ARRAY_SIZE(stack_syncobjs),
>> &syncobjs);
>> if (ret < 0)
>> return ret;
>> @@ -1722,7 +1753,8 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
>> if (ret)
>> break;
>> }
>> - drm_syncobj_array_free(syncobjs, args->count_handles);
>> + drm_syncobj_array_free(syncobjs, args->count_handles,
>> + syncobjs != stack_syncobjs);
>>
>> return ret;
>> }
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 6/6] drm/syncobj: Add a fast path to drm_syncobj_array_find
2025-06-11 15:29 ` Tvrtko Ursulin
@ 2025-06-12 7:21 ` Christian König
2025-06-12 10:58 ` Tvrtko Ursulin
0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2025-06-12 7:21 UTC (permalink / raw)
To: Tvrtko Ursulin, dri-devel
Cc: kernel-dev, amd-gfx, intel-xe, Michel Dänzer,
Maíra Canal
On 6/11/25 17:29, Tvrtko Ursulin wrote:
>
> On 11/06/2025 15:21, Christian König wrote:
>> On 6/11/25 16:00, Tvrtko Ursulin wrote:
>>> Running the Cyberpunk 2077 benchmark we can observe that the lookup helper
>>> is relatively hot, but the 97% of the calls are for a single object. (~3%
>>> for two points, and never more than three points. While a more trivial
>>> workload like vkmark under Plasma is even more skewed to single point
>>> lookups.)
>>>
>>> Therefore lets add a fast path to bypass the kmalloc_array/kfree and use a
>>> pre-allocated stack array for those cases.
>>
>> Have you considered using memdup_user()? That's using a separate bucket IIRC and might give similar performance.
>
> I haven't but I can try it. I would be surprised if it made a (positive) difference though.
Yeah, it's mostly for extra security I think.
>
> And I realised I need to repeat the benchmarks anyway, since in v4 I had to stop doing access_ok+__get_user, after kernel test robot let me know 64-bit get_user is a not a thing on all platforms. I thought the gains are from avoiding allocations but, as you say, now I need to see if copy_from_user doesn't nullify them..
>
>> If that is still not sufficient I'm really wondering if we shouldn't have a macro for doing this. It's a really common use case as far as I can see.
>
> Hmm macro for what exactly?
Like a macro which uses an array on the stack for small (<4) number of values and k(v)malloc() for large ones.
IIRC there is also a relatively new functionality which allows releasing the memory automatically when we leave the function.
Regards,
Christian.
>
> Regards,
>
> Tvrtko
>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>> Reviewed-by: Maíra Canal <mcanal@igalia.com>
>>> ---
>>> v2:
>>> * Added comments describing how the fast path arrays were sized.
>>> * Make container freeing criteria clearer by using a boolean.
>>> ---
>>> drivers/gpu/drm/drm_syncobj.c | 56 +++++++++++++++++++++++++++--------
>>> 1 file changed, 44 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
>>> index be5905dca87f..65c301852f0d 100644
>>> --- a/drivers/gpu/drm/drm_syncobj.c
>>> +++ b/drivers/gpu/drm/drm_syncobj.c
>>> @@ -1259,6 +1259,8 @@ EXPORT_SYMBOL(drm_timeout_abs_to_jiffies);
>>> static int drm_syncobj_array_find(struct drm_file *file_private,
>>> u32 __user *handles,
>>> uint32_t count,
>>> + struct drm_syncobj **stack_syncobjs,
>>> + u32 stack_count,
>>> struct drm_syncobj ***syncobjs_out)
>>> {
>>> struct drm_syncobj **syncobjs;
>>> @@ -1268,9 +1270,13 @@ static int drm_syncobj_array_find(struct drm_file *file_private,
>>> if (!access_ok(handles, count * sizeof(*handles)))
>>> return -EFAULT;
>>> - syncobjs = kmalloc_array(count, sizeof(*syncobjs), GFP_KERNEL);
>>> - if (!syncobjs)
>>> - return -ENOMEM;
>>> + if (count > stack_count) {
>>> + syncobjs = kmalloc_array(count, sizeof(*syncobjs), GFP_KERNEL);
>>> + if (!syncobjs)
>>> + return -ENOMEM;
>>> + } else {
>>> + syncobjs = stack_syncobjs;
>>> + }
>>> for (i = 0; i < count; i++) {
>>> u32 handle;
>>> @@ -1292,25 +1298,31 @@ static int drm_syncobj_array_find(struct drm_file *file_private,
>>> err_put_syncobjs:
>>> while (i-- > 0)
>>> drm_syncobj_put(syncobjs[i]);
>>> - kfree(syncobjs);
>>> +
>>> + if (syncobjs != stack_syncobjs)
>>> + kfree(syncobjs);
>>> return ret;
>>> }
>>> static void drm_syncobj_array_free(struct drm_syncobj **syncobjs,
>>> - uint32_t count)
>>> + uint32_t count,
>>> + bool free_container)
>>> {
>>> uint32_t i;
>>> for (i = 0; i < count; i++)
>>> drm_syncobj_put(syncobjs[i]);
>>> - kfree(syncobjs);
>>> +
>>> + if (free_container)
>>> + kfree(syncobjs);
>>> }
>>> int
>>> drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
>>> struct drm_file *file_private)
>>> {
>>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>>> struct drm_syncobj_wait *args = data;
>>> ktime_t deadline, *pdeadline = NULL;
>>> u32 count = args->count_handles;
>>> @@ -1336,6 +1348,8 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
>>> ret = drm_syncobj_array_find(file_private,
>>> u64_to_user_ptr(args->handles),
>>> count,
>>> + stack_syncobjs,
>>> + ARRAY_SIZE(stack_syncobjs),
>>> &syncobjs);
>>> if (ret < 0)
>>> return ret;
>>> @@ -1354,7 +1368,7 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
>>> &first,
>>> pdeadline);
>>> - drm_syncobj_array_free(syncobjs, count);
>>> + drm_syncobj_array_free(syncobjs, count, syncobjs != stack_syncobjs);
>>> if (timeout < 0)
>>> return timeout;
>>> @@ -1368,6 +1382,7 @@ int
>>> drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
>>> struct drm_file *file_private)
>>> {
>>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>>> struct drm_syncobj_timeline_wait *args = data;
>>> ktime_t deadline, *pdeadline = NULL;
>>> u32 count = args->count_handles;
>>> @@ -1394,6 +1409,8 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
>>> ret = drm_syncobj_array_find(file_private,
>>> u64_to_user_ptr(args->handles),
>>> count,
>>> + stack_syncobjs,
>>> + ARRAY_SIZE(stack_syncobjs),
>>> &syncobjs);
>>> if (ret < 0)
>>> return ret;
>>> @@ -1412,7 +1429,7 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
>>> &first,
>>> pdeadline);
>>> - drm_syncobj_array_free(syncobjs, count);
>>> + drm_syncobj_array_free(syncobjs, count, syncobjs != stack_syncobjs);
>>> if (timeout < 0)
>>> return timeout;
>>> @@ -1529,6 +1546,7 @@ int
>>> drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
>>> struct drm_file *file_private)
>>> {
>>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>>> struct drm_syncobj_array *args = data;
>>> struct drm_syncobj **syncobjs;
>>> uint32_t i;
>>> @@ -1546,6 +1564,8 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
>>> ret = drm_syncobj_array_find(file_private,
>>> u64_to_user_ptr(args->handles),
>>> args->count_handles,
>>> + stack_syncobjs,
>>> + ARRAY_SIZE(stack_syncobjs),
>>> &syncobjs);
>>> if (ret < 0)
>>> return ret;
>>> @@ -1553,7 +1573,8 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
>>> for (i = 0; i < args->count_handles; i++)
>>> drm_syncobj_replace_fence(syncobjs[i], NULL);
>>> - drm_syncobj_array_free(syncobjs, args->count_handles);
>>> + drm_syncobj_array_free(syncobjs, args->count_handles,
>>> + syncobjs != stack_syncobjs);
>>> return 0;
>>> }
>>> @@ -1562,6 +1583,7 @@ int
>>> drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
>>> struct drm_file *file_private)
>>> {
>>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>>> struct drm_syncobj_array *args = data;
>>> struct drm_syncobj **syncobjs;
>>> uint32_t i;
>>> @@ -1579,6 +1601,8 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
>>> ret = drm_syncobj_array_find(file_private,
>>> u64_to_user_ptr(args->handles),
>>> args->count_handles,
>>> + stack_syncobjs,
>>> + ARRAY_SIZE(stack_syncobjs),
>>> &syncobjs);
>>> if (ret < 0)
>>> return ret;
>>> @@ -1589,7 +1613,8 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
>>> break;
>>> }
>>> - drm_syncobj_array_free(syncobjs, args->count_handles);
>>> + drm_syncobj_array_free(syncobjs, args->count_handles,
>>> + syncobjs != stack_syncobjs);
>>> return ret;
>>> }
>>> @@ -1598,6 +1623,7 @@ int
>>> drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>>> struct drm_file *file_private)
>>> {
>>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>>> struct drm_syncobj_timeline_array *args = data;
>>> uint64_t __user *points = u64_to_user_ptr(args->points);
>>> uint32_t i, j, count = args->count_handles;
>>> @@ -1617,6 +1643,8 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>>> ret = drm_syncobj_array_find(file_private,
>>> u64_to_user_ptr(args->handles),
>>> count,
>>> + stack_syncobjs,
>>> + ARRAY_SIZE(stack_syncobjs),
>>> &syncobjs);
>>> if (ret < 0)
>>> return ret;
>>> @@ -1653,7 +1681,7 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>>> err_chains:
>>> kfree(chains);
>>> out:
>>> - drm_syncobj_array_free(syncobjs, count);
>>> + drm_syncobj_array_free(syncobjs, count, syncobjs != stack_syncobjs);
>>> return ret;
>>> }
>>> @@ -1661,6 +1689,7 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>>> int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
>>> struct drm_file *file_private)
>>> {
>>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>>> struct drm_syncobj_timeline_array *args = data;
>>> struct drm_syncobj **syncobjs;
>>> uint64_t __user *points = u64_to_user_ptr(args->points);
>>> @@ -1679,6 +1708,8 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
>>> ret = drm_syncobj_array_find(file_private,
>>> u64_to_user_ptr(args->handles),
>>> args->count_handles,
>>> + stack_syncobjs,
>>> + ARRAY_SIZE(stack_syncobjs),
>>> &syncobjs);
>>> if (ret < 0)
>>> return ret;
>>> @@ -1722,7 +1753,8 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
>>> if (ret)
>>> break;
>>> }
>>> - drm_syncobj_array_free(syncobjs, args->count_handles);
>>> + drm_syncobj_array_free(syncobjs, args->count_handles,
>>> + syncobjs != stack_syncobjs);
>>> return ret;
>>> }
>>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 6/6] drm/syncobj: Add a fast path to drm_syncobj_array_find
2025-06-12 7:21 ` Christian König
@ 2025-06-12 10:58 ` Tvrtko Ursulin
2025-06-12 12:02 ` Christian König
0 siblings, 1 reply; 13+ messages in thread
From: Tvrtko Ursulin @ 2025-06-12 10:58 UTC (permalink / raw)
To: Christian König, dri-devel
Cc: kernel-dev, amd-gfx, intel-xe, Michel Dänzer,
Maíra Canal
On 12/06/2025 08:21, Christian König wrote:
> On 6/11/25 17:29, Tvrtko Ursulin wrote:
>>
>> On 11/06/2025 15:21, Christian König wrote:
>>> On 6/11/25 16:00, Tvrtko Ursulin wrote:
>>>> Running the Cyberpunk 2077 benchmark we can observe that the lookup helper
>>>> is relatively hot, but the 97% of the calls are for a single object. (~3%
>>>> for two points, and never more than three points. While a more trivial
>>>> workload like vkmark under Plasma is even more skewed to single point
>>>> lookups.)
>>>>
>>>> Therefore lets add a fast path to bypass the kmalloc_array/kfree and use a
>>>> pre-allocated stack array for those cases.
>>>
>>> Have you considered using memdup_user()? That's using a separate bucket IIRC and might give similar performance.
>>
>> I haven't but I can try it. I would be surprised if it made a (positive) difference though.
>
> Yeah, it's mostly for extra security I think.
On this topic, this discussion prompted me to quickly cook up some
trivial cleanups for amdgpu to use memdup_user & co where it was easy.
Series is on the mailing list but I did not copy you explicitly giving
chance for someone else to notice it and off load you a bit.
>> And I realised I need to repeat the benchmarks anyway, since in v4 I had to stop doing access_ok+__get_user, after kernel test robot let me know 64-bit get_user is a not a thing on all platforms. I thought the gains are from avoiding allocations but, as you say, now I need to see if copy_from_user doesn't nullify them..
>>
>>> If that is still not sufficient I'm really wondering if we shouldn't have a macro for doing this. It's a really common use case as far as I can see.
>>
>> Hmm macro for what exactly?
>
> Like a macro which uses an array on the stack for small (<4) number of values and k(v)malloc() for large ones.
>
> IIRC there is also a relatively new functionality which allows releasing the memory automatically when we leave the function.
Okay I will have a look at all those options. But it's going to the
bottom of my priority pile so it might be a while.
Regards,
Tvrtko
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>> Reviewed-by: Maíra Canal <mcanal@igalia.com>
>>>> ---
>>>> v2:
>>>> * Added comments describing how the fast path arrays were sized.
>>>> * Make container freeing criteria clearer by using a boolean.
>>>> ---
>>>> drivers/gpu/drm/drm_syncobj.c | 56 +++++++++++++++++++++++++++--------
>>>> 1 file changed, 44 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
>>>> index be5905dca87f..65c301852f0d 100644
>>>> --- a/drivers/gpu/drm/drm_syncobj.c
>>>> +++ b/drivers/gpu/drm/drm_syncobj.c
>>>> @@ -1259,6 +1259,8 @@ EXPORT_SYMBOL(drm_timeout_abs_to_jiffies);
>>>> static int drm_syncobj_array_find(struct drm_file *file_private,
>>>> u32 __user *handles,
>>>> uint32_t count,
>>>> + struct drm_syncobj **stack_syncobjs,
>>>> + u32 stack_count,
>>>> struct drm_syncobj ***syncobjs_out)
>>>> {
>>>> struct drm_syncobj **syncobjs;
>>>> @@ -1268,9 +1270,13 @@ static int drm_syncobj_array_find(struct drm_file *file_private,
>>>> if (!access_ok(handles, count * sizeof(*handles)))
>>>> return -EFAULT;
>>>> - syncobjs = kmalloc_array(count, sizeof(*syncobjs), GFP_KERNEL);
>>>> - if (!syncobjs)
>>>> - return -ENOMEM;
>>>> + if (count > stack_count) {
>>>> + syncobjs = kmalloc_array(count, sizeof(*syncobjs), GFP_KERNEL);
>>>> + if (!syncobjs)
>>>> + return -ENOMEM;
>>>> + } else {
>>>> + syncobjs = stack_syncobjs;
>>>> + }
>>>> for (i = 0; i < count; i++) {
>>>> u32 handle;
>>>> @@ -1292,25 +1298,31 @@ static int drm_syncobj_array_find(struct drm_file *file_private,
>>>> err_put_syncobjs:
>>>> while (i-- > 0)
>>>> drm_syncobj_put(syncobjs[i]);
>>>> - kfree(syncobjs);
>>>> +
>>>> + if (syncobjs != stack_syncobjs)
>>>> + kfree(syncobjs);
>>>> return ret;
>>>> }
>>>> static void drm_syncobj_array_free(struct drm_syncobj **syncobjs,
>>>> - uint32_t count)
>>>> + uint32_t count,
>>>> + bool free_container)
>>>> {
>>>> uint32_t i;
>>>> for (i = 0; i < count; i++)
>>>> drm_syncobj_put(syncobjs[i]);
>>>> - kfree(syncobjs);
>>>> +
>>>> + if (free_container)
>>>> + kfree(syncobjs);
>>>> }
>>>> int
>>>> drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
>>>> struct drm_file *file_private)
>>>> {
>>>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>>>> struct drm_syncobj_wait *args = data;
>>>> ktime_t deadline, *pdeadline = NULL;
>>>> u32 count = args->count_handles;
>>>> @@ -1336,6 +1348,8 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
>>>> ret = drm_syncobj_array_find(file_private,
>>>> u64_to_user_ptr(args->handles),
>>>> count,
>>>> + stack_syncobjs,
>>>> + ARRAY_SIZE(stack_syncobjs),
>>>> &syncobjs);
>>>> if (ret < 0)
>>>> return ret;
>>>> @@ -1354,7 +1368,7 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
>>>> &first,
>>>> pdeadline);
>>>> - drm_syncobj_array_free(syncobjs, count);
>>>> + drm_syncobj_array_free(syncobjs, count, syncobjs != stack_syncobjs);
>>>> if (timeout < 0)
>>>> return timeout;
>>>> @@ -1368,6 +1382,7 @@ int
>>>> drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
>>>> struct drm_file *file_private)
>>>> {
>>>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>>>> struct drm_syncobj_timeline_wait *args = data;
>>>> ktime_t deadline, *pdeadline = NULL;
>>>> u32 count = args->count_handles;
>>>> @@ -1394,6 +1409,8 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
>>>> ret = drm_syncobj_array_find(file_private,
>>>> u64_to_user_ptr(args->handles),
>>>> count,
>>>> + stack_syncobjs,
>>>> + ARRAY_SIZE(stack_syncobjs),
>>>> &syncobjs);
>>>> if (ret < 0)
>>>> return ret;
>>>> @@ -1412,7 +1429,7 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
>>>> &first,
>>>> pdeadline);
>>>> - drm_syncobj_array_free(syncobjs, count);
>>>> + drm_syncobj_array_free(syncobjs, count, syncobjs != stack_syncobjs);
>>>> if (timeout < 0)
>>>> return timeout;
>>>> @@ -1529,6 +1546,7 @@ int
>>>> drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
>>>> struct drm_file *file_private)
>>>> {
>>>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>>>> struct drm_syncobj_array *args = data;
>>>> struct drm_syncobj **syncobjs;
>>>> uint32_t i;
>>>> @@ -1546,6 +1564,8 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
>>>> ret = drm_syncobj_array_find(file_private,
>>>> u64_to_user_ptr(args->handles),
>>>> args->count_handles,
>>>> + stack_syncobjs,
>>>> + ARRAY_SIZE(stack_syncobjs),
>>>> &syncobjs);
>>>> if (ret < 0)
>>>> return ret;
>>>> @@ -1553,7 +1573,8 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
>>>> for (i = 0; i < args->count_handles; i++)
>>>> drm_syncobj_replace_fence(syncobjs[i], NULL);
>>>> - drm_syncobj_array_free(syncobjs, args->count_handles);
>>>> + drm_syncobj_array_free(syncobjs, args->count_handles,
>>>> + syncobjs != stack_syncobjs);
>>>> return 0;
>>>> }
>>>> @@ -1562,6 +1583,7 @@ int
>>>> drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
>>>> struct drm_file *file_private)
>>>> {
>>>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>>>> struct drm_syncobj_array *args = data;
>>>> struct drm_syncobj **syncobjs;
>>>> uint32_t i;
>>>> @@ -1579,6 +1601,8 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
>>>> ret = drm_syncobj_array_find(file_private,
>>>> u64_to_user_ptr(args->handles),
>>>> args->count_handles,
>>>> + stack_syncobjs,
>>>> + ARRAY_SIZE(stack_syncobjs),
>>>> &syncobjs);
>>>> if (ret < 0)
>>>> return ret;
>>>> @@ -1589,7 +1613,8 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
>>>> break;
>>>> }
>>>> - drm_syncobj_array_free(syncobjs, args->count_handles);
>>>> + drm_syncobj_array_free(syncobjs, args->count_handles,
>>>> + syncobjs != stack_syncobjs);
>>>> return ret;
>>>> }
>>>> @@ -1598,6 +1623,7 @@ int
>>>> drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>>>> struct drm_file *file_private)
>>>> {
>>>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>>>> struct drm_syncobj_timeline_array *args = data;
>>>> uint64_t __user *points = u64_to_user_ptr(args->points);
>>>> uint32_t i, j, count = args->count_handles;
>>>> @@ -1617,6 +1643,8 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>>>> ret = drm_syncobj_array_find(file_private,
>>>> u64_to_user_ptr(args->handles),
>>>> count,
>>>> + stack_syncobjs,
>>>> + ARRAY_SIZE(stack_syncobjs),
>>>> &syncobjs);
>>>> if (ret < 0)
>>>> return ret;
>>>> @@ -1653,7 +1681,7 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>>>> err_chains:
>>>> kfree(chains);
>>>> out:
>>>> - drm_syncobj_array_free(syncobjs, count);
>>>> + drm_syncobj_array_free(syncobjs, count, syncobjs != stack_syncobjs);
>>>> return ret;
>>>> }
>>>> @@ -1661,6 +1689,7 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>>>> int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
>>>> struct drm_file *file_private)
>>>> {
>>>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>>>> struct drm_syncobj_timeline_array *args = data;
>>>> struct drm_syncobj **syncobjs;
>>>> uint64_t __user *points = u64_to_user_ptr(args->points);
>>>> @@ -1679,6 +1708,8 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
>>>> ret = drm_syncobj_array_find(file_private,
>>>> u64_to_user_ptr(args->handles),
>>>> args->count_handles,
>>>> + stack_syncobjs,
>>>> + ARRAY_SIZE(stack_syncobjs),
>>>> &syncobjs);
>>>> if (ret < 0)
>>>> return ret;
>>>> @@ -1722,7 +1753,8 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
>>>> if (ret)
>>>> break;
>>>> }
>>>> - drm_syncobj_array_free(syncobjs, args->count_handles);
>>>> + drm_syncobj_array_free(syncobjs, args->count_handles,
>>>> + syncobjs != stack_syncobjs);
>>>> return ret;
>>>> }
>>>
>>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 6/6] drm/syncobj: Add a fast path to drm_syncobj_array_find
2025-06-12 10:58 ` Tvrtko Ursulin
@ 2025-06-12 12:02 ` Christian König
0 siblings, 0 replies; 13+ messages in thread
From: Christian König @ 2025-06-12 12:02 UTC (permalink / raw)
To: Tvrtko Ursulin, dri-devel
Cc: kernel-dev, amd-gfx, intel-xe, Michel Dänzer,
Maíra Canal
On 6/12/25 12:58, Tvrtko Ursulin wrote:
>
> On 12/06/2025 08:21, Christian König wrote:
>> On 6/11/25 17:29, Tvrtko Ursulin wrote:
>>>
>>> On 11/06/2025 15:21, Christian König wrote:
>>>> On 6/11/25 16:00, Tvrtko Ursulin wrote:
>>>>> Running the Cyberpunk 2077 benchmark we can observe that the lookup helper
>>>>> is relatively hot, but the 97% of the calls are for a single object. (~3%
>>>>> for two points, and never more than three points. While a more trivial
>>>>> workload like vkmark under Plasma is even more skewed to single point
>>>>> lookups.)
>>>>>
>>>>> Therefore lets add a fast path to bypass the kmalloc_array/kfree and use a
>>>>> pre-allocated stack array for those cases.
>>>>
>>>> Have you considered using memdup_user()? That's using a separate bucket IIRC and might give similar performance.
>>>
>>> I haven't but I can try it. I would be surprised if it made a (positive) difference though.
>>
>> Yeah, it's mostly for extra security I think.
>
> On this topic, this discussion prompted me to quickly cook up some trivial cleanups for amdgpu to use memdup_user & co where it was easy. Series is on the mailing list but I did not copy you explicitly giving chance for someone else to notice it and off load you a bit.
Yeah, I know I always wanted to give that task to a student or interim :)
Alex is the one usually picking up amdgpu patches from the mailing list, but I'm happy to add an rb if necessary.
>>> And I realised I need to repeat the benchmarks anyway, since in v4 I had to stop doing access_ok+__get_user, after kernel test robot let me know 64-bit get_user is a not a thing on all platforms. I thought the gains are from avoiding allocations but, as you say, now I need to see if copy_from_user doesn't nullify them..
>>>
>>>> If that is still not sufficient I'm really wondering if we shouldn't have a macro for doing this. It's a really common use case as far as I can see.
>>>
>>> Hmm macro for what exactly?
>>
>> Like a macro which uses an array on the stack for small (<4) number of values and k(v)malloc() for large ones.
>>
>> IIRC there is also a relatively new functionality which allows releasing the memory automatically when we leave the function.
>
> Okay I will have a look at all those options. But it's going to the bottom of my priority pile so it might be a while.
I'm also perfectly fine with the solution you came up in those patches here for now if that improves the performance at hand.
Just wanted to point out that it is possible that somebody has an use case where X sync_obj handles are asked for timeline fences and that now becomes slower because of that here.
Regards,
Christian.
>
> Regards,
>
> Tvrtko
>
>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>>> Reviewed-by: Maíra Canal <mcanal@igalia.com>
>>>>> ---
>>>>> v2:
>>>>> * Added comments describing how the fast path arrays were sized.
>>>>> * Make container freeing criteria clearer by using a boolean.
>>>>> ---
>>>>> drivers/gpu/drm/drm_syncobj.c | 56 +++++++++++++++++++++++++++--------
>>>>> 1 file changed, 44 insertions(+), 12 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
>>>>> index be5905dca87f..65c301852f0d 100644
>>>>> --- a/drivers/gpu/drm/drm_syncobj.c
>>>>> +++ b/drivers/gpu/drm/drm_syncobj.c
>>>>> @@ -1259,6 +1259,8 @@ EXPORT_SYMBOL(drm_timeout_abs_to_jiffies);
>>>>> static int drm_syncobj_array_find(struct drm_file *file_private,
>>>>> u32 __user *handles,
>>>>> uint32_t count,
>>>>> + struct drm_syncobj **stack_syncobjs,
>>>>> + u32 stack_count,
>>>>> struct drm_syncobj ***syncobjs_out)
>>>>> {
>>>>> struct drm_syncobj **syncobjs;
>>>>> @@ -1268,9 +1270,13 @@ static int drm_syncobj_array_find(struct drm_file *file_private,
>>>>> if (!access_ok(handles, count * sizeof(*handles)))
>>>>> return -EFAULT;
>>>>> - syncobjs = kmalloc_array(count, sizeof(*syncobjs), GFP_KERNEL);
>>>>> - if (!syncobjs)
>>>>> - return -ENOMEM;
>>>>> + if (count > stack_count) {
>>>>> + syncobjs = kmalloc_array(count, sizeof(*syncobjs), GFP_KERNEL);
>>>>> + if (!syncobjs)
>>>>> + return -ENOMEM;
>>>>> + } else {
>>>>> + syncobjs = stack_syncobjs;
>>>>> + }
>>>>> for (i = 0; i < count; i++) {
>>>>> u32 handle;
>>>>> @@ -1292,25 +1298,31 @@ static int drm_syncobj_array_find(struct drm_file *file_private,
>>>>> err_put_syncobjs:
>>>>> while (i-- > 0)
>>>>> drm_syncobj_put(syncobjs[i]);
>>>>> - kfree(syncobjs);
>>>>> +
>>>>> + if (syncobjs != stack_syncobjs)
>>>>> + kfree(syncobjs);
>>>>> return ret;
>>>>> }
>>>>> static void drm_syncobj_array_free(struct drm_syncobj **syncobjs,
>>>>> - uint32_t count)
>>>>> + uint32_t count,
>>>>> + bool free_container)
>>>>> {
>>>>> uint32_t i;
>>>>> for (i = 0; i < count; i++)
>>>>> drm_syncobj_put(syncobjs[i]);
>>>>> - kfree(syncobjs);
>>>>> +
>>>>> + if (free_container)
>>>>> + kfree(syncobjs);
>>>>> }
>>>>> int
>>>>> drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
>>>>> struct drm_file *file_private)
>>>>> {
>>>>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>>>>> struct drm_syncobj_wait *args = data;
>>>>> ktime_t deadline, *pdeadline = NULL;
>>>>> u32 count = args->count_handles;
>>>>> @@ -1336,6 +1348,8 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
>>>>> ret = drm_syncobj_array_find(file_private,
>>>>> u64_to_user_ptr(args->handles),
>>>>> count,
>>>>> + stack_syncobjs,
>>>>> + ARRAY_SIZE(stack_syncobjs),
>>>>> &syncobjs);
>>>>> if (ret < 0)
>>>>> return ret;
>>>>> @@ -1354,7 +1368,7 @@ drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
>>>>> &first,
>>>>> pdeadline);
>>>>> - drm_syncobj_array_free(syncobjs, count);
>>>>> + drm_syncobj_array_free(syncobjs, count, syncobjs != stack_syncobjs);
>>>>> if (timeout < 0)
>>>>> return timeout;
>>>>> @@ -1368,6 +1382,7 @@ int
>>>>> drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
>>>>> struct drm_file *file_private)
>>>>> {
>>>>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>>>>> struct drm_syncobj_timeline_wait *args = data;
>>>>> ktime_t deadline, *pdeadline = NULL;
>>>>> u32 count = args->count_handles;
>>>>> @@ -1394,6 +1409,8 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
>>>>> ret = drm_syncobj_array_find(file_private,
>>>>> u64_to_user_ptr(args->handles),
>>>>> count,
>>>>> + stack_syncobjs,
>>>>> + ARRAY_SIZE(stack_syncobjs),
>>>>> &syncobjs);
>>>>> if (ret < 0)
>>>>> return ret;
>>>>> @@ -1412,7 +1429,7 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
>>>>> &first,
>>>>> pdeadline);
>>>>> - drm_syncobj_array_free(syncobjs, count);
>>>>> + drm_syncobj_array_free(syncobjs, count, syncobjs != stack_syncobjs);
>>>>> if (timeout < 0)
>>>>> return timeout;
>>>>> @@ -1529,6 +1546,7 @@ int
>>>>> drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
>>>>> struct drm_file *file_private)
>>>>> {
>>>>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>>>>> struct drm_syncobj_array *args = data;
>>>>> struct drm_syncobj **syncobjs;
>>>>> uint32_t i;
>>>>> @@ -1546,6 +1564,8 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
>>>>> ret = drm_syncobj_array_find(file_private,
>>>>> u64_to_user_ptr(args->handles),
>>>>> args->count_handles,
>>>>> + stack_syncobjs,
>>>>> + ARRAY_SIZE(stack_syncobjs),
>>>>> &syncobjs);
>>>>> if (ret < 0)
>>>>> return ret;
>>>>> @@ -1553,7 +1573,8 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
>>>>> for (i = 0; i < args->count_handles; i++)
>>>>> drm_syncobj_replace_fence(syncobjs[i], NULL);
>>>>> - drm_syncobj_array_free(syncobjs, args->count_handles);
>>>>> + drm_syncobj_array_free(syncobjs, args->count_handles,
>>>>> + syncobjs != stack_syncobjs);
>>>>> return 0;
>>>>> }
>>>>> @@ -1562,6 +1583,7 @@ int
>>>>> drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
>>>>> struct drm_file *file_private)
>>>>> {
>>>>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>>>>> struct drm_syncobj_array *args = data;
>>>>> struct drm_syncobj **syncobjs;
>>>>> uint32_t i;
>>>>> @@ -1579,6 +1601,8 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
>>>>> ret = drm_syncobj_array_find(file_private,
>>>>> u64_to_user_ptr(args->handles),
>>>>> args->count_handles,
>>>>> + stack_syncobjs,
>>>>> + ARRAY_SIZE(stack_syncobjs),
>>>>> &syncobjs);
>>>>> if (ret < 0)
>>>>> return ret;
>>>>> @@ -1589,7 +1613,8 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
>>>>> break;
>>>>> }
>>>>> - drm_syncobj_array_free(syncobjs, args->count_handles);
>>>>> + drm_syncobj_array_free(syncobjs, args->count_handles,
>>>>> + syncobjs != stack_syncobjs);
>>>>> return ret;
>>>>> }
>>>>> @@ -1598,6 +1623,7 @@ int
>>>>> drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>>>>> struct drm_file *file_private)
>>>>> {
>>>>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>>>>> struct drm_syncobj_timeline_array *args = data;
>>>>> uint64_t __user *points = u64_to_user_ptr(args->points);
>>>>> uint32_t i, j, count = args->count_handles;
>>>>> @@ -1617,6 +1643,8 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>>>>> ret = drm_syncobj_array_find(file_private,
>>>>> u64_to_user_ptr(args->handles),
>>>>> count,
>>>>> + stack_syncobjs,
>>>>> + ARRAY_SIZE(stack_syncobjs),
>>>>> &syncobjs);
>>>>> if (ret < 0)
>>>>> return ret;
>>>>> @@ -1653,7 +1681,7 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>>>>> err_chains:
>>>>> kfree(chains);
>>>>> out:
>>>>> - drm_syncobj_array_free(syncobjs, count);
>>>>> + drm_syncobj_array_free(syncobjs, count, syncobjs != stack_syncobjs);
>>>>> return ret;
>>>>> }
>>>>> @@ -1661,6 +1689,7 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>>>>> int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
>>>>> struct drm_file *file_private)
>>>>> {
>>>>> + struct drm_syncobj *stack_syncobjs[DRM_SYNCOBJ_FAST_PATH_ENTRIES];
>>>>> struct drm_syncobj_timeline_array *args = data;
>>>>> struct drm_syncobj **syncobjs;
>>>>> uint64_t __user *points = u64_to_user_ptr(args->points);
>>>>> @@ -1679,6 +1708,8 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
>>>>> ret = drm_syncobj_array_find(file_private,
>>>>> u64_to_user_ptr(args->handles),
>>>>> args->count_handles,
>>>>> + stack_syncobjs,
>>>>> + ARRAY_SIZE(stack_syncobjs),
>>>>> &syncobjs);
>>>>> if (ret < 0)
>>>>> return ret;
>>>>> @@ -1722,7 +1753,8 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
>>>>> if (ret)
>>>>> break;
>>>>> }
>>>>> - drm_syncobj_array_free(syncobjs, args->count_handles);
>>>>> + drm_syncobj_array_free(syncobjs, args->count_handles,
>>>>> + syncobjs != stack_syncobjs);
>>>>> return ret;
>>>>> }
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-06-12 12:02 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-11 14:00 [PATCH v5 0/6] A few drm_syncobj optimisations Tvrtko Ursulin
2025-06-11 14:00 ` [PATCH v5 1/6] drm/syncobj: Remove unhelpful helper Tvrtko Ursulin
2025-06-11 14:00 ` [PATCH v5 2/6] drm/syncobj: Do not allocate an array to store zeros when waiting Tvrtko Ursulin
2025-06-11 14:00 ` [PATCH v5 3/6] drm/syncobj: Avoid one temporary allocation in drm_syncobj_array_find Tvrtko Ursulin
2025-06-11 14:00 ` [PATCH v5 4/6] drm/syncobj: Avoid temporary allocation in drm_syncobj_timeline_signal_ioctl Tvrtko Ursulin
2025-06-11 14:15 ` Christian König
2025-06-11 14:00 ` [PATCH v5 5/6] drm/syncobj: Add a fast path to drm_syncobj_array_wait_timeout Tvrtko Ursulin
2025-06-11 14:00 ` [PATCH v5 6/6] drm/syncobj: Add a fast path to drm_syncobj_array_find Tvrtko Ursulin
2025-06-11 14:21 ` Christian König
2025-06-11 15:29 ` Tvrtko Ursulin
2025-06-12 7:21 ` Christian König
2025-06-12 10:58 ` Tvrtko Ursulin
2025-06-12 12:02 ` Christian König
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).