* [PATCH 0/2] drm/xe: adding wait helper for gt async reset
@ 2024-12-09 14:12 Maciej Patelczyk
2024-12-09 14:12 ` [PATCH 1/2] drm/xe: introduce xe_gt_reset_wait to wait for async gt reset Maciej Patelczyk
` (4 more replies)
0 siblings, 5 replies; 13+ messages in thread
From: Maciej Patelczyk @ 2024-12-09 14:12 UTC (permalink / raw)
To: intel-xe; +Cc: niranjana.vishwanathapura, Maciej Patelczyk
GT reset is asynchronous. However, in few places there is an explicit
wait for reset to finish via flush_work. Providing a helper which
clearly express the intention of waiting for reset to finish.
The usage of asynchronous reset in sysfs gt0/tile0/ccs_mode creates
a situation in which a user after changing ccs_mode with immediate
query can receive only partial configuration due to ongoing reset.
For instance a single compute engine where 4 were requested.
This sometimes happens in the EU Debugger tests leading to skips or
even failures (sysfs not accessible).
This forces usage of sleep() in tests to wait for reset to complete.
The first patch provides a helper xe_gt_reset_wait and makes usage
of it.
The second patch makes ccs_mode sysfs write wait for reset to
finish.
Maciej Patelczyk (2):
drm/xe: introduce xe_gt_reset_wait to wait for async gt reset
drm/xe: make change ccs_mode a synchronous action
drivers/gpu/drm/xe/tests/xe_bo.c | 2 +-
drivers/gpu/drm/xe/tests/xe_mocs.c | 2 +-
drivers/gpu/drm/xe/xe_gt.h | 12 ++++++++++++
drivers/gpu/drm/xe/xe_gt_ccs_mode.c | 6 +++++-
drivers/gpu/drm/xe/xe_gt_debugfs.c | 2 +-
5 files changed, 20 insertions(+), 4 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/2] drm/xe: introduce xe_gt_reset_wait to wait for async gt reset
2024-12-09 14:12 [PATCH 0/2] drm/xe: adding wait helper for gt async reset Maciej Patelczyk
@ 2024-12-09 14:12 ` Maciej Patelczyk
2024-12-09 14:42 ` Raag Jadav
2024-12-09 14:12 ` [PATCH 2/2] drm/xe: make change ccs_mode a synchronous action Maciej Patelczyk
` (3 subsequent siblings)
4 siblings, 1 reply; 13+ messages in thread
From: Maciej Patelczyk @ 2024-12-09 14:12 UTC (permalink / raw)
To: intel-xe; +Cc: niranjana.vishwanathapura, Maciej Patelczyk
The reset of the GT is asynchronous.
Adding a wait helper to wait until gt reset is done.
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
---
drivers/gpu/drm/xe/tests/xe_bo.c | 2 +-
drivers/gpu/drm/xe/tests/xe_mocs.c | 2 +-
drivers/gpu/drm/xe/xe_gt.h | 12 ++++++++++++
drivers/gpu/drm/xe/xe_gt_debugfs.c | 2 +-
4 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
index c9ec7a313c6b..b51434fc5bd1 100644
--- a/drivers/gpu/drm/xe/tests/xe_bo.c
+++ b/drivers/gpu/drm/xe/tests/xe_bo.c
@@ -266,7 +266,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
*/
for_each_gt(__gt, xe, id) {
xe_gt_reset_async(__gt);
- flush_work(&__gt->reset.worker);
+ xe_gt_reset_wait(__gt);
}
if (err) {
KUNIT_FAIL(test, "restore kernel err=%pe\n",
diff --git a/drivers/gpu/drm/xe/tests/xe_mocs.c b/drivers/gpu/drm/xe/tests/xe_mocs.c
index 6f9b7a266b41..7bfba172ce2a 100644
--- a/drivers/gpu/drm/xe/tests/xe_mocs.c
+++ b/drivers/gpu/drm/xe/tests/xe_mocs.c
@@ -163,7 +163,7 @@ static int mocs_reset_test_run_device(struct xe_device *xe)
read_l3cc_table(gt, &mocs.table);
xe_gt_reset_async(gt);
- flush_work(>->reset.worker);
+ xe_gt_reset_wait(gt);
kunit_info(test, "mocs_reset_test after reset\n");
if (flags & HAS_GLOBAL_MOCS)
diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
index 82b9b7f82fca..3e75f623c346 100644
--- a/drivers/gpu/drm/xe/xe_gt.h
+++ b/drivers/gpu/drm/xe/xe_gt.h
@@ -56,6 +56,18 @@ void xe_gt_sanitize(struct xe_gt *gt);
int xe_gt_sanitize_freq(struct xe_gt *gt);
void xe_gt_remove(struct xe_gt *gt);
+/**
+ * xe_gt_reset_wait - wait for gt's async reset to finalize.
+ * @gt: GT structure
+ * Return:
+ * %true if xe_gt_reset_wait() waited for the work to finish execution,
+ * %false if there was no scheduled reset or it was done.
+ */
+static inline bool xe_gt_reset_wait(struct xe_gt *gt)
+{
+ return flush_work(>->reset.worker);
+}
+
/**
* xe_gt_any_hw_engine_by_reset_domain - scan the list of engines and return the
* first that matches the same reset domain as @class
diff --git a/drivers/gpu/drm/xe/xe_gt_debugfs.c b/drivers/gpu/drm/xe/xe_gt_debugfs.c
index 3e8c351a0eab..79f61532fd00 100644
--- a/drivers/gpu/drm/xe/xe_gt_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_gt_debugfs.c
@@ -135,7 +135,7 @@ static int force_reset_sync(struct xe_gt *gt, struct drm_printer *p)
xe_gt_reset_async(gt);
xe_pm_runtime_put(gt_to_xe(gt));
- flush_work(>->reset.worker);
+ xe_gt_reset_wait(gt);
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/2] drm/xe: make change ccs_mode a synchronous action
2024-12-09 14:12 [PATCH 0/2] drm/xe: adding wait helper for gt async reset Maciej Patelczyk
2024-12-09 14:12 ` [PATCH 1/2] drm/xe: introduce xe_gt_reset_wait to wait for async gt reset Maciej Patelczyk
@ 2024-12-09 14:12 ` Maciej Patelczyk
2024-12-09 16:26 ` Lucas De Marchi
2024-12-09 14:50 ` ✓ CI.Patch_applied: success for drm/xe: adding wait helper for gt async reset Patchwork
` (2 subsequent siblings)
4 siblings, 1 reply; 13+ messages in thread
From: Maciej Patelczyk @ 2024-12-09 14:12 UTC (permalink / raw)
To: intel-xe; +Cc: niranjana.vishwanathapura, Maciej Patelczyk
If ccs_mode is being modified via
/sys/class/drm/cardX/device/tileY/gtY/ccs_mode
the asynchronous reset is triggered and the write returns immediately.
With that some test receive false information about number of CCS engines
or even fail if they proceed without delay after changing the ccs_mode.
Changing the ccs_mode change from async to sync to prevent failures in
tests.
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
---
drivers/gpu/drm/xe/xe_gt_ccs_mode.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_gt_ccs_mode.c b/drivers/gpu/drm/xe/xe_gt_ccs_mode.c
index b6adfb9f2030..8d02ec3f7d12 100644
--- a/drivers/gpu/drm/xe/xe_gt_ccs_mode.c
+++ b/drivers/gpu/drm/xe/xe_gt_ccs_mode.c
@@ -115,7 +115,7 @@ ccs_mode_store(struct device *kdev, struct device_attribute *attr,
struct xe_gt *gt = kobj_to_gt(&kdev->kobj);
struct xe_device *xe = gt_to_xe(gt);
u32 num_engines, num_slices;
- int ret;
+ int mode_changed = 0, ret;
if (IS_SRIOV(xe)) {
xe_gt_dbg(gt, "Can't change compute mode when running as %s\n",
@@ -151,10 +151,14 @@ ccs_mode_store(struct device *kdev, struct device_attribute *attr,
gt->ccs_mode = num_engines;
xe_gt_record_user_engines(gt);
xe_gt_reset_async(gt);
+ mode_changed = 1;
}
mutex_unlock(&xe->drm.filelist_mutex);
+ if (mode_changed)
+ xe_gt_reset_wait(gt);
+
return count;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/xe: introduce xe_gt_reset_wait to wait for async gt reset
2024-12-09 14:12 ` [PATCH 1/2] drm/xe: introduce xe_gt_reset_wait to wait for async gt reset Maciej Patelczyk
@ 2024-12-09 14:42 ` Raag Jadav
2024-12-09 14:54 ` Raag Jadav
0 siblings, 1 reply; 13+ messages in thread
From: Raag Jadav @ 2024-12-09 14:42 UTC (permalink / raw)
To: Maciej Patelczyk; +Cc: intel-xe, niranjana.vishwanathapura
On Mon, Dec 09, 2024 at 03:12:09PM +0100, Maciej Patelczyk wrote:
> The reset of the GT is asynchronous.
> Adding a wait helper to wait until gt reset is done.
>
> Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
> ---
> drivers/gpu/drm/xe/tests/xe_bo.c | 2 +-
> drivers/gpu/drm/xe/tests/xe_mocs.c | 2 +-
> drivers/gpu/drm/xe/xe_gt.h | 12 ++++++++++++
> drivers/gpu/drm/xe/xe_gt_debugfs.c | 2 +-
> 4 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
> index c9ec7a313c6b..b51434fc5bd1 100644
> --- a/drivers/gpu/drm/xe/tests/xe_bo.c
> +++ b/drivers/gpu/drm/xe/tests/xe_bo.c
> @@ -266,7 +266,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
> */
> for_each_gt(__gt, xe, id) {
> xe_gt_reset_async(__gt);
> - flush_work(&__gt->reset.worker);
> + xe_gt_reset_wait(__gt);
Why not just create a
void xe_gt_reset_sync(gt)
{
xe_gt_reset_async()
flush_work()
}
and use it instead?
Raag
^ permalink raw reply [flat|nested] 13+ messages in thread
* ✓ CI.Patch_applied: success for drm/xe: adding wait helper for gt async reset
2024-12-09 14:12 [PATCH 0/2] drm/xe: adding wait helper for gt async reset Maciej Patelczyk
2024-12-09 14:12 ` [PATCH 1/2] drm/xe: introduce xe_gt_reset_wait to wait for async gt reset Maciej Patelczyk
2024-12-09 14:12 ` [PATCH 2/2] drm/xe: make change ccs_mode a synchronous action Maciej Patelczyk
@ 2024-12-09 14:50 ` Patchwork
2024-12-09 14:50 ` ✓ CI.checkpatch: " Patchwork
2024-12-09 14:51 ` ✗ CI.KUnit: failure " Patchwork
4 siblings, 0 replies; 13+ messages in thread
From: Patchwork @ 2024-12-09 14:50 UTC (permalink / raw)
To: Maciej Patelczyk; +Cc: intel-xe
== Series Details ==
Series: drm/xe: adding wait helper for gt async reset
URL : https://patchwork.freedesktop.org/series/142300/
State : success
== Summary ==
=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: c266f2fbb9c0 drm-tip: 2024y-12m-09d-14h-13m-53s UTC integration manifest
=== git am output follows ===
Applying: drm/xe: introduce xe_gt_reset_wait to wait for async gt reset
Applying: drm/xe: make change ccs_mode a synchronous action
^ permalink raw reply [flat|nested] 13+ messages in thread
* ✓ CI.checkpatch: success for drm/xe: adding wait helper for gt async reset
2024-12-09 14:12 [PATCH 0/2] drm/xe: adding wait helper for gt async reset Maciej Patelczyk
` (2 preceding siblings ...)
2024-12-09 14:50 ` ✓ CI.Patch_applied: success for drm/xe: adding wait helper for gt async reset Patchwork
@ 2024-12-09 14:50 ` Patchwork
2024-12-09 14:51 ` ✗ CI.KUnit: failure " Patchwork
4 siblings, 0 replies; 13+ messages in thread
From: Patchwork @ 2024-12-09 14:50 UTC (permalink / raw)
To: Maciej Patelczyk; +Cc: intel-xe
== Series Details ==
Series: drm/xe: adding wait helper for gt async reset
URL : https://patchwork.freedesktop.org/series/142300/
State : success
== Summary ==
+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
30ab6715fc09baee6cc14cb3c89ad8858688d474
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit 46980c204d2ac5b1a0c94ae7c7c97bcfaf87d58a
Author: Maciej Patelczyk <maciej.patelczyk@intel.com>
Date: Mon Dec 9 15:12:10 2024 +0100
drm/xe: make change ccs_mode a synchronous action
If ccs_mode is being modified via
/sys/class/drm/cardX/device/tileY/gtY/ccs_mode
the asynchronous reset is triggered and the write returns immediately.
With that some test receive false information about number of CCS engines
or even fail if they proceed without delay after changing the ccs_mode.
Changing the ccs_mode change from async to sync to prevent failures in
tests.
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
+ /mt/dim checkpatch c266f2fbb9c0250af17b14788364766adf802a4a drm-intel
9b93f46b4015 drm/xe: introduce xe_gt_reset_wait to wait for async gt reset
46980c204d2a drm/xe: make change ccs_mode a synchronous action
^ permalink raw reply [flat|nested] 13+ messages in thread
* ✗ CI.KUnit: failure for drm/xe: adding wait helper for gt async reset
2024-12-09 14:12 [PATCH 0/2] drm/xe: adding wait helper for gt async reset Maciej Patelczyk
` (3 preceding siblings ...)
2024-12-09 14:50 ` ✓ CI.checkpatch: " Patchwork
@ 2024-12-09 14:51 ` Patchwork
4 siblings, 0 replies; 13+ messages in thread
From: Patchwork @ 2024-12-09 14:51 UTC (permalink / raw)
To: Maciej Patelczyk; +Cc: intel-xe
== Series Details ==
Series: drm/xe: adding wait helper for gt async reset
URL : https://patchwork.freedesktop.org/series/142300/
State : failure
== Summary ==
+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
ERROR:root:../lib/iomap.c:156:5: warning: no previous prototype for ‘ioread64_lo_hi’ [-Wmissing-prototypes]
156 | u64 ioread64_lo_hi(const void __iomem *addr)
| ^~~~~~~~~~~~~~
../lib/iomap.c:163:5: warning: no previous prototype for ‘ioread64_hi_lo’ [-Wmissing-prototypes]
163 | u64 ioread64_hi_lo(const void __iomem *addr)
| ^~~~~~~~~~~~~~
../lib/iomap.c:170:5: warning: no previous prototype for ‘ioread64be_lo_hi’ [-Wmissing-prototypes]
170 | u64 ioread64be_lo_hi(const void __iomem *addr)
| ^~~~~~~~~~~~~~~~
../lib/iomap.c:178:5: warning: no previous prototype for ‘ioread64be_hi_lo’ [-Wmissing-prototypes]
178 | u64 ioread64be_hi_lo(const void __iomem *addr)
| ^~~~~~~~~~~~~~~~
../lib/iomap.c:264:6: warning: no previous prototype for ‘iowrite64_lo_hi’ [-Wmissing-prototypes]
264 | void iowrite64_lo_hi(u64 val, void __iomem *addr)
| ^~~~~~~~~~~~~~~
../lib/iomap.c:272:6: warning: no previous prototype for ‘iowrite64_hi_lo’ [-Wmissing-prototypes]
272 | void iowrite64_hi_lo(u64 val, void __iomem *addr)
| ^~~~~~~~~~~~~~~
../lib/iomap.c:280:6: warning: no previous prototype for ‘iowrite64be_lo_hi’ [-Wmissing-prototypes]
280 | void iowrite64be_lo_hi(u64 val, void __iomem *addr)
| ^~~~~~~~~~~~~~~~~
../lib/iomap.c:288:6: warning: no previous prototype for ‘iowrite64be_hi_lo’ [-Wmissing-prototypes]
288 | void iowrite64be_hi_lo(u64 val, void __iomem *addr)
| ^~~~~~~~~~~~~~~~~
In file included from ../include/linux/module.h:22,
from ../include/linux/device/driver.h:21,
from ../include/linux/device.h:32,
from ../include/linux/auxiliary_bus.h:11,
from ../include/linux/intel_vsec.h:5,
from ../drivers/gpu/drm/xe/xe_vsec.c:7:
../drivers/gpu/drm/xe/xe_vsec.c:233:18: error: expected ‘,’ or ‘;’ before ‘INTEL_VSEC’
233 | MODULE_IMPORT_NS(INTEL_VSEC);
| ^~~~~~~~~~
../include/linux/moduleparam.h:26:61: note: in definition of macro ‘__MODULE_INFO’
26 | = __MODULE_INFO_PREFIX __stringify(tag) "=" info
| ^~~~
../include/linux/module.h:299:33: note: in expansion of macro ‘MODULE_INFO’
299 | #define MODULE_IMPORT_NS(ns) MODULE_INFO(import_ns, ns)
| ^~~~~~~~~~~
../drivers/gpu/drm/xe/xe_vsec.c:233:1: note: in expansion of macro ‘MODULE_IMPORT_NS’
233 | MODULE_IMPORT_NS(INTEL_VSEC);
| ^~~~~~~~~~~~~~~~
make[7]: *** [../scripts/Makefile.build:194: drivers/gpu/drm/xe/xe_vsec.o] Error 1
make[7]: *** Waiting for unfinished jobs....
make[6]: *** [../scripts/Makefile.build:440: drivers/gpu/drm/xe] Error 2
make[5]: *** [../scripts/Makefile.build:440: drivers/gpu/drm] Error 2
make[4]: *** [../scripts/Makefile.build:440: drivers/gpu] Error 2
make[3]: *** [../scripts/Makefile.build:440: drivers] Error 2
make[2]: *** [/kernel/Makefile:1989: .] Error 2
make[1]: *** [/kernel/Makefile:251: __sub-make] Error 2
make: *** [Makefile:251: __sub-make] Error 2
[14:50:30] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[14:50:34] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json ARCH=um O=.kunit --jobs=48
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/xe: introduce xe_gt_reset_wait to wait for async gt reset
2024-12-09 14:42 ` Raag Jadav
@ 2024-12-09 14:54 ` Raag Jadav
2024-12-09 16:22 ` Lucas De Marchi
0 siblings, 1 reply; 13+ messages in thread
From: Raag Jadav @ 2024-12-09 14:54 UTC (permalink / raw)
To: Maciej Patelczyk; +Cc: intel-xe, niranjana.vishwanathapura
On Mon, Dec 09, 2024 at 04:42:49PM +0200, Raag Jadav wrote:
> On Mon, Dec 09, 2024 at 03:12:09PM +0100, Maciej Patelczyk wrote:
> > The reset of the GT is asynchronous.
> > Adding a wait helper to wait until gt reset is done.
> >
> > Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
> > ---
> > drivers/gpu/drm/xe/tests/xe_bo.c | 2 +-
> > drivers/gpu/drm/xe/tests/xe_mocs.c | 2 +-
> > drivers/gpu/drm/xe/xe_gt.h | 12 ++++++++++++
> > drivers/gpu/drm/xe/xe_gt_debugfs.c | 2 +-
> > 4 files changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
> > index c9ec7a313c6b..b51434fc5bd1 100644
> > --- a/drivers/gpu/drm/xe/tests/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/tests/xe_bo.c
> > @@ -266,7 +266,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
> > */
> > for_each_gt(__gt, xe, id) {
> > xe_gt_reset_async(__gt);
> > - flush_work(&__gt->reset.worker);
> > + xe_gt_reset_wait(__gt);
>
> Why not just create a
>
> void xe_gt_reset_sync(gt)
> {
> xe_gt_reset_async()
> flush_work()
> }
>
> and use it instead?
Or perhaps reuse force_reset_sync()?
Raag
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/xe: introduce xe_gt_reset_wait to wait for async gt reset
2024-12-09 14:54 ` Raag Jadav
@ 2024-12-09 16:22 ` Lucas De Marchi
2024-12-09 19:39 ` Maciej Patelczyk
0 siblings, 1 reply; 13+ messages in thread
From: Lucas De Marchi @ 2024-12-09 16:22 UTC (permalink / raw)
To: Raag Jadav
Cc: Maciej Patelczyk, intel-xe, niranjana.vishwanathapura,
Thomas Hellstrom
On Mon, Dec 09, 2024 at 04:54:45PM +0200, Raag Jadav wrote:
>On Mon, Dec 09, 2024 at 04:42:49PM +0200, Raag Jadav wrote:
>> On Mon, Dec 09, 2024 at 03:12:09PM +0100, Maciej Patelczyk wrote:
>> > The reset of the GT is asynchronous.
>> > Adding a wait helper to wait until gt reset is done.
>> >
>> > Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
>> > ---
>> > drivers/gpu/drm/xe/tests/xe_bo.c | 2 +-
>> > drivers/gpu/drm/xe/tests/xe_mocs.c | 2 +-
>> > drivers/gpu/drm/xe/xe_gt.h | 12 ++++++++++++
>> > drivers/gpu/drm/xe/xe_gt_debugfs.c | 2 +-
>> > 4 files changed, 15 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c
>> > index c9ec7a313c6b..b51434fc5bd1 100644
>> > --- a/drivers/gpu/drm/xe/tests/xe_bo.c
>> > +++ b/drivers/gpu/drm/xe/tests/xe_bo.c
>> > @@ -266,7 +266,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
>> > */
>> > for_each_gt(__gt, xe, id) {
>> > xe_gt_reset_async(__gt);
>> > - flush_work(&__gt->reset.worker);
>> > + xe_gt_reset_wait(__gt);
>>
>> Why not just create a
>>
>> void xe_gt_reset_sync(gt)
>> {
>> xe_gt_reset_async()
>> flush_work()
>> }
maybe, the problem I think is the 1 letter difference between
xe_gt_reset_async
xe_gt_reset_sync
that may be confusing/error-prone.
I like the helper, just think we need the naming to be polished.
+Thomas
maybe having this as xe_gt_reset() / xe_gt_reset_async()?
Lucas De Marchi
>>
>> and use it instead?
>
>Or perhaps reuse force_reset_sync()?
>
>Raag
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] drm/xe: make change ccs_mode a synchronous action
2024-12-09 14:12 ` [PATCH 2/2] drm/xe: make change ccs_mode a synchronous action Maciej Patelczyk
@ 2024-12-09 16:26 ` Lucas De Marchi
2024-12-09 19:31 ` Maciej Patelczyk
0 siblings, 1 reply; 13+ messages in thread
From: Lucas De Marchi @ 2024-12-09 16:26 UTC (permalink / raw)
To: Maciej Patelczyk; +Cc: intel-xe, niranjana.vishwanathapura
On Mon, Dec 09, 2024 at 03:12:10PM +0100, Maciej Patelczyk wrote:
>If ccs_mode is being modified via
> /sys/class/drm/cardX/device/tileY/gtY/ccs_mode
>the asynchronous reset is triggered and the write returns immediately.
>
>With that some test receive false information about number of CCS engines
>or even fail if they proceed without delay after changing the ccs_mode.
>
>Changing the ccs_mode change from async to sync to prevent failures in
>tests.
>
>Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
>---
> drivers/gpu/drm/xe/xe_gt_ccs_mode.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/xe/xe_gt_ccs_mode.c b/drivers/gpu/drm/xe/xe_gt_ccs_mode.c
>index b6adfb9f2030..8d02ec3f7d12 100644
>--- a/drivers/gpu/drm/xe/xe_gt_ccs_mode.c
>+++ b/drivers/gpu/drm/xe/xe_gt_ccs_mode.c
>@@ -115,7 +115,7 @@ ccs_mode_store(struct device *kdev, struct device_attribute *attr,
> struct xe_gt *gt = kobj_to_gt(&kdev->kobj);
> struct xe_device *xe = gt_to_xe(gt);
> u32 num_engines, num_slices;
>- int ret;
>+ int mode_changed = 0, ret;
>
> if (IS_SRIOV(xe)) {
> xe_gt_dbg(gt, "Can't change compute mode when running as %s\n",
>@@ -151,10 +151,14 @@ ccs_mode_store(struct device *kdev, struct device_attribute *attr,
> gt->ccs_mode = num_engines;
> xe_gt_record_user_engines(gt);
> xe_gt_reset_async(gt);
>+ mode_changed = 1;
I think this can simply be the sync version. We don't want to allow new
clients being created while this reset is taking place, which is what
the mutex below is protecting against.
Lucas De Marchi
> }
>
> mutex_unlock(&xe->drm.filelist_mutex);
>
>+ if (mode_changed)
>+ xe_gt_reset_wait(gt);
>+
> return count;
> }
>
>--
>2.43.0
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] drm/xe: make change ccs_mode a synchronous action
2024-12-09 16:26 ` Lucas De Marchi
@ 2024-12-09 19:31 ` Maciej Patelczyk
0 siblings, 0 replies; 13+ messages in thread
From: Maciej Patelczyk @ 2024-12-09 19:31 UTC (permalink / raw)
To: Lucas De Marchi; +Cc: intel-xe, niranjana.vishwanathapura
On 9.12.2024 17:26, Lucas De Marchi wrote:
> On Mon, Dec 09, 2024 at 03:12:10PM +0100, Maciej Patelczyk wrote:
>> If ccs_mode is being modified via
>> /sys/class/drm/cardX/device/tileY/gtY/ccs_mode
>> the asynchronous reset is triggered and the write returns immediately.
>>
>> With that some test receive false information about number of CCS
>> engines
>> or even fail if they proceed without delay after changing the ccs_mode.
>>
>> Changing the ccs_mode change from async to sync to prevent failures in
>> tests.
>>
>> Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_gt_ccs_mode.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_gt_ccs_mode.c
>> b/drivers/gpu/drm/xe/xe_gt_ccs_mode.c
>> index b6adfb9f2030..8d02ec3f7d12 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_ccs_mode.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_ccs_mode.c
>> @@ -115,7 +115,7 @@ ccs_mode_store(struct device *kdev, struct
>> device_attribute *attr,
>> struct xe_gt *gt = kobj_to_gt(&kdev->kobj);
>> struct xe_device *xe = gt_to_xe(gt);
>> u32 num_engines, num_slices;
>> - int ret;
>> + int mode_changed = 0, ret;
>>
>> if (IS_SRIOV(xe)) {
>> xe_gt_dbg(gt, "Can't change compute mode when running as %s\n",
>> @@ -151,10 +151,14 @@ ccs_mode_store(struct device *kdev, struct
>> device_attribute *attr,
>> gt->ccs_mode = num_engines;
>> xe_gt_record_user_engines(gt);
>> xe_gt_reset_async(gt);
>> + mode_changed = 1;
>
> I think this can simply be the sync version. We don't want to allow new
> clients being created while this reset is taking place, which is what
> the mutex below is protecting against.
> Lucas De Marchi
Hi!
Does it mean that the xe_gt_reset_wait() shall be in the critical section?
The wait can be unconditional as flush_work won't block if there is no work,
but I haven't tested that.
Maciej
>> }
>>
>> mutex_unlock(&xe->drm.filelist_mutex);
>>
>> + if (mode_changed)
>> + xe_gt_reset_wait(gt);
>> +
>> return count;
>> }
>>
>> --
>> 2.43.0
>>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/xe: introduce xe_gt_reset_wait to wait for async gt reset
2024-12-09 16:22 ` Lucas De Marchi
@ 2024-12-09 19:39 ` Maciej Patelczyk
2024-12-09 20:05 ` Lucas De Marchi
0 siblings, 1 reply; 13+ messages in thread
From: Maciej Patelczyk @ 2024-12-09 19:39 UTC (permalink / raw)
To: Lucas De Marchi, Raag Jadav
Cc: intel-xe, niranjana.vishwanathapura, Thomas Hellstrom
On 9.12.2024 17:22, Lucas De Marchi wrote:
> On Mon, Dec 09, 2024 at 04:54:45PM +0200, Raag Jadav wrote:
>> On Mon, Dec 09, 2024 at 04:42:49PM +0200, Raag Jadav wrote:
>>> On Mon, Dec 09, 2024 at 03:12:09PM +0100, Maciej Patelczyk wrote:
>>> > The reset of the GT is asynchronous.
>>> > Adding a wait helper to wait until gt reset is done.
>>> >
>>> > Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
>>> > ---
>>> > drivers/gpu/drm/xe/tests/xe_bo.c | 2 +-
>>> > drivers/gpu/drm/xe/tests/xe_mocs.c | 2 +-
>>> > drivers/gpu/drm/xe/xe_gt.h | 12 ++++++++++++
>>> > drivers/gpu/drm/xe/xe_gt_debugfs.c | 2 +-
>>> > 4 files changed, 15 insertions(+), 3 deletions(-)
>>> >
>>> > diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c
>>> b/drivers/gpu/drm/xe/tests/xe_bo.c
>>> > index c9ec7a313c6b..b51434fc5bd1 100644
>>> > --- a/drivers/gpu/drm/xe/tests/xe_bo.c
>>> > +++ b/drivers/gpu/drm/xe/tests/xe_bo.c
>>> > @@ -266,7 +266,7 @@ static int evict_test_run_tile(struct
>>> xe_device *xe, struct xe_tile *tile, struc
>>> > */
>>> > for_each_gt(__gt, xe, id) {
>>> > xe_gt_reset_async(__gt);
>>> > - flush_work(&__gt->reset.worker);
>>> > + xe_gt_reset_wait(__gt);
>>>
>>> Why not just create a
>>>
>>> void xe_gt_reset_sync(gt)
>>> {
>>> xe_gt_reset_async()
>>> flush_work()
>>> }
>
>
> maybe, the problem I think is the 1 letter difference between
>
> xe_gt_reset_async
> xe_gt_reset_sync
>
> that may be confusing/error-prone.
>
> I like the helper, just think we need the naming to be polished.
>
> +Thomas
>
> maybe having this as xe_gt_reset() / xe_gt_reset_async()?
>
>
> Lucas De Marchi
>
xe_gt_reset(), xe_gt_reset_async() would be my vote.
I proposed only the wait helper as I did not want to block under mutex
for the whole period of time when the reset worker will be scheduler and
executed.
That could be seconds.
Maciej
>
>>>
>>> and use it instead?
>>
>> Or perhaps reuse force_reset_sync()?
>>
>> Raag
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/xe: introduce xe_gt_reset_wait to wait for async gt reset
2024-12-09 19:39 ` Maciej Patelczyk
@ 2024-12-09 20:05 ` Lucas De Marchi
0 siblings, 0 replies; 13+ messages in thread
From: Lucas De Marchi @ 2024-12-09 20:05 UTC (permalink / raw)
To: Maciej Patelczyk
Cc: Raag Jadav, intel-xe, niranjana.vishwanathapura, Thomas Hellstrom
On Mon, Dec 09, 2024 at 08:39:32PM +0100, Maciej Patelczyk wrote:
>On 9.12.2024 17:22, Lucas De Marchi wrote:
>
>>On Mon, Dec 09, 2024 at 04:54:45PM +0200, Raag Jadav wrote:
>>>On Mon, Dec 09, 2024 at 04:42:49PM +0200, Raag Jadav wrote:
>>>>On Mon, Dec 09, 2024 at 03:12:09PM +0100, Maciej Patelczyk wrote:
>>>>> The reset of the GT is asynchronous.
>>>>> Adding a wait helper to wait until gt reset is done.
>>>>>
>>>>> Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
>>>>> ---
>>>>> drivers/gpu/drm/xe/tests/xe_bo.c | 2 +-
>>>>> drivers/gpu/drm/xe/tests/xe_mocs.c | 2 +-
>>>>> drivers/gpu/drm/xe/xe_gt.h | 12 ++++++++++++
>>>>> drivers/gpu/drm/xe/xe_gt_debugfs.c | 2 +-
>>>>> 4 files changed, 15 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c
>>>>b/drivers/gpu/drm/xe/tests/xe_bo.c
>>>>> index c9ec7a313c6b..b51434fc5bd1 100644
>>>>> --- a/drivers/gpu/drm/xe/tests/xe_bo.c
>>>>> +++ b/drivers/gpu/drm/xe/tests/xe_bo.c
>>>>> @@ -266,7 +266,7 @@ static int evict_test_run_tile(struct
>>>>xe_device *xe, struct xe_tile *tile, struc
>>>>> */
>>>>> for_each_gt(__gt, xe, id) {
>>>>> xe_gt_reset_async(__gt);
>>>>> - flush_work(&__gt->reset.worker);
>>>>> + xe_gt_reset_wait(__gt);
>>>>
>>>>Why not just create a
>>>>
>>>>void xe_gt_reset_sync(gt)
>>>>{
>>>> xe_gt_reset_async()
>>>> flush_work()
>>>>}
>>
>>
>>maybe, the problem I think is the 1 letter difference between
>>
>>xe_gt_reset_async
>>xe_gt_reset_sync
>>
>>that may be confusing/error-prone.
>>
>>I like the helper, just think we need the naming to be polished.
>>
>>+Thomas
>>
>>maybe having this as xe_gt_reset() / xe_gt_reset_async()?
>>
>>
>>Lucas De Marchi
>>
>xe_gt_reset(), xe_gt_reset_async() would be my vote.
>
>
>I proposed only the wait helper as I did not want to block under mutex
>for the whole period of time when the reset worker will be scheduler
>and executed.
>
>That could be seconds.
but as I mentioned in the other patch we actually want to hold the
mutex since we don't want a new client until this is completed.
Why would a gt reset take seconds?
Lucas De Marchi
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-12-09 20:06 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-09 14:12 [PATCH 0/2] drm/xe: adding wait helper for gt async reset Maciej Patelczyk
2024-12-09 14:12 ` [PATCH 1/2] drm/xe: introduce xe_gt_reset_wait to wait for async gt reset Maciej Patelczyk
2024-12-09 14:42 ` Raag Jadav
2024-12-09 14:54 ` Raag Jadav
2024-12-09 16:22 ` Lucas De Marchi
2024-12-09 19:39 ` Maciej Patelczyk
2024-12-09 20:05 ` Lucas De Marchi
2024-12-09 14:12 ` [PATCH 2/2] drm/xe: make change ccs_mode a synchronous action Maciej Patelczyk
2024-12-09 16:26 ` Lucas De Marchi
2024-12-09 19:31 ` Maciej Patelczyk
2024-12-09 14:50 ` ✓ CI.Patch_applied: success for drm/xe: adding wait helper for gt async reset Patchwork
2024-12-09 14:50 ` ✓ CI.checkpatch: " Patchwork
2024-12-09 14:51 ` ✗ CI.KUnit: failure " Patchwork
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox