* [PATCH v3 0/4] drm/i915: Fix harmfull driver register/unregister assymetry
@ 2025-03-05 19:11 Janusz Krzysztofik
2025-03-05 19:11 ` [PATCH v3 1/4] drm/i915: Skip harmful unregister steps if not registered Janusz Krzysztofik
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Janusz Krzysztofik @ 2025-03-05 19:11 UTC (permalink / raw)
To: intel-gfx, Jani Nikula, Andi Shyti
Cc: dri-devel, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
Chris Wilson, Lucas De Marchi, Alan Previn, Ashutosh Dixit,
Daniele Ceraolo Spurio, Janusz Krzysztofik
Starting with commit ec3e00b4ee27 ("drm/i915: stop registering if
drm_dev_register() fails"), we may return from i915_driver_register()
immediately, skipping remaining registration steps. However, the
_unregister() counterpart called at device remove knows nothing about that
skip and executes reverts of all those steps. As a consequence, a number
of kernel warnings that taint the kernel are triggered.
Introduce a flag that indicates device registration status and raise it on
device registration success. As a minimum (first patch), when that flag
is found not set while unregistering the driver, jump over those reverts
of registration steps omitted after device registration failure that are
not ready for being called unconditionally (and trigger the kernel
warnings).
With the second patch, also jump over reverts of other driver registration
steps that were not called due to device registration failure. Some
unregister function calls, found implementing additional steps beyond the
register reverts, are still executed.
To simplify i915_driver_unregister() code, the third patch makes sure
reverts of driver registration steps executed before potentially
unsuccessful device registration are symmetrically called after
the device unplug.
Finally, the last patch further simplifies the i915_driver_unregister()
code by moving two required steps down, right after device unplug.
The first patch may be squashed with one or more of its follow-ups if so
decided.
Janusz Krzysztofik (4):
drm/i915: Skip harmful unregister steps if not registered
drm/i915: Omit unnecessary driver unregister steps
drm/i915: Fix asymmetry in PMU register/unregister step order
drm/i915: Group not skipped unregister steps
drivers/gpu/drm/i915/gt/intel_gt.c | 6 ++++++
drivers/gpu/drm/i915/i915_driver.c | 18 ++++++++++++------
drivers/gpu/drm/i915/i915_drv.h | 2 ++
3 files changed, 20 insertions(+), 6 deletions(-)
--
2.48.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 1/4] drm/i915: Skip harmful unregister steps if not registered
2025-03-05 19:11 [PATCH v3 0/4] drm/i915: Fix harmfull driver register/unregister assymetry Janusz Krzysztofik
@ 2025-03-05 19:11 ` Janusz Krzysztofik
2025-03-06 10:25 ` Jani Nikula
2025-03-05 19:11 ` [PATCH v3 2/4] drm/i915: Omit unnecessary driver unregister steps Janusz Krzysztofik
` (3 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Janusz Krzysztofik @ 2025-03-05 19:11 UTC (permalink / raw)
To: intel-gfx, Jani Nikula, Andi Shyti
Cc: dri-devel, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
Chris Wilson, Lucas De Marchi, Alan Previn, Ashutosh Dixit,
Daniele Ceraolo Spurio, Janusz Krzysztofik
Starting with commit ec3e00b4ee27 ("drm/i915: stop registering if
drm_dev_register() fails"), we return from i915_driver_register()
immediately if drm_dev_register() fails, skipping remaining registration
steps. However, the _unregister() counterpart called at device remove
knows nothing about that skip and executes reverts of all those steps,
some of those reverts possibly added or modified later. As a consequence,
a number of kernel warnings that taint the kernel are triggered:
<3> [525.823143] i915 0000:00:02.0: [drm] *ERROR* Failed to register driver for
userspace access!
...
<4> [525.831069] ------------[ cut here ]------------
<4> [525.831071] i915 0000:00:02.0: [drm] drm_WARN_ON(power_domains->init_wakere
f)
<4> [525.831095] WARNING: CPU: 6 PID: 3440 at drivers/gpu/drm/i915/display/intel
_display_power.c:2074 intel_power_domains_disable+0xc2/0xd0 [i915]
...
<4> [525.831328] CPU: 6 UID: 0 PID: 3440 Comm: i915_module_loa Tainted: G U
6.14.0-rc1-CI_DRM_16076-g7a632b6798b6+ #1
...
<4> [525.831334] RIP: 0010:intel_power_domains_disable+0xc2/0xd0 [i915]
...
<4> [525.831483] Call Trace:
<4> [525.831484] <TASK>
...
<4> [525.831943] i915_driver_remove+0x4b/0x140 [i915]
<4> [525.832028] i915_pci_remove+0x1e/0x40 [i915]
<4> [525.832099] pci_device_remove+0x3e/0xb0
<4> [525.832103] device_remove+0x40/0x80
<4> [525.832107] device_release_driver_internal+0x215/0x280
...
<4> [525.947666] ------------[ cut here ]------------
<4> [525.947669] kobject: '(null)' (ffff88814f62a218): is not initialized, yet kobject_put() is being called.
<4> [525.947707] WARNING: CPU: 6 PID: 3440 at lib/kobject.c:734 kobject_put+0xe4/0x200
...
<4> [525.947875] RIP: 0010:kobject_put+0xe4/0x200
...
<4> [525.947909] Call Trace:
<4> [525.947911] <TASK>
...
<4> [525.947963] intel_gt_sysfs_unregister+0x25/0x40 [i915]
<4> [525.948133] intel_gt_driver_unregister+0x14/0x80 [i915]
<4> [525.948291] i915_driver_remove+0x6c/0x140 [i915]
<4> [525.948411] i915_pci_remove+0x1e/0x40 [i915]
...
<4> [526.441186] ------------[ cut here ]------------
<4> [526.441191] kernfs: can not remove 'error', no directory
<4> [526.441211] WARNING: CPU: 1 PID: 3440 at fs/kernfs/dir.c:1684 kernfs_remove_by_name_ns+0xbc/0xc0
...
<4> [526.441536] RIP: 0010:kernfs_remove_by_name_ns+0xbc/0xc0
...
<4> [526.441578] Call Trace:
<4> [526.441581] <TASK>
...
<4> [526.441686] sysfs_remove_bin_file+0x17/0x30
<4> [526.441691] i915_gpu_error_sysfs_teardown+0x1d/0x30 [i915]
<4> [526.442226] i915_teardown_sysfs+0x1c/0x60 [i915]
<4> [526.442369] i915_driver_remove+0x9d/0x140 [i915]
<4> [526.442473] i915_pci_remove+0x1e/0x40 [i915]
...
<4> [526.685700] ------------[ cut here ]------------
<4> [526.685706] i915 0000:00:02.0: [drm] i915 raw-wakerefs=1 wakelocks=1 on cle
anup
<4> [526.685734] WARNING: CPU: 1 PID: 3440 at drivers/gpu/drm/i915/intel_runtime
_pm.c:443 intel_runtime_pm_driver_release+0x75/0x90 [i915]
...
<4> [526.686090] RIP: 0010:intel_runtime_pm_driver_release+0x75/0x90 [i915]
...
<4> [526.686294] Call Trace:
<4> [526.686296] <TASK>
...
<4> [526.687025] i915_driver_release+0x7e/0xb0 [i915]
<4> [526.687243] drm_dev_put.part.0+0x47/0x90
<4> [526.687250] devm_drm_dev_init_release+0x13/0x30
<4> [526.687255] devm_action_release+0x12/0x30
<4> [526.687261] release_nodes+0x3a/0x120
<4> [526.687268] devres_release_all+0x97/0xe0
<4> [526.687277] device_unbind_cleanup+0x12/0x80
<4> [526.687282] device_release_driver_internal+0x23a/0x280
...
A call to intel_power_domains_disable() that triggeres the drm_WARN_ON()
and takes another wakeref even if the one taken during driver register was
not released after device register error, was already there. The WARN()
triggered by kernfs_remove_by_name_ns() from i915_teardown_sysfs()->
i915_gpu_error_sysfs_teardown(), formerly i915_teardown_error_capture(),
was also there. A call to intel_gt_sysfs_unregister() that triggers the
WARN() from kobject_put() was added to intel_gt_driver_unregister() with
commit 69d6bf5c3754ff ("drm/i915/gt: Fix memory leaks in per-gt sysfs").
Introduce a flag that indicates device registration status and raise it on
device registration success. As a minimum (first step), when that flag is
found not set while unregistering the driver, jump over those reverts of
registration steps omitted after device registration failure that are not
ready for being called unconditionally (and trigger kernel warnings).
In case of i915_gt_driver_unregister() called for each GT, omitting it
proved to introduce new issues. Skip only its problematic
intel_gt_sysfs_unregister() sub-step.
Other unregister steps seem to be safe but may still occur redundant in
that scenario -- that will be addressed in follow-up patches.
v3: Based on Andi's commitment on introducing a flag, try to address
Jani's "must find another way" by finding a better place and name for
the flag (in hope that's what Jani had on mind),
- split into a series of patches and limit the scope of the first (this)
one to a minimum of omitting conditionally only those unregister
(sub)steps that trigger kernel warnings when not registered.
v2: Check in _unregister whether the drm_dev_register has succeeded and
skip some of the _unregister() steps. (Andi)
Link: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10047
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/9820
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10131
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10887
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12817
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
drivers/gpu/drm/i915/gt/intel_gt.c | 3 ++-
drivers/gpu/drm/i915/i915_driver.c | 8 ++++++--
drivers/gpu/drm/i915/i915_drv.h | 2 ++
3 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 3d3b1ba76e2be..77eb9305a1ff8 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -783,7 +783,8 @@ void intel_gt_driver_unregister(struct intel_gt *gt)
{
intel_wakeref_t wakeref;
- intel_gt_sysfs_unregister(gt);
+ if (gt->i915->do_unregister)
+ intel_gt_sysfs_unregister(gt);
intel_rps_driver_unregister(>->rps);
intel_gsc_fini(>->gsc);
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index 613084fd00979..2caaeeab1f0f6 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -638,6 +638,8 @@ static void i915_driver_register(struct drm_i915_private *dev_priv)
return;
}
+ dev_priv->do_unregister = true;
+
i915_debugfs_register(dev_priv);
i915_setup_sysfs(dev_priv);
@@ -673,7 +675,8 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
i915_switcheroo_unregister(dev_priv);
intel_runtime_pm_disable(&dev_priv->runtime_pm);
- intel_power_domains_disable(display);
+ if (dev_priv->do_unregister)
+ intel_power_domains_disable(display);
intel_display_driver_unregister(display);
@@ -687,7 +690,8 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
i915_perf_unregister(dev_priv);
i915_pmu_unregister(dev_priv);
- i915_teardown_sysfs(dev_priv);
+ if (dev_priv->do_unregister)
+ i915_teardown_sysfs(dev_priv);
drm_dev_unplug(&dev_priv->drm);
i915_gem_driver_unregister(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ffc346379cc2c..0c6cbedaa71fe 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -184,6 +184,8 @@ struct drm_i915_private {
/* FIXME: Device release actions should all be moved to drmm_ */
bool do_release;
+ bool do_unregister;
+
/* i915 device parameters */
struct i915_params params;
--
2.48.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 2/4] drm/i915: Omit unnecessary driver unregister steps
2025-03-05 19:11 [PATCH v3 0/4] drm/i915: Fix harmfull driver register/unregister assymetry Janusz Krzysztofik
2025-03-05 19:11 ` [PATCH v3 1/4] drm/i915: Skip harmful unregister steps if not registered Janusz Krzysztofik
@ 2025-03-05 19:11 ` Janusz Krzysztofik
2025-03-05 19:11 ` [PATCH v3 3/4] drm/i915: Fix asymmetry in PMU register/unregister step order Janusz Krzysztofik
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Janusz Krzysztofik @ 2025-03-05 19:11 UTC (permalink / raw)
To: intel-gfx, Jani Nikula, Andi Shyti
Cc: dri-devel, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
Chris Wilson, Lucas De Marchi, Alan Previn, Ashutosh Dixit,
Daniele Ceraolo Spurio, Janusz Krzysztofik
Now that we have a flag that indicates device registration status, when
unregistering the driver, jump over reverts of driver registration steps
that were not called due to device registration failure.
Unfortunately, not all steps of i915_driver_unregister() are limited only
to reverting changes introduced by their corresponding steps of
i915_driver_register(), and some are called out of order (not in reverse
order of what was called / skipped during driver registration.
One example is intel_gt_driver_unregister() called for each GT. For clean
driver removal, a step of that function -- intel_gsc_uc_flush_work() --
occurred required even if intel_gt_driver_register() was not called due to
device registration failure. That call was added to
intel_gt_driver_unregister() with commit b09f9670b13038 ("drm/i915/gsc:
flush the GSC worker before wedging on unload").
Another exception is intel_pxp_fini(). It seems to be a counterpart of
intel_pxp_init(), called directly from i915_driver_probe(), rather than of
intel_pxp_debugfs_register(). However, it still needs to be called before
intel_gt_driver_unregister()->intel_gsc_uc_flush_work() for clean driver
removal. The call to intel_pxp_fini() was added to
i915_driver_unregister() with commit f67986b0119c04 ("drm/i915/pxp:
Promote pxp subsystem to top-level of i915").
Also, drm_dev_unplug() must be called even if drm_dev_register() failed.
It not only does more than drm_dev_unregister() but also, some of its
initial steps might have been called and not unwound even if an error then
occurred, so they must be reverted on driver unregister.
Last, i915_pmu_unregister(), a counterpart of i915_pmu_register() always
called from i915_driver_register(), must be called unconditionally from
i915_driver_register(). However, since it is called a few steps before
drm_dev_unplug(), it also requires special handling. BTW, its placement
within i915_driver_unregister() hasn't changes since the immediate return
from i915_driver_register() on device registration failure was introduced.
To avoid unexpected side effects, execution order of unregister steps is
strictly preserved, which potentially makes the code changes sub-optimal.
That will be addressed in follow-up patches.
v3: Don't limit the scope to fixing a subset of what needs to be handled
properly (hwmon? gt? debugfs?). (Andi)
v2: Check in _unregister whether the drm_dev_register has succeeded and
skip some of the _unregister() steps. (Andi)
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
drivers/gpu/drm/i915/gt/intel_gt.c | 9 +++++++--
drivers/gpu/drm/i915/i915_driver.c | 17 +++++++++++++----
2 files changed, 20 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 77eb9305a1ff8..17e2ccd8b1a92 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -783,8 +783,10 @@ void intel_gt_driver_unregister(struct intel_gt *gt)
{
intel_wakeref_t wakeref;
- if (gt->i915->do_unregister)
- intel_gt_sysfs_unregister(gt);
+ if (!gt->i915->do_unregister)
+ goto not_registered;
+
+ intel_gt_sysfs_unregister(gt);
intel_rps_driver_unregister(>->rps);
intel_gsc_fini(>->gsc);
@@ -809,7 +811,10 @@ void intel_gt_driver_unregister(struct intel_gt *gt)
* will likely already be idle in the great majority of non-selftest
* scenarios.
*/
+not_registered:
intel_gsc_uc_flush_work(>->uc.gsc);
+ if (!gt->i915->do_unregister)
+ return;
/*
* Upon unregistering the device to prevent any new users, cancel
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index 2caaeeab1f0f6..d865e90f54704 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -672,26 +672,35 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
struct intel_gt *gt;
unsigned int i;
+ if (!dev_priv->do_unregister)
+ goto do_pxp_gt;
+
i915_switcheroo_unregister(dev_priv);
intel_runtime_pm_disable(&dev_priv->runtime_pm);
- if (dev_priv->do_unregister)
- intel_power_domains_disable(display);
+ intel_power_domains_disable(display);
intel_display_driver_unregister(display);
+do_pxp_gt:
intel_pxp_fini(dev_priv);
for_each_gt(gt, dev_priv, i)
intel_gt_driver_unregister(gt);
+ if (!dev_priv->do_unregister)
+ goto do_pmu;
+
i915_hwmon_unregister(dev_priv);
i915_perf_unregister(dev_priv);
+do_pmu:
i915_pmu_unregister(dev_priv);
+ if (!dev_priv->do_unregister)
+ goto do_unplug;
- if (dev_priv->do_unregister)
- i915_teardown_sysfs(dev_priv);
+ i915_teardown_sysfs(dev_priv);
+do_unplug:
drm_dev_unplug(&dev_priv->drm);
i915_gem_driver_unregister(dev_priv);
--
2.48.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 3/4] drm/i915: Fix asymmetry in PMU register/unregister step order
2025-03-05 19:11 [PATCH v3 0/4] drm/i915: Fix harmfull driver register/unregister assymetry Janusz Krzysztofik
2025-03-05 19:11 ` [PATCH v3 1/4] drm/i915: Skip harmful unregister steps if not registered Janusz Krzysztofik
2025-03-05 19:11 ` [PATCH v3 2/4] drm/i915: Omit unnecessary driver unregister steps Janusz Krzysztofik
@ 2025-03-05 19:11 ` Janusz Krzysztofik
2025-03-05 19:11 ` [PATCH v3 4/4] drm/i915: Group not skipped unregister steps Janusz Krzysztofik
2025-03-06 11:00 ` [PATCH v3 0/4] drm/i915: Fix harmfull driver register/unregister assymetry Krzysztof Karas
4 siblings, 0 replies; 10+ messages in thread
From: Janusz Krzysztofik @ 2025-03-05 19:11 UTC (permalink / raw)
To: intel-gfx, Jani Nikula, Andi Shyti
Cc: dri-devel, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
Chris Wilson, Lucas De Marchi, Alan Previn, Ashutosh Dixit,
Daniele Ceraolo Spurio, Janusz Krzysztofik
To simplify i915_driver_unregister() code, make sure reverts of driver
registration steps executed before potentially unsuccessful device
registration are symmetrically called after drm_dev_unplug(). There is
one case that doesn't follow that rule, introduced by commit b46a33e271ed
("drm/i915/pmu: Expose a PMU interface for perf queries"), with no
justification for asymmetry provided in commit description, then assumed
not intentional.
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
drivers/gpu/drm/i915/i915_driver.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index d865e90f54704..7980bb880c7dc 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -689,20 +689,17 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
intel_gt_driver_unregister(gt);
if (!dev_priv->do_unregister)
- goto do_pmu;
+ goto do_unplug;
i915_hwmon_unregister(dev_priv);
i915_perf_unregister(dev_priv);
-do_pmu:
- i915_pmu_unregister(dev_priv);
- if (!dev_priv->do_unregister)
- goto do_unplug;
i915_teardown_sysfs(dev_priv);
do_unplug:
drm_dev_unplug(&dev_priv->drm);
+ i915_pmu_unregister(dev_priv);
i915_gem_driver_unregister(dev_priv);
}
--
2.48.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 4/4] drm/i915: Group not skipped unregister steps
2025-03-05 19:11 [PATCH v3 0/4] drm/i915: Fix harmfull driver register/unregister assymetry Janusz Krzysztofik
` (2 preceding siblings ...)
2025-03-05 19:11 ` [PATCH v3 3/4] drm/i915: Fix asymmetry in PMU register/unregister step order Janusz Krzysztofik
@ 2025-03-05 19:11 ` Janusz Krzysztofik
2025-03-06 11:00 ` [PATCH v3 0/4] drm/i915: Fix harmfull driver register/unregister assymetry Krzysztof Karas
4 siblings, 0 replies; 10+ messages in thread
From: Janusz Krzysztofik @ 2025-03-05 19:11 UTC (permalink / raw)
To: intel-gfx, Jani Nikula, Andi Shyti
Cc: dri-devel, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
Chris Wilson, Lucas De Marchi, Alan Previn, Ashutosh Dixit,
Daniele Ceraolo Spurio, Janusz Krzysztofik
Further simplification of i915_driver_unregister() requires moving of two
steps, intel_pxp_fini() and intel_gt_driver_unregister(), down, e.g.,
right behind drm_dev_unplug(). Local testing hasn't revealed any issues
with that move, so go for it.
Former placement of intel_gt_driver_unregister() within
i915_driver_unregister() originated from commit 42014f69bb235f ("drm/i915:
Hook up GT power management") while its content has grown significantly
since its introduction. Placement of intel_pxp_fini() originated from
commit f67986b0119c04 ("drm/i915/pxp: Promote pxp subsystem to top-level
of i915").
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
drivers/gpu/drm/i915/i915_driver.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
index 7980bb880c7dc..0a07de2a7734b 100644
--- a/drivers/gpu/drm/i915/i915_driver.c
+++ b/drivers/gpu/drm/i915/i915_driver.c
@@ -673,7 +673,7 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
unsigned int i;
if (!dev_priv->do_unregister)
- goto do_pxp_gt;
+ goto do_unplug;
i915_switcheroo_unregister(dev_priv);
@@ -682,15 +682,6 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
intel_display_driver_unregister(display);
-do_pxp_gt:
- intel_pxp_fini(dev_priv);
-
- for_each_gt(gt, dev_priv, i)
- intel_gt_driver_unregister(gt);
-
- if (!dev_priv->do_unregister)
- goto do_unplug;
-
i915_hwmon_unregister(dev_priv);
i915_perf_unregister(dev_priv);
@@ -699,6 +690,11 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
do_unplug:
drm_dev_unplug(&dev_priv->drm);
+ intel_pxp_fini(dev_priv);
+
+ for_each_gt(gt, dev_priv, i)
+ intel_gt_driver_unregister(gt);
+
i915_pmu_unregister(dev_priv);
i915_gem_driver_unregister(dev_priv);
}
--
2.48.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/4] drm/i915: Skip harmful unregister steps if not registered
2025-03-05 19:11 ` [PATCH v3 1/4] drm/i915: Skip harmful unregister steps if not registered Janusz Krzysztofik
@ 2025-03-06 10:25 ` Jani Nikula
2025-03-06 10:50 ` Janusz Krzysztofik
0 siblings, 1 reply; 10+ messages in thread
From: Jani Nikula @ 2025-03-06 10:25 UTC (permalink / raw)
To: Janusz Krzysztofik, intel-gfx, Andi Shyti
Cc: dri-devel, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
Chris Wilson, Lucas De Marchi, Alan Previn, Ashutosh Dixit,
Daniele Ceraolo Spurio, Janusz Krzysztofik
On Wed, 05 Mar 2025, Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> wrote:
> Starting with commit ec3e00b4ee27 ("drm/i915: stop registering if
> drm_dev_register() fails"), we return from i915_driver_register()
> immediately if drm_dev_register() fails, skipping remaining registration
> steps. However, the _unregister() counterpart called at device remove
> knows nothing about that skip and executes reverts of all those steps,
> some of those reverts possibly added or modified later. As a consequence,
> a number of kernel warnings that taint the kernel are triggered:
>
> <3> [525.823143] i915 0000:00:02.0: [drm] *ERROR* Failed to register driver for
> userspace access!
> ...
> <4> [525.831069] ------------[ cut here ]------------
> <4> [525.831071] i915 0000:00:02.0: [drm] drm_WARN_ON(power_domains->init_wakere
> f)
> <4> [525.831095] WARNING: CPU: 6 PID: 3440 at drivers/gpu/drm/i915/display/intel
> _display_power.c:2074 intel_power_domains_disable+0xc2/0xd0 [i915]
> ...
> <4> [525.831328] CPU: 6 UID: 0 PID: 3440 Comm: i915_module_loa Tainted: G U
> 6.14.0-rc1-CI_DRM_16076-g7a632b6798b6+ #1
> ...
> <4> [525.831334] RIP: 0010:intel_power_domains_disable+0xc2/0xd0 [i915]
> ...
> <4> [525.831483] Call Trace:
> <4> [525.831484] <TASK>
> ...
> <4> [525.831943] i915_driver_remove+0x4b/0x140 [i915]
> <4> [525.832028] i915_pci_remove+0x1e/0x40 [i915]
> <4> [525.832099] pci_device_remove+0x3e/0xb0
> <4> [525.832103] device_remove+0x40/0x80
> <4> [525.832107] device_release_driver_internal+0x215/0x280
> ...
> <4> [525.947666] ------------[ cut here ]------------
> <4> [525.947669] kobject: '(null)' (ffff88814f62a218): is not initialized, yet kobject_put() is being called.
> <4> [525.947707] WARNING: CPU: 6 PID: 3440 at lib/kobject.c:734 kobject_put+0xe4/0x200
> ...
> <4> [525.947875] RIP: 0010:kobject_put+0xe4/0x200
> ...
> <4> [525.947909] Call Trace:
> <4> [525.947911] <TASK>
> ...
> <4> [525.947963] intel_gt_sysfs_unregister+0x25/0x40 [i915]
> <4> [525.948133] intel_gt_driver_unregister+0x14/0x80 [i915]
> <4> [525.948291] i915_driver_remove+0x6c/0x140 [i915]
> <4> [525.948411] i915_pci_remove+0x1e/0x40 [i915]
> ...
> <4> [526.441186] ------------[ cut here ]------------
> <4> [526.441191] kernfs: can not remove 'error', no directory
> <4> [526.441211] WARNING: CPU: 1 PID: 3440 at fs/kernfs/dir.c:1684 kernfs_remove_by_name_ns+0xbc/0xc0
> ...
> <4> [526.441536] RIP: 0010:kernfs_remove_by_name_ns+0xbc/0xc0
> ...
> <4> [526.441578] Call Trace:
> <4> [526.441581] <TASK>
> ...
> <4> [526.441686] sysfs_remove_bin_file+0x17/0x30
> <4> [526.441691] i915_gpu_error_sysfs_teardown+0x1d/0x30 [i915]
> <4> [526.442226] i915_teardown_sysfs+0x1c/0x60 [i915]
> <4> [526.442369] i915_driver_remove+0x9d/0x140 [i915]
> <4> [526.442473] i915_pci_remove+0x1e/0x40 [i915]
> ...
> <4> [526.685700] ------------[ cut here ]------------
> <4> [526.685706] i915 0000:00:02.0: [drm] i915 raw-wakerefs=1 wakelocks=1 on cle
> anup
> <4> [526.685734] WARNING: CPU: 1 PID: 3440 at drivers/gpu/drm/i915/intel_runtime
> _pm.c:443 intel_runtime_pm_driver_release+0x75/0x90 [i915]
> ...
> <4> [526.686090] RIP: 0010:intel_runtime_pm_driver_release+0x75/0x90 [i915]
> ...
> <4> [526.686294] Call Trace:
> <4> [526.686296] <TASK>
> ...
> <4> [526.687025] i915_driver_release+0x7e/0xb0 [i915]
> <4> [526.687243] drm_dev_put.part.0+0x47/0x90
> <4> [526.687250] devm_drm_dev_init_release+0x13/0x30
> <4> [526.687255] devm_action_release+0x12/0x30
> <4> [526.687261] release_nodes+0x3a/0x120
> <4> [526.687268] devres_release_all+0x97/0xe0
> <4> [526.687277] device_unbind_cleanup+0x12/0x80
> <4> [526.687282] device_release_driver_internal+0x23a/0x280
> ...
>
> A call to intel_power_domains_disable() that triggeres the drm_WARN_ON()
> and takes another wakeref even if the one taken during driver register was
> not released after device register error, was already there. The WARN()
> triggered by kernfs_remove_by_name_ns() from i915_teardown_sysfs()->
> i915_gpu_error_sysfs_teardown(), formerly i915_teardown_error_capture(),
> was also there. A call to intel_gt_sysfs_unregister() that triggers the
> WARN() from kobject_put() was added to intel_gt_driver_unregister() with
> commit 69d6bf5c3754ff ("drm/i915/gt: Fix memory leaks in per-gt sysfs").
>
> Introduce a flag that indicates device registration status and raise it on
> device registration success. As a minimum (first step), when that flag is
> found not set while unregistering the driver, jump over those reverts of
> registration steps omitted after device registration failure that are not
> ready for being called unconditionally (and trigger kernel warnings).
>
> In case of i915_gt_driver_unregister() called for each GT, omitting it
> proved to introduce new issues. Skip only its problematic
> intel_gt_sysfs_unregister() sub-step.
>
> Other unregister steps seem to be safe but may still occur redundant in
> that scenario -- that will be addressed in follow-up patches.
>
> v3: Based on Andi's commitment on introducing a flag, try to address
> Jani's "must find another way" by finding a better place and name for
> the flag (in hope that's what Jani had on mind),
> - split into a series of patches and limit the scope of the first (this)
> one to a minimum of omitting conditionally only those unregister
> (sub)steps that trigger kernel warnings when not registered.
> v2: Check in _unregister whether the drm_dev_register has succeeded and
> skip some of the _unregister() steps. (Andi)
>
> Link: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10047
> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/9820
> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10131
> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10887
> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12817
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: Jani Nikula <jani.nikula@linux.intel.com>
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> ---
> drivers/gpu/drm/i915/gt/intel_gt.c | 3 ++-
> drivers/gpu/drm/i915/i915_driver.c | 8 ++++++--
> drivers/gpu/drm/i915/i915_drv.h | 2 ++
> 3 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 3d3b1ba76e2be..77eb9305a1ff8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -783,7 +783,8 @@ void intel_gt_driver_unregister(struct intel_gt *gt)
> {
> intel_wakeref_t wakeref;
>
> - intel_gt_sysfs_unregister(gt);
> + if (gt->i915->do_unregister)
> + intel_gt_sysfs_unregister(gt);
> intel_rps_driver_unregister(>->rps);
> intel_gsc_fini(>->gsc);
>
> diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> index 613084fd00979..2caaeeab1f0f6 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -638,6 +638,8 @@ static void i915_driver_register(struct drm_i915_private *dev_priv)
> return;
> }
>
> + dev_priv->do_unregister = true;
> +
> i915_debugfs_register(dev_priv);
> i915_setup_sysfs(dev_priv);
>
> @@ -673,7 +675,8 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
> i915_switcheroo_unregister(dev_priv);
>
> intel_runtime_pm_disable(&dev_priv->runtime_pm);
> - intel_power_domains_disable(display);
> + if (dev_priv->do_unregister)
> + intel_power_domains_disable(display);
I still think we must find another way than adding a global
->do_unregister flag.
The i915 core to display interface needs to be unified with xe. This
patch is counter-productive to that goal.
BR,
Jani.
>
> intel_display_driver_unregister(display);
>
> @@ -687,7 +690,8 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
> i915_perf_unregister(dev_priv);
> i915_pmu_unregister(dev_priv);
>
> - i915_teardown_sysfs(dev_priv);
> + if (dev_priv->do_unregister)
> + i915_teardown_sysfs(dev_priv);
> drm_dev_unplug(&dev_priv->drm);
>
> i915_gem_driver_unregister(dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index ffc346379cc2c..0c6cbedaa71fe 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -184,6 +184,8 @@ struct drm_i915_private {
> /* FIXME: Device release actions should all be moved to drmm_ */
> bool do_release;
>
> + bool do_unregister;
> +
> /* i915 device parameters */
> struct i915_params params;
--
Jani Nikula, Intel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/4] drm/i915: Skip harmful unregister steps if not registered
2025-03-06 10:25 ` Jani Nikula
@ 2025-03-06 10:50 ` Janusz Krzysztofik
0 siblings, 0 replies; 10+ messages in thread
From: Janusz Krzysztofik @ 2025-03-06 10:50 UTC (permalink / raw)
To: intel-gfx, Andi Shyti, Jani Nikula
Cc: dri-devel, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
Chris Wilson, Lucas De Marchi, Alan Previn, Ashutosh Dixit,
Daniele Ceraolo Spurio
Hi Jani,
On Thursday, 6 March 2025 11:25:52 CET Jani Nikula wrote:
> On Wed, 05 Mar 2025, Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> wrote:
> > Starting with commit ec3e00b4ee27 ("drm/i915: stop registering if
> > drm_dev_register() fails"), we return from i915_driver_register()
> > immediately if drm_dev_register() fails, skipping remaining registration
> > steps. However, the _unregister() counterpart called at device remove
> > knows nothing about that skip and executes reverts of all those steps,
> > some of those reverts possibly added or modified later. As a consequence,
> > a number of kernel warnings that taint the kernel are triggered:
> >
> > <3> [525.823143] i915 0000:00:02.0: [drm] *ERROR* Failed to register driver for
> > userspace access!
> > ...
> > <4> [525.831069] ------------[ cut here ]------------
> > <4> [525.831071] i915 0000:00:02.0: [drm] drm_WARN_ON(power_domains->init_wakere
> > f)
> > <4> [525.831095] WARNING: CPU: 6 PID: 3440 at drivers/gpu/drm/i915/display/intel
> > _display_power.c:2074 intel_power_domains_disable+0xc2/0xd0 [i915]
> > ...
> > <4> [525.831328] CPU: 6 UID: 0 PID: 3440 Comm: i915_module_loa Tainted: G U
> > 6.14.0-rc1-CI_DRM_16076-g7a632b6798b6+ #1
> > ...
> > <4> [525.831334] RIP: 0010:intel_power_domains_disable+0xc2/0xd0 [i915]
> > ...
> > <4> [525.831483] Call Trace:
> > <4> [525.831484] <TASK>
> > ...
> > <4> [525.831943] i915_driver_remove+0x4b/0x140 [i915]
> > <4> [525.832028] i915_pci_remove+0x1e/0x40 [i915]
> > <4> [525.832099] pci_device_remove+0x3e/0xb0
> > <4> [525.832103] device_remove+0x40/0x80
> > <4> [525.832107] device_release_driver_internal+0x215/0x280
> > ...
> > <4> [525.947666] ------------[ cut here ]------------
> > <4> [525.947669] kobject: '(null)' (ffff88814f62a218): is not initialized, yet kobject_put() is being called.
> > <4> [525.947707] WARNING: CPU: 6 PID: 3440 at lib/kobject.c:734 kobject_put+0xe4/0x200
> > ...
> > <4> [525.947875] RIP: 0010:kobject_put+0xe4/0x200
> > ...
> > <4> [525.947909] Call Trace:
> > <4> [525.947911] <TASK>
> > ...
> > <4> [525.947963] intel_gt_sysfs_unregister+0x25/0x40 [i915]
> > <4> [525.948133] intel_gt_driver_unregister+0x14/0x80 [i915]
> > <4> [525.948291] i915_driver_remove+0x6c/0x140 [i915]
> > <4> [525.948411] i915_pci_remove+0x1e/0x40 [i915]
> > ...
> > <4> [526.441186] ------------[ cut here ]------------
> > <4> [526.441191] kernfs: can not remove 'error', no directory
> > <4> [526.441211] WARNING: CPU: 1 PID: 3440 at fs/kernfs/dir.c:1684 kernfs_remove_by_name_ns+0xbc/0xc0
> > ...
> > <4> [526.441536] RIP: 0010:kernfs_remove_by_name_ns+0xbc/0xc0
> > ...
> > <4> [526.441578] Call Trace:
> > <4> [526.441581] <TASK>
> > ...
> > <4> [526.441686] sysfs_remove_bin_file+0x17/0x30
> > <4> [526.441691] i915_gpu_error_sysfs_teardown+0x1d/0x30 [i915]
> > <4> [526.442226] i915_teardown_sysfs+0x1c/0x60 [i915]
> > <4> [526.442369] i915_driver_remove+0x9d/0x140 [i915]
> > <4> [526.442473] i915_pci_remove+0x1e/0x40 [i915]
> > ...
> > <4> [526.685700] ------------[ cut here ]------------
> > <4> [526.685706] i915 0000:00:02.0: [drm] i915 raw-wakerefs=1 wakelocks=1 on cle
> > anup
> > <4> [526.685734] WARNING: CPU: 1 PID: 3440 at drivers/gpu/drm/i915/intel_runtime
> > _pm.c:443 intel_runtime_pm_driver_release+0x75/0x90 [i915]
> > ...
> > <4> [526.686090] RIP: 0010:intel_runtime_pm_driver_release+0x75/0x90 [i915]
> > ...
> > <4> [526.686294] Call Trace:
> > <4> [526.686296] <TASK>
> > ...
> > <4> [526.687025] i915_driver_release+0x7e/0xb0 [i915]
> > <4> [526.687243] drm_dev_put.part.0+0x47/0x90
> > <4> [526.687250] devm_drm_dev_init_release+0x13/0x30
> > <4> [526.687255] devm_action_release+0x12/0x30
> > <4> [526.687261] release_nodes+0x3a/0x120
> > <4> [526.687268] devres_release_all+0x97/0xe0
> > <4> [526.687277] device_unbind_cleanup+0x12/0x80
> > <4> [526.687282] device_release_driver_internal+0x23a/0x280
> > ...
> >
> > A call to intel_power_domains_disable() that triggeres the drm_WARN_ON()
> > and takes another wakeref even if the one taken during driver register was
> > not released after device register error, was already there. The WARN()
> > triggered by kernfs_remove_by_name_ns() from i915_teardown_sysfs()->
> > i915_gpu_error_sysfs_teardown(), formerly i915_teardown_error_capture(),
> > was also there. A call to intel_gt_sysfs_unregister() that triggers the
> > WARN() from kobject_put() was added to intel_gt_driver_unregister() with
> > commit 69d6bf5c3754ff ("drm/i915/gt: Fix memory leaks in per-gt sysfs").
> >
> > Introduce a flag that indicates device registration status and raise it on
> > device registration success. As a minimum (first step), when that flag is
> > found not set while unregistering the driver, jump over those reverts of
> > registration steps omitted after device registration failure that are not
> > ready for being called unconditionally (and trigger kernel warnings).
> >
> > In case of i915_gt_driver_unregister() called for each GT, omitting it
> > proved to introduce new issues. Skip only its problematic
> > intel_gt_sysfs_unregister() sub-step.
> >
> > Other unregister steps seem to be safe but may still occur redundant in
> > that scenario -- that will be addressed in follow-up patches.
> >
> > v3: Based on Andi's commitment on introducing a flag, try to address
> > Jani's "must find another way" by finding a better place and name for
> > the flag (in hope that's what Jani had on mind),
> > - split into a series of patches and limit the scope of the first (this)
> > one to a minimum of omitting conditionally only those unregister
> > (sub)steps that trigger kernel warnings when not registered.
> > v2: Check in _unregister whether the drm_dev_register has succeeded and
> > skip some of the _unregister() steps. (Andi)
> >
> > Link: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10047
> > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/9820
> > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10131
> > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10887
> > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12817
> > Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> > Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> > Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
> > Cc: Andi Shyti <andi.shyti@linux.intel.com>
> > Cc: Jani Nikula <jani.nikula@linux.intel.com>
> > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > ---
> > drivers/gpu/drm/i915/gt/intel_gt.c | 3 ++-
> > drivers/gpu/drm/i915/i915_driver.c | 8 ++++++--
> > drivers/gpu/drm/i915/i915_drv.h | 2 ++
> > 3 files changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > index 3d3b1ba76e2be..77eb9305a1ff8 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > @@ -783,7 +783,8 @@ void intel_gt_driver_unregister(struct intel_gt *gt)
> > {
> > intel_wakeref_t wakeref;
> >
> > - intel_gt_sysfs_unregister(gt);
> > + if (gt->i915->do_unregister)
> > + intel_gt_sysfs_unregister(gt);
> > intel_rps_driver_unregister(>->rps);
> > intel_gsc_fini(>->gsc);
> >
> > diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c
> > index 613084fd00979..2caaeeab1f0f6 100644
> > --- a/drivers/gpu/drm/i915/i915_driver.c
> > +++ b/drivers/gpu/drm/i915/i915_driver.c
> > @@ -638,6 +638,8 @@ static void i915_driver_register(struct drm_i915_private *dev_priv)
> > return;
> > }
> >
> > + dev_priv->do_unregister = true;
> > +
> > i915_debugfs_register(dev_priv);
> > i915_setup_sysfs(dev_priv);
> >
> > @@ -673,7 +675,8 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
> > i915_switcheroo_unregister(dev_priv);
> >
> > intel_runtime_pm_disable(&dev_priv->runtime_pm);
> > - intel_power_domains_disable(display);
> > + if (dev_priv->do_unregister)
> > + intel_power_domains_disable(display);
>
> I still think we must find another way than adding a global
> ->do_unregister flag.
>
> The i915 core to display interface needs to be unified with xe. This
> patch is counter-productive to that goal.
Thank you for clarification.
Have you seen my v1 of the changeset? That was an attempt to resolve each
issue individually, without adding the global flag. In case of
intel_power_domains_enable/disable(), I proposed to move it out of
i915_driver_register/unregister() up one level, to i915_drv_probe/remove(), so
it is called unconditionally, enabling power domains (releasing the wakeref)
on probe regardless of device register result, then safely re-acquiring the
wakeref on remove[1]. Would that work for you?
Thanks,
Janusz
[1] https://lore.kernel.org/dri-devel/20250206180927.2237256-6-janusz.krzysztofik@linux.intel.com/
>
>
> BR,
> Jani.
>
> >
> > intel_display_driver_unregister(display);
> >
> > @@ -687,7 +690,8 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv)
> > i915_perf_unregister(dev_priv);
> > i915_pmu_unregister(dev_priv);
> >
> > - i915_teardown_sysfs(dev_priv);
> > + if (dev_priv->do_unregister)
> > + i915_teardown_sysfs(dev_priv);
> > drm_dev_unplug(&dev_priv->drm);
> >
> > i915_gem_driver_unregister(dev_priv);
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index ffc346379cc2c..0c6cbedaa71fe 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -184,6 +184,8 @@ struct drm_i915_private {
> > /* FIXME: Device release actions should all be moved to drmm_ */
> > bool do_release;
> >
> > + bool do_unregister;
> > +
> > /* i915 device parameters */
> > struct i915_params params;
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 0/4] drm/i915: Fix harmfull driver register/unregister assymetry
2025-03-05 19:11 [PATCH v3 0/4] drm/i915: Fix harmfull driver register/unregister assymetry Janusz Krzysztofik
` (3 preceding siblings ...)
2025-03-05 19:11 ` [PATCH v3 4/4] drm/i915: Group not skipped unregister steps Janusz Krzysztofik
@ 2025-03-06 11:00 ` Krzysztof Karas
2025-03-06 11:40 ` Janusz Krzysztofik
4 siblings, 1 reply; 10+ messages in thread
From: Krzysztof Karas @ 2025-03-06 11:00 UTC (permalink / raw)
To: Janusz Krzysztofik
Cc: intel-gfx, Jani Nikula, Andi Shyti, dri-devel, Joonas Lahtinen,
Rodrigo Vivi, Tvrtko Ursulin, Chris Wilson, Lucas De Marchi,
Alan Previn, Ashutosh Dixit, Daniele Ceraolo Spurio
Hi Janusz,
throughout the series you modify the code right after
introducing it. How about changing the order of things a bit:
1) order the functions in a symmetrical way between
register/unregister steps and group them as you see necessary,
(At that point you would not be fixing the issue yet, but
prepare the code for further changes)
2) then introduce the new flag along with all the labels needed
for clean unregistration.
I think that way you could reduce number of patches (and changes
in code needing review) while also fixing the original issue.
Overall, I believe this is a good effort and much needed change
in registration and unregistering process.
Best Regards,
Krzysztof
> Starting with commit ec3e00b4ee27 ("drm/i915: stop registering if
> drm_dev_register() fails"), we may return from i915_driver_register()
> immediately, skipping remaining registration steps. However, the
> _unregister() counterpart called at device remove knows nothing about that
> skip and executes reverts of all those steps. As a consequence, a number
> of kernel warnings that taint the kernel are triggered.
>
> Introduce a flag that indicates device registration status and raise it on
> device registration success. As a minimum (first patch), when that flag
> is found not set while unregistering the driver, jump over those reverts
> of registration steps omitted after device registration failure that are
> not ready for being called unconditionally (and trigger the kernel
> warnings).
>
> With the second patch, also jump over reverts of other driver registration
> steps that were not called due to device registration failure. Some
> unregister function calls, found implementing additional steps beyond the
> register reverts, are still executed.
>
> To simplify i915_driver_unregister() code, the third patch makes sure
> reverts of driver registration steps executed before potentially
> unsuccessful device registration are symmetrically called after
> the device unplug.
>
> Finally, the last patch further simplifies the i915_driver_unregister()
> code by moving two required steps down, right after device unplug.
>
> The first patch may be squashed with one or more of its follow-ups if so
> decided.
>
> Janusz Krzysztofik (4):
> drm/i915: Skip harmful unregister steps if not registered
> drm/i915: Omit unnecessary driver unregister steps
> drm/i915: Fix asymmetry in PMU register/unregister step order
> drm/i915: Group not skipped unregister steps
>
> drivers/gpu/drm/i915/gt/intel_gt.c | 6 ++++++
> drivers/gpu/drm/i915/i915_driver.c | 18 ++++++++++++------
> drivers/gpu/drm/i915/i915_drv.h | 2 ++
> 3 files changed, 20 insertions(+), 6 deletions(-)
>
> --
> 2.48.1
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 0/4] drm/i915: Fix harmfull driver register/unregister assymetry
2025-03-06 11:00 ` [PATCH v3 0/4] drm/i915: Fix harmfull driver register/unregister assymetry Krzysztof Karas
@ 2025-03-06 11:40 ` Janusz Krzysztofik
2025-03-06 13:49 ` Krzysztof Karas
0 siblings, 1 reply; 10+ messages in thread
From: Janusz Krzysztofik @ 2025-03-06 11:40 UTC (permalink / raw)
To: Krzysztof Karas
Cc: intel-gfx, Jani Nikula, Andi Shyti, dri-devel, Joonas Lahtinen,
Rodrigo Vivi, Tvrtko Ursulin, Chris Wilson, Lucas De Marchi,
Alan Previn, Ashutosh Dixit, Daniele Ceraolo Spurio
Hi Krzysztof,
Thank you for looking at it.
On Thursday, 6 March 2025 12:00:40 CET Krzysztof Karas wrote:
> Hi Janusz,
>
> throughout the series you modify the code right after
> introducing it.
Yes, that split among patches reflects my way of getting to a solution that
not only resolves the issue but also tries to address comments I got and take
care of resulting code clarity. That's why I mentioned the possibility of
squashing one or more follow-ups with the initial patch. Patch 1/4 alone is a
minimal fix that actually resolves the issues. The rest is only about
satisfying Andi's comments (patch 2/4) and simplifying the code (patches
3-4/4) that we may or may not want to apply or squash.
> How about changing the order of things a bit:
> 1) order the functions in a symmetrical way between
> register/unregister steps and group them as you see necessary,
> (At that point you would not be fixing the issue yet, but
> prepare the code for further changes)
Please note that I still haven't achieved full symmetry. If I only had clues
from authors of patches that introduced asymmetry on why they did it that way
then I would think of reordering the steps to achieve full symmetry, each in a
separate patch, together with meaningful justification and possibly
alternative solutions to issues that asymmetry was trying to address. Without
those clues, more work on analysis and more testing is needed, I believe, and
that would be still more beyond the scope of a quick fix I initially intended.
>
> 2) then introduce the new flag along with all the labels needed
> for clean unregistration.
The flag, or a single global point of indication if device registration
succeeded or not, was an idea suggested by Andi, and now objected by Jani from
the display code PoV, so not a final solution.
BTW, have you seen v1 of the series[1]? How do you find it, compared to v2/3?
Thanks,
Janusz
[1] https://lore.kernel.org/dri-devel/20250206180927.2237256-5-janusz.krzysztofik@linux.intel.com/
>
> I think that way you could reduce number of patches (and changes
> in code needing review) while also fixing the original issue.
>
> Overall, I believe this is a good effort and much needed change
> in registration and unregistering process.
>
> Best Regards,
> Krzysztof
>
> > Starting with commit ec3e00b4ee27 ("drm/i915: stop registering if
> > drm_dev_register() fails"), we may return from i915_driver_register()
> > immediately, skipping remaining registration steps. However, the
> > _unregister() counterpart called at device remove knows nothing about that
> > skip and executes reverts of all those steps. As a consequence, a number
> > of kernel warnings that taint the kernel are triggered.
> >
> > Introduce a flag that indicates device registration status and raise it on
> > device registration success. As a minimum (first patch), when that flag
> > is found not set while unregistering the driver, jump over those reverts
> > of registration steps omitted after device registration failure that are
> > not ready for being called unconditionally (and trigger the kernel
> > warnings).
> >
> > With the second patch, also jump over reverts of other driver registration
> > steps that were not called due to device registration failure. Some
> > unregister function calls, found implementing additional steps beyond the
> > register reverts, are still executed.
> >
> > To simplify i915_driver_unregister() code, the third patch makes sure
> > reverts of driver registration steps executed before potentially
> > unsuccessful device registration are symmetrically called after
> > the device unplug.
> >
> > Finally, the last patch further simplifies the i915_driver_unregister()
> > code by moving two required steps down, right after device unplug.
> >
> > The first patch may be squashed with one or more of its follow-ups if so
> > decided.
> >
> > Janusz Krzysztofik (4):
> > drm/i915: Skip harmful unregister steps if not registered
> > drm/i915: Omit unnecessary driver unregister steps
> > drm/i915: Fix asymmetry in PMU register/unregister step order
> > drm/i915: Group not skipped unregister steps
> >
> > drivers/gpu/drm/i915/gt/intel_gt.c | 6 ++++++
> > drivers/gpu/drm/i915/i915_driver.c | 18 ++++++++++++------
> > drivers/gpu/drm/i915/i915_drv.h | 2 ++
> > 3 files changed, 20 insertions(+), 6 deletions(-)
> >
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 0/4] drm/i915: Fix harmfull driver register/unregister assymetry
2025-03-06 11:40 ` Janusz Krzysztofik
@ 2025-03-06 13:49 ` Krzysztof Karas
0 siblings, 0 replies; 10+ messages in thread
From: Krzysztof Karas @ 2025-03-06 13:49 UTC (permalink / raw)
To: Janusz Krzysztofik
Cc: intel-gfx, Jani Nikula, Andi Shyti, dri-devel, Joonas Lahtinen,
Rodrigo Vivi, Tvrtko Ursulin, Chris Wilson, Lucas De Marchi,
Alan Previn, Ashutosh Dixit, Daniele Ceraolo Spurio
Hi Janusz,
thanks for a quick response.
> > throughout the series you modify the code right after
> > introducing it.
>
> Yes, that split among patches reflects my way of getting to a solution that
> not only resolves the issue but also tries to address comments I got and take
> care of resulting code clarity. That's why I mentioned the possibility of
> squashing one or more follow-ups with the initial patch. Patch 1/4 alone is a
> minimal fix that actually resolves the issues. The rest is only about
> satisfying Andi's comments (patch 2/4) and simplifying the code (patches
> 3-4/4) that we may or may not want to apply or squash.
So this is more about an optional cleanup then. If the goal is
to get all of these patches merged together, I think it would
still be a bit more readable from git log perspective, if you
introduced the code in more straightforward way (previous review
comments on v1 and v2 are sound, I just think that doing the
cleanup first and then applying required fix for the issue
would result in less shuffling and smaller delta).
> > How about changing the order of things a bit:
> > 1) order the functions in a symmetrical way between
> > register/unregister steps and group them as you see necessary,
> > (At that point you would not be fixing the issue yet, but
> > prepare the code for further changes)
>
> Please note that I still haven't achieved full symmetry. If I only had clues
> from authors of patches that introduced asymmetry on why they did it that way
> then I would think of reordering the steps to achieve full symmetry, each in a
> separate patch, together with meaningful justification and possibly
> alternative solutions to issues that asymmetry was trying to address. Without
> those clues, more work on analysis and more testing is needed, I believe, and
> that would be still more beyond the scope of a quick fix I initially intended.
Fair enough. I didn't mean to suggest a full symmetry here: just
as much of it as you are sure will work.
> >
> > 2) then introduce the new flag along with all the labels needed
> > for clean unregistration.
>
> The flag, or a single global point of indication if device registration
> succeeded or not, was an idea suggested by Andi, and now objected by Jani from
> the display code PoV, so not a final solution.
Let's wait for Jani's opinion on that matter.
> BTW, have you seen v1 of the series[1]? How do you find it, compared to v2/3?
I think the flag is more explicit compared to v1 and the addition
of cleanup "by the way" of fixing the actual issue is a nice
addition compared to v2.
Best Regards,
Krzysztof
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-03-06 13:50 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-05 19:11 [PATCH v3 0/4] drm/i915: Fix harmfull driver register/unregister assymetry Janusz Krzysztofik
2025-03-05 19:11 ` [PATCH v3 1/4] drm/i915: Skip harmful unregister steps if not registered Janusz Krzysztofik
2025-03-06 10:25 ` Jani Nikula
2025-03-06 10:50 ` Janusz Krzysztofik
2025-03-05 19:11 ` [PATCH v3 2/4] drm/i915: Omit unnecessary driver unregister steps Janusz Krzysztofik
2025-03-05 19:11 ` [PATCH v3 3/4] drm/i915: Fix asymmetry in PMU register/unregister step order Janusz Krzysztofik
2025-03-05 19:11 ` [PATCH v3 4/4] drm/i915: Group not skipped unregister steps Janusz Krzysztofik
2025-03-06 11:00 ` [PATCH v3 0/4] drm/i915: Fix harmfull driver register/unregister assymetry Krzysztof Karas
2025-03-06 11:40 ` Janusz Krzysztofik
2025-03-06 13:49 ` Krzysztof Karas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).