* [PATCH] media:v4l2-async:debugfs for registered subdevices
@ 2026-03-13 7:58 luo.liu
2026-03-13 10:28 ` Sakari Ailus
0 siblings, 1 reply; 11+ messages in thread
From: luo.liu @ 2026-03-13 7:58 UTC (permalink / raw)
To: sakari.ailus, mchehab; +Cc: linux-media, linux-kernel, luo.liu.linux
Add a new debugfs file "registered_subdevices" under the "v4l2-async"
directory to display all registered subdevices in the subdev_list. This
helps with debugging by providing a clear view of all currently registered
V4L2 subdevices.
The new file displays each subdevice's name and device path (if available).
Signed-off-by: luo.liu <luo.liu.linux@163.com>
---
drivers/media/v4l2-core/v4l2-async.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/drivers/media/v4l2-core/v4l2-async.c b/drivers/media/v4l2-core/v4l2-async.c
index 888a2e213b08..0f90ee268d99 100644
--- a/drivers/media/v4l2-core/v4l2-async.c
+++ b/drivers/media/v4l2-core/v4l2-async.c
@@ -966,6 +966,25 @@ static int pending_subdevs_show(struct seq_file *s, void *data)
}
DEFINE_SHOW_ATTRIBUTE(pending_subdevs);
+static int registered_subdevs_show(struct seq_file *s, void *data)
+{
+ struct v4l2_subdev *sd;
+
+ mutex_lock(&list_lock);
+
+ list_for_each_entry(sd, &subdev_list, async_list) {
+ seq_printf(s, "%s", sd->name);
+ if (sd->dev)
+ seq_printf(s, " (dev: %s)", dev_name(sd->dev));
+ seq_putc(s, '\n');
+ }
+
+ mutex_unlock(&list_lock);
+
+ return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(registered_subdevs);
+
static struct dentry *v4l2_async_debugfs_dir;
static int __init v4l2_async_init(void)
@@ -974,6 +993,9 @@ static int __init v4l2_async_init(void)
debugfs_create_file("pending_async_subdevices", 0444,
v4l2_async_debugfs_dir, NULL,
&pending_subdevs_fops);
+ debugfs_create_file("registered_subdevices", 0444,
+ v4l2_async_debugfs_dir, NULL,
+ ®istered_subdevs_fops);
return 0;
}
base-commit: 5c9e55fecf9365890c64f14761a80f9413a3b1d1
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] media:v4l2-async:debugfs for registered subdevices
2026-03-13 7:58 [PATCH] media:v4l2-async:debugfs for registered subdevices luo.liu
@ 2026-03-13 10:28 ` Sakari Ailus
2026-03-13 11:21 ` luo.liu.linux
0 siblings, 1 reply; 11+ messages in thread
From: Sakari Ailus @ 2026-03-13 10:28 UTC (permalink / raw)
To: luo.liu; +Cc: mchehab, linux-media, linux-kernel
Hi Luo,
On Fri, Mar 13, 2026 at 03:58:24PM +0800, luo.liu wrote:
> Add a new debugfs file "registered_subdevices" under the "v4l2-async"
> directory to display all registered subdevices in the subdev_list. This
> helps with debugging by providing a clear view of all currently registered
> V4L2 subdevices.
Could you elaborate a little how has providing this information over
debugfs helped you?
--
Regards,
Sakari Ailus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re:Re: [PATCH] media:v4l2-async:debugfs for registered subdevices
2026-03-13 10:28 ` Sakari Ailus
@ 2026-03-13 11:21 ` luo.liu.linux
2026-03-13 11:42 ` Sakari Ailus
0 siblings, 1 reply; 11+ messages in thread
From: luo.liu.linux @ 2026-03-13 11:21 UTC (permalink / raw)
To: Sakari Ailus; +Cc: mchehab, linux-media, linux-kernel
Hi Sakari,
Thank you very much for your reply.
When building a pipeline in the camera subsystem via Media Entities
(e.g., Sensor -> DPHY -> MIPI-CSI2 -> ISP), it is crucial to verify that the registration
and unregistration processes for each sub-device (subdev) driver within the pipeline
are functioning correctly.
Specifically, taking a sensor as an example of a subdev:
Registration Verification: Upon successfully loading the driver using insmod xxx_sensor.ko,
the corresponding sensor subdev should appear in the subdev_list.
Unregistration Verification: Upon successfully unloading the driver using rmmod xxx_sensor.ko,
the corresponding subdev entry should be removed from the subdev_list.
The following test log in my board:
root@cix-localhost:~/upload# cat /sys/kernel/debug/v4l2-async/registered_subdevices
root@cix-localhost:~/upload#
root@cix-localhost:~/upload# insmod lt7911uxc.ko
root@cix-localhost:~/upload#
root@cix-localhost:~/upload# cat /sys/kernel/debug/v4l2-async/registered_subdevices
LT7911UXC 0-0043 (dev: 0-0043)
root@cix-localhost:~/upload#
root@cix-localhost:~/upload# rmmod lt7911uxc.ko
root@cix-localhost:~/upload#
root@cix-localhost:~/upload# cat /sys/kernel/debug/v4l2-async/registered_subdevices
root@cix-localhost:~/upload#
root@cix-localhost:~/upload#
Regards,
Luo Liu
At 2026-03-13 18:28:45, "Sakari Ailus" <sakari.ailus@linux.intel.com> wrote:
>Hi Luo,
>
>On Fri, Mar 13, 2026 at 03:58:24PM +0800, luo.liu wrote:
>> Add a new debugfs file "registered_subdevices" under the "v4l2-async"
>> directory to display all registered subdevices in the subdev_list. This
>> helps with debugging by providing a clear view of all currently registered
>> V4L2 subdevices.
>
>Could you elaborate a little how has providing this information over
>debugfs helped you?
>
>--
>Regards,
>
>Sakari Ailus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Re: [PATCH] media:v4l2-async:debugfs for registered subdevices
2026-03-13 11:21 ` luo.liu.linux
@ 2026-03-13 11:42 ` Sakari Ailus
2026-03-13 13:50 ` luo.liu.linux
0 siblings, 1 reply; 11+ messages in thread
From: Sakari Ailus @ 2026-03-13 11:42 UTC (permalink / raw)
To: luo.liu.linux; +Cc: mchehab, linux-media, linux-kernel
Hi Luo,
On Fri, Mar 13, 2026 at 07:21:42PM +0800, luo.liu.linux wrote:
>
> Hi Sakari,
>
> Thank you very much for your reply.
>
> When building a pipeline in the camera subsystem via Media Entities
>
> (e.g., Sensor -> DPHY -> MIPI-CSI2 -> ISP), it is crucial to verify that the registration
>
> and unregistration processes for each sub-device (subdev) driver within the pipeline
>
> are functioning correctly.
You don't need a debugfs interface for that, do you? We have a large number
of things that can go wrong that are much more complicated than this (and
there's no debugfs interface to verify those either, no, largely because it
wouldn't be meaningful).
--
Regards,
Sakari Ailus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re:Re: Re: [PATCH] media:v4l2-async:debugfs for registered subdevices
2026-03-13 11:42 ` Sakari Ailus
@ 2026-03-13 13:50 ` luo.liu.linux
2026-03-16 16:48 ` Sakari Ailus
0 siblings, 1 reply; 11+ messages in thread
From: luo.liu.linux @ 2026-03-13 13:50 UTC (permalink / raw)
To: Sakari Ailus; +Cc: mchehab, linux-media, linux-kernel
Hi Sakari,
Apologies if my previous explanation wasn't clear enough.
To clarify, the primary goal of this interface is not merely to verify if insmod/rmmod succeeds,
but to validate the correctness of the asynchronous subdevice registration and unregistration paths,
specifically ensuring that resource allocation and reclamation are handled properly.
I would like to share a real-world scenario that motivated this patch:
We had a camera subsystem pipeline like sensor -> dphy -> mipi-csi2 -> isp
subdevice driver that appeared to function perfectly for six months. insmod and rmmod completed without any errors,
and the system seemed stable during normal operation. However, just before a major release, a QA engineer performed
stress testing involving rapid, repeated cycles of insmod and rmmod, which eventually triggered a kernel crash.
During the debugging process, I inspected the internal global lists:
static LIST_HEAD(subdev_list);
static LIST_HEAD(notifier_list);
By dumping the subdev_list via this debugfs interface, I discovered that a D-PHY subdevice entry remained in the list even
after its driver was unloaded. Crucially, the output explicitly showed the device name, allowing me to immediately pinpoint
the D-PHY driver as the culprit, rather than blindly troubleshooting other components in the pipeline (such as the sensor or ISP).
This was the critical clue that led me to the root cause:
The D-PHY subdriver's remove function was missing a call to v4l2_async_cleanup(sd). Consequently, the subdevice was never properly
unregistered from the async framework, leading to a use-after-free or stale pointer issue during the stress test.
Without this debugfs interface, detecting such "silent" registration leaks is extremely difficult.
The driver loads and unloads without reporting errors, and standard logs (dmesg) often provide
no indication that an entry was left behind in the core framework's list until a crash occurs under specific timing conditions.
Given this experience, I believe this interface provides a vital visibility point for engineers to:
1,Verify that subdevices are correctly removed from the global list upon driver unload.
2,Catch missing cleanup calls (like v4l2_async_cleanup) early in the development cycle, rather than discovering them through random crashes in stress testing.
I hope this context clarifies why I consider this debugfs interface meaningful and necessary for robust driver development.
Best regards,
Luo
At 2026-03-13 19:42:42, "Sakari Ailus" <sakari.ailus@linux.intel.com> wrote:
>Hi Luo,
>
>On Fri, Mar 13, 2026 at 07:21:42PM +0800, luo.liu.linux wrote:
>>
>> Hi Sakari,
>>
>> Thank you very much for your reply.
>>
>> When building a pipeline in the camera subsystem via Media Entities
>>
>> (e.g., Sensor -> DPHY -> MIPI-CSI2 -> ISP), it is crucial to verify that the registration
>>
>> and unregistration processes for each sub-device (subdev) driver within the pipeline
>>
>> are functioning correctly.
>
>You don't need a debugfs interface for that, do you? We have a large number
>of things that can go wrong that are much more complicated than this (and
>there's no debugfs interface to verify those either, no, largely because it
>wouldn't be meaningful).
>
>--
>Regards,
>
>Sakari Ailus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Re: Re: [PATCH] media:v4l2-async:debugfs for registered subdevices
2026-03-13 13:50 ` luo.liu.linux
@ 2026-03-16 16:48 ` Sakari Ailus
2026-03-17 3:37 ` luo.liu.linux
0 siblings, 1 reply; 11+ messages in thread
From: Sakari Ailus @ 2026-03-16 16:48 UTC (permalink / raw)
To: luo.liu.linux; +Cc: mchehab, linux-media, linux-kernel
Hi Luo,
On Fri, Mar 13, 2026 at 09:50:56PM +0800, luo.liu.linux wrote:
>
> Hi Sakari,
>
> Apologies if my previous explanation wasn't clear enough.
>
> To clarify, the primary goal of this interface is not merely to verify if insmod/rmmod succeeds,
> but to validate the correctness of the asynchronous subdevice registration and unregistration paths,
> specifically ensuring that resource allocation and reclamation are handled properly.
>
> I would like to share a real-world scenario that motivated this patch:
>
> We had a camera subsystem pipeline like sensor -> dphy -> mipi-csi2 -> isp
> subdevice driver that appeared to function perfectly for six months. insmod and rmmod completed without any errors,
> and the system seemed stable during normal operation. However, just before a major release, a QA engineer performed
> stress testing involving rapid, repeated cycles of insmod and rmmod, which eventually triggered a kernel crash.
>
> During the debugging process, I inspected the internal global lists:
>
> static LIST_HEAD(subdev_list);
> static LIST_HEAD(notifier_list);
>
> By dumping the subdev_list via this debugfs interface, I discovered that a D-PHY subdevice entry remained in the list even
> after its driver was unloaded. Crucially, the output explicitly showed the device name, allowing me to immediately pinpoint
> the D-PHY driver as the culprit, rather than blindly troubleshooting other components in the pipeline (such as the sensor or ISP).
>
> This was the critical clue that led me to the root cause:
>
> The D-PHY subdriver's remove function was missing a call to v4l2_async_cleanup(sd). Consequently, the subdevice was never properly
> unregistered from the async framework, leading to a use-after-free or stale pointer issue during the stress test.
>
> Without this debugfs interface, detecting such "silent" registration leaks is extremely difficult.
> The driver loads and unloads without reporting errors, and standard logs (dmesg) often provide
> no indication that an entry was left behind in the core framework's list until a crash occurs under specific timing conditions.
>
>
> Given this experience, I believe this interface provides a vital visibility point for engineers to:
>
> 1,Verify that subdevices are correctly removed from the global list upon driver unload.
> 2,Catch missing cleanup calls (like v4l2_async_cleanup) early in the development cycle, rather than discovering them through random crashes in stress testing.
I guess you'd have found this with either KASAN or linked list debugging?
--
Sakari Ailus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re:Re: Re: Re: [PATCH] media:v4l2-async:debugfs for registered subdevices
2026-03-16 16:48 ` Sakari Ailus
@ 2026-03-17 3:37 ` luo.liu.linux
2026-03-17 8:21 ` Sakari Ailus
0 siblings, 1 reply; 11+ messages in thread
From: luo.liu.linux @ 2026-03-17 3:37 UTC (permalink / raw)
To: Sakari Ailus; +Cc: mchehab, linux-media, linux-kernel
Hi Sakari,
You are absolutely right. For an experienced kernel developer like yourself, tools like KASAN and CONFIG_DEBUG_LIST are second nature and incredibly effective for pinpointing such issues.
I truly admire your expertise in leveraging these advanced debugging mechanisms.
However, I think it is important to consider the reality for many junior driver developers (myself included). We often lack the deep intuition and extensive experience required to wield these
powerful tools effectively in every scenario. More often than not, we still rely on primitive methods: struggling to reproduce intermittent crashes, scattering printk logs everywhere, and manually
tracing execution paths. This process is extremely time-consuming and often yields no clear conclusions for "silent" resource leaks.
While I am actively working to improve my skills and learn to use these advanced tools more proficiently. I remain convinced that providing such a simple, intuitive interface offers a necessary
supplement by serving as a low-barrier entry point for developers.
I hope this perspective clarifies why I believe this small change can bring a bit of convenience to a broader range of driver developers.
Best regards,
Luo
At 2026-03-17 00:48:07, "Sakari Ailus" <sakari.ailus@linux.intel.com> wrote:
>Hi Luo,
>
>On Fri, Mar 13, 2026 at 09:50:56PM +0800, luo.liu.linux wrote:
>>
>> Hi Sakari,
>>
>> Apologies if my previous explanation wasn't clear enough.
>>
>> To clarify, the primary goal of this interface is not merely to verify if insmod/rmmod succeeds,
>> but to validate the correctness of the asynchronous subdevice registration and unregistration paths,
>> specifically ensuring that resource allocation and reclamation are handled properly.
>>
>> I would like to share a real-world scenario that motivated this patch:
>>
>> We had a camera subsystem pipeline like sensor -> dphy -> mipi-csi2 -> isp
>> subdevice driver that appeared to function perfectly for six months. insmod and rmmod completed without any errors,
>> and the system seemed stable during normal operation. However, just before a major release, a QA engineer performed
>> stress testing involving rapid, repeated cycles of insmod and rmmod, which eventually triggered a kernel crash.
>>
>> During the debugging process, I inspected the internal global lists:
>>
>> static LIST_HEAD(subdev_list);
>> static LIST_HEAD(notifier_list);
>>
>> By dumping the subdev_list via this debugfs interface, I discovered that a D-PHY subdevice entry remained in the list even
>> after its driver was unloaded. Crucially, the output explicitly showed the device name, allowing me to immediately pinpoint
>> the D-PHY driver as the culprit, rather than blindly troubleshooting other components in the pipeline (such as the sensor or ISP).
>>
>> This was the critical clue that led me to the root cause:
>>
>> The D-PHY subdriver's remove function was missing a call to v4l2_async_cleanup(sd). Consequently, the subdevice was never properly
>> unregistered from the async framework, leading to a use-after-free or stale pointer issue during the stress test.
>>
>> Without this debugfs interface, detecting such "silent" registration leaks is extremely difficult.
>> The driver loads and unloads without reporting errors, and standard logs (dmesg) often provide
>> no indication that an entry was left behind in the core framework's list until a crash occurs under specific timing conditions.
>>
>>
>> Given this experience, I believe this interface provides a vital visibility point for engineers to:
>>
>> 1,Verify that subdevices are correctly removed from the global list upon driver unload.
>> 2,Catch missing cleanup calls (like v4l2_async_cleanup) early in the development cycle, rather than discovering them through random crashes in stress testing.
>
>I guess you'd have found this with either KASAN or linked list debugging?
>
>--
>Sakari Ailus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Re: Re: Re: [PATCH] media:v4l2-async:debugfs for registered subdevices
2026-03-17 3:37 ` luo.liu.linux
@ 2026-03-17 8:21 ` Sakari Ailus
2026-03-17 11:14 ` luo.liu.linux
0 siblings, 1 reply; 11+ messages in thread
From: Sakari Ailus @ 2026-03-17 8:21 UTC (permalink / raw)
To: luo.liu.linux; +Cc: mchehab, linux-media, linux-kernel
Hi Luo,
On Tue, Mar 17, 2026 at 11:37:52AM +0800, luo.liu.linux wrote:
>
>
> Hi Sakari,
>
> You are absolutely right. For an experienced kernel developer like
> yourself, tools like KASAN and CONFIG_DEBUG_LIST are second nature and
> incredibly effective for pinpointing such issues. I truly admire your
> expertise in leveraging these advanced debugging mechanisms.
>
> However, I think it is important to consider the reality for many
> junior driver developers (myself included). We often lack the deep
> intuition and extensive experience required to wield these powerful tools
> effectively in every scenario. More often than not, we still rely on
> primitive methods: struggling to reproduce intermittent crashes,
> scattering printk logs everywhere, and manually tracing execution paths.
> This process is extremely time-consuming and often yields no clear
> conclusions for "silent" resource leaks.
>
> While I am actively working to improve my skills and learn to use
> these advanced tools more proficiently. I remain convinced that providing
> such a simple, intuitive interface offers a necessary supplement by
> serving as a low-barrier entry point for developers.
>
> I hope this perspective clarifies why I believe this small change
> can bring a bit of convenience to a broader range of driver developers.
Just enable KASAN and list debugging in the future. New interfaces like
this won't improve things at large.
--
Kind regards,
Sakari Ailus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re:Re: Re: Re: Re: [PATCH] media:v4l2-async:debugfs for registered subdevices
2026-03-17 8:21 ` Sakari Ailus
@ 2026-03-17 11:14 ` luo.liu.linux
2026-03-19 20:30 ` Laurent Pinchart
0 siblings, 1 reply; 11+ messages in thread
From: luo.liu.linux @ 2026-03-17 11:14 UTC (permalink / raw)
To: Sakari Ailus; +Cc: mchehab, linux-media, linux-kernel
Hi Sakari,
The existing pending_async_subdevices interface provides excellent visibility into the notifier_list (the 'waiter' side).
To achieve full symmetry and complete debuggability, we should also expose the subdev_list (the 'provider' side).These two views solve different problems:
1 Notifier List: Diagnoses why binding is stalled (missing sub-devices).
2 Subdev List: Diagnoses state inconsistencies (e.g., sub-devices present but unmatched) and verifies resource cleanup upon unbind.
From practical experience, lacking visibility into subdev_list makes it difficult to distinguish between a sub-device probe failure and an async matching failure.
Adding this interface would provide a holistic view of the async engine's state, which has proven essential for rapid issue localization in complex driver stacks.
Kind regards,
Luo
At 2026-03-17 16:21:31, "Sakari Ailus" <sakari.ailus@linux.intel.com> wrote:
>Hi Luo,
>
>On Tue, Mar 17, 2026 at 11:37:52AM +0800, luo.liu.linux wrote:
>>
>>
>> Hi Sakari,
>>
>> You are absolutely right. For an experienced kernel developer like
>> yourself, tools like KASAN and CONFIG_DEBUG_LIST are second nature and
>> incredibly effective for pinpointing such issues. I truly admire your
>> expertise in leveraging these advanced debugging mechanisms.
>>
>> However, I think it is important to consider the reality for many
>> junior driver developers (myself included). We often lack the deep
>> intuition and extensive experience required to wield these powerful tools
>> effectively in every scenario. More often than not, we still rely on
>> primitive methods: struggling to reproduce intermittent crashes,
>> scattering printk logs everywhere, and manually tracing execution paths.
>> This process is extremely time-consuming and often yields no clear
>> conclusions for "silent" resource leaks.
>>
>> While I am actively working to improve my skills and learn to use
>> these advanced tools more proficiently. I remain convinced that providing
>> such a simple, intuitive interface offers a necessary supplement by
>> serving as a low-barrier entry point for developers.
>>
>> I hope this perspective clarifies why I believe this small change
>> can bring a bit of convenience to a broader range of driver developers.
>
>Just enable KASAN and list debugging in the future. New interfaces like
>this won't improve things at large.
>
>--
>Kind regards,
>
>Sakari Ailus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Re: Re: Re: Re: [PATCH] media:v4l2-async:debugfs for registered subdevices
2026-03-17 11:14 ` luo.liu.linux
@ 2026-03-19 20:30 ` Laurent Pinchart
2026-03-20 6:52 ` luo.liu.linux
0 siblings, 1 reply; 11+ messages in thread
From: Laurent Pinchart @ 2026-03-19 20:30 UTC (permalink / raw)
To: luo.liu.linux; +Cc: Sakari Ailus, mchehab, linux-media, linux-kernel
On Tue, Mar 17, 2026 at 07:14:43PM +0800, luo.liu.linux wrote:
>
> Hi Sakari,
>
> The existing pending_async_subdevices interface provides excellent
> visibility into the notifier_list (the 'waiter' side).
>
> To achieve full symmetry and complete debuggability, we should also
> expose the subdev_list (the 'provider' side).These two views solve
> different problems:
>
> 1 Notifier List: Diagnoses why binding is stalled (missing sub-devices).
>
> 2 Subdev List: Diagnoses state inconsistencies (e.g., sub-devices
> present but unmatched) and verifies resource cleanup upon unbind.
>
> From practical experience, lacking visibility into subdev_list makes
> it difficult to distinguish between a sub-device probe failure and an
> async matching failure.
>
> Adding this interface would provide a holistic view of the async
> engine's state, which has proven essential for rapid issue
> localization in complex driver stacks.
I agree with Sakari here. There are plenty of other debugging tools in
the kernel that can be used to diagnose the kind of issues you've
described. I think this patch adds more noise than value.
> At 2026-03-17 16:21:31, Sakari Ailus wrote:
> > On Tue, Mar 17, 2026 at 11:37:52AM +0800, luo.liu.linux wrote:
> >>
> >> Hi Sakari,
> >>
> >> You are absolutely right. For an experienced kernel developer like
> >> yourself, tools like KASAN and CONFIG_DEBUG_LIST are second nature and
> >> incredibly effective for pinpointing such issues. I truly admire your
> >> expertise in leveraging these advanced debugging mechanisms.
> >>
> >> However, I think it is important to consider the reality for many
> >> junior driver developers (myself included). We often lack the deep
> >> intuition and extensive experience required to wield these powerful tools
> >> effectively in every scenario. More often than not, we still rely on
> >> primitive methods: struggling to reproduce intermittent crashes,
> >> scattering printk logs everywhere, and manually tracing execution paths.
> >> This process is extremely time-consuming and often yields no clear
> >> conclusions for "silent" resource leaks.
> >>
> >> While I am actively working to improve my skills and learn to use
> >> these advanced tools more proficiently. I remain convinced that providing
> >> such a simple, intuitive interface offers a necessary supplement by
> >> serving as a low-barrier entry point for developers.
> >>
> >> I hope this perspective clarifies why I believe this small change
> >> can bring a bit of convenience to a broader range of driver developers.
> >
> > Just enable KASAN and list debugging in the future. New interfaces like
> > this won't improve things at large.
--
Regards,
Laurent Pinchart
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re:Re: Re: Re: Re: Re: [PATCH] media:v4l2-async:debugfs for registered subdevices
2026-03-19 20:30 ` Laurent Pinchart
@ 2026-03-20 6:52 ` luo.liu.linux
0 siblings, 0 replies; 11+ messages in thread
From: luo.liu.linux @ 2026-03-20 6:52 UTC (permalink / raw)
To: Laurent Pinchart, Sakari Ailus; +Cc: mchehab, linux-media, linux-kernel
Hi Laurent ,Sakari,
1,Current debugging tools (KASAN, CONFIG_DEBUG_LIST) operate on a passive defense model:
they only alert when explicit errors occur, such as corrupted pointers or illegal memory access. They are fundamentally blind to "logical omissions"—states where data should have been removed but wasn't.
Consider the scenario where a driver forgets to call v4l2_async_unregister_subdev():
List Debug remains silent: Since no list_del() is executed, the sd->async_list node remains structurally intact within subdev_list, with valid prev and next links.
KASAN remains silent: It detects access to freed memory, not the existence of dangling pointers. If the orphaned node is never traversed after the driver frees its memory, no Use-After-Free is triggered.
In the scenario I described in my previous email,the bug remains dormant and invisible during normal operation. The driver could run flawlessly for years, only revealing the issue under specific stress conditions
2,I noticed that the implementation of v4l2_async_unregister_subdev() reveals a critical state dependency:
void v4l2_async_unregister_subdev(struct v4l2_subdev *sd)
{
// ...
if (!sd->async_list.next)
return; // Guard check implies the node must be linked to proceed
// ...
}
This guard check (if (!sd->async_list.next)) highlights the fragility of the state machine. If a driver mismanages its lifecycle or simply omits this call, the subdev remains permanently stranded in subdev_list. This is a logical consistency error—the data structure is valid but semantically incorrect—rather than a memory corruption issue
3,Together with pending_subdevs_show, this interface provides a holistic view of the subsystem's health. It enables teams to proactively identify logical flaws during the development cycle, eliminating the reliance on luck or stress tests to uncover these deep-seated state management bugs
Regards,
Luo
At 2026-03-20 04:30:37, "Laurent Pinchart" <laurent.pinchart@ideasonboard.com> wrote:
>On Tue, Mar 17, 2026 at 07:14:43PM +0800, luo.liu.linux wrote:
>>
>> Hi Sakari,
>>
>> The existing pending_async_subdevices interface provides excellent
>> visibility into the notifier_list (the 'waiter' side).
>>
>> To achieve full symmetry and complete debuggability, we should also
>> expose the subdev_list (the 'provider' side).These two views solve
>> different problems:
>>
>> 1 Notifier List: Diagnoses why binding is stalled (missing sub-devices).
>>
>> 2 Subdev List: Diagnoses state inconsistencies (e.g., sub-devices
>> present but unmatched) and verifies resource cleanup upon unbind.
>>
>> From practical experience, lacking visibility into subdev_list makes
>> it difficult to distinguish between a sub-device probe failure and an
>> async matching failure.
>>
>> Adding this interface would provide a holistic view of the async
>> engine's state, which has proven essential for rapid issue
>> localization in complex driver stacks.
>
>I agree with Sakari here. There are plenty of other debugging tools in
>the kernel that can be used to diagnose the kind of issues you've
>described. I think this patch adds more noise than value.
>
>--
>Regards,
>
>Laurent Pinchart
On Fri, Mar 13, 2026 at 09:50:56PM +0800, luo.liu.linux wrote:
>
> Hi Sakari,
>
> Apologies if my previous explanation wasn't clear enough.
>
> To clarify, the primary goal of this interface is not merely to verify if insmod/rmmod succeeds,
> but to validate the correctness of the asynchronous subdevice registration and unregistration paths,
> specifically ensuring that resource allocation and reclamation are handled properly.
>
> I would like to share a real-world scenario that motivated this patch:
>
> We had a camera subsystem pipeline like sensor -> dphy -> mipi-csi2 -> isp
> subdevice driver that appeared to function perfectly for six months. insmod and rmmod completed without any errors,
> and the system seemed stable during normal operation. However, just before a major release, a QA engineer performed
> stress testing involving rapid, repeated cycles of insmod and rmmod, which eventually triggered a kernel crash.
>
> During the debugging process, I inspected the internal global lists:
>
> static LIST_HEAD(subdev_list);
> static LIST_HEAD(notifier_list);
>
> By dumping the subdev_list via this debugfs interface, I discovered that a D-PHY subdevice entry remained in the list even
> after its driver was unloaded. Crucially, the output explicitly showed the device name, allowing me to immediately pinpoint
> the D-PHY driver as the culprit, rather than blindly troubleshooting other components in the pipeline (such as the sensor or ISP).
>
> This was the critical clue that led me to the root cause:
>
> The D-PHY subdriver's remove function was missing a call to v4l2_async_cleanup(sd). Consequently, the subdevice was never properly
> unregistered from the async framework, leading to a use-after-free or stale pointer issue during the stress test.
>
> Without this debugfs interface, detecting such "silent" registration leaks is extremely difficult.
> The driver loads and unloads without reporting errors, and standard logs (dmesg) often provide
> no indication that an entry was left behind in the core framework's list until a crash occurs under specific timing conditions.
>
>
> Given this experience, I believe this interface provides a vital visibility point for engineers to:
>
> 1,Verify that subdevices are correctly removed from the global list upon driver unload.
> 2,Catch missing cleanup calls (like v4l2_async_cleanup) early in the development cycle, rather than discovering them through random crashes in stress testing.
I guess you'd have found this with either KASAN or linked list debugging?
--
Sakari Ailus
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-03-20 6:54 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-13 7:58 [PATCH] media:v4l2-async:debugfs for registered subdevices luo.liu
2026-03-13 10:28 ` Sakari Ailus
2026-03-13 11:21 ` luo.liu.linux
2026-03-13 11:42 ` Sakari Ailus
2026-03-13 13:50 ` luo.liu.linux
2026-03-16 16:48 ` Sakari Ailus
2026-03-17 3:37 ` luo.liu.linux
2026-03-17 8:21 ` Sakari Ailus
2026-03-17 11:14 ` luo.liu.linux
2026-03-19 20:30 ` Laurent Pinchart
2026-03-20 6:52 ` luo.liu.linux
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox