* [PATCH RFC] [media] v4l: async: don't bomb out on ->complete failure
@ 2017-09-29 21:33 Russell King
2017-09-30 9:20 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 3+ messages in thread
From: Russell King @ 2017-09-29 21:33 UTC (permalink / raw)
To: Mauro Carvalho Chehab; +Cc: linux-media
When a subdev is being registered via v4l2_async_register_subdev(), we
test to see if we have all the components, and if so, we call the
->complete() handler.
However, the error handling is broken - if notifier->complete() returns
an error, we propagate the error out of v4l2_async_register_subdev(),
but leaving the subdev bound and on the notifier's "done" list.
Drivers calling v4l2_async_register_subdev() assume that this means
that v4l2_async_register_subdev() failed, which causes probe failure
and the memory backing the subdev to be freed.
At this point, we now have a subdev on the notifier's done list which
has been freed. If we then try to remove the notifier via
v4l2_async_notifier_unregister(), we walk the notifier's done list,
cleaning up the subdevs - and hit the freed subdev, causing a kernel
oops such as:
Unable to handle kernel paging request at virtual address 6e6f6994
pgd = d039c000
[6e6f6994] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in: mux_mmio caam_jr snd_soc_imx_sgtl5000 snd_soc_fsl_asoc_card
snd_soc_imx_spdif imx_media_csi(C) ci_hdrc_imx snd_soc_imx_audmux
imx6_mipi_csi2(C) imx219 snd_soc_sgtl5000 ci_hdrc usbmisc_imx udc_core
caam video_mux mux_core imx_media_ic(C) imx_sdma imx_media_vdic(C)
imx_media_capture(C) imx2_wdt snd_soc_fsl_ssi snd_soc_fsl_spdif
imx_pcm_dma coda v4l2_mem2mem videobuf2_v4l2 imx_vdoa videobuf2_dma_contig
videobuf2_core imx_thermal videobuf2_vmalloc videobuf2_memops imx_media(C-)
imx_media_common(C) v4l2_fwnode nfsd rc_pinnacle_pctv_hd dw_hdmi_ahb_audio
dw_hdmi_cec etnaviv
CPU: 1 PID: 8039 Comm: rmmod Tainted: G C 4.14.0-rc1+ #2194
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
task: ee522700 task.stack: e5f68000
PC is at __lock_acquire+0x98/0x13f4
LR is at lock_acquire+0xd8/0x250
pc : [<c008cb00>] lr : [<c008e828>] psr: 200e0093
sp : e5f69d28 ip : e5f68000 fp : e5f69da4
r10: 00000000 r9 : 00000000 r8 : 6e6f6994
r7 : 00000000 r6 : c0aa4f64 r5 : ee522700 r4 : c13f4abc
r3 : 00000000 r2 : 00000001 r1 : 00000000 r0 : 6e6f6994
Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 2039c04a DAC: 00000051
Process rmmod (pid: 8039, stack limit = 0xe5f68210)
Backtrace:
[<c008ca68>] (__lock_acquire) from [<c008e828>] (lock_acquire+0xd8/0x250)
[<c008e750>] (lock_acquire) from [<c0749734>] (_raw_spin_lock+0x34/0x44)
[<c0749700>] (_raw_spin_lock) from [<c05155b4>] (v4l2_device_unregister_subdev+0x2c/0xb0)
[<c0515588>] (v4l2_device_unregister_subdev) from [<c051f4d0>] (v4l2_async_cleanup+0x14/0x40)
[<c051f4bc>] (v4l2_async_cleanup) from [<c051f6dc>] (v4l2_async_notifier_unregister+0xa8/0x1ec)
[<c051f634>] (v4l2_async_notifier_unregister) from [<bf05802c>] (imx_media_remove+0x2c/0x54 [imx_media])
[<bf058000>] (imx_media_remove [imx_media]) from [<c043d3a8>] (platform_drv_remove+0x2c/0x44)
[<c043d37c>] (platform_drv_remove) from [<c043b8d8>] (device_release_driver_internal+0x158/0x1f8)
[<c043b780>] (device_release_driver_internal) from [<c043b9d4>] (driver_detach+0x40/0x74)
[<c043b994>] (driver_detach) from [<c043aadc>] (bus_remove_driver+0x54/0x98)
[<c043aa88>] (bus_remove_driver) from [<c043c5c8>] (driver_unregister+0x30/0x50)
[<c043c598>] (driver_unregister) from [<c043d274>] (platform_driver_unregister+0x14/0x18)
[<c043d260>] (platform_driver_unregister) from [<bf059508>] (imx_media_pdrv_exit+0x14/0x1c [imx_media])
[<bf0594f4>] (imx_media_pdrv_exit [imx_media]) from [<c00d9218>] (SyS_delete_module+0x170/0x1c0)
[<c00d90a8>] (SyS_delete_module) from [<c00103c0>] (ret_fast_syscall+0x0/0x1c)
Code: 1a00000e e3a00000 e24bd028 e89daff0 (e5980000)
---[ end trace 97732329ac63e5ae ]---
Avoid this by reporting an error to the kernel message log about the
failure, rather than silently propagating the error from ->complete()
and causing this latent use-after-free oops.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
drivers/media/v4l2-core/v4l2-async.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/media/v4l2-core/v4l2-async.c b/drivers/media/v4l2-core/v4l2-async.c
index d741a8e0fdac..8d221f4a8a8d 100644
--- a/drivers/media/v4l2-core/v4l2-async.c
+++ b/drivers/media/v4l2-core/v4l2-async.c
@@ -122,8 +122,12 @@ static int v4l2_async_test_notify(struct v4l2_async_notifier *notifier,
/* Move from the global subdevice list to notifier's done */
list_move(&sd->async_list, ¬ifier->done);
- if (list_empty(¬ifier->waiting) && notifier->complete)
- return notifier->complete(notifier);
+ if (list_empty(¬ifier->waiting) && notifier->complete) {
+ int ret = notifier->complete(notifier);
+ if (ret)
+ dev_err(notifier->v4l2_dev->dev,
+ "complete notifier failed: %d\n", ret);
+ }
return 0;
}
--
2.7.4
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH RFC] [media] v4l: async: don't bomb out on ->complete failure
2017-09-29 21:33 [PATCH RFC] [media] v4l: async: don't bomb out on ->complete failure Russell King
@ 2017-09-30 9:20 ` Mauro Carvalho Chehab
2017-10-02 10:27 ` Sakari Ailus
0 siblings, 1 reply; 3+ messages in thread
From: Mauro Carvalho Chehab @ 2017-09-30 9:20 UTC (permalink / raw)
To: Russell King; +Cc: Mauro Carvalho Chehab, linux-media, Hans Verkuil
Em Fri, 29 Sep 2017 22:33:36 +0100
Russell King <rmk+kernel@armlinux.org.uk> escreveu:
> When a subdev is being registered via v4l2_async_register_subdev(), we
> test to see if we have all the components, and if so, we call the
> ->complete() handler.
>
> However, the error handling is broken - if notifier->complete() returns
> an error, we propagate the error out of v4l2_async_register_subdev(),
> but leaving the subdev bound and on the notifier's "done" list.
>
> Drivers calling v4l2_async_register_subdev() assume that this means
> that v4l2_async_register_subdev() failed, which causes probe failure
> and the memory backing the subdev to be freed.
>
> At this point, we now have a subdev on the notifier's done list which
> has been freed. If we then try to remove the notifier via
> v4l2_async_notifier_unregister(), we walk the notifier's done list,
> cleaning up the subdevs - and hit the freed subdev, causing a kernel
> oops such as:
>
> Unable to handle kernel paging request at virtual address 6e6f6994
> pgd = d039c000
> [6e6f6994] *pgd=00000000
> Internal error: Oops: 5 [#1] SMP ARM
> Modules linked in: mux_mmio caam_jr snd_soc_imx_sgtl5000 snd_soc_fsl_asoc_card
> snd_soc_imx_spdif imx_media_csi(C) ci_hdrc_imx snd_soc_imx_audmux
> imx6_mipi_csi2(C) imx219 snd_soc_sgtl5000 ci_hdrc usbmisc_imx udc_core
> caam video_mux mux_core imx_media_ic(C) imx_sdma imx_media_vdic(C)
> imx_media_capture(C) imx2_wdt snd_soc_fsl_ssi snd_soc_fsl_spdif
> imx_pcm_dma coda v4l2_mem2mem videobuf2_v4l2 imx_vdoa videobuf2_dma_contig
> videobuf2_core imx_thermal videobuf2_vmalloc videobuf2_memops imx_media(C-)
> imx_media_common(C) v4l2_fwnode nfsd rc_pinnacle_pctv_hd dw_hdmi_ahb_audio
> dw_hdmi_cec etnaviv
> CPU: 1 PID: 8039 Comm: rmmod Tainted: G C 4.14.0-rc1+ #2194
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> task: ee522700 task.stack: e5f68000
> PC is at __lock_acquire+0x98/0x13f4
> LR is at lock_acquire+0xd8/0x250
> pc : [<c008cb00>] lr : [<c008e828>] psr: 200e0093
> sp : e5f69d28 ip : e5f68000 fp : e5f69da4
> r10: 00000000 r9 : 00000000 r8 : 6e6f6994
> r7 : 00000000 r6 : c0aa4f64 r5 : ee522700 r4 : c13f4abc
> r3 : 00000000 r2 : 00000001 r1 : 00000000 r0 : 6e6f6994
> Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
> Control: 10c5387d Table: 2039c04a DAC: 00000051
> Process rmmod (pid: 8039, stack limit = 0xe5f68210)
> Backtrace:
> [<c008ca68>] (__lock_acquire) from [<c008e828>] (lock_acquire+0xd8/0x250)
> [<c008e750>] (lock_acquire) from [<c0749734>] (_raw_spin_lock+0x34/0x44)
> [<c0749700>] (_raw_spin_lock) from [<c05155b4>] (v4l2_device_unregister_subdev+0x2c/0xb0)
> [<c0515588>] (v4l2_device_unregister_subdev) from [<c051f4d0>] (v4l2_async_cleanup+0x14/0x40)
> [<c051f4bc>] (v4l2_async_cleanup) from [<c051f6dc>] (v4l2_async_notifier_unregister+0xa8/0x1ec)
> [<c051f634>] (v4l2_async_notifier_unregister) from [<bf05802c>] (imx_media_remove+0x2c/0x54 [imx_media])
> [<bf058000>] (imx_media_remove [imx_media]) from [<c043d3a8>] (platform_drv_remove+0x2c/0x44)
> [<c043d37c>] (platform_drv_remove) from [<c043b8d8>] (device_release_driver_internal+0x158/0x1f8)
> [<c043b780>] (device_release_driver_internal) from [<c043b9d4>] (driver_detach+0x40/0x74)
> [<c043b994>] (driver_detach) from [<c043aadc>] (bus_remove_driver+0x54/0x98)
> [<c043aa88>] (bus_remove_driver) from [<c043c5c8>] (driver_unregister+0x30/0x50)
> [<c043c598>] (driver_unregister) from [<c043d274>] (platform_driver_unregister+0x14/0x18)
> [<c043d260>] (platform_driver_unregister) from [<bf059508>] (imx_media_pdrv_exit+0x14/0x1c [imx_media])
> [<bf0594f4>] (imx_media_pdrv_exit [imx_media]) from [<c00d9218>] (SyS_delete_module+0x170/0x1c0)
> [<c00d90a8>] (SyS_delete_module) from [<c00103c0>] (ret_fast_syscall+0x0/0x1c)
> Code: 1a00000e e3a00000 e24bd028 e89daff0 (e5980000)
> ---[ end trace 97732329ac63e5ae ]---
>
> Avoid this by reporting an error to the kernel message log about the
> failure, rather than silently propagating the error from ->complete()
> and causing this latent use-after-free oops.
Thanks for tracking this issue!
>
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> ---
> drivers/media/v4l2-core/v4l2-async.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/media/v4l2-core/v4l2-async.c b/drivers/media/v4l2-core/v4l2-async.c
> index d741a8e0fdac..8d221f4a8a8d 100644
> --- a/drivers/media/v4l2-core/v4l2-async.c
> +++ b/drivers/media/v4l2-core/v4l2-async.c
> @@ -122,8 +122,12 @@ static int v4l2_async_test_notify(struct v4l2_async_notifier *notifier,
> /* Move from the global subdevice list to notifier's done */
> list_move(&sd->async_list, ¬ifier->done);
>
> - if (list_empty(¬ifier->waiting) && notifier->complete)
> - return notifier->complete(notifier);
> + if (list_empty(¬ifier->waiting) && notifier->complete) {
> + int ret = notifier->complete(notifier);
> + if (ret)
> + dev_err(notifier->v4l2_dev->dev,
> + "complete notifier failed: %d\n", ret);
Printing the error here is OK...
Yet, v4l2_async_test_notify() should be cleaning up the house and
return the error, instead of returning 0, e. g. something like
(untested):
if (list_empty(¬ifier->waiting) && notifier->complete) {
int ret = notifier->complete(notifier);
if (ret) {
dev_err(notifier->v4l2_dev->dev,
"complete notifier failed: %d\n", ret);
v4l2_async_cleanup(sd);
return (ret);
}
}
Thanks,
Mauro
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH RFC] [media] v4l: async: don't bomb out on ->complete failure
2017-09-30 9:20 ` Mauro Carvalho Chehab
@ 2017-10-02 10:27 ` Sakari Ailus
0 siblings, 0 replies; 3+ messages in thread
From: Sakari Ailus @ 2017-10-02 10:27 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Russell King, Mauro Carvalho Chehab, linux-media, Hans Verkuil
Hi Mauro and Russell,
On Sat, Sep 30, 2017 at 06:20:49AM -0300, Mauro Carvalho Chehab wrote:
> Em Fri, 29 Sep 2017 22:33:36 +0100
> Russell King <rmk+kernel@armlinux.org.uk> escreveu:
>
> > When a subdev is being registered via v4l2_async_register_subdev(), we
> > test to see if we have all the components, and if so, we call the
> > ->complete() handler.
> >
> > However, the error handling is broken - if notifier->complete() returns
> > an error, we propagate the error out of v4l2_async_register_subdev(),
> > but leaving the subdev bound and on the notifier's "done" list.
> >
> > Drivers calling v4l2_async_register_subdev() assume that this means
> > that v4l2_async_register_subdev() failed, which causes probe failure
> > and the memory backing the subdev to be freed.
> >
> > At this point, we now have a subdev on the notifier's done list which
> > has been freed. If we then try to remove the notifier via
> > v4l2_async_notifier_unregister(), we walk the notifier's done list,
> > cleaning up the subdevs - and hit the freed subdev, causing a kernel
> > oops such as:
> >
> > Unable to handle kernel paging request at virtual address 6e6f6994
> > pgd = d039c000
> > [6e6f6994] *pgd=00000000
> > Internal error: Oops: 5 [#1] SMP ARM
> > Modules linked in: mux_mmio caam_jr snd_soc_imx_sgtl5000 snd_soc_fsl_asoc_card
> > snd_soc_imx_spdif imx_media_csi(C) ci_hdrc_imx snd_soc_imx_audmux
> > imx6_mipi_csi2(C) imx219 snd_soc_sgtl5000 ci_hdrc usbmisc_imx udc_core
> > caam video_mux mux_core imx_media_ic(C) imx_sdma imx_media_vdic(C)
> > imx_media_capture(C) imx2_wdt snd_soc_fsl_ssi snd_soc_fsl_spdif
> > imx_pcm_dma coda v4l2_mem2mem videobuf2_v4l2 imx_vdoa videobuf2_dma_contig
> > videobuf2_core imx_thermal videobuf2_vmalloc videobuf2_memops imx_media(C-)
> > imx_media_common(C) v4l2_fwnode nfsd rc_pinnacle_pctv_hd dw_hdmi_ahb_audio
> > dw_hdmi_cec etnaviv
> > CPU: 1 PID: 8039 Comm: rmmod Tainted: G C 4.14.0-rc1+ #2194
> > Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> > task: ee522700 task.stack: e5f68000
> > PC is at __lock_acquire+0x98/0x13f4
> > LR is at lock_acquire+0xd8/0x250
> > pc : [<c008cb00>] lr : [<c008e828>] psr: 200e0093
> > sp : e5f69d28 ip : e5f68000 fp : e5f69da4
> > r10: 00000000 r9 : 00000000 r8 : 6e6f6994
> > r7 : 00000000 r6 : c0aa4f64 r5 : ee522700 r4 : c13f4abc
> > r3 : 00000000 r2 : 00000001 r1 : 00000000 r0 : 6e6f6994
> > Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
> > Control: 10c5387d Table: 2039c04a DAC: 00000051
> > Process rmmod (pid: 8039, stack limit = 0xe5f68210)
> > Backtrace:
> > [<c008ca68>] (__lock_acquire) from [<c008e828>] (lock_acquire+0xd8/0x250)
> > [<c008e750>] (lock_acquire) from [<c0749734>] (_raw_spin_lock+0x34/0x44)
> > [<c0749700>] (_raw_spin_lock) from [<c05155b4>] (v4l2_device_unregister_subdev+0x2c/0xb0)
> > [<c0515588>] (v4l2_device_unregister_subdev) from [<c051f4d0>] (v4l2_async_cleanup+0x14/0x40)
> > [<c051f4bc>] (v4l2_async_cleanup) from [<c051f6dc>] (v4l2_async_notifier_unregister+0xa8/0x1ec)
> > [<c051f634>] (v4l2_async_notifier_unregister) from [<bf05802c>] (imx_media_remove+0x2c/0x54 [imx_media])
> > [<bf058000>] (imx_media_remove [imx_media]) from [<c043d3a8>] (platform_drv_remove+0x2c/0x44)
> > [<c043d37c>] (platform_drv_remove) from [<c043b8d8>] (device_release_driver_internal+0x158/0x1f8)
> > [<c043b780>] (device_release_driver_internal) from [<c043b9d4>] (driver_detach+0x40/0x74)
> > [<c043b994>] (driver_detach) from [<c043aadc>] (bus_remove_driver+0x54/0x98)
> > [<c043aa88>] (bus_remove_driver) from [<c043c5c8>] (driver_unregister+0x30/0x50)
> > [<c043c598>] (driver_unregister) from [<c043d274>] (platform_driver_unregister+0x14/0x18)
> > [<c043d260>] (platform_driver_unregister) from [<bf059508>] (imx_media_pdrv_exit+0x14/0x1c [imx_media])
> > [<bf0594f4>] (imx_media_pdrv_exit [imx_media]) from [<c00d9218>] (SyS_delete_module+0x170/0x1c0)
> > [<c00d90a8>] (SyS_delete_module) from [<c00103c0>] (ret_fast_syscall+0x0/0x1c)
> > Code: 1a00000e e3a00000 e24bd028 e89daff0 (e5980000)
> > ---[ end trace 97732329ac63e5ae ]---
> >
> > Avoid this by reporting an error to the kernel message log about the
> > failure, rather than silently propagating the error from ->complete()
> > and causing this latent use-after-free oops.
>
> Thanks for tracking this issue!
Ouch. We've known errors can't be properly handled from the complete
callback but I didn't know it was *that* broken.
>
> >
> > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> > ---
> > drivers/media/v4l2-core/v4l2-async.c | 8 ++++++--
> > 1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/media/v4l2-core/v4l2-async.c b/drivers/media/v4l2-core/v4l2-async.c
> > index d741a8e0fdac..8d221f4a8a8d 100644
> > --- a/drivers/media/v4l2-core/v4l2-async.c
> > +++ b/drivers/media/v4l2-core/v4l2-async.c
> > @@ -122,8 +122,12 @@ static int v4l2_async_test_notify(struct v4l2_async_notifier *notifier,
> > /* Move from the global subdevice list to notifier's done */
> > list_move(&sd->async_list, ¬ifier->done);
> >
> > - if (list_empty(¬ifier->waiting) && notifier->complete)
> > - return notifier->complete(notifier);
> > + if (list_empty(¬ifier->waiting) && notifier->complete) {
> > + int ret = notifier->complete(notifier);
> > + if (ret)
> > + dev_err(notifier->v4l2_dev->dev,
> > + "complete notifier failed: %d\n", ret);
>
> Printing the error here is OK...
>
> Yet, v4l2_async_test_notify() should be cleaning up the house and
> return the error, instead of returning 0, e. g. something like
> (untested):
>
> if (list_empty(¬ifier->waiting) && notifier->complete) {
> int ret = notifier->complete(notifier);
> if (ret) {
> dev_err(notifier->v4l2_dev->dev,
> "complete notifier failed: %d\n", ret);
> v4l2_async_cleanup(sd);
> return (ret);
> }
> }
That's not quite enough: the complete callback may be called through the
notifier registration, too. I'll send a patch(set).
--
Regards,
Sakari Ailus
e-mail: sakari.ailus@iki.fi
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-10-02 10:27 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-29 21:33 [PATCH RFC] [media] v4l: async: don't bomb out on ->complete failure Russell King
2017-09-30 9:20 ` Mauro Carvalho Chehab
2017-10-02 10:27 ` Sakari Ailus
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).