* [PATCH RFC] [media] v4l: async: don't bomb out on ->complete failure @ 2017-09-29 21:33 Russell King 2017-09-30 9:20 ` Mauro Carvalho Chehab 0 siblings, 1 reply; 3+ messages in thread From: Russell King @ 2017-09-29 21:33 UTC (permalink / raw) To: Mauro Carvalho Chehab; +Cc: linux-media When a subdev is being registered via v4l2_async_register_subdev(), we test to see if we have all the components, and if so, we call the ->complete() handler. However, the error handling is broken - if notifier->complete() returns an error, we propagate the error out of v4l2_async_register_subdev(), but leaving the subdev bound and on the notifier's "done" list. Drivers calling v4l2_async_register_subdev() assume that this means that v4l2_async_register_subdev() failed, which causes probe failure and the memory backing the subdev to be freed. At this point, we now have a subdev on the notifier's done list which has been freed. If we then try to remove the notifier via v4l2_async_notifier_unregister(), we walk the notifier's done list, cleaning up the subdevs - and hit the freed subdev, causing a kernel oops such as: Unable to handle kernel paging request at virtual address 6e6f6994 pgd = d039c000 [6e6f6994] *pgd=00000000 Internal error: Oops: 5 [#1] SMP ARM Modules linked in: mux_mmio caam_jr snd_soc_imx_sgtl5000 snd_soc_fsl_asoc_card snd_soc_imx_spdif imx_media_csi(C) ci_hdrc_imx snd_soc_imx_audmux imx6_mipi_csi2(C) imx219 snd_soc_sgtl5000 ci_hdrc usbmisc_imx udc_core caam video_mux mux_core imx_media_ic(C) imx_sdma imx_media_vdic(C) imx_media_capture(C) imx2_wdt snd_soc_fsl_ssi snd_soc_fsl_spdif imx_pcm_dma coda v4l2_mem2mem videobuf2_v4l2 imx_vdoa videobuf2_dma_contig videobuf2_core imx_thermal videobuf2_vmalloc videobuf2_memops imx_media(C-) imx_media_common(C) v4l2_fwnode nfsd rc_pinnacle_pctv_hd dw_hdmi_ahb_audio dw_hdmi_cec etnaviv CPU: 1 PID: 8039 Comm: rmmod Tainted: G C 4.14.0-rc1+ #2194 Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) task: ee522700 task.stack: e5f68000 PC is at __lock_acquire+0x98/0x13f4 LR is at lock_acquire+0xd8/0x250 pc : [<c008cb00>] lr : [<c008e828>] psr: 200e0093 sp : e5f69d28 ip : e5f68000 fp : e5f69da4 r10: 00000000 r9 : 00000000 r8 : 6e6f6994 r7 : 00000000 r6 : c0aa4f64 r5 : ee522700 r4 : c13f4abc r3 : 00000000 r2 : 00000001 r1 : 00000000 r0 : 6e6f6994 Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 2039c04a DAC: 00000051 Process rmmod (pid: 8039, stack limit = 0xe5f68210) Backtrace: [<c008ca68>] (__lock_acquire) from [<c008e828>] (lock_acquire+0xd8/0x250) [<c008e750>] (lock_acquire) from [<c0749734>] (_raw_spin_lock+0x34/0x44) [<c0749700>] (_raw_spin_lock) from [<c05155b4>] (v4l2_device_unregister_subdev+0x2c/0xb0) [<c0515588>] (v4l2_device_unregister_subdev) from [<c051f4d0>] (v4l2_async_cleanup+0x14/0x40) [<c051f4bc>] (v4l2_async_cleanup) from [<c051f6dc>] (v4l2_async_notifier_unregister+0xa8/0x1ec) [<c051f634>] (v4l2_async_notifier_unregister) from [<bf05802c>] (imx_media_remove+0x2c/0x54 [imx_media]) [<bf058000>] (imx_media_remove [imx_media]) from [<c043d3a8>] (platform_drv_remove+0x2c/0x44) [<c043d37c>] (platform_drv_remove) from [<c043b8d8>] (device_release_driver_internal+0x158/0x1f8) [<c043b780>] (device_release_driver_internal) from [<c043b9d4>] (driver_detach+0x40/0x74) [<c043b994>] (driver_detach) from [<c043aadc>] (bus_remove_driver+0x54/0x98) [<c043aa88>] (bus_remove_driver) from [<c043c5c8>] (driver_unregister+0x30/0x50) [<c043c598>] (driver_unregister) from [<c043d274>] (platform_driver_unregister+0x14/0x18) [<c043d260>] (platform_driver_unregister) from [<bf059508>] (imx_media_pdrv_exit+0x14/0x1c [imx_media]) [<bf0594f4>] (imx_media_pdrv_exit [imx_media]) from [<c00d9218>] (SyS_delete_module+0x170/0x1c0) [<c00d90a8>] (SyS_delete_module) from [<c00103c0>] (ret_fast_syscall+0x0/0x1c) Code: 1a00000e e3a00000 e24bd028 e89daff0 (e5980000) ---[ end trace 97732329ac63e5ae ]--- Avoid this by reporting an error to the kernel message log about the failure, rather than silently propagating the error from ->complete() and causing this latent use-after-free oops. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> --- drivers/media/v4l2-core/v4l2-async.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/media/v4l2-core/v4l2-async.c b/drivers/media/v4l2-core/v4l2-async.c index d741a8e0fdac..8d221f4a8a8d 100644 --- a/drivers/media/v4l2-core/v4l2-async.c +++ b/drivers/media/v4l2-core/v4l2-async.c @@ -122,8 +122,12 @@ static int v4l2_async_test_notify(struct v4l2_async_notifier *notifier, /* Move from the global subdevice list to notifier's done */ list_move(&sd->async_list, ¬ifier->done); - if (list_empty(¬ifier->waiting) && notifier->complete) - return notifier->complete(notifier); + if (list_empty(¬ifier->waiting) && notifier->complete) { + int ret = notifier->complete(notifier); + if (ret) + dev_err(notifier->v4l2_dev->dev, + "complete notifier failed: %d\n", ret); + } return 0; } -- 2.7.4 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH RFC] [media] v4l: async: don't bomb out on ->complete failure 2017-09-29 21:33 [PATCH RFC] [media] v4l: async: don't bomb out on ->complete failure Russell King @ 2017-09-30 9:20 ` Mauro Carvalho Chehab 2017-10-02 10:27 ` Sakari Ailus 0 siblings, 1 reply; 3+ messages in thread From: Mauro Carvalho Chehab @ 2017-09-30 9:20 UTC (permalink / raw) To: Russell King; +Cc: Mauro Carvalho Chehab, linux-media, Hans Verkuil Em Fri, 29 Sep 2017 22:33:36 +0100 Russell King <rmk+kernel@armlinux.org.uk> escreveu: > When a subdev is being registered via v4l2_async_register_subdev(), we > test to see if we have all the components, and if so, we call the > ->complete() handler. > > However, the error handling is broken - if notifier->complete() returns > an error, we propagate the error out of v4l2_async_register_subdev(), > but leaving the subdev bound and on the notifier's "done" list. > > Drivers calling v4l2_async_register_subdev() assume that this means > that v4l2_async_register_subdev() failed, which causes probe failure > and the memory backing the subdev to be freed. > > At this point, we now have a subdev on the notifier's done list which > has been freed. If we then try to remove the notifier via > v4l2_async_notifier_unregister(), we walk the notifier's done list, > cleaning up the subdevs - and hit the freed subdev, causing a kernel > oops such as: > > Unable to handle kernel paging request at virtual address 6e6f6994 > pgd = d039c000 > [6e6f6994] *pgd=00000000 > Internal error: Oops: 5 [#1] SMP ARM > Modules linked in: mux_mmio caam_jr snd_soc_imx_sgtl5000 snd_soc_fsl_asoc_card > snd_soc_imx_spdif imx_media_csi(C) ci_hdrc_imx snd_soc_imx_audmux > imx6_mipi_csi2(C) imx219 snd_soc_sgtl5000 ci_hdrc usbmisc_imx udc_core > caam video_mux mux_core imx_media_ic(C) imx_sdma imx_media_vdic(C) > imx_media_capture(C) imx2_wdt snd_soc_fsl_ssi snd_soc_fsl_spdif > imx_pcm_dma coda v4l2_mem2mem videobuf2_v4l2 imx_vdoa videobuf2_dma_contig > videobuf2_core imx_thermal videobuf2_vmalloc videobuf2_memops imx_media(C-) > imx_media_common(C) v4l2_fwnode nfsd rc_pinnacle_pctv_hd dw_hdmi_ahb_audio > dw_hdmi_cec etnaviv > CPU: 1 PID: 8039 Comm: rmmod Tainted: G C 4.14.0-rc1+ #2194 > Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > task: ee522700 task.stack: e5f68000 > PC is at __lock_acquire+0x98/0x13f4 > LR is at lock_acquire+0xd8/0x250 > pc : [<c008cb00>] lr : [<c008e828>] psr: 200e0093 > sp : e5f69d28 ip : e5f68000 fp : e5f69da4 > r10: 00000000 r9 : 00000000 r8 : 6e6f6994 > r7 : 00000000 r6 : c0aa4f64 r5 : ee522700 r4 : c13f4abc > r3 : 00000000 r2 : 00000001 r1 : 00000000 r0 : 6e6f6994 > Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none > Control: 10c5387d Table: 2039c04a DAC: 00000051 > Process rmmod (pid: 8039, stack limit = 0xe5f68210) > Backtrace: > [<c008ca68>] (__lock_acquire) from [<c008e828>] (lock_acquire+0xd8/0x250) > [<c008e750>] (lock_acquire) from [<c0749734>] (_raw_spin_lock+0x34/0x44) > [<c0749700>] (_raw_spin_lock) from [<c05155b4>] (v4l2_device_unregister_subdev+0x2c/0xb0) > [<c0515588>] (v4l2_device_unregister_subdev) from [<c051f4d0>] (v4l2_async_cleanup+0x14/0x40) > [<c051f4bc>] (v4l2_async_cleanup) from [<c051f6dc>] (v4l2_async_notifier_unregister+0xa8/0x1ec) > [<c051f634>] (v4l2_async_notifier_unregister) from [<bf05802c>] (imx_media_remove+0x2c/0x54 [imx_media]) > [<bf058000>] (imx_media_remove [imx_media]) from [<c043d3a8>] (platform_drv_remove+0x2c/0x44) > [<c043d37c>] (platform_drv_remove) from [<c043b8d8>] (device_release_driver_internal+0x158/0x1f8) > [<c043b780>] (device_release_driver_internal) from [<c043b9d4>] (driver_detach+0x40/0x74) > [<c043b994>] (driver_detach) from [<c043aadc>] (bus_remove_driver+0x54/0x98) > [<c043aa88>] (bus_remove_driver) from [<c043c5c8>] (driver_unregister+0x30/0x50) > [<c043c598>] (driver_unregister) from [<c043d274>] (platform_driver_unregister+0x14/0x18) > [<c043d260>] (platform_driver_unregister) from [<bf059508>] (imx_media_pdrv_exit+0x14/0x1c [imx_media]) > [<bf0594f4>] (imx_media_pdrv_exit [imx_media]) from [<c00d9218>] (SyS_delete_module+0x170/0x1c0) > [<c00d90a8>] (SyS_delete_module) from [<c00103c0>] (ret_fast_syscall+0x0/0x1c) > Code: 1a00000e e3a00000 e24bd028 e89daff0 (e5980000) > ---[ end trace 97732329ac63e5ae ]--- > > Avoid this by reporting an error to the kernel message log about the > failure, rather than silently propagating the error from ->complete() > and causing this latent use-after-free oops. Thanks for tracking this issue! > > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> > --- > drivers/media/v4l2-core/v4l2-async.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/media/v4l2-core/v4l2-async.c b/drivers/media/v4l2-core/v4l2-async.c > index d741a8e0fdac..8d221f4a8a8d 100644 > --- a/drivers/media/v4l2-core/v4l2-async.c > +++ b/drivers/media/v4l2-core/v4l2-async.c > @@ -122,8 +122,12 @@ static int v4l2_async_test_notify(struct v4l2_async_notifier *notifier, > /* Move from the global subdevice list to notifier's done */ > list_move(&sd->async_list, ¬ifier->done); > > - if (list_empty(¬ifier->waiting) && notifier->complete) > - return notifier->complete(notifier); > + if (list_empty(¬ifier->waiting) && notifier->complete) { > + int ret = notifier->complete(notifier); > + if (ret) > + dev_err(notifier->v4l2_dev->dev, > + "complete notifier failed: %d\n", ret); Printing the error here is OK... Yet, v4l2_async_test_notify() should be cleaning up the house and return the error, instead of returning 0, e. g. something like (untested): if (list_empty(¬ifier->waiting) && notifier->complete) { int ret = notifier->complete(notifier); if (ret) { dev_err(notifier->v4l2_dev->dev, "complete notifier failed: %d\n", ret); v4l2_async_cleanup(sd); return (ret); } } Thanks, Mauro ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH RFC] [media] v4l: async: don't bomb out on ->complete failure 2017-09-30 9:20 ` Mauro Carvalho Chehab @ 2017-10-02 10:27 ` Sakari Ailus 0 siblings, 0 replies; 3+ messages in thread From: Sakari Ailus @ 2017-10-02 10:27 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Russell King, Mauro Carvalho Chehab, linux-media, Hans Verkuil Hi Mauro and Russell, On Sat, Sep 30, 2017 at 06:20:49AM -0300, Mauro Carvalho Chehab wrote: > Em Fri, 29 Sep 2017 22:33:36 +0100 > Russell King <rmk+kernel@armlinux.org.uk> escreveu: > > > When a subdev is being registered via v4l2_async_register_subdev(), we > > test to see if we have all the components, and if so, we call the > > ->complete() handler. > > > > However, the error handling is broken - if notifier->complete() returns > > an error, we propagate the error out of v4l2_async_register_subdev(), > > but leaving the subdev bound and on the notifier's "done" list. > > > > Drivers calling v4l2_async_register_subdev() assume that this means > > that v4l2_async_register_subdev() failed, which causes probe failure > > and the memory backing the subdev to be freed. > > > > At this point, we now have a subdev on the notifier's done list which > > has been freed. If we then try to remove the notifier via > > v4l2_async_notifier_unregister(), we walk the notifier's done list, > > cleaning up the subdevs - and hit the freed subdev, causing a kernel > > oops such as: > > > > Unable to handle kernel paging request at virtual address 6e6f6994 > > pgd = d039c000 > > [6e6f6994] *pgd=00000000 > > Internal error: Oops: 5 [#1] SMP ARM > > Modules linked in: mux_mmio caam_jr snd_soc_imx_sgtl5000 snd_soc_fsl_asoc_card > > snd_soc_imx_spdif imx_media_csi(C) ci_hdrc_imx snd_soc_imx_audmux > > imx6_mipi_csi2(C) imx219 snd_soc_sgtl5000 ci_hdrc usbmisc_imx udc_core > > caam video_mux mux_core imx_media_ic(C) imx_sdma imx_media_vdic(C) > > imx_media_capture(C) imx2_wdt snd_soc_fsl_ssi snd_soc_fsl_spdif > > imx_pcm_dma coda v4l2_mem2mem videobuf2_v4l2 imx_vdoa videobuf2_dma_contig > > videobuf2_core imx_thermal videobuf2_vmalloc videobuf2_memops imx_media(C-) > > imx_media_common(C) v4l2_fwnode nfsd rc_pinnacle_pctv_hd dw_hdmi_ahb_audio > > dw_hdmi_cec etnaviv > > CPU: 1 PID: 8039 Comm: rmmod Tainted: G C 4.14.0-rc1+ #2194 > > Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > > task: ee522700 task.stack: e5f68000 > > PC is at __lock_acquire+0x98/0x13f4 > > LR is at lock_acquire+0xd8/0x250 > > pc : [<c008cb00>] lr : [<c008e828>] psr: 200e0093 > > sp : e5f69d28 ip : e5f68000 fp : e5f69da4 > > r10: 00000000 r9 : 00000000 r8 : 6e6f6994 > > r7 : 00000000 r6 : c0aa4f64 r5 : ee522700 r4 : c13f4abc > > r3 : 00000000 r2 : 00000001 r1 : 00000000 r0 : 6e6f6994 > > Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none > > Control: 10c5387d Table: 2039c04a DAC: 00000051 > > Process rmmod (pid: 8039, stack limit = 0xe5f68210) > > Backtrace: > > [<c008ca68>] (__lock_acquire) from [<c008e828>] (lock_acquire+0xd8/0x250) > > [<c008e750>] (lock_acquire) from [<c0749734>] (_raw_spin_lock+0x34/0x44) > > [<c0749700>] (_raw_spin_lock) from [<c05155b4>] (v4l2_device_unregister_subdev+0x2c/0xb0) > > [<c0515588>] (v4l2_device_unregister_subdev) from [<c051f4d0>] (v4l2_async_cleanup+0x14/0x40) > > [<c051f4bc>] (v4l2_async_cleanup) from [<c051f6dc>] (v4l2_async_notifier_unregister+0xa8/0x1ec) > > [<c051f634>] (v4l2_async_notifier_unregister) from [<bf05802c>] (imx_media_remove+0x2c/0x54 [imx_media]) > > [<bf058000>] (imx_media_remove [imx_media]) from [<c043d3a8>] (platform_drv_remove+0x2c/0x44) > > [<c043d37c>] (platform_drv_remove) from [<c043b8d8>] (device_release_driver_internal+0x158/0x1f8) > > [<c043b780>] (device_release_driver_internal) from [<c043b9d4>] (driver_detach+0x40/0x74) > > [<c043b994>] (driver_detach) from [<c043aadc>] (bus_remove_driver+0x54/0x98) > > [<c043aa88>] (bus_remove_driver) from [<c043c5c8>] (driver_unregister+0x30/0x50) > > [<c043c598>] (driver_unregister) from [<c043d274>] (platform_driver_unregister+0x14/0x18) > > [<c043d260>] (platform_driver_unregister) from [<bf059508>] (imx_media_pdrv_exit+0x14/0x1c [imx_media]) > > [<bf0594f4>] (imx_media_pdrv_exit [imx_media]) from [<c00d9218>] (SyS_delete_module+0x170/0x1c0) > > [<c00d90a8>] (SyS_delete_module) from [<c00103c0>] (ret_fast_syscall+0x0/0x1c) > > Code: 1a00000e e3a00000 e24bd028 e89daff0 (e5980000) > > ---[ end trace 97732329ac63e5ae ]--- > > > > Avoid this by reporting an error to the kernel message log about the > > failure, rather than silently propagating the error from ->complete() > > and causing this latent use-after-free oops. > > Thanks for tracking this issue! Ouch. We've known errors can't be properly handled from the complete callback but I didn't know it was *that* broken. > > > > > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> > > --- > > drivers/media/v4l2-core/v4l2-async.c | 8 ++++++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/media/v4l2-core/v4l2-async.c b/drivers/media/v4l2-core/v4l2-async.c > > index d741a8e0fdac..8d221f4a8a8d 100644 > > --- a/drivers/media/v4l2-core/v4l2-async.c > > +++ b/drivers/media/v4l2-core/v4l2-async.c > > @@ -122,8 +122,12 @@ static int v4l2_async_test_notify(struct v4l2_async_notifier *notifier, > > /* Move from the global subdevice list to notifier's done */ > > list_move(&sd->async_list, ¬ifier->done); > > > > - if (list_empty(¬ifier->waiting) && notifier->complete) > > - return notifier->complete(notifier); > > + if (list_empty(¬ifier->waiting) && notifier->complete) { > > + int ret = notifier->complete(notifier); > > + if (ret) > > + dev_err(notifier->v4l2_dev->dev, > > + "complete notifier failed: %d\n", ret); > > Printing the error here is OK... > > Yet, v4l2_async_test_notify() should be cleaning up the house and > return the error, instead of returning 0, e. g. something like > (untested): > > if (list_empty(¬ifier->waiting) && notifier->complete) { > int ret = notifier->complete(notifier); > if (ret) { > dev_err(notifier->v4l2_dev->dev, > "complete notifier failed: %d\n", ret); > v4l2_async_cleanup(sd); > return (ret); > } > } That's not quite enough: the complete callback may be called through the notifier registration, too. I'll send a patch(set). -- Regards, Sakari Ailus e-mail: sakari.ailus@iki.fi ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-10-02 10:27 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-09-29 21:33 [PATCH RFC] [media] v4l: async: don't bomb out on ->complete failure Russell King 2017-09-30 9:20 ` Mauro Carvalho Chehab 2017-10-02 10:27 ` Sakari Ailus
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).