* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
2022-10-18 5:40 ` Ricardo Ribalda
@ 2022-10-18 5:42 ` Ricardo Ribalda
2022-10-18 14:46 ` Alan Stern
2022-10-18 15:02 ` Laurent Pinchart
2 siblings, 0 replies; 21+ messages in thread
From: Ricardo Ribalda @ 2022-10-18 5:42 UTC (permalink / raw)
To: Alan Stern, linux
Cc: Nazar Mokrynskyi, Laurent Pinchart, linux-media, linux-usb
using proper Guenter email
On Tue, 18 Oct 2022 at 14:40, Ricardo Ribalda <ribalda@chromium.org> wrote:
>
> Hi
>
> Guenter already provided some patches to fix this issue:
> https://lore.kernel.org/lkml/20200917022547.198090-1-linux@roeck-us.net/
>
> Until we have a solution on the core (or rewrite the kernel in rust
> ;P) , I think we should merge them (or something similar).
>
> I can prepare a patchset merging Guenter set and my "grannular PM"
> https://lore.kernel.org/linux-media/20220920-resend-powersave-v1-0-123aa2ba3836@chromium.org/
>
> It can always be reverted when we reach consensus on how to do it for
> every driver.
>
> Regards!
>
>
> On Tue, 18 Oct 2022 at 06:46, Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > Moving this bug report from bugzilla to the mailing lists.
> >
> > The short description of the bug is that in uvcvideo, disconnect races
> > with starting a video transfer. The race shows up on Nazar's system
> > because of a marginal USB cable which leads to a lot of spontaneous
> > disconnections.
> >
> > On Mon, Oct 17, 2022 at 05:59:48PM +0000, bugzilla-daemon@kernel.org wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=216543
> > >
> > > --- Comment #7 from Nazar Mokrynskyi (nazar@mokrynskyi.com) ---
> > > Created attachment 303022
> > > --> https://bugzilla.kernel.org/attachment.cgi?id=303022&action=edit
> > > Kernel log with uvc-trace patch applied
> >
> > For everyone's information, here is the uvc-trace patch. All it does is
> > add messages to the kernel log when uvcvideo's probe and disconnect
> > routines run, and just before uvc_video_start_transfer() calls
> > usb_set_interface().
> >
> > --- usb-devel/drivers/media/usb/uvc/uvc_video.c
> > +++ usb-devel/drivers/media/usb/uvc/uvc_video.c
> > @@ -1965,6 +1965,7 @@ static int uvc_video_start_transfer(stru
> > "Selecting alternate setting %u (%u B/frame bandwidth)\n",
> > altsetting, best_psize);
> >
> > + dev_info(&intf->dev, "uvc set alt\n");
> > ret = usb_set_interface(stream->dev->udev, intfnum, altsetting);
> > if (ret < 0)
> > return ret;
> > --- usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > +++ usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > @@ -2374,6 +2374,8 @@ static int uvc_probe(struct usb_interfac
> > int function;
> > int ret;
> >
> > + dev_info(&intf->dev, "uvc_probe start\n");
> > +
> > /* Allocate memory for the device and initialize it. */
> > dev = kzalloc(sizeof(*dev), GFP_KERNEL);
> > if (dev == NULL)
> > @@ -2535,6 +2537,7 @@ static void uvc_disconnect(struct usb_in
> > return;
> >
> > uvc_unregister_video(dev);
> > + dev_info(&intf->dev, "uvc_disconnect done\n");
> > kref_put(&dev->ref, uvc_delete);
> > }
> >
> > The output in the kernel log below clearly shows that there is a bug in
> > the uvcvideo driver.
> >
> > > I'm on 6.0.2 and seemingly get this even more frequently with good cable and no
> > > extra adapters. So I patched 6.0.2 with uvc-trace above and reproduced it
> > > within a few minutes.
> > >
> > > USB seems to reset, often camera stops or freezes in the browser, but the light
> > > on the camera itself remains on. Sometimes I can enable/disable/enable camera
> > > for it to reboot, but the last time I did that in the log I got null pointer
> > > de-reference again.
> >
> > Here is the important part of the log:
> >
> > [ 684.746848] usb 8-2.4.4: reset SuperSpeed USB device number 6 using xhci_hcd
> > [ 684.810979] uvcvideo 8-2.4.4:1.0: uvc_probe start
> > [ 684.811032] usb 8-2.4.4: Found UVC 1.00 device Logitech BRIO (046d:085e)
> > [ 684.843413] input: Logitech BRIO as /devices/pci0000:00/0000:00:08.1/0000:59:00.3/usb8/8-2/8-2.4/8-2.4.4/8-2.4.4:1.0/input/input43
> > [ 684.911255] usb 8-2.4.4: current rate 16000 is different from the runtime rate 24000
> > ...
> > [ 743.800368] uvcvideo 8-2.4.4:1.1: uvc set alt
> >
> > This is where an ioctl calls uvc_video_start_transfer.
> >
> > [ 748.654701] usb 8-2.4.4: USB disconnect, device number 6
> > [ 748.714355] uvcvideo 8-2.4.4:1.0: uvc_disconnect done
> >
> > This is where the disconnect starts and finishes
> >
> > [ 748.898340] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > [ 748.898344] #PF: supervisor read access in kernel mode
> > [ 748.898346] #PF: error_code(0x0000) - not-present page
> > [ 748.898347] PGD 0 P4D 0
> > [ 748.898349] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [ 748.898351] CPU: 16 PID: 11890 Comm: VideoCapture Not tainted 6.0.2-x64v2-uvc-trace-xanmod1 #1
> > [ 748.898353] Hardware name: Gigabyte Technology Co., Ltd. B550 VISION D/B550 VISION D, BIOS F15d 07/20/2022
> > [ 748.898354] RIP: 0010:usb_ifnum_to_if+0x35/0x60
> > ...
> > [ 748.898368] Call Trace:
> > [ 748.898370] <TASK>
> > [ 748.898370] usb_hcd_alloc_bandwidth+0x240/0x370
> > [ 748.898375] usb_set_interface+0x122/0x350
> > [ 748.898378] uvc_video_start_transfer.cold+0xd8/0x2ae [uvcvideo]
> > [ 748.898383] uvc_video_start_streaming+0x75/0xd0 [uvcvideo]
> > [ 748.898386] uvc_start_streaming+0x25/0xe0 [uvcvideo]
> > [ 748.898390] vb2_start_streaming+0x86/0x140 [videobuf2_common]
> > [ 748.898393] vb2_core_streamon+0x57/0xc0 [videobuf2_common]
> > [ 748.898395] uvc_queue_streamon+0x25/0x40 [uvcvideo]
> > [ 748.898398] uvc_ioctl_streamon+0x35/0x60 [uvcvideo]
> > [ 748.898401] __video_do_ioctl+0x19a/0x3f0 [videodev]
> >
> > And this proves that uvc_disconnect() returned before the driver was
> > finished accessing the device.
> >
> > I don't know how the driver works or how it tries to prevent this sort
> > of race from occurring, but apparently the strategy isn't working.
> >
> > > Please let me know if there is any other information I can provide and what
> > > could be the root cause of this annoying behavior.
> >
> > At this point I will bow out of the discussion; it's up to the uvcvideo
> > maintainers to investigate further. Maybe they can provide a patch for
> > you to test.
> >
> > Alan Stern
>
>
>
> --
> Ricardo Ribalda
--
Ricardo Ribalda
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
2022-10-18 5:40 ` Ricardo Ribalda
2022-10-18 5:42 ` Ricardo Ribalda
@ 2022-10-18 14:46 ` Alan Stern
2022-10-18 15:02 ` Laurent Pinchart
2 siblings, 0 replies; 21+ messages in thread
From: Alan Stern @ 2022-10-18 14:46 UTC (permalink / raw)
To: Ricardo Ribalda
Cc: inux, Nazar Mokrynskyi, Laurent Pinchart, linux-media, linux-usb
On Tue, Oct 18, 2022 at 02:40:44PM +0900, Ricardo Ribalda wrote:
> Hi
>
> Guenter already provided some patches to fix this issue:
> https://lore.kernel.org/lkml/20200917022547.198090-1-linux@roeck-us.net/
>
> Until we have a solution on the core (or rewrite the kernel in rust
> ;P) , I think we should merge them (or something similar).
>
> I can prepare a patchset merging Guenter set and my "grannular PM"
> https://lore.kernel.org/linux-media/20220920-resend-powersave-v1-0-123aa2ba3836@chromium.org/
>
> It can always be reverted when we reach consensus on how to do it for
> every driver.
>
> Regards!
I was going to say "Wow! Quick work, thank you!" until I noticed that
the patch set was submitted over two years ago and still hasn't been
merged. :-(
Well, Nazar, at least now you have an answer and a potential fix.
Alan Stern
> On Tue, 18 Oct 2022 at 06:46, Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > Moving this bug report from bugzilla to the mailing lists.
> >
> > The short description of the bug is that in uvcvideo, disconnect races
> > with starting a video transfer. The race shows up on Nazar's system
> > because of a marginal USB cable which leads to a lot of spontaneous
> > disconnections.
> >
> > On Mon, Oct 17, 2022 at 05:59:48PM +0000, bugzilla-daemon@kernel.org wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=216543
> > >
> > > --- Comment #7 from Nazar Mokrynskyi (nazar@mokrynskyi.com) ---
> > > Created attachment 303022
> > > --> https://bugzilla.kernel.org/attachment.cgi?id=303022&action=edit
> > > Kernel log with uvc-trace patch applied
> >
> > For everyone's information, here is the uvc-trace patch. All it does is
> > add messages to the kernel log when uvcvideo's probe and disconnect
> > routines run, and just before uvc_video_start_transfer() calls
> > usb_set_interface().
> >
> > --- usb-devel/drivers/media/usb/uvc/uvc_video.c
> > +++ usb-devel/drivers/media/usb/uvc/uvc_video.c
> > @@ -1965,6 +1965,7 @@ static int uvc_video_start_transfer(stru
> > "Selecting alternate setting %u (%u B/frame bandwidth)\n",
> > altsetting, best_psize);
> >
> > + dev_info(&intf->dev, "uvc set alt\n");
> > ret = usb_set_interface(stream->dev->udev, intfnum, altsetting);
> > if (ret < 0)
> > return ret;
> > --- usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > +++ usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > @@ -2374,6 +2374,8 @@ static int uvc_probe(struct usb_interfac
> > int function;
> > int ret;
> >
> > + dev_info(&intf->dev, "uvc_probe start\n");
> > +
> > /* Allocate memory for the device and initialize it. */
> > dev = kzalloc(sizeof(*dev), GFP_KERNEL);
> > if (dev == NULL)
> > @@ -2535,6 +2537,7 @@ static void uvc_disconnect(struct usb_in
> > return;
> >
> > uvc_unregister_video(dev);
> > + dev_info(&intf->dev, "uvc_disconnect done\n");
> > kref_put(&dev->ref, uvc_delete);
> > }
> >
> > The output in the kernel log below clearly shows that there is a bug in
> > the uvcvideo driver.
> >
> > > I'm on 6.0.2 and seemingly get this even more frequently with good cable and no
> > > extra adapters. So I patched 6.0.2 with uvc-trace above and reproduced it
> > > within a few minutes.
> > >
> > > USB seems to reset, often camera stops or freezes in the browser, but the light
> > > on the camera itself remains on. Sometimes I can enable/disable/enable camera
> > > for it to reboot, but the last time I did that in the log I got null pointer
> > > de-reference again.
> >
> > Here is the important part of the log:
> >
> > [ 684.746848] usb 8-2.4.4: reset SuperSpeed USB device number 6 using xhci_hcd
> > [ 684.810979] uvcvideo 8-2.4.4:1.0: uvc_probe start
> > [ 684.811032] usb 8-2.4.4: Found UVC 1.00 device Logitech BRIO (046d:085e)
> > [ 684.843413] input: Logitech BRIO as /devices/pci0000:00/0000:00:08.1/0000:59:00.3/usb8/8-2/8-2.4/8-2.4.4/8-2.4.4:1.0/input/input43
> > [ 684.911255] usb 8-2.4.4: current rate 16000 is different from the runtime rate 24000
> > ...
> > [ 743.800368] uvcvideo 8-2.4.4:1.1: uvc set alt
> >
> > This is where an ioctl calls uvc_video_start_transfer.
> >
> > [ 748.654701] usb 8-2.4.4: USB disconnect, device number 6
> > [ 748.714355] uvcvideo 8-2.4.4:1.0: uvc_disconnect done
> >
> > This is where the disconnect starts and finishes
> >
> > [ 748.898340] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > [ 748.898344] #PF: supervisor read access in kernel mode
> > [ 748.898346] #PF: error_code(0x0000) - not-present page
> > [ 748.898347] PGD 0 P4D 0
> > [ 748.898349] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [ 748.898351] CPU: 16 PID: 11890 Comm: VideoCapture Not tainted 6.0.2-x64v2-uvc-trace-xanmod1 #1
> > [ 748.898353] Hardware name: Gigabyte Technology Co., Ltd. B550 VISION D/B550 VISION D, BIOS F15d 07/20/2022
> > [ 748.898354] RIP: 0010:usb_ifnum_to_if+0x35/0x60
> > ...
> > [ 748.898368] Call Trace:
> > [ 748.898370] <TASK>
> > [ 748.898370] usb_hcd_alloc_bandwidth+0x240/0x370
> > [ 748.898375] usb_set_interface+0x122/0x350
> > [ 748.898378] uvc_video_start_transfer.cold+0xd8/0x2ae [uvcvideo]
> > [ 748.898383] uvc_video_start_streaming+0x75/0xd0 [uvcvideo]
> > [ 748.898386] uvc_start_streaming+0x25/0xe0 [uvcvideo]
> > [ 748.898390] vb2_start_streaming+0x86/0x140 [videobuf2_common]
> > [ 748.898393] vb2_core_streamon+0x57/0xc0 [videobuf2_common]
> > [ 748.898395] uvc_queue_streamon+0x25/0x40 [uvcvideo]
> > [ 748.898398] uvc_ioctl_streamon+0x35/0x60 [uvcvideo]
> > [ 748.898401] __video_do_ioctl+0x19a/0x3f0 [videodev]
> >
> > And this proves that uvc_disconnect() returned before the driver was
> > finished accessing the device.
> >
> > I don't know how the driver works or how it tries to prevent this sort
> > of race from occurring, but apparently the strategy isn't working.
> >
> > > Please let me know if there is any other information I can provide and what
> > > could be the root cause of this annoying behavior.
> >
> > At this point I will bow out of the discussion; it's up to the uvcvideo
> > maintainers to investigate further. Maybe they can provide a patch for
> > you to test.
> >
> > Alan Stern
>
>
>
> --
> Ricardo Ribalda
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
2022-10-18 5:40 ` Ricardo Ribalda
2022-10-18 5:42 ` Ricardo Ribalda
2022-10-18 14:46 ` Alan Stern
@ 2022-10-18 15:02 ` Laurent Pinchart
2022-10-19 1:35 ` Ricardo Ribalda
2 siblings, 1 reply; 21+ messages in thread
From: Laurent Pinchart @ 2022-10-18 15:02 UTC (permalink / raw)
To: Ricardo Ribalda
Cc: Alan Stern, inux, Nazar Mokrynskyi, linux-media, linux-usb
hi Ricardo,
On Tue, Oct 18, 2022 at 02:40:44PM +0900, Ricardo Ribalda wrote:
> Hi
>
> Guenter already provided some patches to fix this issue:
> https://lore.kernel.org/lkml/20200917022547.198090-1-linux@roeck-us.net/
>
> Until we have a solution on the core (or rewrite the kernel in rust
> ;P) , I think we should merge them (or something similar).
>
> I can prepare a patchset merging Guenter set and my "grannular PM"
> https://lore.kernel.org/linux-media/20220920-resend-powersave-v1-0-123aa2ba3836@chromium.org/
How about working on a proper fix instead ? :-)
> It can always be reverted when we reach consensus on how to do it for
> every driver.
>
> Regards!
>
>
> On Tue, 18 Oct 2022 at 06:46, Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > Moving this bug report from bugzilla to the mailing lists.
> >
> > The short description of the bug is that in uvcvideo, disconnect races
> > with starting a video transfer. The race shows up on Nazar's system
> > because of a marginal USB cable which leads to a lot of spontaneous
> > disconnections.
> >
> > On Mon, Oct 17, 2022 at 05:59:48PM +0000, bugzilla-daemon@kernel.org wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=216543
> > >
> > > --- Comment #7 from Nazar Mokrynskyi (nazar@mokrynskyi.com) ---
> > > Created attachment 303022
> > > --> https://bugzilla.kernel.org/attachment.cgi?id=303022&action=edit
> > > Kernel log with uvc-trace patch applied
> >
> > For everyone's information, here is the uvc-trace patch. All it does is
> > add messages to the kernel log when uvcvideo's probe and disconnect
> > routines run, and just before uvc_video_start_transfer() calls
> > usb_set_interface().
> >
> > --- usb-devel/drivers/media/usb/uvc/uvc_video.c
> > +++ usb-devel/drivers/media/usb/uvc/uvc_video.c
> > @@ -1965,6 +1965,7 @@ static int uvc_video_start_transfer(stru
> > "Selecting alternate setting %u (%u B/frame bandwidth)\n",
> > altsetting, best_psize);
> >
> > + dev_info(&intf->dev, "uvc set alt\n");
> > ret = usb_set_interface(stream->dev->udev, intfnum, altsetting);
> > if (ret < 0)
> > return ret;
> > --- usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > +++ usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > @@ -2374,6 +2374,8 @@ static int uvc_probe(struct usb_interfac
> > int function;
> > int ret;
> >
> > + dev_info(&intf->dev, "uvc_probe start\n");
> > +
> > /* Allocate memory for the device and initialize it. */
> > dev = kzalloc(sizeof(*dev), GFP_KERNEL);
> > if (dev == NULL)
> > @@ -2535,6 +2537,7 @@ static void uvc_disconnect(struct usb_in
> > return;
> >
> > uvc_unregister_video(dev);
> > + dev_info(&intf->dev, "uvc_disconnect done\n");
> > kref_put(&dev->ref, uvc_delete);
> > }
> >
> > The output in the kernel log below clearly shows that there is a bug in
> > the uvcvideo driver.
> >
> > > I'm on 6.0.2 and seemingly get this even more frequently with good cable and no
> > > extra adapters. So I patched 6.0.2 with uvc-trace above and reproduced it
> > > within a few minutes.
> > >
> > > USB seems to reset, often camera stops or freezes in the browser, but the light
> > > on the camera itself remains on. Sometimes I can enable/disable/enable camera
> > > for it to reboot, but the last time I did that in the log I got null pointer
> > > de-reference again.
> >
> > Here is the important part of the log:
> >
> > [ 684.746848] usb 8-2.4.4: reset SuperSpeed USB device number 6 using xhci_hcd
> > [ 684.810979] uvcvideo 8-2.4.4:1.0: uvc_probe start
> > [ 684.811032] usb 8-2.4.4: Found UVC 1.00 device Logitech BRIO (046d:085e)
> > [ 684.843413] input: Logitech BRIO as /devices/pci0000:00/0000:00:08.1/0000:59:00.3/usb8/8-2/8-2.4/8-2.4.4/8-2.4.4:1.0/input/input43
> > [ 684.911255] usb 8-2.4.4: current rate 16000 is different from the runtime rate 24000
> > ...
> > [ 743.800368] uvcvideo 8-2.4.4:1.1: uvc set alt
> >
> > This is where an ioctl calls uvc_video_start_transfer.
> >
> > [ 748.654701] usb 8-2.4.4: USB disconnect, device number 6
> > [ 748.714355] uvcvideo 8-2.4.4:1.0: uvc_disconnect done
> >
> > This is where the disconnect starts and finishes
> >
> > [ 748.898340] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > [ 748.898344] #PF: supervisor read access in kernel mode
> > [ 748.898346] #PF: error_code(0x0000) - not-present page
> > [ 748.898347] PGD 0 P4D 0
> > [ 748.898349] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [ 748.898351] CPU: 16 PID: 11890 Comm: VideoCapture Not tainted 6.0.2-x64v2-uvc-trace-xanmod1 #1
> > [ 748.898353] Hardware name: Gigabyte Technology Co., Ltd. B550 VISION D/B550 VISION D, BIOS F15d 07/20/2022
> > [ 748.898354] RIP: 0010:usb_ifnum_to_if+0x35/0x60
> > ...
> > [ 748.898368] Call Trace:
> > [ 748.898370] <TASK>
> > [ 748.898370] usb_hcd_alloc_bandwidth+0x240/0x370
> > [ 748.898375] usb_set_interface+0x122/0x350
> > [ 748.898378] uvc_video_start_transfer.cold+0xd8/0x2ae [uvcvideo]
> > [ 748.898383] uvc_video_start_streaming+0x75/0xd0 [uvcvideo]
> > [ 748.898386] uvc_start_streaming+0x25/0xe0 [uvcvideo]
> > [ 748.898390] vb2_start_streaming+0x86/0x140 [videobuf2_common]
> > [ 748.898393] vb2_core_streamon+0x57/0xc0 [videobuf2_common]
> > [ 748.898395] uvc_queue_streamon+0x25/0x40 [uvcvideo]
> > [ 748.898398] uvc_ioctl_streamon+0x35/0x60 [uvcvideo]
> > [ 748.898401] __video_do_ioctl+0x19a/0x3f0 [videodev]
> >
> > And this proves that uvc_disconnect() returned before the driver was
> > finished accessing the device.
> >
> > I don't know how the driver works or how it tries to prevent this sort
> > of race from occurring, but apparently the strategy isn't working.
> >
> > > Please let me know if there is any other information I can provide and what
> > > could be the root cause of this annoying behavior.
> >
> > At this point I will bow out of the discussion; it's up to the uvcvideo
> > maintainers to investigate further. Maybe they can provide a patch for
> > you to test.
> >
> > Alan Stern
>
>
>
> --
> Ricardo Ribalda
--
Regards,
Laurent Pinchart
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
2022-10-18 15:02 ` Laurent Pinchart
@ 2022-10-19 1:35 ` Ricardo Ribalda
2022-10-19 1:44 ` Laurent Pinchart
0 siblings, 1 reply; 21+ messages in thread
From: Ricardo Ribalda @ 2022-10-19 1:35 UTC (permalink / raw)
To: Laurent Pinchart
Cc: Alan Stern, Nazar Mokrynskyi, linux-media, linux-usb, linux,
Tomasz Figa
Hi Laurent
On Wed, 19 Oct 2022 at 00:02, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> hi Ricardo,
>
> On Tue, Oct 18, 2022 at 02:40:44PM +0900, Ricardo Ribalda wrote:
> > Hi
> >
> > Guenter already provided some patches to fix this issue:
> > https://lore.kernel.org/lkml/20200917022547.198090-1-linux@roeck-us.net/
> >
> > Until we have a solution on the core (or rewrite the kernel in rust
> > ;P) , I think we should merge them (or something similar).
> >
> > I can prepare a patchset merging Guenter set and my "grannular PM"
> > https://lore.kernel.org/linux-media/20220920-resend-powersave-v1-0-123aa2ba3836@chromium.org/
>
> How about working on a proper fix instead ? :-)
We already have a fix that has been extensively tested ;P
When put on top of granular PM it is a tiny patch:
https://git.kernel.org/pub/scm/linux/kernel/git/ribalda/linux.git/commit/?h=b4/resend-powersave&id=cf826010bedda38f8faf8d072f95a9ca69ed452d
that can be cleanly reverted when/if we fix it in core.
I would like to avoid that more and more people/distros have
downstream patches on top of uvc to fix real issues just because we
think that it is not the "perfect" solution.
Would you please take a second look at the combined patchset?
Thanks!
>
> > It can always be reverted when we reach consensus on how to do it for
> > every driver.
> >
> > Regards!
> >
> >
> > On Tue, 18 Oct 2022 at 06:46, Alan Stern <stern@rowland.harvard.edu> wrote:
> > >
> > > Moving this bug report from bugzilla to the mailing lists.
> > >
> > > The short description of the bug is that in uvcvideo, disconnect races
> > > with starting a video transfer. The race shows up on Nazar's system
> > > because of a marginal USB cable which leads to a lot of spontaneous
> > > disconnections.
> > >
> > > On Mon, Oct 17, 2022 at 05:59:48PM +0000, bugzilla-daemon@kernel.org wrote:
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=216543
> > > >
> > > > --- Comment #7 from Nazar Mokrynskyi (nazar@mokrynskyi.com) ---
> > > > Created attachment 303022
> > > > --> https://bugzilla.kernel.org/attachment.cgi?id=303022&action=edit
> > > > Kernel log with uvc-trace patch applied
> > >
> > > For everyone's information, here is the uvc-trace patch. All it does is
> > > add messages to the kernel log when uvcvideo's probe and disconnect
> > > routines run, and just before uvc_video_start_transfer() calls
> > > usb_set_interface().
> > >
> > > --- usb-devel/drivers/media/usb/uvc/uvc_video.c
> > > +++ usb-devel/drivers/media/usb/uvc/uvc_video.c
> > > @@ -1965,6 +1965,7 @@ static int uvc_video_start_transfer(stru
> > > "Selecting alternate setting %u (%u B/frame bandwidth)\n",
> > > altsetting, best_psize);
> > >
> > > + dev_info(&intf->dev, "uvc set alt\n");
> > > ret = usb_set_interface(stream->dev->udev, intfnum, altsetting);
> > > if (ret < 0)
> > > return ret;
> > > --- usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > > +++ usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > > @@ -2374,6 +2374,8 @@ static int uvc_probe(struct usb_interfac
> > > int function;
> > > int ret;
> > >
> > > + dev_info(&intf->dev, "uvc_probe start\n");
> > > +
> > > /* Allocate memory for the device and initialize it. */
> > > dev = kzalloc(sizeof(*dev), GFP_KERNEL);
> > > if (dev == NULL)
> > > @@ -2535,6 +2537,7 @@ static void uvc_disconnect(struct usb_in
> > > return;
> > >
> > > uvc_unregister_video(dev);
> > > + dev_info(&intf->dev, "uvc_disconnect done\n");
> > > kref_put(&dev->ref, uvc_delete);
> > > }
> > >
> > > The output in the kernel log below clearly shows that there is a bug in
> > > the uvcvideo driver.
> > >
> > > > I'm on 6.0.2 and seemingly get this even more frequently with good cable and no
> > > > extra adapters. So I patched 6.0.2 with uvc-trace above and reproduced it
> > > > within a few minutes.
> > > >
> > > > USB seems to reset, often camera stops or freezes in the browser, but the light
> > > > on the camera itself remains on. Sometimes I can enable/disable/enable camera
> > > > for it to reboot, but the last time I did that in the log I got null pointer
> > > > de-reference again.
> > >
> > > Here is the important part of the log:
> > >
> > > [ 684.746848] usb 8-2.4.4: reset SuperSpeed USB device number 6 using xhci_hcd
> > > [ 684.810979] uvcvideo 8-2.4.4:1.0: uvc_probe start
> > > [ 684.811032] usb 8-2.4.4: Found UVC 1.00 device Logitech BRIO (046d:085e)
> > > [ 684.843413] input: Logitech BRIO as /devices/pci0000:00/0000:00:08.1/0000:59:00.3/usb8/8-2/8-2.4/8-2.4.4/8-2.4.4:1.0/input/input43
> > > [ 684.911255] usb 8-2.4.4: current rate 16000 is different from the runtime rate 24000
> > > ...
> > > [ 743.800368] uvcvideo 8-2.4.4:1.1: uvc set alt
> > >
> > > This is where an ioctl calls uvc_video_start_transfer.
> > >
> > > [ 748.654701] usb 8-2.4.4: USB disconnect, device number 6
> > > [ 748.714355] uvcvideo 8-2.4.4:1.0: uvc_disconnect done
> > >
> > > This is where the disconnect starts and finishes
> > >
> > > [ 748.898340] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > > [ 748.898344] #PF: supervisor read access in kernel mode
> > > [ 748.898346] #PF: error_code(0x0000) - not-present page
> > > [ 748.898347] PGD 0 P4D 0
> > > [ 748.898349] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > > [ 748.898351] CPU: 16 PID: 11890 Comm: VideoCapture Not tainted 6.0.2-x64v2-uvc-trace-xanmod1 #1
> > > [ 748.898353] Hardware name: Gigabyte Technology Co., Ltd. B550 VISION D/B550 VISION D, BIOS F15d 07/20/2022
> > > [ 748.898354] RIP: 0010:usb_ifnum_to_if+0x35/0x60
> > > ...
> > > [ 748.898368] Call Trace:
> > > [ 748.898370] <TASK>
> > > [ 748.898370] usb_hcd_alloc_bandwidth+0x240/0x370
> > > [ 748.898375] usb_set_interface+0x122/0x350
> > > [ 748.898378] uvc_video_start_transfer.cold+0xd8/0x2ae [uvcvideo]
> > > [ 748.898383] uvc_video_start_streaming+0x75/0xd0 [uvcvideo]
> > > [ 748.898386] uvc_start_streaming+0x25/0xe0 [uvcvideo]
> > > [ 748.898390] vb2_start_streaming+0x86/0x140 [videobuf2_common]
> > > [ 748.898393] vb2_core_streamon+0x57/0xc0 [videobuf2_common]
> > > [ 748.898395] uvc_queue_streamon+0x25/0x40 [uvcvideo]
> > > [ 748.898398] uvc_ioctl_streamon+0x35/0x60 [uvcvideo]
> > > [ 748.898401] __video_do_ioctl+0x19a/0x3f0 [videodev]
> > >
> > > And this proves that uvc_disconnect() returned before the driver was
> > > finished accessing the device.
> > >
> > > I don't know how the driver works or how it tries to prevent this sort
> > > of race from occurring, but apparently the strategy isn't working.
> > >
> > > > Please let me know if there is any other information I can provide and what
> > > > could be the root cause of this annoying behavior.
> > >
> > > At this point I will bow out of the discussion; it's up to the uvcvideo
> > > maintainers to investigate further. Maybe they can provide a patch for
> > > you to test.
> > >
> > > Alan Stern
> >
> >
> >
> > --
> > Ricardo Ribalda
>
> --
> Regards,
>
> Laurent Pinchart
--
Ricardo Ribalda
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
2022-10-19 1:35 ` Ricardo Ribalda
@ 2022-10-19 1:44 ` Laurent Pinchart
2022-10-19 4:22 ` Ricardo Ribalda
0 siblings, 1 reply; 21+ messages in thread
From: Laurent Pinchart @ 2022-10-19 1:44 UTC (permalink / raw)
To: Ricardo Ribalda
Cc: Alan Stern, Nazar Mokrynskyi, linux-media, linux-usb, linux,
Tomasz Figa
Hi Ricardo,
On Wed, Oct 19, 2022 at 10:35:00AM +0900, Ricardo Ribalda wrote:
> On Wed, 19 Oct 2022 at 00:02, Laurent Pinchart wrote:
> > On Tue, Oct 18, 2022 at 02:40:44PM +0900, Ricardo Ribalda wrote:
> > > Hi
> > >
> > > Guenter already provided some patches to fix this issue:
> > > https://lore.kernel.org/lkml/20200917022547.198090-1-linux@roeck-us.net/
> > >
> > > Until we have a solution on the core (or rewrite the kernel in rust
> > > ;P) , I think we should merge them (or something similar).
> > >
> > > I can prepare a patchset merging Guenter set and my "grannular PM"
> > > https://lore.kernel.org/linux-media/20220920-resend-powersave-v1-0-123aa2ba3836@chromium.org/
> >
> > How about working on a proper fix instead ? :-)
>
> We already have a fix that has been extensively tested ;P
>
> When put on top of granular PM it is a tiny patch:
> https://git.kernel.org/pub/scm/linux/kernel/git/ribalda/linux.git/commit/?h=b4/resend-powersave&id=cf826010bedda38f8faf8d072f95a9ca69ed452d
> that can be cleanly reverted when/if we fix it in core.
>
> I would like to avoid that more and more people/distros have
> downstream patches on top of uvc to fix real issues just because we
> think that it is not the "perfect" solution.
And I would like to avoid having to roll out manual changes to all
drivers when the problem can be fixed in the core, just because nobody
can be bothered to spend time to implement a good fix. We don't have to
aim for a solution at the cdev level if that takes too long, an
implementation in V4L2 would be enough to start with.
I'm getting tired of having to reexplain this continuously with nobody
listening. This could have been solved a long time ago.
> Would you please take a second look at the combined patchset?
I will have a look. If I recall correctly, there were some patches in
Guenter's series that I had no issue with, I'll start with those.
> > > It can always be reverted when we reach consensus on how to do it for
> > > every driver.
> > >
> > > Regards!
> > >
> > > On Tue, 18 Oct 2022 at 06:46, Alan Stern wrote:
> > > >
> > > > Moving this bug report from bugzilla to the mailing lists.
> > > >
> > > > The short description of the bug is that in uvcvideo, disconnect races
> > > > with starting a video transfer. The race shows up on Nazar's system
> > > > because of a marginal USB cable which leads to a lot of spontaneous
> > > > disconnections.
> > > >
> > > > On Mon, Oct 17, 2022 at 05:59:48PM +0000, bugzilla-daemon@kernel.org wrote:
> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=216543
> > > > >
> > > > > --- Comment #7 from Nazar Mokrynskyi (nazar@mokrynskyi.com) ---
> > > > > Created attachment 303022
> > > > > --> https://bugzilla.kernel.org/attachment.cgi?id=303022&action=edit
> > > > > Kernel log with uvc-trace patch applied
> > > >
> > > > For everyone's information, here is the uvc-trace patch. All it does is
> > > > add messages to the kernel log when uvcvideo's probe and disconnect
> > > > routines run, and just before uvc_video_start_transfer() calls
> > > > usb_set_interface().
> > > >
> > > > --- usb-devel/drivers/media/usb/uvc/uvc_video.c
> > > > +++ usb-devel/drivers/media/usb/uvc/uvc_video.c
> > > > @@ -1965,6 +1965,7 @@ static int uvc_video_start_transfer(stru
> > > > "Selecting alternate setting %u (%u B/frame bandwidth)\n",
> > > > altsetting, best_psize);
> > > >
> > > > + dev_info(&intf->dev, "uvc set alt\n");
> > > > ret = usb_set_interface(stream->dev->udev, intfnum, altsetting);
> > > > if (ret < 0)
> > > > return ret;
> > > > --- usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > > > +++ usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > > > @@ -2374,6 +2374,8 @@ static int uvc_probe(struct usb_interfac
> > > > int function;
> > > > int ret;
> > > >
> > > > + dev_info(&intf->dev, "uvc_probe start\n");
> > > > +
> > > > /* Allocate memory for the device and initialize it. */
> > > > dev = kzalloc(sizeof(*dev), GFP_KERNEL);
> > > > if (dev == NULL)
> > > > @@ -2535,6 +2537,7 @@ static void uvc_disconnect(struct usb_in
> > > > return;
> > > >
> > > > uvc_unregister_video(dev);
> > > > + dev_info(&intf->dev, "uvc_disconnect done\n");
> > > > kref_put(&dev->ref, uvc_delete);
> > > > }
> > > >
> > > > The output in the kernel log below clearly shows that there is a bug in
> > > > the uvcvideo driver.
> > > >
> > > > > I'm on 6.0.2 and seemingly get this even more frequently with good cable and no
> > > > > extra adapters. So I patched 6.0.2 with uvc-trace above and reproduced it
> > > > > within a few minutes.
> > > > >
> > > > > USB seems to reset, often camera stops or freezes in the browser, but the light
> > > > > on the camera itself remains on. Sometimes I can enable/disable/enable camera
> > > > > for it to reboot, but the last time I did that in the log I got null pointer
> > > > > de-reference again.
> > > >
> > > > Here is the important part of the log:
> > > >
> > > > [ 684.746848] usb 8-2.4.4: reset SuperSpeed USB device number 6 using xhci_hcd
> > > > [ 684.810979] uvcvideo 8-2.4.4:1.0: uvc_probe start
> > > > [ 684.811032] usb 8-2.4.4: Found UVC 1.00 device Logitech BRIO (046d:085e)
> > > > [ 684.843413] input: Logitech BRIO as /devices/pci0000:00/0000:00:08.1/0000:59:00.3/usb8/8-2/8-2.4/8-2.4.4/8-2.4.4:1.0/input/input43
> > > > [ 684.911255] usb 8-2.4.4: current rate 16000 is different from the runtime rate 24000
> > > > ...
> > > > [ 743.800368] uvcvideo 8-2.4.4:1.1: uvc set alt
> > > >
> > > > This is where an ioctl calls uvc_video_start_transfer.
> > > >
> > > > [ 748.654701] usb 8-2.4.4: USB disconnect, device number 6
> > > > [ 748.714355] uvcvideo 8-2.4.4:1.0: uvc_disconnect done
> > > >
> > > > This is where the disconnect starts and finishes
> > > >
> > > > [ 748.898340] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > > > [ 748.898344] #PF: supervisor read access in kernel mode
> > > > [ 748.898346] #PF: error_code(0x0000) - not-present page
> > > > [ 748.898347] PGD 0 P4D 0
> > > > [ 748.898349] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > > > [ 748.898351] CPU: 16 PID: 11890 Comm: VideoCapture Not tainted 6.0.2-x64v2-uvc-trace-xanmod1 #1
> > > > [ 748.898353] Hardware name: Gigabyte Technology Co., Ltd. B550 VISION D/B550 VISION D, BIOS F15d 07/20/2022
> > > > [ 748.898354] RIP: 0010:usb_ifnum_to_if+0x35/0x60
> > > > ...
> > > > [ 748.898368] Call Trace:
> > > > [ 748.898370] <TASK>
> > > > [ 748.898370] usb_hcd_alloc_bandwidth+0x240/0x370
> > > > [ 748.898375] usb_set_interface+0x122/0x350
> > > > [ 748.898378] uvc_video_start_transfer.cold+0xd8/0x2ae [uvcvideo]
> > > > [ 748.898383] uvc_video_start_streaming+0x75/0xd0 [uvcvideo]
> > > > [ 748.898386] uvc_start_streaming+0x25/0xe0 [uvcvideo]
> > > > [ 748.898390] vb2_start_streaming+0x86/0x140 [videobuf2_common]
> > > > [ 748.898393] vb2_core_streamon+0x57/0xc0 [videobuf2_common]
> > > > [ 748.898395] uvc_queue_streamon+0x25/0x40 [uvcvideo]
> > > > [ 748.898398] uvc_ioctl_streamon+0x35/0x60 [uvcvideo]
> > > > [ 748.898401] __video_do_ioctl+0x19a/0x3f0 [videodev]
> > > >
> > > > And this proves that uvc_disconnect() returned before the driver was
> > > > finished accessing the device.
> > > >
> > > > I don't know how the driver works or how it tries to prevent this sort
> > > > of race from occurring, but apparently the strategy isn't working.
> > > >
> > > > > Please let me know if there is any other information I can provide and what
> > > > > could be the root cause of this annoying behavior.
> > > >
> > > > At this point I will bow out of the discussion; it's up to the uvcvideo
> > > > maintainers to investigate further. Maybe they can provide a patch for
> > > > you to test.
--
Regards,
Laurent Pinchart
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
2022-10-19 1:44 ` Laurent Pinchart
@ 2022-10-19 4:22 ` Ricardo Ribalda
2022-10-19 15:08 ` Alan Stern
0 siblings, 1 reply; 21+ messages in thread
From: Ricardo Ribalda @ 2022-10-19 4:22 UTC (permalink / raw)
To: Laurent Pinchart
Cc: Alan Stern, Nazar Mokrynskyi, linux-media, linux-usb, linux,
Tomasz Figa
Hi Laurent
On Wed, 19 Oct 2022 at 10:45, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Ricardo,
>
> On Wed, Oct 19, 2022 at 10:35:00AM +0900, Ricardo Ribalda wrote:
> > On Wed, 19 Oct 2022 at 00:02, Laurent Pinchart wrote:
> > > On Tue, Oct 18, 2022 at 02:40:44PM +0900, Ricardo Ribalda wrote:
> > > > Hi
> > > >
> > > > Guenter already provided some patches to fix this issue:
> > > > https://lore.kernel.org/lkml/20200917022547.198090-1-linux@roeck-us.net/
> > > >
> > > > Until we have a solution on the core (or rewrite the kernel in rust
> > > > ;P) , I think we should merge them (or something similar).
> > > >
> > > > I can prepare a patchset merging Guenter set and my "grannular PM"
> > > > https://lore.kernel.org/linux-media/20220920-resend-powersave-v1-0-123aa2ba3836@chromium.org/
> > >
> > > How about working on a proper fix instead ? :-)
> >
> > We already have a fix that has been extensively tested ;P
> >
> > When put on top of granular PM it is a tiny patch:
> > https://git.kernel.org/pub/scm/linux/kernel/git/ribalda/linux.git/commit/?h=b4/resend-powersave&id=cf826010bedda38f8faf8d072f95a9ca69ed452d
> > that can be cleanly reverted when/if we fix it in core.
> >
> > I would like to avoid that more and more people/distros have
> > downstream patches on top of uvc to fix real issues just because we
> > think that it is not the "perfect" solution.
>
> And I would like to avoid having to roll out manual changes to all
> drivers when the problem can be fixed in the core, just because nobody
> can be bothered to spend time to implement a good fix. We don't have to
> aim for a solution at the cdev level if that takes too long, an
> implementation in V4L2 would be enough to start with.
Do we know what a "good fix" would look like?. This is a race
condition between cdev, v4l2, and usb_driver. The only entity that
knows about the three of them is the driver.
If we "fix" v4l2 to provide a callback to notify the framework about a
"bus disconnect". It can prevent new syscalls, but it cannot interrupt
the current ones.
So this is not something we can easily fix in O(months). Is there
anyone working on it after your LPC presentation?
Until then, landing a 10 lines patch that solves a real fix, that
distros are backporting it already is not a bad compromise....
>
> I'm getting tired of having to reexplain this continuously with nobody
> listening. This could have been solved a long time ago.
People listen, but it is a change that goes across multiple boundaries
>
> > Would you please take a second look at the combined patchset?
>
> I will have a look. If I recall correctly, there were some patches in
> Guenter's series that I had no issue with, I'll start with those.
Thanks, I will post the combined series today.
>
> > > > It can always be reverted when we reach consensus on how to do it for
> > > > every driver.
> > > >
> > > > Regards!
> > > >
> > > > On Tue, 18 Oct 2022 at 06:46, Alan Stern wrote:
> > > > >
> > > > > Moving this bug report from bugzilla to the mailing lists.
> > > > >
> > > > > The short description of the bug is that in uvcvideo, disconnect races
> > > > > with starting a video transfer. The race shows up on Nazar's system
> > > > > because of a marginal USB cable which leads to a lot of spontaneous
> > > > > disconnections.
> > > > >
> > > > > On Mon, Oct 17, 2022 at 05:59:48PM +0000, bugzilla-daemon@kernel.org wrote:
> > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=216543
> > > > > >
> > > > > > --- Comment #7 from Nazar Mokrynskyi (nazar@mokrynskyi.com) ---
> > > > > > Created attachment 303022
> > > > > > --> https://bugzilla.kernel.org/attachment.cgi?id=303022&action=edit
> > > > > > Kernel log with uvc-trace patch applied
> > > > >
> > > > > For everyone's information, here is the uvc-trace patch. All it does is
> > > > > add messages to the kernel log when uvcvideo's probe and disconnect
> > > > > routines run, and just before uvc_video_start_transfer() calls
> > > > > usb_set_interface().
> > > > >
> > > > > --- usb-devel/drivers/media/usb/uvc/uvc_video.c
> > > > > +++ usb-devel/drivers/media/usb/uvc/uvc_video.c
> > > > > @@ -1965,6 +1965,7 @@ static int uvc_video_start_transfer(stru
> > > > > "Selecting alternate setting %u (%u B/frame bandwidth)\n",
> > > > > altsetting, best_psize);
> > > > >
> > > > > + dev_info(&intf->dev, "uvc set alt\n");
> > > > > ret = usb_set_interface(stream->dev->udev, intfnum, altsetting);
> > > > > if (ret < 0)
> > > > > return ret;
> > > > > --- usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > > > > +++ usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > > > > @@ -2374,6 +2374,8 @@ static int uvc_probe(struct usb_interfac
> > > > > int function;
> > > > > int ret;
> > > > >
> > > > > + dev_info(&intf->dev, "uvc_probe start\n");
> > > > > +
> > > > > /* Allocate memory for the device and initialize it. */
> > > > > dev = kzalloc(sizeof(*dev), GFP_KERNEL);
> > > > > if (dev == NULL)
> > > > > @@ -2535,6 +2537,7 @@ static void uvc_disconnect(struct usb_in
> > > > > return;
> > > > >
> > > > > uvc_unregister_video(dev);
> > > > > + dev_info(&intf->dev, "uvc_disconnect done\n");
> > > > > kref_put(&dev->ref, uvc_delete);
> > > > > }
> > > > >
> > > > > The output in the kernel log below clearly shows that there is a bug in
> > > > > the uvcvideo driver.
> > > > >
> > > > > > I'm on 6.0.2 and seemingly get this even more frequently with good cable and no
> > > > > > extra adapters. So I patched 6.0.2 with uvc-trace above and reproduced it
> > > > > > within a few minutes.
> > > > > >
> > > > > > USB seems to reset, often camera stops or freezes in the browser, but the light
> > > > > > on the camera itself remains on. Sometimes I can enable/disable/enable camera
> > > > > > for it to reboot, but the last time I did that in the log I got null pointer
> > > > > > de-reference again.
> > > > >
> > > > > Here is the important part of the log:
> > > > >
> > > > > [ 684.746848] usb 8-2.4.4: reset SuperSpeed USB device number 6 using xhci_hcd
> > > > > [ 684.810979] uvcvideo 8-2.4.4:1.0: uvc_probe start
> > > > > [ 684.811032] usb 8-2.4.4: Found UVC 1.00 device Logitech BRIO (046d:085e)
> > > > > [ 684.843413] input: Logitech BRIO as /devices/pci0000:00/0000:00:08.1/0000:59:00.3/usb8/8-2/8-2.4/8-2.4.4/8-2.4.4:1.0/input/input43
> > > > > [ 684.911255] usb 8-2.4.4: current rate 16000 is different from the runtime rate 24000
> > > > > ...
> > > > > [ 743.800368] uvcvideo 8-2.4.4:1.1: uvc set alt
> > > > >
> > > > > This is where an ioctl calls uvc_video_start_transfer.
> > > > >
> > > > > [ 748.654701] usb 8-2.4.4: USB disconnect, device number 6
> > > > > [ 748.714355] uvcvideo 8-2.4.4:1.0: uvc_disconnect done
> > > > >
> > > > > This is where the disconnect starts and finishes
> > > > >
> > > > > [ 748.898340] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > > > > [ 748.898344] #PF: supervisor read access in kernel mode
> > > > > [ 748.898346] #PF: error_code(0x0000) - not-present page
> > > > > [ 748.898347] PGD 0 P4D 0
> > > > > [ 748.898349] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > > > > [ 748.898351] CPU: 16 PID: 11890 Comm: VideoCapture Not tainted 6.0.2-x64v2-uvc-trace-xanmod1 #1
> > > > > [ 748.898353] Hardware name: Gigabyte Technology Co., Ltd. B550 VISION D/B550 VISION D, BIOS F15d 07/20/2022
> > > > > [ 748.898354] RIP: 0010:usb_ifnum_to_if+0x35/0x60
> > > > > ...
> > > > > [ 748.898368] Call Trace:
> > > > > [ 748.898370] <TASK>
> > > > > [ 748.898370] usb_hcd_alloc_bandwidth+0x240/0x370
> > > > > [ 748.898375] usb_set_interface+0x122/0x350
> > > > > [ 748.898378] uvc_video_start_transfer.cold+0xd8/0x2ae [uvcvideo]
> > > > > [ 748.898383] uvc_video_start_streaming+0x75/0xd0 [uvcvideo]
> > > > > [ 748.898386] uvc_start_streaming+0x25/0xe0 [uvcvideo]
> > > > > [ 748.898390] vb2_start_streaming+0x86/0x140 [videobuf2_common]
> > > > > [ 748.898393] vb2_core_streamon+0x57/0xc0 [videobuf2_common]
> > > > > [ 748.898395] uvc_queue_streamon+0x25/0x40 [uvcvideo]
> > > > > [ 748.898398] uvc_ioctl_streamon+0x35/0x60 [uvcvideo]
> > > > > [ 748.898401] __video_do_ioctl+0x19a/0x3f0 [videodev]
> > > > >
> > > > > And this proves that uvc_disconnect() returned before the driver was
> > > > > finished accessing the device.
> > > > >
> > > > > I don't know how the driver works or how it tries to prevent this sort
> > > > > of race from occurring, but apparently the strategy isn't working.
> > > > >
> > > > > > Please let me know if there is any other information I can provide and what
> > > > > > could be the root cause of this annoying behavior.
> > > > >
> > > > > At this point I will bow out of the discussion; it's up to the uvcvideo
> > > > > maintainers to investigate further. Maybe they can provide a patch for
> > > > > you to test.
>
> --
> Regards,
>
> Laurent Pinchart
--
Ricardo Ribalda
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
2022-10-19 4:22 ` Ricardo Ribalda
@ 2022-10-19 15:08 ` Alan Stern
2022-10-20 0:57 ` Ricardo Ribalda
0 siblings, 1 reply; 21+ messages in thread
From: Alan Stern @ 2022-10-19 15:08 UTC (permalink / raw)
To: Ricardo Ribalda
Cc: Laurent Pinchart, Nazar Mokrynskyi, linux-media, linux-usb, linux,
Tomasz Figa
On Wed, Oct 19, 2022 at 01:22:48PM +0900, Ricardo Ribalda wrote:
> Hi Laurent
>
> On Wed, 19 Oct 2022 at 10:45, Laurent Pinchart
> <laurent.pinchart@ideasonboard.com> wrote:
> > And I would like to avoid having to roll out manual changes to all
> > drivers when the problem can be fixed in the core, just because nobody
> > can be bothered to spend time to implement a good fix. We don't have to
> > aim for a solution at the cdev level if that takes too long, an
> > implementation in V4L2 would be enough to start with.
>
> Do we know what a "good fix" would look like?. This is a race
> condition between cdev, v4l2, and usb_driver. The only entity that
> knows about the three of them is the driver.
>
> If we "fix" v4l2 to provide a callback to notify the framework about a
> "bus disconnect". It can prevent new syscalls, but it cannot interrupt
> the current ones.
It doesn't need to interrupt current syscalls. It merely needs to wait
until the current ones complete (and help them to complete early by
making them aware of the disconnection) and to prevent new ones from
starting.
I have no idea what facility (if any) the framework uses for this
already. However, if it turns out that proper synchronization needs a
new approach, I suggest trying SRCU. It can be viewed in some respects
as a kind of read-write mutex that is highly optimized for rapid
read-locks and -unlocks at the cost of very slow write-locks --
appropriate here since every syscall would need a read-lock whereas
write-locking would be needed only when a disconnect occurs.
Alan Stern
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
2022-10-19 15:08 ` Alan Stern
@ 2022-10-20 0:57 ` Ricardo Ribalda
2022-10-20 14:25 ` Alan Stern
0 siblings, 1 reply; 21+ messages in thread
From: Ricardo Ribalda @ 2022-10-20 0:57 UTC (permalink / raw)
To: Alan Stern
Cc: Laurent Pinchart, Nazar Mokrynskyi, linux-media, linux-usb, linux,
Tomasz Figa
Hi Alan
On Thu, 20 Oct 2022 at 00:08, Alan Stern <stern@rowland.harvard.edu> wrote:
>
> On Wed, Oct 19, 2022 at 01:22:48PM +0900, Ricardo Ribalda wrote:
> > Hi Laurent
> >
> > On Wed, 19 Oct 2022 at 10:45, Laurent Pinchart
> > <laurent.pinchart@ideasonboard.com> wrote:
> > > And I would like to avoid having to roll out manual changes to all
> > > drivers when the problem can be fixed in the core, just because nobody
> > > can be bothered to spend time to implement a good fix. We don't have to
> > > aim for a solution at the cdev level if that takes too long, an
> > > implementation in V4L2 would be enough to start with.
> >
> > Do we know what a "good fix" would look like?. This is a race
> > condition between cdev, v4l2, and usb_driver. The only entity that
> > knows about the three of them is the driver.
> >
> > If we "fix" v4l2 to provide a callback to notify the framework about a
> > "bus disconnect". It can prevent new syscalls, but it cannot interrupt
> > the current ones.
>
> It doesn't need to interrupt current syscalls. It merely needs to wait
> until the current ones complete (and help them to complete early by
> making them aware of the disconnection) and to prevent new ones from
> starting.
>
The USB subsystem is not aware of the current syscalls running for that device,
it just triggers the callback disconnect() to notify the driver that
they are not allowed
anymore to access the hardware.
Even when/if we fix this for real, a "basic test" checking if the
device is disconnected is a
nice thing to have. I think of it as a protective programming :)
Something like:
if WARN_ON(is_connected)
return -EIO;
> I have no idea what facility (if any) the framework uses for this
> already. However, if it turns out that proper synchronization needs a
> new approach, I suggest trying SRCU. It can be viewed in some respects
> as a kind of read-write mutex that is highly optimized for rapid
> read-locks and -unlocks at the cost of very slow write-locks --
> appropriate here since every syscall would need a read-lock whereas
> write-locking would be needed only when a disconnect occurs.
Thanks for the pointer :)
>
> Alan Stern
--
Ricardo Ribalda
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
2022-10-20 0:57 ` Ricardo Ribalda
@ 2022-10-20 14:25 ` Alan Stern
0 siblings, 0 replies; 21+ messages in thread
From: Alan Stern @ 2022-10-20 14:25 UTC (permalink / raw)
To: Ricardo Ribalda
Cc: Laurent Pinchart, Nazar Mokrynskyi, linux-media, linux-usb, linux,
Tomasz Figa
On Thu, Oct 20, 2022 at 09:57:24AM +0900, Ricardo Ribalda wrote:
> Hi Alan
>
> On Thu, 20 Oct 2022 at 00:08, Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > On Wed, Oct 19, 2022 at 01:22:48PM +0900, Ricardo Ribalda wrote:
> > > Hi Laurent
> > >
> > > On Wed, 19 Oct 2022 at 10:45, Laurent Pinchart
> > > <laurent.pinchart@ideasonboard.com> wrote:
> > > > And I would like to avoid having to roll out manual changes to all
> > > > drivers when the problem can be fixed in the core, just because nobody
> > > > can be bothered to spend time to implement a good fix. We don't have to
> > > > aim for a solution at the cdev level if that takes too long, an
> > > > implementation in V4L2 would be enough to start with.
> > >
> > > Do we know what a "good fix" would look like?. This is a race
> > > condition between cdev, v4l2, and usb_driver. The only entity that
> > > knows about the three of them is the driver.
> > >
> > > If we "fix" v4l2 to provide a callback to notify the framework about a
> > > "bus disconnect". It can prevent new syscalls, but it cannot interrupt
> > > the current ones.
> >
> > It doesn't need to interrupt current syscalls. It merely needs to wait
> > until the current ones complete (and help them to complete early by
> > making them aware of the disconnection) and to prevent new ones from
> > starting.
> >
>
> The USB subsystem is not aware of the current syscalls running for that device,
> it just triggers the callback disconnect() to notify the driver that
> they are not allowed
> anymore to access the hardware.
Right. The question is: At what level in the various video code paths
should the check for "device gone" then be made?
One possibility is to do all these checks at the USB driver level. This
has the advantage of not requiring any changes to the V4L2 core or
elsewhere in the framework. But it has the disadvantage of spreading
the checks all over the place, making it that much easier to forget one.
Furthermore it would help only the USB driver, not any of the other
video drivers.
Another possibility is to do the checks at the framework level, or at
least, higher up than the driver level. For example, upon syscall
entry, and for long-running commands, at suitable intermediate points.
This can be implemented by having the driver call a core function from
within its disconnect callback; that function would let the framework
know the device is gone and wouldn't return until all existing syscall
threads were aware of this fact and had stopped accessing the device.
This approach has advantages and disadvantages complementary to the
first approach.
I can't tell you which approach is better -- that's up to Laurent :-) --
I just want to make sure you are aware of the possibilities and their
tradeoffs.
> Even when/if we fix this for real, a "basic test" checking if the
> device is disconnected is a
> nice thing to have. I think of it as a protective programming :)
>
> Something like:
>
> if WARN_ON(is_connected)
> return -EIO;
Sure, it wouldn't hurt to sprinkle things like that here and there. But
obviously they won't fix the problem by themselves.
Alan Stern
> > I have no idea what facility (if any) the framework uses for this
> > already. However, if it turns out that proper synchronization needs a
> > new approach, I suggest trying SRCU. It can be viewed in some respects
> > as a kind of read-write mutex that is highly optimized for rapid
> > read-locks and -unlocks at the cost of very slow write-locks --
> > appropriate here since every syscall would need a read-lock whereas
> > write-locking would be needed only when a disconnect occurs.
>
>
> Thanks for the pointer :)
>
> >
> > Alan Stern
>
>
>
> --
> Ricardo Ribalda
^ permalink raw reply [flat|nested] 21+ messages in thread