public inbox for linux-media@vger.kernel.org
 help / color / mirror / Atom feed
* Possible race condition on videobuf2?
@ 2013-10-13 13:03 Ricardo Ribalda Delgado
  2013-10-13 16:42 ` Hans Verkuil
  0 siblings, 1 reply; 6+ messages in thread
From: Ricardo Ribalda Delgado @ 2013-10-13 13:03 UTC (permalink / raw)
  To: Marek Szyprowski, Hans Verkuil, linux-media

Hello

These days I have been testing an app that uses the old read/write
API. It is interfacing a videobuf2-sg driver.

Once in a while I get an oops on the vb2_perform_fileio function.

After digging a while I have been able to fix the bug like this:

int vb2_fop_release(struct file *file)
 {
  struct video_device *vdev = video_devdata(file);
+ struct mutex *lock = vdev->queue->lock ? vdev->queue->lock : vdev->lock;

+ if (lock)
+ mutex_lock(lock);
  if (file->private_data == vdev->queue->owner) {
  vb2_queue_release(vdev->queue);
  vdev->queue->owner = NULL;
  }
+ if (lock)
+ mutex_unlock(lock);
  return v4l2_fh_release(file);
 }
 EXPORT_SYMBOL_GPL(vb2_fop_release);

(or at least, I havent seen the oops since then).

I was wondering if this a real bug on vb2 and I shall post a proper
patch or the device driver is doing something out of spec.

vb2_perform_fileio+0x372 corresponds to

  memset(&fileio->b, 0, sizeof(fileio->b));

My theory is that:

fileio = q->fileio;
if (!fileio){
...
q->fileio = NULL;

can be prone to race conditions if the lock is not held on release.


[ 308.297741] BUG: unable to handle kernel NULL pointer dereference at
0000000000000260
[ 308.297759] IP: [<ffffffffa07a9fd2>] vb2_perform_fileio+0x372/0x610
[videobuf2_core]
[ 308.297794] PGD 159719067 PUD 158119067 PMD 0
[ 308.297812] Oops: 0000 #1 SMP
[ 308.297826] Modules linked in: qt5023_video videobuf2_dma_sg
qtec_xform videobuf2_vmalloc videobuf2_memops videobuf2_core
qtec_white qtec_mem gpio_xilinx qtec_cmosis qtec_pcie fglrx(PO)
spi_xilinx spi_bitbang qt5023
[ 308.297888] CPU: 1 PID: 2189 Comm: java Tainted: P O 3.11.0-qtec-standard #1
[ 308.297919] Hardware name: QTechnology QT5022/QT5022, BIOS
PM_2.1.0.309 X64 05/23/2013
[ 308.297952] task: ffff8801564e1690 ti: ffff88014dc02000 task.ti:
ffff88014dc02000
[ 308.297962] RIP: 0010:[<ffffffffa07a9fd2>] [<ffffffffa07a9fd2>]
vb2_perform_fileio+0x372/0x610 [videobuf2_core]
[ 308.297985] RSP: 0018:ffff88014dc03df8 EFLAGS: 00010202
[ 308.297995] RAX: 0000000000000000 RBX: ffff880158a23000 RCX: dead000000100100
[ 308.298003] RDX: 0000000000000000 RSI: dead000000200200 RDI: 0000000000000000
[ 308.298012] RBP: ffff88014dc03e58 R08: 0000000000000000 R09: 0000000000000001
[ 308.298020] R10: ffffea00051e8380 R11: ffff88014dc03fd8 R12: ffff880158a23070
[ 308.298029] R13: ffff8801549040b8 R14: 0000000000198000 R15: 0000000001887e60
[ 308.298040] FS: 00007f65130d5700(0000) GS:ffff88015ed00000(0000)
knlGS:0000000000000000
[ 308.298049] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 308.298057] CR2: 0000000000000260 CR3: 0000000159630000 CR4: 00000000000007e0
[ 308.298064] Stack:
[ 308.298071] ffff880156416c00 0000000000198000 0000000000000000
ffff880100000001
[ 308.298087] ffff88014dc03f50 00000000810a79ca 0002000000000001
ffff880154904718
[ 308.298101] ffff880156416c00 0000000000198000 ffff880154904338
ffff88014dc03f50
[ 308.298116] Call Trace:
[ 308.298143] [<ffffffffa07aa3c4>] vb2_read+0x14/0x20 [videobuf2_core]
[ 308.298198] [<ffffffffa07aa494>] vb2_fop_read+0xc4/0x120 [videobuf2_core]
[ 308.298252] [<ffffffff8154ee9e>] v4l2_read+0x7e/0xc0
[ 308.298296] [<ffffffff8116e639>] vfs_read+0xa9/0x160
[ 308.298312] [<ffffffff8116e882>] SyS_read+0x52/0xb0
[ 308.298328] [<ffffffff81784179>] tracesys+0xd0/0xd5
[ 308.298335] Code: e5 d6 ff ff 83 3d be 24 00 00 04 89 c2 4c 8b 45 b0
44 8b 4d b8 0f 8f 20 02 00 00 85 d2 75 32 83 83 78 03 00 00 01 4b 8b
44 c5 48 <8b> 88 60 02 00 00 85 c9 0f 84 b0 00 00 00 8b 40 58 89 c2 41
89
[ 308.298487] RIP [<ffffffffa07a9fd2>] vb2_perform_fileio+0x372/0x610
[videobuf2_core]
[ 308.298507] RSP <ffff88014dc03df8>
[ 308.298514] CR2: 0000000000000260
[ 308.298526] ---[ end trace e8f01717c96d1e41 ]---
-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Possible race condition on videobuf2?
  2013-10-13 13:03 Possible race condition on videobuf2? Ricardo Ribalda Delgado
@ 2013-10-13 16:42 ` Hans Verkuil
  2013-10-14  7:42   ` Ricardo Ribalda Delgado
  0 siblings, 1 reply; 6+ messages in thread
From: Hans Verkuil @ 2013-10-13 16:42 UTC (permalink / raw)
  To: Ricardo Ribalda Delgado; +Cc: Marek Szyprowski, linux-media

Hi Ricardo,

On 10/13/2013 03:03 PM, Ricardo Ribalda Delgado wrote:
> Hello
> 
> These days I have been testing an app that uses the old read/write
> API. It is interfacing a videobuf2-sg driver.
> 
> Once in a while I get an oops on the vb2_perform_fileio function.
> 
> After digging a while I have been able to fix the bug like this:
> 
> int vb2_fop_release(struct file *file)
>  {
>   struct video_device *vdev = video_devdata(file);
> + struct mutex *lock = vdev->queue->lock ? vdev->queue->lock : vdev->lock;
> 
> + if (lock)
> + mutex_lock(lock);
>   if (file->private_data == vdev->queue->owner) {
>   vb2_queue_release(vdev->queue);
>   vdev->queue->owner = NULL;
>   }
> + if (lock)
> + mutex_unlock(lock);
>   return v4l2_fh_release(file);
>  }
>  EXPORT_SYMBOL_GPL(vb2_fop_release);
> 
> (or at least, I havent seen the oops since then).
> 
> I was wondering if this a real bug on vb2 and I shall post a proper
> patch or the device driver is doing something out of spec.
> 
> vb2_perform_fileio+0x372 corresponds to
> 
>   memset(&fileio->b, 0, sizeof(fileio->b));
> 
> My theory is that:
> 
> fileio = q->fileio;
> if (!fileio){
> ...
> q->fileio = NULL;
> 
> can be prone to race conditions if the lock is not held on release.

This would not surprise me.

I have a number of vb2 patches in this branch:

http://git.linuxtv.org/hverkuil/media_tree.git/shortlog/refs/heads/vb2-thread

Among others it replaces the 'q->fileio = NULL' hack by a proper solution.

It will be interesting to see if my patch series in that branch would fix your
problem as well.

That said, I do think vb2_fop_release should take the lock regardless, although
I would put the mutex_lock inside the 'if' that checks the owner.

Regards,

	Hans

> 
> 
> [ 308.297741] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000260
> [ 308.297759] IP: [<ffffffffa07a9fd2>] vb2_perform_fileio+0x372/0x610
> [videobuf2_core]
> [ 308.297794] PGD 159719067 PUD 158119067 PMD 0
> [ 308.297812] Oops: 0000 #1 SMP
> [ 308.297826] Modules linked in: qt5023_video videobuf2_dma_sg
> qtec_xform videobuf2_vmalloc videobuf2_memops videobuf2_core
> qtec_white qtec_mem gpio_xilinx qtec_cmosis qtec_pcie fglrx(PO)
> spi_xilinx spi_bitbang qt5023
> [ 308.297888] CPU: 1 PID: 2189 Comm: java Tainted: P O 3.11.0-qtec-standard #1
> [ 308.297919] Hardware name: QTechnology QT5022/QT5022, BIOS
> PM_2.1.0.309 X64 05/23/2013
> [ 308.297952] task: ffff8801564e1690 ti: ffff88014dc02000 task.ti:
> ffff88014dc02000
> [ 308.297962] RIP: 0010:[<ffffffffa07a9fd2>] [<ffffffffa07a9fd2>]
> vb2_perform_fileio+0x372/0x610 [videobuf2_core]
> [ 308.297985] RSP: 0018:ffff88014dc03df8 EFLAGS: 00010202
> [ 308.297995] RAX: 0000000000000000 RBX: ffff880158a23000 RCX: dead000000100100
> [ 308.298003] RDX: 0000000000000000 RSI: dead000000200200 RDI: 0000000000000000
> [ 308.298012] RBP: ffff88014dc03e58 R08: 0000000000000000 R09: 0000000000000001
> [ 308.298020] R10: ffffea00051e8380 R11: ffff88014dc03fd8 R12: ffff880158a23070
> [ 308.298029] R13: ffff8801549040b8 R14: 0000000000198000 R15: 0000000001887e60
> [ 308.298040] FS: 00007f65130d5700(0000) GS:ffff88015ed00000(0000)
> knlGS:0000000000000000
> [ 308.298049] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 308.298057] CR2: 0000000000000260 CR3: 0000000159630000 CR4: 00000000000007e0
> [ 308.298064] Stack:
> [ 308.298071] ffff880156416c00 0000000000198000 0000000000000000
> ffff880100000001
> [ 308.298087] ffff88014dc03f50 00000000810a79ca 0002000000000001
> ffff880154904718
> [ 308.298101] ffff880156416c00 0000000000198000 ffff880154904338
> ffff88014dc03f50
> [ 308.298116] Call Trace:
> [ 308.298143] [<ffffffffa07aa3c4>] vb2_read+0x14/0x20 [videobuf2_core]
> [ 308.298198] [<ffffffffa07aa494>] vb2_fop_read+0xc4/0x120 [videobuf2_core]
> [ 308.298252] [<ffffffff8154ee9e>] v4l2_read+0x7e/0xc0
> [ 308.298296] [<ffffffff8116e639>] vfs_read+0xa9/0x160
> [ 308.298312] [<ffffffff8116e882>] SyS_read+0x52/0xb0
> [ 308.298328] [<ffffffff81784179>] tracesys+0xd0/0xd5
> [ 308.298335] Code: e5 d6 ff ff 83 3d be 24 00 00 04 89 c2 4c 8b 45 b0
> 44 8b 4d b8 0f 8f 20 02 00 00 85 d2 75 32 83 83 78 03 00 00 01 4b 8b
> 44 c5 48 <8b> 88 60 02 00 00 85 c9 0f 84 b0 00 00 00 8b 40 58 89 c2 41
> 89
> [ 308.298487] RIP [<ffffffffa07a9fd2>] vb2_perform_fileio+0x372/0x610
> [videobuf2_core]
> [ 308.298507] RSP <ffff88014dc03df8>
> [ 308.298514] CR2: 0000000000000260
> [ 308.298526] ---[ end trace e8f01717c96d1e41 ]---
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Possible race condition on videobuf2?
  2013-10-13 16:42 ` Hans Verkuil
@ 2013-10-14  7:42   ` Ricardo Ribalda Delgado
  2013-10-14  7:47     ` Hans Verkuil
  0 siblings, 1 reply; 6+ messages in thread
From: Ricardo Ribalda Delgado @ 2013-10-14  7:42 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: Marek Szyprowski, linux-media

Hello Hans

Thanks for your feedback. I have send a patch to the list with your suggestion.

If I dont apply my patch, but your your branch I still get the error:


Thanks!


[   57.567944] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000260
[   57.567980] IP: [<ffffffffa07ab963>]
__vb2_perform_fileio+0x363/0x600 [videobuf2_core]
[   57.568013] PGD 15952a067 PUD 159519067 PMD 0
[   57.568032] Oops: 0000 [#1] SMP
[   57.568045] Modules linked in: nfc qtec_xform gpio_xilinx
qt5023_video videobuf2_vmalloc videobuf2_dma_sg videobuf2_memops
videobuf2_core qtec_pcie qtec_cmosis qtec_white qtec_mem fglrx(PO)
spi_xilinx spi_bitbang qt5023
[   57.568109] CPU: 1 PID: 1213 Comm: java Tainted: P           O
3.11.0-qtec-standard+ #112
[   57.568122] Hardware name: QTechnology QT5022/QT5022, BIOS
PM_2.1.0.309 X64 05/23/2013
[   57.568134] task: ffff88014aa98000 ti: ffff88014aa94000 task.ti:
ffff88014aa94000
[   57.568144] RIP: 0010:[<ffffffffa07ab963>]  [<ffffffffa07ab963>]
__vb2_perform_fileio+0x363/0x600 [videobuf2_core]
[   57.568168] RSP: 0018:ffff88014aa95e48  EFLAGS: 00010246
[   57.568178] RAX: 0000000000000000 RBX: ffff88015694e870 RCX: 0000000000000000
[   57.568187] RDX: 0000000000000000 RSI: ffff88015694c400 RDI: ffff88015694c400
[   57.568195] RBP: ffff88014aa95e98 R08: 0000000000000000 R09: 0000000000000001
[   57.568204] R10: ffffea00055c2880 R11: ffff88014aa95fd8 R12: ffff88015694e800
[   57.568213] R13: 000000000021de00 R14: ffff88015727f0b8 R15: 000000000126ae20
[   57.568224] FS:  00007fc23bbda700(0000) GS:ffff88015ed00000(0000)
knlGS:0000000000000000
[   57.568233] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   57.568241] CR2: 0000000000000260 CR3: 0000000156931000 CR4: 00000000000007e0
[   57.568248] Stack:
[   57.568254]  ffff880159632910 0000000000000000 ffff880100000001
0000000059bd0820
[   57.568270]  ffff88014aa95f50 ffff88015727f720 ffff880159632900
ffff88015727f340
[   57.568285]  000000000021de00 ffff88014aa95f50 ffff88014aa95ed8
ffffffffa07abe09
[   57.568300] Call Trace:
[   57.568328]  [<ffffffffa07abe09>] vb2_fop_read+0xb9/0x110 [videobuf2_core]
[   57.568351]  [<ffffffff81534985>] v4l2_read+0x65/0xb0
[   57.568372]  [<ffffffff81166bea>] vfs_read+0x9a/0x150
[   57.568388]  [<ffffffff81167649>] SyS_read+0x49/0xa0
[   57.568407]  [<ffffffff81760479>] tracesys+0xd0/0xd5
[   57.568415] Code: 89 c2 44 8b 4d c0 4c 8b 45 b8 0f 8f 2d 02 00 00
85 d2 48 63 c2 0f 85 86 fd ff ff 41 83 84 24 78 03 00 00 01 31 d2 4b
8b 4c c6 48 <8b> 81 60 02 00 00 85 c0 74 05 8b 41 58 89 c2 89 43 08 80
63 10
[   57.568566] RIP  [<ffffffffa07ab963>]
__vb2_perform_fileio+0x363/0x600 [videobuf2_core]
[   57.568587]  RSP <ffff88014aa95e48>
[   57.568594] CR2: 0000000000000260
[   57.568606] ---[ end trace d664b8a0460e8ca5 ]---



On Sun, Oct 13, 2013 at 6:42 PM, Hans Verkuil <hverkuil@xs4all.nl> wrote:
> Hi Ricardo,
>
> On 10/13/2013 03:03 PM, Ricardo Ribalda Delgado wrote:
>> Hello
>>
>> These days I have been testing an app that uses the old read/write
>> API. It is interfacing a videobuf2-sg driver.
>>
>> Once in a while I get an oops on the vb2_perform_fileio function.
>>
>> After digging a while I have been able to fix the bug like this:
>>
>> int vb2_fop_release(struct file *file)
>>  {
>>   struct video_device *vdev = video_devdata(file);
>> + struct mutex *lock = vdev->queue->lock ? vdev->queue->lock : vdev->lock;
>>
>> + if (lock)
>> + mutex_lock(lock);
>>   if (file->private_data == vdev->queue->owner) {
>>   vb2_queue_release(vdev->queue);
>>   vdev->queue->owner = NULL;
>>   }
>> + if (lock)
>> + mutex_unlock(lock);
>>   return v4l2_fh_release(file);
>>  }
>>  EXPORT_SYMBOL_GPL(vb2_fop_release);
>>
>> (or at least, I havent seen the oops since then).
>>
>> I was wondering if this a real bug on vb2 and I shall post a proper
>> patch or the device driver is doing something out of spec.
>>
>> vb2_perform_fileio+0x372 corresponds to
>>
>>   memset(&fileio->b, 0, sizeof(fileio->b));
>>
>> My theory is that:
>>
>> fileio = q->fileio;
>> if (!fileio){
>> ...
>> q->fileio = NULL;
>>
>> can be prone to race conditions if the lock is not held on release.
>
> This would not surprise me.
>
> I have a number of vb2 patches in this branch:
>
> http://git.linuxtv.org/hverkuil/media_tree.git/shortlog/refs/heads/vb2-thread
>
> Among others it replaces the 'q->fileio = NULL' hack by a proper solution.
>
> It will be interesting to see if my patch series in that branch would fix your
> problem as well.
>
> That said, I do think vb2_fop_release should take the lock regardless, although
> I would put the mutex_lock inside the 'if' that checks the owner.
>
> Regards,
>
>         Hans
>
>>
>>
>> [ 308.297741] BUG: unable to handle kernel NULL pointer dereference at
>> 0000000000000260
>> [ 308.297759] IP: [<ffffffffa07a9fd2>] vb2_perform_fileio+0x372/0x610
>> [videobuf2_core]
>> [ 308.297794] PGD 159719067 PUD 158119067 PMD 0
>> [ 308.297812] Oops: 0000 #1 SMP
>> [ 308.297826] Modules linked in: qt5023_video videobuf2_dma_sg
>> qtec_xform videobuf2_vmalloc videobuf2_memops videobuf2_core
>> qtec_white qtec_mem gpio_xilinx qtec_cmosis qtec_pcie fglrx(PO)
>> spi_xilinx spi_bitbang qt5023
>> [ 308.297888] CPU: 1 PID: 2189 Comm: java Tainted: P O 3.11.0-qtec-standard #1
>> [ 308.297919] Hardware name: QTechnology QT5022/QT5022, BIOS
>> PM_2.1.0.309 X64 05/23/2013
>> [ 308.297952] task: ffff8801564e1690 ti: ffff88014dc02000 task.ti:
>> ffff88014dc02000
>> [ 308.297962] RIP: 0010:[<ffffffffa07a9fd2>] [<ffffffffa07a9fd2>]
>> vb2_perform_fileio+0x372/0x610 [videobuf2_core]
>> [ 308.297985] RSP: 0018:ffff88014dc03df8 EFLAGS: 00010202
>> [ 308.297995] RAX: 0000000000000000 RBX: ffff880158a23000 RCX: dead000000100100
>> [ 308.298003] RDX: 0000000000000000 RSI: dead000000200200 RDI: 0000000000000000
>> [ 308.298012] RBP: ffff88014dc03e58 R08: 0000000000000000 R09: 0000000000000001
>> [ 308.298020] R10: ffffea00051e8380 R11: ffff88014dc03fd8 R12: ffff880158a23070
>> [ 308.298029] R13: ffff8801549040b8 R14: 0000000000198000 R15: 0000000001887e60
>> [ 308.298040] FS: 00007f65130d5700(0000) GS:ffff88015ed00000(0000)
>> knlGS:0000000000000000
>> [ 308.298049] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 308.298057] CR2: 0000000000000260 CR3: 0000000159630000 CR4: 00000000000007e0
>> [ 308.298064] Stack:
>> [ 308.298071] ffff880156416c00 0000000000198000 0000000000000000
>> ffff880100000001
>> [ 308.298087] ffff88014dc03f50 00000000810a79ca 0002000000000001
>> ffff880154904718
>> [ 308.298101] ffff880156416c00 0000000000198000 ffff880154904338
>> ffff88014dc03f50
>> [ 308.298116] Call Trace:
>> [ 308.298143] [<ffffffffa07aa3c4>] vb2_read+0x14/0x20 [videobuf2_core]
>> [ 308.298198] [<ffffffffa07aa494>] vb2_fop_read+0xc4/0x120 [videobuf2_core]
>> [ 308.298252] [<ffffffff8154ee9e>] v4l2_read+0x7e/0xc0
>> [ 308.298296] [<ffffffff8116e639>] vfs_read+0xa9/0x160
>> [ 308.298312] [<ffffffff8116e882>] SyS_read+0x52/0xb0
>> [ 308.298328] [<ffffffff81784179>] tracesys+0xd0/0xd5
>> [ 308.298335] Code: e5 d6 ff ff 83 3d be 24 00 00 04 89 c2 4c 8b 45 b0
>> 44 8b 4d b8 0f 8f 20 02 00 00 85 d2 75 32 83 83 78 03 00 00 01 4b 8b
>> 44 c5 48 <8b> 88 60 02 00 00 85 c9 0f 84 b0 00 00 00 8b 40 58 89 c2 41
>> 89
>> [ 308.298487] RIP [<ffffffffa07a9fd2>] vb2_perform_fileio+0x372/0x610
>> [videobuf2_core]
>> [ 308.298507] RSP <ffff88014dc03df8>
>> [ 308.298514] CR2: 0000000000000260
>> [ 308.298526] ---[ end trace e8f01717c96d1e41 ]---
>>
>



-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Possible race condition on videobuf2?
  2013-10-14  7:42   ` Ricardo Ribalda Delgado
@ 2013-10-14  7:47     ` Hans Verkuil
  2013-10-14  7:53       ` Ricardo Ribalda Delgado
  0 siblings, 1 reply; 6+ messages in thread
From: Hans Verkuil @ 2013-10-14  7:47 UTC (permalink / raw)
  To: Ricardo Ribalda Delgado; +Cc: Marek Szyprowski, linux-media

On 10/14/13 09:42, Ricardo Ribalda Delgado wrote:
> Hello Hans
> 
> Thanks for your feedback. I have send a patch to the list with your suggestion.
> 
> If I dont apply my patch, but your your branch I still get the error:

Thanks for testing this. I suspected that might be the case.

What exactly is the sequence that reproduces this error? Ctrl-C in the middle of
a read()? Multiple threads accessing the same file handle?

Regards,

	Hans

> 
> 
> Thanks!
> 
> 
> [   57.567944] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000260
> [   57.567980] IP: [<ffffffffa07ab963>]
> __vb2_perform_fileio+0x363/0x600 [videobuf2_core]
> [   57.568013] PGD 15952a067 PUD 159519067 PMD 0
> [   57.568032] Oops: 0000 [#1] SMP
> [   57.568045] Modules linked in: nfc qtec_xform gpio_xilinx
> qt5023_video videobuf2_vmalloc videobuf2_dma_sg videobuf2_memops
> videobuf2_core qtec_pcie qtec_cmosis qtec_white qtec_mem fglrx(PO)
> spi_xilinx spi_bitbang qt5023
> [   57.568109] CPU: 1 PID: 1213 Comm: java Tainted: P           O
> 3.11.0-qtec-standard+ #112
> [   57.568122] Hardware name: QTechnology QT5022/QT5022, BIOS
> PM_2.1.0.309 X64 05/23/2013
> [   57.568134] task: ffff88014aa98000 ti: ffff88014aa94000 task.ti:
> ffff88014aa94000
> [   57.568144] RIP: 0010:[<ffffffffa07ab963>]  [<ffffffffa07ab963>]
> __vb2_perform_fileio+0x363/0x600 [videobuf2_core]
> [   57.568168] RSP: 0018:ffff88014aa95e48  EFLAGS: 00010246
> [   57.568178] RAX: 0000000000000000 RBX: ffff88015694e870 RCX: 0000000000000000
> [   57.568187] RDX: 0000000000000000 RSI: ffff88015694c400 RDI: ffff88015694c400
> [   57.568195] RBP: ffff88014aa95e98 R08: 0000000000000000 R09: 0000000000000001
> [   57.568204] R10: ffffea00055c2880 R11: ffff88014aa95fd8 R12: ffff88015694e800
> [   57.568213] R13: 000000000021de00 R14: ffff88015727f0b8 R15: 000000000126ae20
> [   57.568224] FS:  00007fc23bbda700(0000) GS:ffff88015ed00000(0000)
> knlGS:0000000000000000
> [   57.568233] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [   57.568241] CR2: 0000000000000260 CR3: 0000000156931000 CR4: 00000000000007e0
> [   57.568248] Stack:
> [   57.568254]  ffff880159632910 0000000000000000 ffff880100000001
> 0000000059bd0820
> [   57.568270]  ffff88014aa95f50 ffff88015727f720 ffff880159632900
> ffff88015727f340
> [   57.568285]  000000000021de00 ffff88014aa95f50 ffff88014aa95ed8
> ffffffffa07abe09
> [   57.568300] Call Trace:
> [   57.568328]  [<ffffffffa07abe09>] vb2_fop_read+0xb9/0x110 [videobuf2_core]
> [   57.568351]  [<ffffffff81534985>] v4l2_read+0x65/0xb0
> [   57.568372]  [<ffffffff81166bea>] vfs_read+0x9a/0x150
> [   57.568388]  [<ffffffff81167649>] SyS_read+0x49/0xa0
> [   57.568407]  [<ffffffff81760479>] tracesys+0xd0/0xd5
> [   57.568415] Code: 89 c2 44 8b 4d c0 4c 8b 45 b8 0f 8f 2d 02 00 00
> 85 d2 48 63 c2 0f 85 86 fd ff ff 41 83 84 24 78 03 00 00 01 31 d2 4b
> 8b 4c c6 48 <8b> 81 60 02 00 00 85 c0 74 05 8b 41 58 89 c2 89 43 08 80
> 63 10
> [   57.568566] RIP  [<ffffffffa07ab963>]
> __vb2_perform_fileio+0x363/0x600 [videobuf2_core]
> [   57.568587]  RSP <ffff88014aa95e48>
> [   57.568594] CR2: 0000000000000260
> [   57.568606] ---[ end trace d664b8a0460e8ca5 ]---
> 
> 
> 
> On Sun, Oct 13, 2013 at 6:42 PM, Hans Verkuil <hverkuil@xs4all.nl> wrote:
>> Hi Ricardo,
>>
>> On 10/13/2013 03:03 PM, Ricardo Ribalda Delgado wrote:
>>> Hello
>>>
>>> These days I have been testing an app that uses the old read/write
>>> API. It is interfacing a videobuf2-sg driver.
>>>
>>> Once in a while I get an oops on the vb2_perform_fileio function.
>>>
>>> After digging a while I have been able to fix the bug like this:
>>>
>>> int vb2_fop_release(struct file *file)
>>>  {
>>>   struct video_device *vdev = video_devdata(file);
>>> + struct mutex *lock = vdev->queue->lock ? vdev->queue->lock : vdev->lock;
>>>
>>> + if (lock)
>>> + mutex_lock(lock);
>>>   if (file->private_data == vdev->queue->owner) {
>>>   vb2_queue_release(vdev->queue);
>>>   vdev->queue->owner = NULL;
>>>   }
>>> + if (lock)
>>> + mutex_unlock(lock);
>>>   return v4l2_fh_release(file);
>>>  }
>>>  EXPORT_SYMBOL_GPL(vb2_fop_release);
>>>
>>> (or at least, I havent seen the oops since then).
>>>
>>> I was wondering if this a real bug on vb2 and I shall post a proper
>>> patch or the device driver is doing something out of spec.
>>>
>>> vb2_perform_fileio+0x372 corresponds to
>>>
>>>   memset(&fileio->b, 0, sizeof(fileio->b));
>>>
>>> My theory is that:
>>>
>>> fileio = q->fileio;
>>> if (!fileio){
>>> ...
>>> q->fileio = NULL;
>>>
>>> can be prone to race conditions if the lock is not held on release.
>>
>> This would not surprise me.
>>
>> I have a number of vb2 patches in this branch:
>>
>> http://git.linuxtv.org/hverkuil/media_tree.git/shortlog/refs/heads/vb2-thread
>>
>> Among others it replaces the 'q->fileio = NULL' hack by a proper solution.
>>
>> It will be interesting to see if my patch series in that branch would fix your
>> problem as well.
>>
>> That said, I do think vb2_fop_release should take the lock regardless, although
>> I would put the mutex_lock inside the 'if' that checks the owner.
>>
>> Regards,
>>
>>         Hans
>>
>>>
>>>
>>> [ 308.297741] BUG: unable to handle kernel NULL pointer dereference at
>>> 0000000000000260
>>> [ 308.297759] IP: [<ffffffffa07a9fd2>] vb2_perform_fileio+0x372/0x610
>>> [videobuf2_core]
>>> [ 308.297794] PGD 159719067 PUD 158119067 PMD 0
>>> [ 308.297812] Oops: 0000 #1 SMP
>>> [ 308.297826] Modules linked in: qt5023_video videobuf2_dma_sg
>>> qtec_xform videobuf2_vmalloc videobuf2_memops videobuf2_core
>>> qtec_white qtec_mem gpio_xilinx qtec_cmosis qtec_pcie fglrx(PO)
>>> spi_xilinx spi_bitbang qt5023
>>> [ 308.297888] CPU: 1 PID: 2189 Comm: java Tainted: P O 3.11.0-qtec-standard #1
>>> [ 308.297919] Hardware name: QTechnology QT5022/QT5022, BIOS
>>> PM_2.1.0.309 X64 05/23/2013
>>> [ 308.297952] task: ffff8801564e1690 ti: ffff88014dc02000 task.ti:
>>> ffff88014dc02000
>>> [ 308.297962] RIP: 0010:[<ffffffffa07a9fd2>] [<ffffffffa07a9fd2>]
>>> vb2_perform_fileio+0x372/0x610 [videobuf2_core]
>>> [ 308.297985] RSP: 0018:ffff88014dc03df8 EFLAGS: 00010202
>>> [ 308.297995] RAX: 0000000000000000 RBX: ffff880158a23000 RCX: dead000000100100
>>> [ 308.298003] RDX: 0000000000000000 RSI: dead000000200200 RDI: 0000000000000000
>>> [ 308.298012] RBP: ffff88014dc03e58 R08: 0000000000000000 R09: 0000000000000001
>>> [ 308.298020] R10: ffffea00051e8380 R11: ffff88014dc03fd8 R12: ffff880158a23070
>>> [ 308.298029] R13: ffff8801549040b8 R14: 0000000000198000 R15: 0000000001887e60
>>> [ 308.298040] FS: 00007f65130d5700(0000) GS:ffff88015ed00000(0000)
>>> knlGS:0000000000000000
>>> [ 308.298049] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 308.298057] CR2: 0000000000000260 CR3: 0000000159630000 CR4: 00000000000007e0
>>> [ 308.298064] Stack:
>>> [ 308.298071] ffff880156416c00 0000000000198000 0000000000000000
>>> ffff880100000001
>>> [ 308.298087] ffff88014dc03f50 00000000810a79ca 0002000000000001
>>> ffff880154904718
>>> [ 308.298101] ffff880156416c00 0000000000198000 ffff880154904338
>>> ffff88014dc03f50
>>> [ 308.298116] Call Trace:
>>> [ 308.298143] [<ffffffffa07aa3c4>] vb2_read+0x14/0x20 [videobuf2_core]
>>> [ 308.298198] [<ffffffffa07aa494>] vb2_fop_read+0xc4/0x120 [videobuf2_core]
>>> [ 308.298252] [<ffffffff8154ee9e>] v4l2_read+0x7e/0xc0
>>> [ 308.298296] [<ffffffff8116e639>] vfs_read+0xa9/0x160
>>> [ 308.298312] [<ffffffff8116e882>] SyS_read+0x52/0xb0
>>> [ 308.298328] [<ffffffff81784179>] tracesys+0xd0/0xd5
>>> [ 308.298335] Code: e5 d6 ff ff 83 3d be 24 00 00 04 89 c2 4c 8b 45 b0
>>> 44 8b 4d b8 0f 8f 20 02 00 00 85 d2 75 32 83 83 78 03 00 00 01 4b 8b
>>> 44 c5 48 <8b> 88 60 02 00 00 85 c9 0f 84 b0 00 00 00 8b 40 58 89 c2 41
>>> 89
>>> [ 308.298487] RIP [<ffffffffa07a9fd2>] vb2_perform_fileio+0x372/0x610
>>> [videobuf2_core]
>>> [ 308.298507] RSP <ffff88014dc03df8>
>>> [ 308.298514] CR2: 0000000000000260
>>> [ 308.298526] ---[ end trace e8f01717c96d1e41 ]---
>>>
>>
> 
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Possible race condition on videobuf2?
  2013-10-14  7:47     ` Hans Verkuil
@ 2013-10-14  7:53       ` Ricardo Ribalda Delgado
  2013-10-14  8:13         ` Hans Verkuil
  0 siblings, 1 reply; 6+ messages in thread
From: Ricardo Ribalda Delgado @ 2013-10-14  7:53 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: Marek Szyprowski, linux-media

Hello Hans

There is a java web server that broadcast the images.   :S

I can see that the server does open/read/close everytime it is asked
for an image. The error was triggered once in a while. Now I can try
to trigger the error faster by openning multiple client windows. I
dont know how the broken connections are handled.

I have tried to crash it with this script and it did not failed.

while true;

do

for a in $(seq 100)
do
dd if=/dev/video of=/dev/null count=5 &
done

sleep 2
done


Thanks!

On Mon, Oct 14, 2013 at 9:47 AM, Hans Verkuil <hverkuil@xs4all.nl> wrote:
> On 10/14/13 09:42, Ricardo Ribalda Delgado wrote:
>> Hello Hans
>>
>> Thanks for your feedback. I have send a patch to the list with your suggestion.
>>
>> If I dont apply my patch, but your your branch I still get the error:
>
> Thanks for testing this. I suspected that might be the case.
>
> What exactly is the sequence that reproduces this error? Ctrl-C in the middle of
> a read()? Multiple threads accessing the same file handle?
>
> Regards,
>
>         Hans
>
>>
>>
>> Thanks!
>>
>>
>> [ 57.567944] BUG: unable to handle kernel NULL pointer dereference
>> at 0000000000000260
>> [ 57.567980] IP: [<ffffffffa07ab963>]
>> __vb2_perform_fileio+0x363/0x600 [videobuf2_core]
>> [ 57.568013] PGD 15952a067 PUD 159519067 PMD 0
>> [   57.568032] Oops: 0000 [#1] SMP
>> [   57.568045] Modules linked in: nfc qtec_xform gpio_xilinx
>> qt5023_video videobuf2_vmalloc videobuf2_dma_sg videobuf2_memops
>> videobuf2_core qtec_pcie qtec_cmosis qtec_white qtec_mem fglrx(PO)
>> spi_xilinx spi_bitbang qt5023
>> [   57.568109] CPU: 1 PID: 1213 Comm: java Tainted: P           O
>> 3.11.0-qtec-standard+ #112
>> [   57.568122] Hardware name: QTechnology QT5022/QT5022, BIOS
>> PM_2.1.0.309 X64 05/23/2013
>> [   57.568134] task: ffff88014aa98000 ti: ffff88014aa94000 task.ti:
>> ffff88014aa94000
>> [   57.568144] RIP: 0010:[<ffffffffa07ab963>]  [<ffffffffa07ab963>]
>> __vb2_perform_fileio+0x363/0x600 [videobuf2_core]
>> [   57.568168] RSP: 0018:ffff88014aa95e48  EFLAGS: 00010246
>> [   57.568178] RAX: 0000000000000000 RBX: ffff88015694e870 RCX: 0000000000000000
>> [   57.568187] RDX: 0000000000000000 RSI: ffff88015694c400 RDI: ffff88015694c400
>> [   57.568195] RBP: ffff88014aa95e98 R08: 0000000000000000 R09: 0000000000000001
>> [   57.568204] R10: ffffea00055c2880 R11: ffff88014aa95fd8 R12: ffff88015694e800
>> [   57.568213] R13: 000000000021de00 R14: ffff88015727f0b8 R15: 000000000126ae20
>> [   57.568224] FS:  00007fc23bbda700(0000) GS:ffff88015ed00000(0000)
>> knlGS:0000000000000000
>> [   57.568233] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [   57.568241] CR2: 0000000000000260 CR3: 0000000156931000 CR4: 00000000000007e0
>> [   57.568248] Stack:
>> [   57.568254]  ffff880159632910 0000000000000000 ffff880100000001
>> 0000000059bd0820
>> [   57.568270]  ffff88014aa95f50 ffff88015727f720 ffff880159632900
>> ffff88015727f340
>> [   57.568285]  000000000021de00 ffff88014aa95f50 ffff88014aa95ed8
>> ffffffffa07abe09
>> [   57.568300] Call Trace:
>> [   57.568328]  [<ffffffffa07abe09>] vb2_fop_read+0xb9/0x110 [videobuf2_core]
>> [   57.568351]  [<ffffffff81534985>] v4l2_read+0x65/0xb0
>> [   57.568372]  [<ffffffff81166bea>] vfs_read+0x9a/0x150
>> [   57.568388]  [<ffffffff81167649>] SyS_read+0x49/0xa0
>> [   57.568407]  [<ffffffff81760479>] tracesys+0xd0/0xd5
>> [   57.568415] Code: 89 c2 44 8b 4d c0 4c 8b 45 b8 0f 8f 2d 02 00 00
>> 85 d2 48 63 c2 0f 85 86 fd ff ff 41 83 84 24 78 03 00 00 01 31 d2 4b
>> 8b 4c c6 48 <8b> 81 60 02 00 00 85 c0 74 05 8b 41 58 89 c2 89 43 08 80
>> 63 10
>> [   57.568566] RIP  [<ffffffffa07ab963>]
>> __vb2_perform_fileio+0x363/0x600 [videobuf2_core]
>> [   57.568587]  RSP <ffff88014aa95e48>
>> [   57.568594] CR2: 0000000000000260
>> [   57.568606] ---[ end trace d664b8a0460e8ca5 ]---
>>
>>
>>
>> On Sun, Oct 13, 2013 at 6:42 PM, Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>> Hi Ricardo,
>>>
>>> On 10/13/2013 03:03 PM, Ricardo Ribalda Delgado wrote:
>>>> Hello
>>>>
>>>> These days I have been testing an app that uses the old read/write
>>>> API. It is interfacing a videobuf2-sg driver.
>>>>
>>>> Once in a while I get an oops on the vb2_perform_fileio function.
>>>>
>>>> After digging a while I have been able to fix the bug like this:
>>>>
>>>> int vb2_fop_release(struct file *file)
>>>>  {
>>>>   struct video_device *vdev = video_devdata(file);
>>>> + struct mutex *lock = vdev->queue->lock ? vdev->queue->lock : vdev->lock;
>>>>
>>>> + if (lock)
>>>> + mutex_lock(lock);
>>>>   if (file->private_data == vdev->queue->owner) {
>>>>   vb2_queue_release(vdev->queue);
>>>>   vdev->queue->owner = NULL;
>>>>   }
>>>> + if (lock)
>>>> + mutex_unlock(lock);
>>>>   return v4l2_fh_release(file);
>>>>  }
>>>>  EXPORT_SYMBOL_GPL(vb2_fop_release);
>>>>
>>>> (or at least, I havent seen the oops since then).
>>>>
>>>> I was wondering if this a real bug on vb2 and I shall post a proper
>>>> patch or the device driver is doing something out of spec.
>>>>
>>>> vb2_perform_fileio+0x372 corresponds to
>>>>
>>>>   memset(&fileio->b, 0, sizeof(fileio->b));
>>>>
>>>> My theory is that:
>>>>
>>>> fileio = q->fileio;
>>>> if (!fileio){
>>>> ...
>>>> q->fileio = NULL;
>>>>
>>>> can be prone to race conditions if the lock is not held on release.
>>>
>>> This would not surprise me.
>>>
>>> I have a number of vb2 patches in this branch:
>>>
>>> http://git.linuxtv.org/hverkuil/media_tree.git/shortlog/refs/heads/vb2-thread
>>>
>>> Among others it replaces the 'q->fileio = NULL' hack by a proper solution.
>>>
>>> It will be interesting to see if my patch series in that branch would fix your
>>> problem as well.
>>>
>>> That said, I do think vb2_fop_release should take the lock regardless, although
>>> I would put the mutex_lock inside the 'if' that checks the owner.
>>>
>>> Regards,
>>>
>>>         Hans
>>>
>>>>
>>>>
>>>> [ 308.297741] BUG: unable to handle kernel NULL pointer dereference at
>>>> 0000000000000260
>>>> [ 308.297759] IP: [<ffffffffa07a9fd2>] vb2_perform_fileio+0x372/0x610
>>>> [videobuf2_core]
>>>> [ 308.297794] PGD 159719067 PUD 158119067 PMD 0
>>>> [ 308.297812] Oops: 0000 #1 SMP
>>>> [ 308.297826] Modules linked in: qt5023_video videobuf2_dma_sg
>>>> qtec_xform videobuf2_vmalloc videobuf2_memops videobuf2_core
>>>> qtec_white qtec_mem gpio_xilinx qtec_cmosis qtec_pcie fglrx(PO)
>>>> spi_xilinx spi_bitbang qt5023
>>>> [ 308.297888] CPU: 1 PID: 2189 Comm: java Tainted: P O 3.11.0-qtec-standard #1
>>>> [ 308.297919] Hardware name: QTechnology QT5022/QT5022, BIOS
>>>> PM_2.1.0.309 X64 05/23/2013
>>>> [ 308.297952] task: ffff8801564e1690 ti: ffff88014dc02000 task.ti:
>>>> ffff88014dc02000
>>>> [ 308.297962] RIP: 0010:[<ffffffffa07a9fd2>] [<ffffffffa07a9fd2>]
>>>> vb2_perform_fileio+0x372/0x610 [videobuf2_core]
>>>> [ 308.297985] RSP: 0018:ffff88014dc03df8 EFLAGS: 00010202
>>>> [ 308.297995] RAX: 0000000000000000 RBX: ffff880158a23000 RCX: dead000000100100
>>>> [ 308.298003] RDX: 0000000000000000 RSI: dead000000200200 RDI: 0000000000000000
>>>> [ 308.298012] RBP: ffff88014dc03e58 R08: 0000000000000000 R09: 0000000000000001
>>>> [ 308.298020] R10: ffffea00051e8380 R11: ffff88014dc03fd8 R12: ffff880158a23070
>>>> [ 308.298029] R13: ffff8801549040b8 R14: 0000000000198000 R15: 0000000001887e60
>>>> [ 308.298040] FS: 00007f65130d5700(0000) GS:ffff88015ed00000(0000)
>>>> knlGS:0000000000000000
>>>> [ 308.298049] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [ 308.298057] CR2: 0000000000000260 CR3: 0000000159630000 CR4: 00000000000007e0
>>>> [ 308.298064] Stack:
>>>> [ 308.298071] ffff880156416c00 0000000000198000 0000000000000000
>>>> ffff880100000001
>>>> [ 308.298087] ffff88014dc03f50 00000000810a79ca 0002000000000001
>>>> ffff880154904718
>>>> [ 308.298101] ffff880156416c00 0000000000198000 ffff880154904338
>>>> ffff88014dc03f50
>>>> [ 308.298116] Call Trace:
>>>> [ 308.298143] [<ffffffffa07aa3c4>] vb2_read+0x14/0x20 [videobuf2_core]
>>>> [ 308.298198] [<ffffffffa07aa494>] vb2_fop_read+0xc4/0x120 [videobuf2_core]
>>>> [ 308.298252] [<ffffffff8154ee9e>] v4l2_read+0x7e/0xc0
>>>> [ 308.298296] [<ffffffff8116e639>] vfs_read+0xa9/0x160
>>>> [ 308.298312] [<ffffffff8116e882>] SyS_read+0x52/0xb0
>>>> [ 308.298328] [<ffffffff81784179>] tracesys+0xd0/0xd5
>>>> [ 308.298335] Code: e5 d6 ff ff 83 3d be 24 00 00 04 89 c2 4c 8b 45 b0
>>>> 44 8b 4d b8 0f 8f 20 02 00 00 85 d2 75 32 83 83 78 03 00 00 01 4b 8b
>>>> 44 c5 48 <8b> 88 60 02 00 00 85 c9 0f 84 b0 00 00 00 8b 40 58 89 c2 41
>>>> 89
>>>> [ 308.298487] RIP [<ffffffffa07a9fd2>] vb2_perform_fileio+0x372/0x610
>>>> [videobuf2_core]
>>>> [ 308.298507] RSP <ffff88014dc03df8>
>>>> [ 308.298514] CR2: 0000000000000260
>>>> [ 308.298526] ---[ end trace e8f01717c96d1e41 ]---
>>>>
>>>
>>
>>
>>
>



-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Possible race condition on videobuf2?
  2013-10-14  7:53       ` Ricardo Ribalda Delgado
@ 2013-10-14  8:13         ` Hans Verkuil
  0 siblings, 0 replies; 6+ messages in thread
From: Hans Verkuil @ 2013-10-14  8:13 UTC (permalink / raw)
  To: Ricardo Ribalda Delgado; +Cc: Marek Szyprowski, linux-media

On 10/14/13 09:53, Ricardo Ribalda Delgado wrote:
> Hello Hans
> 
> There is a java web server that broadcast the images.   :S
> 
> I can see that the server does open/read/close everytime it is asked
> for an image. The error was triggered once in a while. Now I can try
> to trigger the error faster by openning multiple client windows. I
> dont know how the broken connections are handled.

This suggests multi-threaded operation. I actually wonder if the whole
dqbuf/qbuf cycle is thread-safe. Looking at videobuf2-core.c, in
__vb2_wait_for_done_vb(), while that function is waiting for an event it
doesn't hold any locks. So another thread may close the filehandle.

I think this will all work fine provided dqbuf returns an error if streaming
stopped while waiting. I'm not sure it does, though.

It might need a check like this:

	if (!q->streaming)
		return -ESOMEERROR;

This really needs more testing. Anyone willing to dig into this a bit more?
I won't have time for this this month.

Regards,

	Hans

> 
> I have tried to crash it with this script and it did not failed.
> 
> while true;
> 
> do
> 
> for a in $(seq 100)
> do
> dd if=/dev/video of=/dev/null count=5 &
> done
> 
> sleep 2
> done
> 
> 
> Thanks!
> 
> On Mon, Oct 14, 2013 at 9:47 AM, Hans Verkuil <hverkuil@xs4all.nl> wrote:
>> On 10/14/13 09:42, Ricardo Ribalda Delgado wrote:
>>> Hello Hans
>>>
>>> Thanks for your feedback. I have send a patch to the list with your suggestion.
>>>
>>> If I dont apply my patch, but your your branch I still get the error:
>>
>> Thanks for testing this. I suspected that might be the case.
>>
>> What exactly is the sequence that reproduces this error? Ctrl-C in the middle of
>> a read()? Multiple threads accessing the same file handle?
>>
>> Regards,
>>
>>         Hans
>>
>>>
>>>
>>> Thanks!
>>>
>>>
>>> [ 57.567944] BUG: unable to handle kernel NULL pointer dereference
>>> at 0000000000000260
>>> [ 57.567980] IP: [<ffffffffa07ab963>]
>>> __vb2_perform_fileio+0x363/0x600 [videobuf2_core]
>>> [ 57.568013] PGD 15952a067 PUD 159519067 PMD 0
>>> [   57.568032] Oops: 0000 [#1] SMP
>>> [   57.568045] Modules linked in: nfc qtec_xform gpio_xilinx
>>> qt5023_video videobuf2_vmalloc videobuf2_dma_sg videobuf2_memops
>>> videobuf2_core qtec_pcie qtec_cmosis qtec_white qtec_mem fglrx(PO)
>>> spi_xilinx spi_bitbang qt5023
>>> [   57.568109] CPU: 1 PID: 1213 Comm: java Tainted: P           O
>>> 3.11.0-qtec-standard+ #112
>>> [   57.568122] Hardware name: QTechnology QT5022/QT5022, BIOS
>>> PM_2.1.0.309 X64 05/23/2013
>>> [   57.568134] task: ffff88014aa98000 ti: ffff88014aa94000 task.ti:
>>> ffff88014aa94000
>>> [   57.568144] RIP: 0010:[<ffffffffa07ab963>]  [<ffffffffa07ab963>]
>>> __vb2_perform_fileio+0x363/0x600 [videobuf2_core]
>>> [   57.568168] RSP: 0018:ffff88014aa95e48  EFLAGS: 00010246
>>> [   57.568178] RAX: 0000000000000000 RBX: ffff88015694e870 RCX: 0000000000000000
>>> [   57.568187] RDX: 0000000000000000 RSI: ffff88015694c400 RDI: ffff88015694c400
>>> [   57.568195] RBP: ffff88014aa95e98 R08: 0000000000000000 R09: 0000000000000001
>>> [   57.568204] R10: ffffea00055c2880 R11: ffff88014aa95fd8 R12: ffff88015694e800
>>> [   57.568213] R13: 000000000021de00 R14: ffff88015727f0b8 R15: 000000000126ae20
>>> [   57.568224] FS:  00007fc23bbda700(0000) GS:ffff88015ed00000(0000)
>>> knlGS:0000000000000000
>>> [   57.568233] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> [   57.568241] CR2: 0000000000000260 CR3: 0000000156931000 CR4: 00000000000007e0
>>> [   57.568248] Stack:
>>> [   57.568254]  ffff880159632910 0000000000000000 ffff880100000001
>>> 0000000059bd0820
>>> [   57.568270]  ffff88014aa95f50 ffff88015727f720 ffff880159632900
>>> ffff88015727f340
>>> [   57.568285]  000000000021de00 ffff88014aa95f50 ffff88014aa95ed8
>>> ffffffffa07abe09
>>> [   57.568300] Call Trace:
>>> [   57.568328]  [<ffffffffa07abe09>] vb2_fop_read+0xb9/0x110 [videobuf2_core]
>>> [   57.568351]  [<ffffffff81534985>] v4l2_read+0x65/0xb0
>>> [   57.568372]  [<ffffffff81166bea>] vfs_read+0x9a/0x150
>>> [   57.568388]  [<ffffffff81167649>] SyS_read+0x49/0xa0
>>> [   57.568407]  [<ffffffff81760479>] tracesys+0xd0/0xd5
>>> [   57.568415] Code: 89 c2 44 8b 4d c0 4c 8b 45 b8 0f 8f 2d 02 00 00
>>> 85 d2 48 63 c2 0f 85 86 fd ff ff 41 83 84 24 78 03 00 00 01 31 d2 4b
>>> 8b 4c c6 48 <8b> 81 60 02 00 00 85 c0 74 05 8b 41 58 89 c2 89 43 08 80
>>> 63 10
>>> [   57.568566] RIP  [<ffffffffa07ab963>]
>>> __vb2_perform_fileio+0x363/0x600 [videobuf2_core]
>>> [   57.568587]  RSP <ffff88014aa95e48>
>>> [   57.568594] CR2: 0000000000000260
>>> [   57.568606] ---[ end trace d664b8a0460e8ca5 ]---
>>>
>>>
>>>
>>> On Sun, Oct 13, 2013 at 6:42 PM, Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>>> Hi Ricardo,
>>>>
>>>> On 10/13/2013 03:03 PM, Ricardo Ribalda Delgado wrote:
>>>>> Hello
>>>>>
>>>>> These days I have been testing an app that uses the old read/write
>>>>> API. It is interfacing a videobuf2-sg driver.
>>>>>
>>>>> Once in a while I get an oops on the vb2_perform_fileio function.
>>>>>
>>>>> After digging a while I have been able to fix the bug like this:
>>>>>
>>>>> int vb2_fop_release(struct file *file)
>>>>>  {
>>>>>   struct video_device *vdev = video_devdata(file);
>>>>> + struct mutex *lock = vdev->queue->lock ? vdev->queue->lock : vdev->lock;
>>>>>
>>>>> + if (lock)
>>>>> + mutex_lock(lock);
>>>>>   if (file->private_data == vdev->queue->owner) {
>>>>>   vb2_queue_release(vdev->queue);
>>>>>   vdev->queue->owner = NULL;
>>>>>   }
>>>>> + if (lock)
>>>>> + mutex_unlock(lock);
>>>>>   return v4l2_fh_release(file);
>>>>>  }
>>>>>  EXPORT_SYMBOL_GPL(vb2_fop_release);
>>>>>
>>>>> (or at least, I havent seen the oops since then).
>>>>>
>>>>> I was wondering if this a real bug on vb2 and I shall post a proper
>>>>> patch or the device driver is doing something out of spec.
>>>>>
>>>>> vb2_perform_fileio+0x372 corresponds to
>>>>>
>>>>>   memset(&fileio->b, 0, sizeof(fileio->b));
>>>>>
>>>>> My theory is that:
>>>>>
>>>>> fileio = q->fileio;
>>>>> if (!fileio){
>>>>> ...
>>>>> q->fileio = NULL;
>>>>>
>>>>> can be prone to race conditions if the lock is not held on release.
>>>>
>>>> This would not surprise me.
>>>>
>>>> I have a number of vb2 patches in this branch:
>>>>
>>>> http://git.linuxtv.org/hverkuil/media_tree.git/shortlog/refs/heads/vb2-thread
>>>>
>>>> Among others it replaces the 'q->fileio = NULL' hack by a proper solution.
>>>>
>>>> It will be interesting to see if my patch series in that branch would fix your
>>>> problem as well.
>>>>
>>>> That said, I do think vb2_fop_release should take the lock regardless, although
>>>> I would put the mutex_lock inside the 'if' that checks the owner.
>>>>
>>>> Regards,
>>>>
>>>>         Hans
>>>>
>>>>>
>>>>>
>>>>> [ 308.297741] BUG: unable to handle kernel NULL pointer dereference at
>>>>> 0000000000000260
>>>>> [ 308.297759] IP: [<ffffffffa07a9fd2>] vb2_perform_fileio+0x372/0x610
>>>>> [videobuf2_core]
>>>>> [ 308.297794] PGD 159719067 PUD 158119067 PMD 0
>>>>> [ 308.297812] Oops: 0000 #1 SMP
>>>>> [ 308.297826] Modules linked in: qt5023_video videobuf2_dma_sg
>>>>> qtec_xform videobuf2_vmalloc videobuf2_memops videobuf2_core
>>>>> qtec_white qtec_mem gpio_xilinx qtec_cmosis qtec_pcie fglrx(PO)
>>>>> spi_xilinx spi_bitbang qt5023
>>>>> [ 308.297888] CPU: 1 PID: 2189 Comm: java Tainted: P O 3.11.0-qtec-standard #1
>>>>> [ 308.297919] Hardware name: QTechnology QT5022/QT5022, BIOS
>>>>> PM_2.1.0.309 X64 05/23/2013
>>>>> [ 308.297952] task: ffff8801564e1690 ti: ffff88014dc02000 task.ti:
>>>>> ffff88014dc02000
>>>>> [ 308.297962] RIP: 0010:[<ffffffffa07a9fd2>] [<ffffffffa07a9fd2>]
>>>>> vb2_perform_fileio+0x372/0x610 [videobuf2_core]
>>>>> [ 308.297985] RSP: 0018:ffff88014dc03df8 EFLAGS: 00010202
>>>>> [ 308.297995] RAX: 0000000000000000 RBX: ffff880158a23000 RCX: dead000000100100
>>>>> [ 308.298003] RDX: 0000000000000000 RSI: dead000000200200 RDI: 0000000000000000
>>>>> [ 308.298012] RBP: ffff88014dc03e58 R08: 0000000000000000 R09: 0000000000000001
>>>>> [ 308.298020] R10: ffffea00051e8380 R11: ffff88014dc03fd8 R12: ffff880158a23070
>>>>> [ 308.298029] R13: ffff8801549040b8 R14: 0000000000198000 R15: 0000000001887e60
>>>>> [ 308.298040] FS: 00007f65130d5700(0000) GS:ffff88015ed00000(0000)
>>>>> knlGS:0000000000000000
>>>>> [ 308.298049] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [ 308.298057] CR2: 0000000000000260 CR3: 0000000159630000 CR4: 00000000000007e0
>>>>> [ 308.298064] Stack:
>>>>> [ 308.298071] ffff880156416c00 0000000000198000 0000000000000000
>>>>> ffff880100000001
>>>>> [ 308.298087] ffff88014dc03f50 00000000810a79ca 0002000000000001
>>>>> ffff880154904718
>>>>> [ 308.298101] ffff880156416c00 0000000000198000 ffff880154904338
>>>>> ffff88014dc03f50
>>>>> [ 308.298116] Call Trace:
>>>>> [ 308.298143] [<ffffffffa07aa3c4>] vb2_read+0x14/0x20 [videobuf2_core]
>>>>> [ 308.298198] [<ffffffffa07aa494>] vb2_fop_read+0xc4/0x120 [videobuf2_core]
>>>>> [ 308.298252] [<ffffffff8154ee9e>] v4l2_read+0x7e/0xc0
>>>>> [ 308.298296] [<ffffffff8116e639>] vfs_read+0xa9/0x160
>>>>> [ 308.298312] [<ffffffff8116e882>] SyS_read+0x52/0xb0
>>>>> [ 308.298328] [<ffffffff81784179>] tracesys+0xd0/0xd5
>>>>> [ 308.298335] Code: e5 d6 ff ff 83 3d be 24 00 00 04 89 c2 4c 8b 45 b0
>>>>> 44 8b 4d b8 0f 8f 20 02 00 00 85 d2 75 32 83 83 78 03 00 00 01 4b 8b
>>>>> 44 c5 48 <8b> 88 60 02 00 00 85 c9 0f 84 b0 00 00 00 8b 40 58 89 c2 41
>>>>> 89
>>>>> [ 308.298487] RIP [<ffffffffa07a9fd2>] vb2_perform_fileio+0x372/0x610
>>>>> [videobuf2_core]
>>>>> [ 308.298507] RSP <ffff88014dc03df8>
>>>>> [ 308.298514] CR2: 0000000000000260
>>>>> [ 308.298526] ---[ end trace e8f01717c96d1e41 ]---
>>>>>
>>>>
>>>
>>>
>>>
>>
> 
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-10-14  8:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-13 13:03 Possible race condition on videobuf2? Ricardo Ribalda Delgado
2013-10-13 16:42 ` Hans Verkuil
2013-10-14  7:42   ` Ricardo Ribalda Delgado
2013-10-14  7:47     ` Hans Verkuil
2013-10-14  7:53       ` Ricardo Ribalda Delgado
2013-10-14  8:13         ` Hans Verkuil

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox