From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from lb3-smtp-cloud2.xs4all.net ([194.109.24.29]:47511 "EHLO lb3-smtp-cloud2.xs4all.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751072AbbFSMer (ORCPT ); Fri, 19 Jun 2015 08:34:47 -0400 Message-ID: <55840C50.4070705@xs4all.nl> Date: Fri, 19 Jun 2015 14:34:24 +0200 From: Hans Verkuil MIME-Version: 1.0 To: Jan Kara CC: Linux Media Mailing List , Andrew Morton Subject: Re: [PATCH] Revert "[media] vb2: Push mmap_sem down to memops" References: <557E7DC7.3000607@xs4all.nl> <20150618103335.GF21820@quack.suse.cz> In-Reply-To: <20150618103335.GF21820@quack.suse.cz> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-media-owner@vger.kernel.org List-ID: On 06/18/2015 12:33 PM, Jan Kara wrote: > On Mon 15-06-15 09:24:55, Hans Verkuil wrote: >> This reverts commit 48b25a3a713b90988b6882d318f7c0a6bed9aabc. >> >> That commit caused two regressions. The first is a BUG: >> >> BUG: unable to handle kernel NULL pointer dereference at 0000000000000100 >> IP: [] __lock_acquire+0x2f0/0x2070 >> PGD 0 >> Oops: 0000 [#1] PREEMPT SMP >> Modules linked in: vivid v4l2_dv_timings videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev media vmw_balloon vmw_vmci acpi_cpufreq processor button >> CPU: 0 PID: 1542 Comm: v4l2-ctl Not tainted 4.1.0-rc3-test-media #1190 >> Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014 >> task: ffff880220ce4200 ti: ffff88021d16c000 task.ti: ffff88021d16c000 >> RIP: 0010:[] [] __lock_acquire+0x2f0/0x2070 >> RSP: 0018:ffff88021d16f9b8 EFLAGS: 00010002 >> RAX: 0000000000000046 RBX: 0000000000000292 RCX: 0000000000000001 >> RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000100 >> RBP: ffff88021d16fa88 R08: 0000000000000001 R09: 0000000000000000 >> R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001 >> R13: ffff880220ce4200 R14: 0000000000000100 R15: 0000000000000000 >> FS: 00007f2441e7f740(0000) GS:ffff880236e00000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> CR2: 0000000000000100 CR3: 0000000001e0b000 CR4: 00000000001406f0 >> Stack: >> ffff88021d16fa98 ffffffff810d6543 0000000000000006 0000000000000246 >> ffff88021d16fa08 ffffffff810d532d ffff880220ce4a78 ffff880200000000 >> ffff880200000001 0000000000000000 0000000000000001 000000000093a4a0 >> Call Trace: >> [] ? __lock_acquire+0xb63/0x2070 >> [] ? mark_held_locks+0x6d/0xa0 >> [] ? __lock_is_held+0x58/0x80 >> [] lock_acquire+0x6c/0xa0 >> [] ? vb2_vmalloc_put_userptr+0x36/0x110 [videobuf2_vmalloc] >> [] down_read+0x42/0x60 >> [] ? vb2_vmalloc_put_userptr+0x36/0x110 [videobuf2_vmalloc] >> [] ? mutex_lock_nested+0x2b1/0x560 >> [] ? vb2_queue_release+0x25/0x40 [videobuf2_core] >> [] vb2_vmalloc_put_userptr+0x36/0x110 [videobuf2_vmalloc] >> [] __vb2_queue_free+0x146/0x5e0 [videobuf2_core] >> [] vb2_queue_release+0x33/0x40 [videobuf2_core] >> [] _vb2_fop_release+0x95/0xb0 [videobuf2_core] >> [] vb2_fop_release+0x29/0x50 [videobuf2_core] >> [] vivid_fop_release+0x92/0x230 [vivid] >> [] v4l2_release+0x30/0x80 [videodev] >> [] __fput+0xe5/0x200 >> [] ____fput+0x9/0x10 >> [] task_work_run+0xc4/0xf0 >> [] do_exit+0x3a0/0xaf0 >> [] ? _raw_spin_unlock_irq+0x2b/0x60 >> [] do_group_exit+0x4f/0xe0 >> [] get_signal+0x200/0x8c0 >> [] ? __mutex_unlock_slowpath+0xf5/0x240 >> [] do_signal+0x23/0x820 >> [] ? mutex_unlock+0x9/0x10 >> [] ? v4l2_ioctl+0x78/0xf0 [videodev] >> [] ? int_very_careful+0x5/0x46 >> [] ? trace_hardirqs_on_caller+0x15d/0x200 >> [] do_notify_resume+0x50/0x60 >> [] int_signal+0x12/0x17 >> Code: ca 81 31 c0 e8 7a e2 8c 00 e8 aa 1d 8d 00 0f 1f 44 00 00 31 db 48 81 c4 a8 00 00 00 89 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 90 <49> 81 3e 40 4e 02 82 b8 00 00 00 00 44 0f 44 e0 41 83 ff 01 0f >> RIP [] __lock_acquire+0x2f0/0x2070 >> RSP >> CR2: 0000000000000100 >> ---[ end trace 25595c2b8560cb57 ]--- >> Fixing recursive fault but reboot is needed! > > Ah, that's tricky. We can end up calling task_work_run() via > exit_task_work() after mm has been shut down. And the task work will be > dropping the last reference to all file descriptors which ends up shutting > down vb2 after current->mm has been cleaned up. > > So in the light of this it's probably better for the initial patch to > completely avoid grabbing mmap_sem in put_userptr(). It breaks locking for > vma->vm_ops->close() but that's already broken in vb2 as I explained in my > other email. And the remainder of the patch set will make sure we don't > need mmap_sem in put_userptr() at all and thus fixes the whole issue. > > This also explains why I never saw the problem in my testing - I was always > testing the patch set as a whole. > > I'll send an updated first patch later today. I just wanted to thank you for your quick work in resolving both regressions and making a new patch series. Much appreciated! I was afraid this would prove to be complex to fix, so I'm glad that I was wrong about that. Hans -- To unsubscribe from this list: send the line "unsubscribe linux-media" in