From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758855Ab3K1EKw (ORCPT ); Wed, 27 Nov 2013 23:10:52 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:55254 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754434Ab3K1EKj (ORCPT ); Wed, 27 Nov 2013 23:10:39 -0500 Date: Wed, 27 Nov 2013 20:11:28 -0800 From: Greg KH To: Xiaotian Feng Cc: Tejun Heo , Andrew Morton , neilb@suse.de, linux-kernel Subject: Re: page fault deadlock Message-ID: <20131128041128.GA30156@kroah.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.22 (2013-10-16) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 28, 2013 at 11:25:32AM +0800, Xiaotian Feng wrote: > Hi, > > When I upgrade to latest kernel, I found my system hang there. It > is reproducible on my virtualbox, and I found each time I mounted my > RAID6 partition and tried to vi or build kernel, my whole system > lockup very soon. > > After turning on lockdep, I found following lockdep warning: > > [ 27.848462] > [ 27.848471] ====================================================== > [ 27.848477] [ INFO: possible circular locking dependency detected ] > [ 27.848484] 3.13.0-rc1+ #1 Tainted: GF W > [ 27.848490] ------------------------------------------------------- > [ 27.848496] Xorg/1268 is trying to acquire lock: > [ 27.848501] (&of->mutex){+.+.+.}, at: [] > sysfs_bin_mmap+0x4f/0x120 > [ 27.848516] > [ 27.848516] but task is already holding lock: > [ 27.848521] (&mm->mmap_sem){++++++}, at: [] > vm_mmap_pgoff+0x6f/0xc0 > [ 27.848534] > [ 27.848534] which lock already depends on the new lock. > [ 27.848534] > [ 27.848541] > [ 27.848541] the existing dependency chain (in reverse order) is: > [ 27.848547] > [ 27.848547] -> #2 (&mm->mmap_sem){++++++}: > [ 27.848556] [] lock_acquire+0xb0/0x160 > [ 27.848564] [] might_fault+0x8c/0xb0 > [ 27.848572] [] md_ioctl+0xa78/0x19b0 > [ 27.848580] [] blkdev_ioctl+0x234/0x840 > [ 27.848588] [] block_ioctl+0x41/0x50 > [ 27.848597] [] do_vfs_ioctl+0x300/0x520 > [ 27.848605] [] SyS_ioctl+0x81/0xa0 > [ 27.848613] [] tracesys+0xe1/0xe6 > [ 27.848622] > [ 27.848622] -> #1 (&mddev->reconfig_mutex){+.+.+.}: > [ 27.848630] [] lock_acquire+0xb0/0x160 > [ 27.848637] [] > mutex_lock_interruptible_nested+0x78/0x610 > [ 27.848646] [] rdev_attr_show+0x40/0x90 > [ 27.848654] [] sysfs_seq_show+0xda/0x170 > [ 27.848662] [] seq_read+0x164/0x3e0 > [ 27.848671] [] vfs_read+0x95/0x160 > [ 27.848680] [] SyS_read+0x49/0xa0 > [ 27.848687] [] tracesys+0xe1/0xe6 > [ 27.848695] > [ 27.848695] -> #0 (&of->mutex){+.+.+.}: > [ 27.848703] [] __lock_acquire+0x1587/0x1ca0 > [ 27.848711] [] lock_acquire+0xb0/0x160 > [ 27.848718] [] mutex_lock_nested+0x68/0x510 > [ 27.848725] [] sysfs_bin_mmap+0x4f/0x120 > [ 27.848732] [] mmap_region+0x3ed/0x5d0 > [ 27.848741] [] do_mmap_pgoff+0x34e/0x3d0 > [ 27.848748] [] vm_mmap_pgoff+0x90/0xc0 > [ 27.848755] [] SyS_mmap_pgoff+0x1d5/0x270 > [ 27.848763] [] SyS_mmap+0x22/0x30 > [ 27.848771] [] tracesys+0xe1/0xe6 > [ 27.848778] > [ 27.848778] other info that might help us debug this: > [ 27.848778] > [ 27.848785] Chain exists of: > [ 27.848785] &of->mutex --> &mddev->reconfig_mutex --> &mm->mmap_sem > [ 27.848785] > [ 27.848795] Possible unsafe locking scenario: > [ 27.848795] > [ 27.848800] CPU0 CPU1 > [ 27.848805] ---- ---- > [ 27.848810] lock(&mm->mmap_sem); > [ 27.848817] lock(&mddev->reconfig_mutex); > [ 27.848824] lock(&mm->mmap_sem); > [ 27.848830] lock(&of->mutex); > [ 27.848837] > [ 27.848837] *** DEADLOCK *** > [ 27.848837] > [ 27.848844] 1 lock held by Xorg/1268: > [ 27.848849] #0: (&mm->mmap_sem){++++++}, at: [] > vm_mmap_pgoff+0x6f/0xc0 > [ 27.848861] > [ 27.848861] stack backtrace: > [ 27.848868] CPU: 1 PID: 1268 Comm: Xorg Tainted: GF W 3.13.0-rc1+ #1 > [ 27.848873] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS > VirtualBox 12/01/2006 > [ 27.848879] ffffffff822daa00 ffff8800d0371bc8 ffffffff817725f7 > ffffffff822cbdc0 > [ 27.848901] ffff8800d0371c08 ffffffff8176d9eb ffff8800d0371c60 > ffff880115b42a78 > [ 27.848909] 0000000000000000 ffff880115b42a78 ffff880115b422a0 > 0000000000000001 > [ 27.848918] Call Trace: > [ 27.848930] [] dump_stack+0x4e/0x7a > [ 27.848942] [] print_circular_bug+0x1f9/0x208 > [ 27.848952] [] __lock_acquire+0x1587/0x1ca0 > [ 27.848964] [] ? print_context_stack+0x8f/0x100 > [ 27.848975] [] lock_acquire+0xb0/0x160 > [ 27.848986] [] ? sysfs_bin_mmap+0x4f/0x120 > [ 27.848996] [] ? sysfs_bin_mmap+0x4f/0x120 > [ 27.849007] [] mutex_lock_nested+0x68/0x510 > [ 27.849016] [] ? sysfs_bin_mmap+0x4f/0x120 > [ 27.849027] [] ? kmemleak_alloc+0x4e/0xb0 > [ 27.849038] [] sysfs_bin_mmap+0x4f/0x120 > [ 27.849048] [] mmap_region+0x3ed/0x5d0 > [ 27.849058] [] do_mmap_pgoff+0x34e/0x3d0 > [ 27.849070] [] vm_mmap_pgoff+0x90/0xc0 > [ 27.849080] [] SyS_mmap_pgoff+0x1d5/0x270 > [ 27.849092] [] ? syscall_trace_enter+0x145/0x270 > [ 27.849102] [] SyS_mmap+0x22/0x30 > [ 27.849112] [] tracesys+0xe1/0xe6 > > > I think it is a real deadlock, and it is caused by commit > 3124eb1679b28726 "sysfs: merge regular and bin file handling". > > With that commit, sysfs_bin_mmap will hold of->mutex. > > So assume cpu0 called sysfs_bin_mmap, acquired mmap_sem and trying > to get of->mutex. > > CPU1 called sysfs_seq_show, acqured of->mutex and trying to > get mddev->reconfig_mutex. > > CPU2 called md_ioctl, acquired mddev->reconfig_mutex, and > later call copy_from_user and page fault trying to get mmap_sem. > > DEADLOCK now. I can't test the effort of reverting 3124eb16 as > there're a whole patchset and many commits after that. But I do > believe it's buggy and the root cause of my system hang. > > CPU0: CPU1: > CPU2: > lock(&mm->mmap_sem) > lock(&of->mutex); > > lock(&mddev->reconfig_mutex) > > lock(&mm->mmap_sem) > > lock(&mddev->reconfig_mutex) > lock(&of->mutex) > > Can we revert commit 3124eb167? or any patches to solve this page > fault deadlock? Thanks. Can you try linux-next, this should be fixed with a patch in my tree there, thanks. greg k-h