From: David Chinner <dgc@sgi.com>
To: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Cc: pvp-lsts@fs.ru.acad.bg,
Alexander Beregalov <a.beregalov@gmail.com>,
kernel-testers@vger.kernel.org,
kernel list <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@elte.hu>,
peterz@infradead.org, xfs@oss.sgi.com,
David Chinner <dgc@sgi.com>
Subject: Re: 2.6.26-rc1: possible circular locking dependency with xfs filesystem
Date: Mon, 12 May 2008 09:10:02 +1000 [thread overview]
Message-ID: <20080511231002.GN103491721@sgi.com> (raw)
In-Reply-To: <48266C77.3040102@linux.vnet.ibm.com>
On Sun, May 11, 2008 at 09:18:07AM +0530, Kamalesh Babulal wrote:
> Kamalesh Babulal wrote:
> > Adding the cc to kernel-list, Ingo Molnar and Peter Zijlstra
> >
> > Alexander Beregalov wrote:
> >> [ INFO: possible circular locking dependency detected ]
> >> 2.6.26-rc1-00279-g28a4acb #13
> >> -------------------------------------------------------
> >> nfsd/3087 is trying to acquire lock:
> >> (iprune_mutex){--..}, at: [<c016f947>] shrink_icache_memory+0x38/0x19b
> >>
> >> but task is already holding lock:
> >> (&(&ip->i_iolock)->mr_lock){----}, at: [<c0210b83>] xfs_ilock+0xa2/0xd6
> >>
> >> which lock already depends on the new lock.
> >>
> >>
> >> the existing dependency chain (in reverse order) is:
> >>
> >> -> #1 (&(&ip->i_iolock)->mr_lock){----}:
> >> [<c01352e6>] __lock_acquire+0xa0c/0xbc6
> >> [<c013550a>] lock_acquire+0x6a/0x86
> >> [<c012c39a>] down_write_nested+0x33/0x6a
> >> [<c0210b5c>] xfs_ilock+0x7b/0xd6
> >> [<c0210cd5>] xfs_ireclaim+0x1d/0x59
> >> [<c022edfe>] xfs_finish_reclaim+0x173/0x195
> >> [<c0230fa3>] xfs_reclaim+0xb3/0x138
> >> [<c023b4cb>] xfs_fs_clear_inode+0x55/0x8e
> >> [<c016f60b>] clear_inode+0x83/0xd2
> >> [<c016f88a>] dispose_list+0x3c/0xc1
> >> [<c016fa82>] shrink_icache_memory+0x173/0x19b
> >> [<c014a68d>] shrink_slab+0xda/0x14e
> >> [<c014a8e5>] try_to_free_pages+0x1e4/0x2a2
> >> [<c0146997>] __alloc_pages_internal+0x23a/0x39d
> >> [<c0146b11>] __alloc_pages+0xa/0xc
> >> [<c01483b2>] __do_page_cache_readahead+0xaa/0x16a
> >> [<c01484bc>] force_page_cache_readahead+0x4a/0x74
> >> [<c014c9b0>] sys_madvise+0x308/0x400
> >> [<c0102b25>] sysenter_past_esp+0x6a/0xb1
> >> [<ffffffff>] 0xffffffff
> >>
> >> -> #0 (iprune_mutex){--..}:
> >> [<c0135203>] __lock_acquire+0x929/0xbc6
> >> [<c013550a>] lock_acquire+0x6a/0x86
> >> [<c0356a6f>] mutex_lock_nested+0xb4/0x226
> >> [<c016f947>] shrink_icache_memory+0x38/0x19b
> >> [<c014a68d>] shrink_slab+0xda/0x14e
> >> [<c014a8e5>] try_to_free_pages+0x1e4/0x2a2
> >> [<c0146997>] __alloc_pages_internal+0x23a/0x39d
> >> [<c0146b11>] __alloc_pages+0xa/0xc
> >> [<c01483b2>] __do_page_cache_readahead+0xaa/0x16a
> >> [<c014866c>] ondemand_readahead+0x119/0x127
> >> [<c01486cc>] page_cache_async_readahead+0x52/0x5d
> >> [<c0178e46>] generic_file_splice_read+0x290/0x4a8
> >> [<c0239f06>] xfs_splice_read+0x4b/0x78
> >> [<c0237713>] xfs_file_splice_read+0x24/0x29
> >> [<c0178182>] do_splice_to+0x45/0x63
> >> [<c01783f6>] splice_direct_to_actor+0xab/0x150
> >> [<c01ce8e1>] nfsd_vfs_read+0x1ed/0x2d0
> >> [<c01ced50>] nfsd_read+0x82/0x99
> >> [<c01d42bc>] nfsd3_proc_read+0xdf/0x12a
> >> [<c01cb40b>] nfsd_dispatch+0xcf/0x19e
> >> [<c033f484>] svc_process+0x3b3/0x68b
> >> [<c01cb939>] nfsd+0x168/0x26b
> >> [<c0103747>] kernel_thread_helper+0x7/0x10
> >> [<ffffffff>] 0xffffffff
Oh, yeah, that. Direct inode reclaim through memory pressure.
Effectively memory reclaim inverts locking order w.r.t. iprune_mutex
when it recurses into the filesystem. False positive - can never
cause a deadlock on XFS. Can't be solved from the XFS side of things
without effectively turning off lockdep checking for xfs inode
locking.
The fix is needed to lockdep via iprune_mutex annotations here....
> May 9 02:16:46 nomad64 kernel: [42951853.992965] the existing dependency chain (in reverse order) is:
> May 9 02:16:46 nomad64 kernel: [42951853.992967]
> May 9 02:16:46 nomad64 kernel: [42951853.992968] -> #1 (&(&ip->i_iolock)->mr_lock){----}:
> May 9 02:16:46 nomad64 kernel: [42951853.992974] [<ffffffff80261d72>] __lock_acquire+0xf92/0x1080
> May 9 02:16:46 nomad64 kernel: [42951853.992989] [<ffffffff80261f02>] lock_acquire+0xa2/0xd0
> May 9 02:16:46 nomad64 kernel: [42951853.993002] [<ffffffff80255556>] down_write_nested+0x46/0x80
> May 9 02:16:46 nomad64 kernel: [42951853.993018] [<ffffffff80387fb9>] xfs_ilock+0x99/0xa0
> May 9 02:16:46 nomad64 kernel: [42951853.993034] [<ffffffff803a5117>] xfs_free_eofblocks+0x1c7/0x250
> May 9 02:16:46 nomad64 kernel: [42951853.993049] [<ffffffff803a8a26>] xfs_release+0x186/0x1d0
> May 9 02:16:46 nomad64 kernel: [42951853.993062] [<ffffffff803aeeb0>] xfs_file_release+0x10/0x20
> May 9 02:16:46 nomad64 kernel: [42951853.993076] [<ffffffff802a01cc>] __fput+0xcc/0x1c0
> May 9 02:16:46 nomad64 kernel: [42951853.993091] [<ffffffff802a05e6>] fput+0x16/0x20
> May 9 02:16:46 nomad64 kernel: [42951853.993105] [<ffffffff8028865a>] remove_vma+0x4a/0x80
> May 9 02:16:46 nomad64 kernel: [42951853.993120] [<ffffffff802894e1>] do_munmap+0x281/0x2e0
> May 9 02:16:46 nomad64 kernel: [42951853.993134] [<ffffffff8028958b>] sys_munmap+0x4b/0x70
> May 9 02:16:46 nomad64 kernel: [42951853.993148] [<ffffffff8020b62b>] system_call_after_swapgs+0x7b/0x80
> May 9 02:16:46 nomad64 kernel: [42951853.993161] [<ffffffffffffffff>] 0xffffffffffffffff
hmmmm. Sounds like:
fd = open()
addr = mmap(fd)
close(fd)
.....
munmap(addr);
But yes, XFS takes locks in ->release which means.....
> May 9 02:16:46 nomad64 kernel: [42951853.993293] Call Trace:
> May 9 02:16:46 nomad64 kernel: [42951853.993297] [<ffffffff8025f2b3>] print_circular_bug_tail+0x83/0x90
> May 9 02:16:46 nomad64 kernel: [42951853.993302] [<ffffffff80261b90>] __lock_acquire+0xdb0/0x1080
> May 9 02:16:46 nomad64 kernel: [42951853.993306] [<ffffffff80222bbd>] ? do_page_fault+0xdd/0x890
> May 9 02:16:46 nomad64 kernel: [42951853.993310] [<ffffffff80261f02>] lock_acquire+0xa2/0xd0
> May 9 02:16:46 nomad64 kernel: [42951853.993313] [<ffffffff80222bbd>] ? do_page_fault+0xdd/0x890
> May 9 02:16:46 nomad64 kernel: [42951853.993317] [<ffffffff806b887b>] down_read+0x3b/0x70
> May 9 02:16:46 nomad64 kernel: [42951853.993320] [<ffffffff80222bbd>] do_page_fault+0xdd/0x890
> May 9 02:16:46 nomad64 kernel: [42951853.993324] [<ffffffff806ba5dd>] error_exit+0x0/0xa9
> May 9 02:16:46 nomad64 kernel: [42951853.993328] [<ffffffff802739b6>] ? file_read_actor+0x46/0x1b0
> May 9 02:16:46 nomad64 kernel: [42951853.993331] [<ffffffff806ba3d6>] ? _read_unlock_irq+0x36/0x60
> May 9 02:16:46 nomad64 kernel: [42951853.993335] [<ffffffff80275dbc>] ? generic_file_aio_read+0x2cc/0x5d0
> May 9 02:16:46 nomad64 kernel: [42951853.993339] [<ffffffff8025ddb9>] ? get_lock_stats+0x19/0x70
> May 9 02:16:46 nomad64 kernel: [42951853.993343] [<ffffffff803b2769>] ? xfs_read+0x139/0x220
> May 9 02:16:46 nomad64 kernel: [42951853.993347] [<ffffffff803af06d>] ? xfs_file_aio_read+0x4d/0x60
> May 9 02:16:46 nomad64 kernel: [42951853.993350] [<ffffffff8029eeb1>] ? do_sync_read+0xf1/0x130
> May 9 02:16:46 nomad64 kernel: [42951853.993354] [<ffffffff802516e0>] ? autoremove_wake_function+0x0/0x40
> May 9 02:16:46 nomad64 kernel: [42951853.993358] [<ffffffff8026089a>] ? trace_hardirqs_on+0xda/0x170
> May 9 02:16:46 nomad64 kernel: [42951853.993361] [<ffffffff80272e45>] ? __rcu_read_unlock+0xb5/0xc0
> May 9 02:16:46 nomad64 kernel: [42951853.993365] [<ffffffff8026089a>] ? trace_hardirqs_on+0xda/0x170
> May 9 02:16:46 nomad64 kernel: [42951853.993369] [<ffffffff803c4381>] ? security_file_permission+0x11/0x20
> May 9 02:16:46 nomad64 kernel: [42951853.993374] [<ffffffff8029f794>] ? vfs_read+0xc4/0x160
> May 9 02:16:46 nomad64 kernel: [42951853.993377] [<ffffffff8029fc30>] ? sys_read+0x50/0x90
> May 9 02:16:46 nomad64 kernel: [42951853.993380] [<ffffffff8020b62b>] ? system_call_after_swapgs+0x7b/0x80
Oh, joy - a page fault during a read() call triggers lock order
inversions on the mmap->sem. I don't think this can deadlock
(can't be page faulting in a vma that is being torn down), but
it's clear from the last trace that the VM has a mmap->sem
inversion problem with ->release vs ->read and page faults...
Basically what we are seeing here in both cases is that the VM is
calling inode ->release or ->clear_inode methods with different high
level locks held. If the filesystem has to take the same locks in
these methods as it does in, say, ->read (like XFS does), then we
are guaranteed to get reports like this. AFAICT there's nothing we
can do from the filesystem perspective to prevent false positives like
this from being reported....
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
next prev parent reply other threads:[~2008-05-11 23:10 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <a4423d670805101005x113c4813w2b95c1fb535cf080@mail.gmail.com>
[not found] ` <4825DF71.1030209@linux.vnet.ibm.com>
2008-05-11 3:48 ` 2.6.26-rc1: possible circular locking dependency with xfs filesystem Kamalesh Babulal
2008-05-11 23:10 ` David Chinner [this message]
2008-05-15 17:45 ` Alexander Beregalov
2008-05-15 22:27 ` David Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080511231002.GN103491721@sgi.com \
--to=dgc@sgi.com \
--cc=a.beregalov@gmail.com \
--cc=kamalesh@linux.vnet.ibm.com \
--cc=kernel-testers@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=pvp-lsts@fs.ru.acad.bg \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox