From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>,
linux-xfs@vger.kernel.org, hch@infradead.org
Subject: Re: lockdep warning on 4.11.0-rc5 kernel
Date: Fri, 7 Apr 2017 10:35:40 -0700 [thread overview]
Message-ID: <20170407173540.GL4864@birch.djwong.org> (raw)
In-Reply-To: <20170407172828.GB55851@bfoster.bfoster>
On Fri, Apr 07, 2017 at 01:28:31PM -0400, Brian Foster wrote:
> On Fri, Apr 07, 2017 at 09:58:43AM -0700, Darrick J. Wong wrote:
> > On Fri, Apr 07, 2017 at 11:10:09AM -0400, Brian Foster wrote:
> > > On Wed, Apr 05, 2017 at 04:52:11PM -0400, Vivek Goyal wrote:
> > > > Hi,
> > > >
> > > > I am running 4.11.0-rc5 kernel and did a kernel build and noticed
> > > > following lockdep warning on console. Have not analyzed it. Lots of
> > > > xfs in backtrace, so sending it to xfs mailing list.
> > > >
> > >
> > > Darrick pointed out on irc yesterday that this is likely due to the
> > > lock_inode() call in chmod_common(). I was confused as to where the
> > > iolock came into play here, but apparently we now reuse the core
> > > inode->i_rwsem for that.
> > >
> > > In any event, I was playing around with this and reproduce pretty easily
> > > by populating an fs with a bunch files with speculative preallocation
> > > and then generating some memory pressure. I reproduce with the following
> > > stack, however:
> > >
> > > [ 434.220605] =================================
> > > [ 434.222286] [ INFO: inconsistent lock state ]
> > > [ 434.224092] 4.11.0-rc4+ #36 Tainted: G OE
> > > [ 434.225839] ---------------------------------
> > > [ 434.227587] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W}
> > > usage.
> > > [ 434.229995] kswapd0/59 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > > [ 434.231851] (&sb->s_type->i_mutex_key#17){+.+.?.}, at:
> > > [<ffffffffc078e0aa>] xfs_ilock+0x20a/0x300 [xfs]
> > > [ 434.235473] {RECLAIM_FS-ON-W} state was registered at:
> > > [ 434.237427] mark_held_locks+0x76/0xa0
> > > [ 434.238840] lockdep_trace_alloc+0x7d/0xe0
> > > [ 434.240362] kmem_cache_alloc+0x2f/0x2d0
> > > [ 434.241871] kmem_zone_alloc+0x81/0x120 [xfs]
> > > [ 434.243559] xfs_trans_alloc+0x6c/0x130 [xfs]
> > > [ 434.245233] xfs_vn_update_time+0x75/0x230 [xfs]
> > > [ 434.247031] file_update_time+0xbc/0x110
> > > [ 434.248593] xfs_file_aio_write_checks+0x19b/0x1c0 [xfs]
> > > [ 434.250762] xfs_file_buffered_aio_write+0x75/0x350 [xfs]
> > > [ 434.252978] xfs_file_write_iter+0x103/0x150 [xfs]
> > > [ 434.254935] __vfs_write+0xe8/0x160
> > > [ 434.256325] vfs_write+0xcb/0x1f0
> > > [ 434.257625] SyS_pwrite64+0x98/0xc0
> > > [ 434.258963] entry_SYSCALL_64_fastpath+0x1f/0xc2
> > >
> > > ... so this isn't just a chmod thing. OTOH, I think we agree that this
> > > is not a real deadlock vector because the iolock is taken in the
> > > destroy_inode() path and so there should be no other reference to the
> > > inode.
> > >
> > > That aside, the IOLOCK_EXCL was added to xfs_inactive() in commit
> > > a36b926180 ("xfs: pull up iolock from xfs_free_eofblocks()") purely to
> > > honor the cleaner call semantics that patch defined for
> > > xfs_free_eofblocks(). We could probably either drop the iolock from here
> > > (though we would then have to kill the assert in xfs_free_eofblocks()),
> > > or use something like the diff below that quiets the lockdep splat for
> > > me. Thoughts?
> >
> > Well... the inode is inactive, which means there won't be any io to
> > protect ourselves against, so we don't need to take the iolock, right?
> > Why not remove the iolock grabbing and either change the iolock assert
> > in _free_eofblocks to detect that we're coming from _inactive and not
> > blow the assert, or just make a __xfs_free_eofblocks that skips the
> > assert.
> >
>
> Correct, the iolock is not necessary here and historically was not
> taken. It was added in the commit above to clean up the function, more
> specifically to support the assert. I'd rather not go and uglify the
> whole thing again (might as well just revert the patch in that case).
>
> I guess we could just kill the assert then. How about the appended?
Looks reasonable. Test, patchify, and send to list?
--D
>
> Brian
>
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 4d1920e..de94798 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -903,9 +903,9 @@ xfs_can_free_eofblocks(struct xfs_inode *ip, bool force)
> }
>
> /*
> - * This is called by xfs_inactive to free any blocks beyond eof
> - * when the link count isn't zero and by xfs_dm_punch_hole() when
> - * punching a hole to EOF.
> + * This is called to free any blocks beyond eof. The caller must hold
> + * IOLOCK_EXCL unless we are in the inode reclaim path and have the only
> + * reference to the inode.
> */
> int
> xfs_free_eofblocks(
> @@ -920,8 +920,6 @@ xfs_free_eofblocks(
> struct xfs_bmbt_irec imap;
> struct xfs_mount *mp = ip->i_mount;
>
> - ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
> -
> /*
> * Figure out if there are any blocks beyond the end
> * of the file. If not, then there is nothing to do.
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 7605d83..e36c2c3 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1906,12 +1906,13 @@ xfs_inactive(
> * force is true because we are evicting an inode from the
> * cache. Post-eof blocks must be freed, lest we end up with
> * broken free space accounting.
> + *
> + * Note: don't bother with iolock here since lockdep complains
> + * about acquiring it in reclaim context. We have the only
> + * reference to the inode at this point anyways.
> */
> - if (xfs_can_free_eofblocks(ip, true)) {
> - xfs_ilock(ip, XFS_IOLOCK_EXCL);
> + if (xfs_can_free_eofblocks(ip, true))
> xfs_free_eofblocks(ip);
> - xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> - }
>
> return;
> }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2017-04-07 17:36 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-05 20:52 lockdep warning on 4.11.0-rc5 kernel Vivek Goyal
2017-04-07 15:10 ` Brian Foster
2017-04-07 16:58 ` Darrick J. Wong
2017-04-07 17:28 ` Brian Foster
2017-04-07 17:35 ` Darrick J. Wong [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170407173540.GL4864@birch.djwong.org \
--to=darrick.wong@oracle.com \
--cc=bfoster@redhat.com \
--cc=hch@infradead.org \
--cc=linux-xfs@vger.kernel.org \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox