All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Myers <bpm@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Michael L. Semon" <mlsemon35@gmail.com>, xfs-oss <xfs@oss.sgi.com>
Subject: Re: Multi-CPU harmless lockdep on x86 while copying data
Date: Mon, 10 Mar 2014 15:52:49 -0500	[thread overview]
Message-ID: <20140310205249.GS1935@sgi.com> (raw)
In-Reply-To: <20140310025523.GV6851@dastard>

Hi Dave,

On Mon, Mar 10, 2014 at 01:55:23PM +1100, Dave Chinner wrote:
> On Sat, Mar 08, 2014 at 09:58:01PM -0500, Michael L. Semon wrote:
> > Hi!  I was playing a shell game with some precious backup data.  It 
> > went like this:
> > 
> > open a 36-GB source partition (DM linear, partitions from 2 drives, 
> >    v5-superblock XFS)
> > 
> > open a 32-GB aes-xts crypt (v4-superblock XFS)
> > 
> > `cp -av` from holding partition to crypt
> > 
> > It was during the cp operation that I got this multi-CPU version of 
> > the harmless lockdep:
> > 
> > =========================================================
> > [ INFO: possible irq lock inversion dependency detected ]
> > 3.14.0-rc5+ #6 Not tainted
> > ---------------------------------------------------------
> > kswapd0/25 just changed the state of lock:
> >  (&xfs_dir_ilock_class){++++-.}, at: [<791c09fa>] xfs_ilock+0xff/0x16c
> > but this lock took another, RECLAIM_FS-unsafe lock in the past:
> >  (&mm->mmap_sem){++++++}
> > 
> > and interrupts could create inverse lock ordering between them.
> > 
> > 
> > other info that might help us debug this:
> >  Possible interrupt unsafe locking scenario:
> > 
> >        CPU0                    CPU1
> >        ----                    ----
> >   lock(&mm->mmap_sem);
> >                                local_irq_disable();
> >                                lock(&xfs_dir_ilock_class);
> >                                lock(&mm->mmap_sem);
> >   <Interrupt>
> >     lock(&xfs_dir_ilock_class);
> 
> Well, that's rather interesting. I'd love to know where we take the
> directory ilock with interupts disabled - and then take a page
> fault...  :/
> 
> ....
> 
> >   }
> >   ... key      at: [<795f1eec>] __key.44037+0x0/0x8
> >   ... acquired at:
> >    [<79064360>] __lock_acquire+0x397/0xa6c
> >    [<7906510f>] lock_acquire+0x8b/0x101
> >    [<790d1718>] might_fault+0x81/0xa9
> >    [<790f379a>] filldir64+0x92/0xe2
> >    [<7916f079>] xfs_dir2_sf_getdents+0x1a0/0x44c
> >    [<7917009e>] xfs_readdir+0xc4/0x126
> >    [<79171b8b>] xfs_file_readdir+0x25/0x3e
> >    [<790f385a>] iterate_dir+0x70/0x9b
> >    [<790f3a31>] SyS_getdents64+0x6d/0xcc
> >    [<792f85b8>] sysenter_do_call+0x12/0x36
> 
> All of these paths are from the syscall path, except for one in
> kswapd context. I don't see any stack trace here that could result
> in the above "deadlock". It looks to be yet another false
> positive...
> 
> > I call it harmless because its single-CPU variant can be reproduced
> > as far back as I could bisect in earlier testing (way before
> > kernel 2.6.20).  However, such lockdep splats have never manifested
> > in a noticeable problem on production kernels on x86.  Either
> > down_write_nested or down_read_nested is in all of them, depending
> > on which kernel version is in use.  At least one reclaim-related
> > function is in there as well.  vm_map_ram used to be in there, but Dave 
> > took care of that (thanks!).
> > 
> > This particular splat has been showing up more often, though.  It's not
> > tied to one particular commit, event, or change; just something that
> > has crept in gradually since maybe kernel 3.11.  It's like watching
> > grass grow or watching paint dry.
> 
> The above reports all indicate that the taking a page fault while
> holding the directory ilock is problematic. So have all the other
> new lockdep reports we've got that have been caused by that change.
> 
> Realistically, I think that the readdir locking change was
> fundamentally broken. Why? Because taking page faults while holding
> the ilock violates the lock order we have for inodes.  For regular
> files, the lock heirarchy is iolock -> page fault -> ilock, and we
> got rid of the need for the iolock in the inode reclaim path because
> of all the problems it caused lockdep. Now we've gone and
> reintroduced that same set of problems - only worse - by allowing
> the readdir code to take page faults while holding the ilock....
> 
> I'd like to revert the change that introduced the ilock into the
> readdir path, but I suspect that woul dbe an unpopular direction to
> take. However, we need a solution that doesn't drive us insane with
> lockdep false positives.
> 
> IOWs, we need to do this differently - readdir needs to be protected
> by the iolock, not the ilock, and only the block mapping routines in
> the readdir code should be protected by the ilock. i.e.  the same
> locking strategy that is used for regular files: iolock protects the
> contents of the file, the ilock protects the inode metadata.
> 
> What I think we really need to do is drive the ilock all the way
> into the block mapping functions of the directory code and to the
> places where the inode metadata is actually going to be modified.
> The iolock protects access to the directory data, and logging of
> changes to those blocks is serialised by the buffer locks, so we
> don't actually need the inode ilock to serialise access to the
> directory data - we only need the ilock for modifications to the
> inode and it's blockmaps itself, same as for regular files.
> 
> Changing the directory code to handle this sort of locking is going
> to require a bit of surgery. However, I can see advantages to moving
> directory data to the same locking strategy as regular file data -
> locking heirarchies are identical, directory ilock hold times are
> much reduced, we don't get lockdep whining about taking page faults
> with the ilock held, etc.
> 
> A quick hack at to demonstrate the high level, initial step of using
> the IOLOCK for readdir serialisation. I've done a little smoke
> testing on it, so it won't die immediately. It should get rid of all
> the nasty lockdep issues, but it doesn't start to address the deeper
> restructing that is needed.
> 
> Thoughts, comments? I'm especially interested in what SGI people
> have to say here because this locking change was needed for CXFS,
> and changing the directory locking iheirarchy is probably going to
> upset CXFS a lot...

Protecting directory contents with the iolock and pushing the ilock down
to protect the directory block mapping functions seems like a good
strategy to me.  In this area I think it's good to treat directories the
same as regular files.  I have not had a very close look yet... and
couldn't find where we take a fault with irqs disabled.  Maybe something
out of a workqueue?

Thanks,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2014-03-10 20:52 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-09  2:58 Multi-CPU harmless lockdep on x86 while copying data Michael L. Semon
2014-03-10  2:55 ` Dave Chinner
2014-03-10 10:37   ` Christoph Hellwig
2014-03-10 11:12     ` Christoph Hellwig
2014-03-10 20:51       ` Dave Chinner
2014-03-11 16:48         ` Christoph Hellwig
2014-03-10 20:46     ` Dave Chinner
2014-03-10 21:16       ` Ben Myers
2014-03-10 21:24         ` Dave Chinner
2014-03-10 22:10           ` Ben Myers
2014-03-10 20:52   ` Ben Myers [this message]
2014-03-10 21:20     ` Dave Chinner
2014-03-10 21:30       ` Ben Myers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140310205249.GS1935@sgi.com \
    --to=bpm@sgi.com \
    --cc=david@fromorbit.com \
    --cc=mlsemon35@gmail.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.