All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Myers <bpm@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	"Michael L. Semon" <mlsemon35@gmail.com>,
	xfs-oss <xfs@oss.sgi.com>
Subject: Re: Multi-CPU harmless lockdep on x86 while copying data
Date: Mon, 10 Mar 2014 17:10:02 -0500	[thread overview]
Message-ID: <20140310221002.GE26064@sgi.com> (raw)
In-Reply-To: <20140310212430.GB6851@dastard>

On Tue, Mar 11, 2014 at 08:24:30AM +1100, Dave Chinner wrote:
> On Mon, Mar 10, 2014 at 04:16:58PM -0500, Ben Myers wrote:
> > Hi,
> > 
> > On Tue, Mar 11, 2014 at 07:46:47AM +1100, Dave Chinner wrote:
> > > On Mon, Mar 10, 2014 at 03:37:16AM -0700, Christoph Hellwig wrote:
> > > > On Mon, Mar 10, 2014 at 01:55:23PM +1100, Dave Chinner wrote:
> > > > > Changing the directory code to handle this sort of locking is going
> > > > > to require a bit of surgery. However, I can see advantages to moving
> > > > > directory data to the same locking strategy as regular file data -
> > > > > locking heirarchies are identical, directory ilock hold times are
> > > > > much reduced, we don't get lockdep whining about taking page faults
> > > > > with the ilock held, etc.
> > > > > 
> > > > > A quick hack at to demonstrate the high level, initial step of using
> > > > > the IOLOCK for readdir serialisation. I've done a little smoke
> > > > > testing on it, so it won't die immediately. It should get rid of all
> > > > > the nasty lockdep issues, but it doesn't start to address the deeper
> > > > > restructing that is needed.
> > > > 
> > > > What synchronization do we actually need from the iolock?  Pushing the
> > > > ilock down to where it's actually needed is a good idea either way,
> > > > though.
> > > 
> > > The issue is that if we push the ilock down to the just the block
> > > mapping routines, the directory can be modified while the readdir is
> > > in progress. That's the root problem that adding the ilock solved.
> > > Now, just pushing the ilock down to protect the bmbt lookups might
> > > result in a consistent lookup, but it won't serialise sanely against
> > > modifications.
> > > 
> > > i.e. readdir only walks one dir block at a time but
> > > it maps multiple blocks for readahead and keeps them in a local
> > > array and doesn't validate them again before issuing read o nthose
> > > buffers. Hence at a high level we currently have to serialise
> > > readdir against all directory modifications.
> > > 
> > > The only other option we might have is to completely rewrite the
> > > directory readahead code not to cache mappings. If we use the ilock
> > > purely for bmbt lookup and buffer read, then the ilock will
> > > serialise against modification, and the buffer lock will stabilise
> > > the buffer until the readdir moves to the next buffer and picks the
> > > ilock up again to read it.
> > > 
> > > That would avoid the need for high level serialisation, but it's a
> > > lot more work than using the iolock to provide the high level
> > > serialisation and i'm still not sure it's 100% safe. And I've got no
> > > idea if it would work for CXFS. Hopefully someone from SGI will
> > > chime in here....
> > 
> > Also in leaf and node formats a single modification can change multiple
> > buffers, so I suspect the buffer lock isn't enough serialization to maintain a
> > consistent directory in the face of multiple readers and writers.  The iolock
> > does resolve that issue.
> 
> Right, but we don't care about anything other than the leaf block
> that we are currently reading is consistent when the read starts and
> is consistent across the entire processing. i.e. if the leaf is locked by
> readdir, then the modification is completely stalled until the
> readdir lets it go. And readdir then can't get the next buffer until
> the modification is complete because it blocks on the ilock to get
> the next mapping and buffer....

As long as [you pointed out above] the readahead buffers aren't cached, and all
of the callers who do require that data/freeindex/node/leaf blocks be
consistent continue to take the ilock...  Yeah, I think that might work.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2014-03-10 22:10 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-09  2:58 Multi-CPU harmless lockdep on x86 while copying data Michael L. Semon
2014-03-10  2:55 ` Dave Chinner
2014-03-10 10:37   ` Christoph Hellwig
2014-03-10 11:12     ` Christoph Hellwig
2014-03-10 20:51       ` Dave Chinner
2014-03-11 16:48         ` Christoph Hellwig
2014-03-10 20:46     ` Dave Chinner
2014-03-10 21:16       ` Ben Myers
2014-03-10 21:24         ` Dave Chinner
2014-03-10 22:10           ` Ben Myers [this message]
2014-03-10 20:52   ` Ben Myers
2014-03-10 21:20     ` Dave Chinner
2014-03-10 21:30       ` Ben Myers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140310221002.GE26064@sgi.com \
    --to=bpm@sgi.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=mlsemon35@gmail.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.