linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Chinner <dgc@sgi.com>
To: Andrew Clayton <andrew@digital-domain.net>
Cc: David Chinner <dgc@sgi.com>,
	linux-fsdevel@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: XFS regression?
Date: Fri, 12 Oct 2007 07:53:53 +1000	[thread overview]
Message-ID: <20071011215352.GX995458@sgi.com> (raw)
In-Reply-To: <20071011151512.69f19419@zeus.pccl.info>

On Thu, Oct 11, 2007 at 03:15:12PM +0100, Andrew Clayton wrote:
> On Thu, 11 Oct 2007 11:01:39 +1000, David Chinner wrote:
> 
> > So it's almost certainly pointing at an elevator or driver change, not an
> > XFS change.
> 
> heh, git bisect begs to differ :)
> 
> 4c60658e0f4e253cf275f12b7c76bf128515a774 is first bad commit commit
> 4c60658e0f4e253cf275f12b7c76bf128515a774 Author: David Chinner <dgc@sgi.com>
> Date:   Sat Nov 11 18:05:00 2006 +1100
> 
>     [XFS] Prevent a deadlock when xfslogd unpins inodes.

Oh, of course - I failed to notice the significance of
this loop in your test:

	while [foo]; do
		touch fred
		rm fred
	done

The inode allocator keeps reusing the same inode.  If the
transaction that did the unlink has not hit the disk before we
allocate the inode again, we have to force the log to get the unlink
transaction to disk to get the xfs inode unpinned (i.e. able to be
modified in memory again).

It's the log force I/O that's introducing the latency.

If we don't force the log, then we have a possible use-after free
of the linux inode because of a fundamental mismatch between
the XFS inode life cycle and the linux inode life cycle. The
use-after free only occurs on large machines under heavy, heavy
metadata load to many disks and filesystems (requires enough
traffic to overload an xfslogd) and is very difficult to
reproduce (large machine, lots of disks and 20-30 hours MTTF).

I'll have a look at other ways to solve this problem, but it
took 6 months to find a solution to the race in the first place
so don't hold your breath.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

  reply	other threads:[~2007-10-11 21:54 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-10 14:27 XFS regression? Andrew Clayton
2007-10-11  1:01 ` David Chinner
2007-10-11  9:05   ` Andrew Clayton
2007-10-11 14:15   ` Andrew Clayton
2007-10-11 21:53     ` David Chinner [this message]
2007-10-12  0:26       ` David Chinner
2007-10-12 11:36         ` Andrew Clayton
2007-10-12 13:28           ` Andrew Clayton
     [not found]           ` <cc7060690710130635u2a85bc28we36b344c0987b691@mail.gmail.com>
2007-10-14 23:09             ` David Chinner
2007-10-15  9:58               ` Bhagi rathi
2007-10-15 11:57                 ` David Chinner
2007-10-14 23:19           ` David Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071011215352.GX995458@sgi.com \
    --to=dgc@sgi.com \
    --cc=andrew@digital-domain.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).