All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Lachlan McIlroy <lachlan@sgi.com>
Cc: xfs@oss.sgi.com
Subject: Re: Filesystem corruption writing out unlinked inodes
Date: Tue, 2 Sep 2008 16:21:55 +1000	[thread overview]
Message-ID: <20080902062155.GE15962@disturbed> (raw)
In-Reply-To: <48BCD622.1080406@sgi.com>

On Tue, Sep 02, 2008 at 03:58:58PM +1000, Lachlan McIlroy wrote:
> Dave Chinner wrote:
>> On Tue, Sep 02, 2008 at 02:48:49PM +1000, Lachlan McIlroy wrote:
>> This is supposed to catch all the inodes in memory and mark them
>> XFS_ISTALE to prevent them from being written back once the
>> transaction is committed. The question is - how are dirty inodes
>> slipping through this?
>>
>> If we are freeing the cluster buffer, then there can be no other
>> active references to any of the inodes, so if they are dirty it
>> has to be due to inactivation transactions and so should be in
>> the log and attached to the buffer due to removal from the
>> unlinked list.
>>
>> The question is - which bit of this is not working? i.e. what is the
>> race condition that is allowing dirty inodes to slip through the
>> locking here?
>>
>> Hmmm - I see that xfs_iflush() doesn't check for XFS_ISTALE when
>> flushing out inodes. Perhaps you could check to see if we are
>> writing an inode marked as such.....
>
> That's what I was suggesting. 

I'm not suggesting that as a fix. I'm suggesting that you determine
whether the inode being flushed has that flag set or not. If it is
not set, then we need to determine how it slipped through
xfs_ifree_cluster() without being marked XFS_ISTALE, and if it is
set, why it was not marked clean by xfs_istale_done() when the
buffer callbacks are made and the flush lock dropped....

> I'm just not sure about the assumption
> that if the flush lock cannot be acquired in xfs_ifree_cluster() then
> the inode must be in the process of being flushed. The flush could
> be aborted due to the inode being pinned or some other case and the
> inode never gets marked as stale.

Did that happen?

Basically I'm asking what the sequence of events is that leads up
to this problem - we need to identify the actual race condition
before speculating on potential fixes....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2008-09-02  6:20 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-02  4:48 Filesystem corruption writing out unlinked inodes Lachlan McIlroy
2008-09-02  5:15 ` Dave Chinner
2008-09-02  5:58   ` Lachlan McIlroy
2008-09-02  6:21     ` Dave Chinner [this message]
2008-09-04  1:03       ` Lachlan McIlroy
2008-09-04  9:08         ` Dave Chinner
2008-09-05  6:23           ` Lachlan McIlroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080902062155.GE15962@disturbed \
    --to=david@fromorbit.com \
    --cc=lachlan@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.