From: Dave Chinner <david@fromorbit.com>
To: Lin Li <sdeber@gmail.com>
Cc: xfs@oss.sgi.com
Subject: Re: XFS write cache flush policy
Date: Mon, 10 Dec 2012 11:45:14 +1100 [thread overview]
Message-ID: <20121210004514.GF15784@dastard> (raw)
In-Reply-To: <CAA_rkDfFUmZzT_kMznsTSNVxdfqfmz=bmJ400wdBOzocgP32eA@mail.gmail.com>
On Thu, Dec 06, 2012 at 09:51:15AM +0100, Lin Li wrote:
> Hi, Guys. I recently suffered a huge data loss on power cut on an XFS
> partition. The problem was that I copied a lot of files (roughly 20Gb) to
> an XFS partition, then 10 hours later, I got an unexpected power cut. As a
> result, all these newly copied files disappeared as if they had never been
> copied. I tried to check and repair the partition, but xfs_check reports no
> error at all. So I guess the problem is that the meta data for these files
> were all kept in the cache (64Mb) and were never committed to the hard
> disk.
This will have absolutely nothing to do with disk cache flush
policy.
It sounds very much like a journal recovery issue where a set of
changes is not recovered by to a problem with the transaction in the
journal. Indeed, I recently fixed a 19 year old bug in the journal
write code that could cause exactly this sort of symptom:
commit d35e88faa3b0fc2cea35c3b2dca358b5cd09b45f
Author: Dave Chinner <dchinner@redhat.com>
Date: Mon Oct 8 21:56:12 2012 +1100
xfs: only update the last_sync_lsn when a transaction completes
The log write code stamps each iclog with the current tail LSN in
the iclog header so that recovery knows where to find the tail of
thelog once it has found the head. Normally this is taken from the
first item on the AIL - the log item that corresponds to the oldest
active item in the log.
The problem is that when the AIL is empty, the tail lsn is dervied
from the the l_last_sync_lsn, which is the LSN of the last iclog to
be written to the log. In most cases this doesn't happen, because
the AIL is rarely empty on an active filesystem. However, when it
does, it opens up an interesting case when the transaction being
committed to the iclog spans multiple iclogs.
That is, the first iclog is stamped with the l_last_sync_lsn, and IO
is issued. Then the next iclog is setup, the changes copied into the
iclog (takes some time), and then the l_last_sync_lsn is stamped
into the header and IO is issued. This is still the same
transaction, so the tail lsn of both iclogs must be the same for log
recovery to find the entire transaction to be able to replay it.
The problem arises in that the iclog buffer IO completion updates
the l_last_sync_lsn with it's own LSN. Therefore, If the first iclog
completes it's IO before the second iclog is filled and has the tail
lsn stamped in it, it will stamp the LSN of the first iclog into
it's tail lsn field. If the system fails at this point, log recovery
will not see a complete transaction, so the transaction will no be
replayed.
The fix is simple - the l_last_sync_lsn is updated when a iclog
buffer IO completes, and this is incorrect. The l_last_sync_lsn
shoul dbe updated when a transaction is completed by a iclog buffer
IO. That is, only iclog buffers that have transaction commit
callbacks attached to them should update the l_last_sync_lsn. This
means that the last_sync_lsn will only move forward when a commit
record it written, not in the middle of a large transaction that is
rolling through multiple iclog buffers.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
This commit only hit 3.7-rc5, but has not been sent to -stable
kernels because I thought it was only exposed by the 3.7 changes.
However, looking at it we've been changing the code that exposed it
since about 3.4, so it's entirely possible that we did expose it
earlier than 3.7-rc1.
Looks like a stable kernel candidate....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
prev parent reply other threads:[~2012-12-10 0:43 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-06 8:51 XFS write cache flush policy Lin Li
2012-12-08 19:29 ` Matthias Schniedermeyer
2012-12-08 19:40 ` Michael Monnerie
2012-12-08 19:51 ` Joe Landman
2012-12-08 19:53 ` Matthias Schniedermeyer
2012-12-09 7:19 ` Lin Li
2012-12-10 1:01 ` Dave Chinner
2012-12-10 20:14 ` Michael Monnerie
2012-12-10 0:58 ` Dave Chinner
2012-12-10 9:12 ` Matthias Schniedermeyer
2012-12-10 20:54 ` Eric Sandeen
2012-12-10 21:45 ` Matthias Schniedermeyer
2012-12-11 0:25 ` Dave Chinner
2012-12-10 0:45 ` Dave Chinner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121210004514.GF15784@dastard \
--to=david@fromorbit.com \
--cc=sdeber@gmail.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox