public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: xfs@oss.sgi.com
Subject: [PATCH 08/11] xfs: fix broken icreate log item cancellation
Date: Thu,  6 Aug 2015 13:44:29 -0400	[thread overview]
Message-ID: <1438883072-28706-9-git-send-email-bfoster@redhat.com> (raw)
In-Reply-To: <1438883072-28706-1-git-send-email-bfoster@redhat.com>

Inode cluster buffers are invalidated and cancelled when inode chunks
are freed to notify log recovery that previous logged updates to the
metadata buffer should be skipped. This ensures that log recovery does
not overwrite buffers that might have already been reused.

On v4 filesystems, inode chunk allocation and inode updates are logged
via the cluster buffers and thus cancellation is easily detected via
buffer cancellation items. v5 filesystems use the new icreate
transaction, which uses logical logging and ordered buffers to log a
full inode chunk allocation at once. The resulting icreate item often
spans multiple inode cluster buffers.

Log recovery checks for cancelled buffers when processing icreate log
items, but it has a couple problems. First, it uses the full length of
the inode chunk rather than the cluster size. Second, it uses the length
in FSB units rather than BB units. Either of these problems prevent
icreate recovery from identifying cancelled buffers and thus inode
initialization proceeds unconditionally.

Update xlog_recover_do_icreate_pass2() to iterate the icreate range in
cluster sized increments and check each increment for cancellation.
Since icreate is currently only used for the minimum atomic inode chunk
allocation, we expect that either all or none of the buffers will be
cancelled. Cancel the icreate if at least one buffer is cancelled to
avoid making a bad situation worse by initializing a partial inode
chunk, but detect such anomalies and warn the user.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_log_recover.c | 49 ++++++++++++++++++++++++++++++++++++------------
 1 file changed, 37 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 76248bf..62a9b57 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3037,6 +3037,11 @@ xlog_recover_do_icreate_pass2(
 	unsigned int		count;
 	unsigned int		isize;
 	xfs_agblock_t		length;
+	int			blks_per_cluster;
+	int			bb_per_cluster;
+	int			cancel_count;
+	int			nbufs;
+	int			i;
 
 	icl = (struct xfs_icreate_log *)item->ri_buf[0].i_addr;
 	if (icl->icl_type != XFS_LI_ICREATE) {
@@ -3095,25 +3100,45 @@ xlog_recover_do_icreate_pass2(
 	}
 
 	/*
-	 * Inode buffers can be freed. Do not replay the inode initialisation as
-	 * we could be overwriting something written after this inode buffer was
-	 * cancelled.
+	 * The icreate transaction can cover multiple cluster buffers and these
+	 * buffers could have been freed and reused. Check the individual
+	 * buffers for cancellation so we don't overwrite anything written after
+	 * a cancellation.
+	 */
+	blks_per_cluster = xfs_icluster_size_fsb(mp);
+	bb_per_cluster = XFS_FSB_TO_BB(mp, blks_per_cluster);
+	nbufs = length / blks_per_cluster;
+	for (i = 0, cancel_count = 0; i < nbufs; i++) {
+		xfs_daddr_t	daddr;
+
+		daddr = XFS_AGB_TO_DADDR(mp, agno,
+					 agbno + i * blks_per_cluster);
+		if (xlog_check_buffer_cancelled(log, daddr, bb_per_cluster, 0))
+			cancel_count++;
+	}
+
+	/*
+	 * We currently only use icreate for a single allocation at a time. This
+	 * means we should expect either all or none of the buffers to be
+	 * cancelled. Be conservative and skip replay if at least one buffer is
+	 * cancelled, but warn the user that something is awry if the buffers
+	 * are not consistent.
 	 *
-	 * XXX: we need to iterate all buffers and only init those that are not
-	 * cancelled. I think that a more fine grained factoring of
-	 * xfs_ialloc_inode_init may be appropriate here to enable this to be
-	 * done easily.
+	 * XXX: This must be refined to only skip cancelled clusters once we use
+	 * icreate for multiple chunk allocations.
 	 */
-	if (xlog_check_buffer_cancelled(log,
-			XFS_AGB_TO_DADDR(mp, agno, agbno), length, 0)) {
+	ASSERT(!cancel_count || cancel_count == nbufs);
+	if (cancel_count) {
+		if (cancel_count != nbufs)
+			xfs_warn(mp,
+	"WARNING: partial inode chunk cancellation, skipped icreate.");
 		trace_xfs_log_recover_icreate_cancel(log, icl);
 		return 0;
 	}
 
 	trace_xfs_log_recover_icreate_recover(log, icl);
-	xfs_ialloc_inode_init(mp, NULL, buffer_list, count, agno, agbno, length,
-			      be32_to_cpu(icl->icl_gen));
-	return 0;
+	return xfs_ialloc_inode_init(mp, NULL, buffer_list, count, agno, agbno,
+				     length, be32_to_cpu(icl->icl_gen));
 }
 
 STATIC void
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2015-08-06 17:44 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-06 17:44 [PATCH 00/11] xfs: miscellaneous logging, recovery, umount fixes Brian Foster
2015-08-06 17:44 ` [PATCH 01/11] xfs: disentagle EFI release from the extent count Brian Foster
2015-08-09  7:36   ` Christoph Hellwig
2015-08-10 12:37     ` Brian Foster
2015-08-12  7:15       ` Christoph Hellwig
2015-08-12 12:47         ` Brian Foster
2015-08-06 17:44 ` [PATCH 02/11] xfs: return committed status from xfs_trans_roll() Brian Foster
2015-08-09  7:37   ` Christoph Hellwig
2015-08-06 17:44 ` [PATCH 03/11] xfs: fix efi/efd error handling to avoid fs shutdown hangs Brian Foster
2015-08-07  0:41   ` Dave Chinner
2015-08-07 12:09     ` Brian Foster
2015-08-07 22:45       ` Dave Chinner
2015-08-09  7:46   ` Christoph Hellwig
2015-08-06 17:44 ` [PATCH 04/11] xfs: ensure EFD trans aborts on log recovery extent free failure Brian Foster
2015-08-09  7:53   ` Christoph Hellwig
2015-08-10 12:38     ` Brian Foster
2015-08-10 13:39       ` Brian Foster
2015-08-06 17:44 ` [PATCH 05/11] xfs: use EFI refcount consistently in log recovery Brian Foster
2015-08-07  1:51   ` Dave Chinner
2015-08-07 12:09     ` Brian Foster
2015-08-09  7:56   ` Christoph Hellwig
2015-08-10 12:37     ` Brian Foster
2015-08-06 17:44 ` [PATCH 06/11] xfs: don't leave EFIs on AIL on mount failure Brian Foster
2015-08-07  2:23   ` Dave Chinner
2015-08-07 12:10     ` Brian Foster
2015-08-09  8:01   ` Christoph Hellwig
2015-08-10 12:38     ` Brian Foster
2015-08-06 17:44 ` [PATCH 07/11] xfs: icreate log item recovery and cancellation tracepoints Brian Foster
2015-08-09  8:02   ` Christoph Hellwig
2015-08-06 17:44 ` Brian Foster [this message]
2015-08-06 17:44 ` [PATCH 09/11] xfs: checksum log record ext headers based on record size Brian Foster
2015-08-06 17:44 ` [PATCH 10/11] xfs: clean up root inode properly on mount failure Brian Foster
2015-08-09  8:03   ` Christoph Hellwig
2015-08-06 17:44 ` [PATCH 11/11] xfs: fix btree cursor error cleanups Brian Foster
2015-08-09  8:03   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1438883072-28706-9-git-send-email-bfoster@redhat.com \
    --to=bfoster@redhat.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox