public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: [PATCH 50/55] xfs: di_flushiter considered harmful
Date: Thu,  5 Sep 2013 08:05:54 +1000	[thread overview]
Message-ID: <1378332359-14737-51-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1378332359-14737-1-git-send-email-david@fromorbit.com>

From: Dave Chinner <dchinner@redhat.com>

When we made all inode updates transactional, we no longer needed
the log recovery detection for inodes being newer on disk than the
transaction being replayed - it was redundant as replay of the log
would always result in the latest version of the inode woul dbe on
disk. It was redundant, but left in place because it wasn't
considered to be a problem.

However, with the new "don't read inodes on create" optimisation,
flushiter has come back to bite us. Essentially, the optimisation
made always initialises flushiter to zero in the create transaction,
and so if we then crash and run recovery and the inode already on
disk has a non-zero flushiter it will skip recovery of that inode.
As a result, log recovery does the wrong thing and we end up with a
corrupt filesystem.

Because we have to support old kernel to new kernl upgrades, we
can't just get rid of the flushiter support in log recovery as we
might be upgrading from a kernel that doesn't have fully transaction
inode updates.  Unfortunately, for v4 superblocks there is no way to
guarantee that log recovery knows about this fact.

We cannot add a new inode format flag to say it's a "special inode
create" because it won't be understood by older kernels and so
recovery could do the wrong thing on downgrade. We cannot specially
detect the combination of zero mode/non-zero flushiter on disk to
non-zero mode, zero flushiter in the log item during recovery
because wrapping of the flushiter can result in false detection.

Hence that makes this "don't use flushiter" optimisation limited to
a disk format that guarantees that we don't need it. And that means
the only fix here is to limit the "no read IO on create"
optimisation to version 5 superblocks....

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 include/xfs_dinode.h   | 3 +++
 libxfs/xfs_inode_buf.c | 4 +++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/xfs_dinode.h b/include/xfs_dinode.h
index 07d735a..e5869b5 100644
--- a/include/xfs_dinode.h
+++ b/include/xfs_dinode.h
@@ -39,6 +39,9 @@ typedef struct xfs_timestamp {
  * There is a very similar struct icdinode in xfs_inode which matches the
  * layout of the first 96 bytes of this structure, but is kept in native
  * format instead of big endian.
+ *
+ * Note: di_flushiter is only used by v1/2 inodes - it's effectively a zeroed
+ * padding field for v3 inodes.
  */
 typedef struct xfs_dinode {
 	__be16		di_magic;	/* inode magic # = XFS_DINODE_MAGIC */
diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index 3c811f5..6205318 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -213,7 +213,6 @@ xfs_dinode_to_disk(
 	to->di_projid_lo = cpu_to_be16(from->di_projid_lo);
 	to->di_projid_hi = cpu_to_be16(from->di_projid_hi);
 	memcpy(to->di_pad, from->di_pad, sizeof(to->di_pad));
-	to->di_flushiter = cpu_to_be16(from->di_flushiter);
 	to->di_atime.t_sec = cpu_to_be32(from->di_atime.t_sec);
 	to->di_atime.t_nsec = cpu_to_be32(from->di_atime.t_nsec);
 	to->di_mtime.t_sec = cpu_to_be32(from->di_mtime.t_sec);
@@ -241,6 +240,9 @@ xfs_dinode_to_disk(
 		to->di_lsn = cpu_to_be64(from->di_lsn);
 		memcpy(to->di_pad2, from->di_pad2, sizeof(to->di_pad2));
 		platform_uuid_copy(&to->di_uuid, &from->di_uuid);
+		to->di_flushiter = 0;
+	} else {
+		to->di_flushiter = cpu_to_be16(from->di_flushiter);
 	}
 }
 
-- 
1.8.3.2

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2013-09-04 22:21 UTC|newest]

Thread overview: 136+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-04 22:05 [PATCH 00/55] xfsprogs: bring code up to date with kernel Dave Chinner
2013-09-04 22:05 ` [PATCH 01/55] xfsprogs: introduce xfs_icreate.h Dave Chinner
2013-09-05 14:35   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 02/55] xfsprogs: port inode create transaction changes Dave Chinner
2013-09-05 15:25   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 03/55] xfsprogs: teach logprint about icreate transaction Dave Chinner
2013-09-05 15:29   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 04/55] libxfs: fix directory/attribute format issues Dave Chinner
2013-09-05 15:36   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 05/55] libxfs: ensure btree root split sets blkno correctly Dave Chinner
2013-09-05 15:37   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 06/55] libxfs: fix byte swapping on constants Dave Chinner
2013-09-05 15:41   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 07/55] libxfs: sync xfs_da_btree.c Dave Chinner
2013-09-05 15:46   ` Mark Tinguely
2013-09-05 15:52   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 08/55] libxfs: update xfs_alloc to current kernel version Dave Chinner
2013-09-05 15:53   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 09/55] libxfs: sync attr code with kernel Dave Chinner
2013-09-05 15:59   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 10/55] libxfs: sync dir2 kernel differences Dave Chinner
2013-09-05 16:32   ` Mark Tinguely
2013-09-17 15:14   ` Eric Sandeen
2013-09-18  3:36     ` Dave Chinner
2013-09-18  5:03       ` Dave Chinner
2013-09-04 22:05 ` [PATCH 11/55] libxfs: sync xfs_ialloc.c to the kernel code Dave Chinner
2013-09-05 18:46   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 12/55] xfsprogs: define min/max once and use them everywhere Dave Chinner
2013-09-05 18:47   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 13/55] libxfs: fix compile warnings Dave Chinner
2013-09-05 18:50   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 14/55] xfs: remove local fork format handling from xfs_bmapi_write() Dave Chinner
2013-09-05 18:55   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 15/55] libxfs: local to remote format support of remote symlinks Dave Chinner
2013-09-05 18:56   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 16/55] xfs: separate out log format definitions Dave Chinner
2013-09-05 19:01   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 17/55] xfs: split out inode log item format definition Dave Chinner
2013-09-05 19:14   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 18/55] xfs: split out buf log item format definitions Dave Chinner
2013-09-05 19:18   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 19/55] xfs: split out inode log item format definition Dave Chinner
2013-09-05 19:20   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 20/55] xfs: separate dquot on disk format definitions out of xfs_quota.h Dave Chinner
2013-09-05 19:33   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 21/55] xfs: separate icreate log format definitions from xfs_icreate_item.h Dave Chinner
2013-09-05 21:12   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 22/55] xfs: split out on-disk transaction definitions Dave Chinner
2013-09-05 21:21   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 23/55] xfs: introduce xfs_rtalloc_defs.h Dave Chinner
2013-09-05 21:26   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 24/55] xfs: introduce xfs_quota_defs.h Dave Chinner
2013-09-05 21:29   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 25/55] libxfs: introduce xfs_trans_resv.c Dave Chinner
2013-09-05 21:45   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 26/55] libxfs: move transaction code to trans.c Dave Chinner
2013-09-05 21:51   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 27/55] xfs: move inode fork definitions to a new header file Dave Chinner
2013-09-05 21:55   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 28/55] xfs: move unrealted definitions out of xfs_inode.h Dave Chinner
2013-09-05 22:05   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 29/55] xfs: introduce xfs_inode_buf.c for inode buffer operations Dave Chinner
2013-09-05 22:27   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 30/55] xfs: split out the remote symlink handling Dave Chinner
2013-09-06 15:13   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 31/55] libxfs: switch over to xfs_sb.c and remove xfs_mount.c Dave Chinner
2013-09-06 18:15   ` Mark Tinguely
2013-09-06 21:40     ` Dave Chinner
2013-09-06 21:43       ` Mark Tinguely
2013-09-10  1:02         ` [PATCH 31/55 V2] " Dave Chinner
2013-09-10 14:11           ` Mark Tinguely
2013-09-10 21:32             ` [PATCH 31/55 V3] " Dave Chinner
2013-09-11 13:25               ` Mark Tinguely
2013-09-11 14:24                 ` Mark Tinguely
2013-09-13 14:17                   ` Mark Tinguely
2013-09-15  3:27                   ` Dave Chinner
2013-09-11 15:11                 ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 32/55] xfs: create xfs_bmap_util.[ch] Dave Chinner
2013-09-06 15:30   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 33/55] xfsprogs: sync minor kernel header differences Dave Chinner
2013-09-06 15:44   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 34/55] xfs: don't special case shared superblock mounts Dave Chinner
2013-09-06 15:48   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 35/55] xfs: move swap extent code to xfs_extent_ops Dave Chinner
2013-09-06 17:13   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 36/55] xfs: kill __KERNEL__ check for debug code in allocation code Dave Chinner
2013-09-06 17:20   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 37/55] xfs: remove __KERNEL__ from debug code Dave Chinner
2013-09-06 17:28   ` Mark Tinguely
2013-09-06 21:41     ` Dave Chinner
2013-09-06 21:42       ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 38/55] xfs: remove __KERNEL__ check from xfs_dir2_leaf.c Dave Chinner
2013-09-06 17:29   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 39/55] xfs: move kernel specific type definitions to xfs.h Dave Chinner
2013-09-06 17:31   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 40/55] xfs: make struct xfs_perag kernel only Dave Chinner
2013-09-06 18:06   ` Mark Tinguely
2013-09-06 21:50     ` Dave Chinner
2013-09-04 22:05 ` [PATCH 41/55] xfs: Introduce a new structure to hold transaction reservation items Dave Chinner
2013-09-06 18:20   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 42/55] xfs: Introduce tr_fsyncts to m_reservation Dave Chinner
2013-09-06 18:20   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 43/55] xfs: Make writeid transaction use tr_writeid Dave Chinner
2013-09-06 18:21   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 44/55] xfs: refactor xfs_trans_reserve() interface Dave Chinner
2013-09-06 18:21   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 45/55] xfs: Get rid of all XFS_XXX_LOG_RES() macro Dave Chinner
2013-09-06 18:22   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 46/55] xfs: Add xfs_log_rlimit.c Dave Chinner
2013-09-06 18:22   ` Mark Tinguely
2013-10-06 17:56   ` Eric Sandeen
2013-10-07  1:46     ` Eric Sandeen
2013-10-07 13:48       ` Mark Tinguely
2014-02-21 19:47         ` Eric Sandeen
2014-02-21 20:40           ` Mark Tinguely
2014-02-21 20:56             ` Eric Sandeen
2014-02-21 21:46               ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 47/55] xfs: Add read-only support for dirent filetype field Dave Chinner
2013-09-06 18:23   ` Mark Tinguely
2013-09-10 21:34   ` [PATCH 47/55 V2] " Dave Chinner
2013-09-04 22:05 ` [PATCH 48/55] xfs: Add write " Dave Chinner
2013-09-06 18:24   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 49/55] xfsprogs: add dtype support to mkfs and db Dave Chinner
2013-09-06 18:25   ` Mark Tinguely
2013-09-04 22:05 ` Dave Chinner [this message]
2013-09-06 18:43   ` [PATCH 50/55] xfs: di_flushiter considered harmful Mark Tinguely
2013-09-04 22:05 ` [PATCH 51/55] xfs: fix calculation of the number of node entries in a dir3 node Dave Chinner
2013-09-06 18:53   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 52/55] xfs: btree block LSN escaping to disk uninitialised Dave Chinner
2013-09-06 18:54   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 53/55] xfs: inode log reservations are too small Dave Chinner
2013-09-06 18:58   ` Mark Tinguely
2013-09-04 22:05 ` [PATCH 54/55] repair: fix segv on directory block read failure Dave Chinner
2013-09-04 23:33   ` Eric Sandeen
2013-09-04 22:05 ` [PATCH 55/55] xfsprogs: cleanup miscellaneous merge faults Dave Chinner
2013-09-06 19:03   ` Mark Tinguely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1378332359-14737-51-git-send-email-david@fromorbit.com \
    --to=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox