From: Dave Chinner <david@fromorbit.com>
To: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH] xfs: di_flushiter considered harmful
Date: Tue, 23 Jul 2013 14:49:59 +1000 [thread overview]
Message-ID: <20130723044959.GH19986@dastard> (raw)
In-Reply-To: <20130723012827.GA360@x4>
On Tue, Jul 23, 2013 at 03:28:27AM +0200, Markus Trippelsdorf wrote:
> On 2013.07.23 at 08:56 +1000, Dave Chinner wrote:
> > On Mon, Jul 22, 2013 at 01:07:32PM +0200, Markus Trippelsdorf wrote:
> > > On 2013.07.22 at 20:18 +1000, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > >
> > > > When we made all inode updates transactional, we no longer needed
> > > > the log recovery detection for inodes being newer on disk than the
> > > > transaction being replayed - it was redundant as replay of the log
> > > > would always result in the latest version of the inode woul dbe on
> > > > disk. It was redundant, but left in place because it wasn't
> > > > considered to be a problem.
> > > >
> > > > However, with the new "don't read inodes on create" optimisation,
> > > > flushiter has come back to bite us. Essentially, the optimisation
> > > > made always initialises flushiter to zero in the create transaction,
> > > > and so if we then crash and run recovery and the inode already on
> > > > disk has a non-zero flushiter it will skip recovery of that inode.
> > > > As a result, log recovery does the wrong thing and we end up with a
> > > > corrupt filesystem.
> > > >
> > > > Because we have to support old kernel to new kernl upgrades, we
> > > > can't just get rid of the flushiter support in log recovery as we
> > > > might be upgrading from a kernel that doesn't have fully transaction
> > > > inode updates. Unfortunately, for v4 superblocks there is no way to
> > > > guarantee that log recovery knows about this fact.
> > > >
> > > > We cannot add a new inode format flag to say it's a "special inode
> > > > create" because it won't be understood by older kernels and so
> > > > recovery could do the wrong thing on downgrade. We cannot specially
> > > > detect the combination of zero mode/non-zero flushiter on disk to
> > > > non-zero mode, zero flushiter in the log item during recovery
> > > > because wrapping of the flushiter can result in false detection.
> > > >
> > > > Hence that makes this "don't use flushiter" optimisation limited to
> > > > a disk format that guarantees that we don't need it. And that means
> > > > the only fix here is to limit the "no read IO on create"
> > > > optimisation to version 5 superblocks....
> > >
> > > I think your patch misses the following part:
> > >
> > > @@ -1054,17 +1056,15 @@ xfs_iread(
> > >
> > > /* shortcut IO on inode allocation if possible */
> > > if ((iget_flags & XFS_IGET_CREATE) &&
> > > - !(mp->m_flags & XFS_MOUNT_IKEEP)) {
> > > + !(mp->m_flags & XFS_MOUNT_IKEEP) &&
> > > + xfs_sb_version_hascrc(&mp->m_sb)) {
> > > /* initialise the on-disk inode core */
> > > memset(&ip->i_d, 0, sizeof(ip->i_d));
> > > ip->i_d.di_magic = XFS_DINODE_MAGIC;
> > > ip->i_d.di_gen = prandom_u32();
> > > - if (xfs_sb_version_hascrc(&mp->m_sb)) {
> > > - ip->i_d.di_version = 3;
> > > - ip->i_d.di_ino = ip->i_ino;
> > > - uuid_copy(&ip->i_d.di_uuid, &mp->m_sb.sb_uuid);
> > > - } else
> > > - ip->i_d.di_version = 2;
> > > + ip->i_d.di_version = 3;
> > > + ip->i_d.di_ino = ip->i_ino;
> > > + uuid_copy(&ip->i_d.di_uuid, &mp->m_sb.sb_uuid);
> > > return 0;
> > > }
> >
> > Sure, it's dead code so doesn't affect the behaviour of the patch.
> > I'll update it, but I need you to reproduce the problem in a simple
> > manner as Mark did with this patch in place so I can find out what
> > the real problem you are seeing is....
>
> No. It's not dead code. Please look at the patch that you've posted.
I was looking at the code in my tree. It appears that what I sent
out is an incomplete version - the patch in my tree up to date
and has a xfs_sb_version_hascrc() check around this entire set of
code. I guess I missed a 'guilt refresh'...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2013-07-23 4:50 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-22 10:18 [PATCH] xfs: di_flushiter considered harmful Dave Chinner
2013-07-22 11:07 ` Markus Trippelsdorf
2013-07-22 14:40 ` Mark Tinguely
2013-07-22 15:15 ` Markus Trippelsdorf
2013-07-22 16:37 ` Mark Tinguely
2013-07-22 19:48 ` Mark Tinguely
2013-07-23 10:42 ` [PATCH v2] " Markus Trippelsdorf
2013-07-23 15:07 ` Ben Myers
2013-07-23 15:56 ` Markus Trippelsdorf
2013-07-22 22:56 ` [PATCH] " Dave Chinner
2013-07-23 1:28 ` Markus Trippelsdorf
2013-07-23 4:49 ` Dave Chinner [this message]
-- strict thread matches above, loose matches on Subject: below --
2013-07-24 5:47 Dave Chinner
2013-07-24 17:16 ` Ben Myers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130723044959.GH19986@dastard \
--to=david@fromorbit.com \
--cc=markus@trippelsdorf.de \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.