Re: xfs resize: primary superblock is not updated immediately

From: Dave Chinner <david@fromorbit.com>
To: Alex Lyakas <alex@zadarastorage.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Danny Shavit <danny@zadarastorage.com>,
	xfs@oss.sgi.com
Subject: Re: xfs resize: primary superblock is not updated immediately
Date: Tue, 23 Feb 2016 10:56:28 +1100	[thread overview]
Message-ID: <20160222235628.GK25832@dastard> (raw)
In-Reply-To: <CAOcd+r1XY2kcp+qJ=mPOAQSmb90QUnLfmT3-FkMjQN_+Ejmt8A@mail.gmail.com>

On Tue, Feb 23, 2016 at 12:38:48AM +0200, Alex Lyakas wrote:
> Hi Dave,
> Thanks for your response.
> 
> I am not freezing the filesystem before the snapshot.

There's your problem. A mounted filesystem is not consistent on disk
without flushing the entire journal and all the dirty metadata to
disk.

> However, let's assume that somebody resized the XFS, and it completed
> and got back to user-space. At this moment the primary superblock
> on-disk is not updated yet with the new agcount. And at this same
> moment there is a power-out. After the power comes back and the
> machine boots, if we mount the XFS, the same problem would happen, I
> believe.

Log recovery will run and update the superblock buffer with the correct
values. But the in-memory superblock that log recoery is working
with does not change, and so if there were accesses beyond the
current superblock ag/block count you'd see messages like this:

XFS (sda1): _xfs_buf_find: Block out of range: block 0xnnnnn EOFS 0xmmmmm

and log recovery should fail at that point because it can't pull in
a buffer it needs for recovery to make further progress. At which
point, you have an unmountable filesystem.

If log recovery succeeds, then yes, I can see that there is a
problem here because the per-ag tree is not reinitialised after the
superblock is re-read. That's a pretty easy fix, though (3-4 lines
of code in xlog_do_recover() to detect a change in filesystem block
count and call xfs_initialize_perag() again.

> Taking a block-level snapshot is exactly like a power-out from XFS
> perspective.

It's similar, but it's not the same. e.g. there are no issues like
volatile storage cache loss that have to be handled.

> And XFS should, in principle, be able to recover from
> that.

For some definition of recover. There is no guarantee that any of
the async transactions in memory will make it to disk, so the point
to which XFS can recover is undefined.

> The snapshot will come up as a new block device, which exhibits
> identical content as the original block device had at the moment when
> the snapshot was taken (like a boot after power-out).

The block device might be identical, but it's not identical to what
the filesystem is presenting the user. Any user dirty data cached in
memory, or metadata changes staged in the CIL will not be in the
snapshot. Hence the snapshot block device is not identical to the
original user visible state and data. You only get that if you
freeze the filesystem before taking the snapshot.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs