From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 05/11] xfs: track CoW blocks separately in the inode
Date: Fri, 26 Jan 2018 11:08:41 -0800 [thread overview]
Message-ID: <20180126190841.GA9068@magnolia> (raw)
In-Reply-To: <20180126130429.GB47923@bfoster.bfoster>
On Fri, Jan 26, 2018 at 08:04:29AM -0500, Brian Foster wrote:
> On Thu, Jan 25, 2018 at 11:21:42AM -0800, Darrick J. Wong wrote:
> > On Thu, Jan 25, 2018 at 08:06:45AM -0500, Brian Foster wrote:
> > > On Tue, Jan 23, 2018 at 06:18:29PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > >
> > > > Track the number of blocks reserved in the CoW fork so that we can
> > > > move the quota reservations whenever we chown, and don't account for
> > > > CoW fork delalloc reservations in i_delayed_blks. This should make
> > > > chown work properly for quota reservations, enables us to fully
> > > > account for real extents in the cow fork in the file stat info, and
> > > > improves the post-eof scanning decisions because we're no longer
> > > > confusing data fork delalloc extents with cow fork delalloc extents.
> > > >
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > > fs/xfs/libxfs/xfs_bmap.c | 16 ++++++++++++----
> > > > fs/xfs/libxfs/xfs_inode_buf.c | 1 +
> > > > fs/xfs/xfs_bmap_util.c | 5 +++++
> > > > fs/xfs/xfs_icache.c | 3 ++-
> > > > fs/xfs/xfs_inode.c | 11 +++++------
> > > > fs/xfs/xfs_inode.h | 1 +
> > > > fs/xfs/xfs_iops.c | 3 ++-
> > > > fs/xfs/xfs_itable.c | 3 ++-
> > > > fs/xfs/xfs_qm.c | 2 +-
> > > > fs/xfs/xfs_reflink.c | 4 ++--
> > > > fs/xfs/xfs_super.c | 1 +
> > > > 11 files changed, 34 insertions(+), 16 deletions(-)
> > > >
> > > >
> > > ...
> > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > > index 4a38cfc..a208825 100644
> > > > --- a/fs/xfs/xfs_inode.c
> > > > +++ b/fs/xfs/xfs_inode.c
> > > ...
> > > > @@ -1669,7 +1667,7 @@ xfs_release(
> > > > truncated = xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED);
> > > > if (truncated) {
> > > > xfs_iflags_clear(ip, XFS_IDIRTY_RELEASE);
> > > > - if (ip->i_delayed_blks > 0) {
> > > > + if (ip->i_delayed_blks > 0 || ip->i_cow_blocks > 0) {
> > > > error = filemap_flush(VFS_I(ip)->i_mapping);
> > > > if (error)
> > > > return error;
> > >
> > > Is having cowblocks really relevant to this hunk? I thought this was
> > > purely a delalloc vs. file size thing, but I could be wrong.
> >
> > AFAICT, if we (1) use truncate to reduce a file's size, (2) write
> > somewhere past eof, (3) make some delalloc reservations for the post-eof
> > write, and (4) close the file, then this chunk flushes the dirty data to
> > disk so that if we crash after the close() call returns, the file will
> > still have all the data that was written out. IOWs, this provides for
> > flush-on-close after a file size reduction.
> >
>
> I think it goes back to problems where those subsequent buffered writes
> increase the file size again and the fs crashes before all data is
> written out. E.g., the problem described by commit ba87ea699e ("[XFS]
> Fix to prevent the notorious 'NULL files' problem after a crash."). It's
> not totally clear to me whether that fixed the problem and this
> particular hack is still needed.
Me neither. It looks like deferring the size update until the write
end_io would have closed this bug... but on the other hand maybe its
function is more to avoid disappointing the people who expect flush on
close behavior...
> FWIW, the flush code looks like it goes back to commit 7d4fb40ad7
> ("[XFS] Start writeout earlier (on last close) ...").
>
> > So I was thinking that if a write to a lower offset causes the creation
> > of a speculative cow extent of some kind that extends past eof, we'd
> > still want to flush the dirty data to disk on close even if there are no
> > delalloc reservations in the data fork.
> >
>
> This whole stanza still depends on a truncate in the first place
> though..?
>
> I guess I'm not necessarily against doing this, I just think we should
> verify whether it's actually useful to prevent some kind of similar
> crash-recovery problem it was intended to help mitigate. If not, then
> we're subjecting ourselves to the tradeoff, which appears to be that
> we'll initiate writeback of any file with cowblocks on close that has
> been truncated.
>
> Granted the truncate operation is probably infrequent with respect to
> close() so it's probably not that big of a deal, but in the delalloc
It's probably infrequent wrt cow-and-close, but "echo foo > existingfile"
would trigger this for the regular da case. I don't really mind
dropping it either, aside from my sense of paranoia. :P
> case a flush is at least generally expected to clear the file of delayed
> allocation. It's my understanding that the same is not necessarily true
> for cowblocks.. cow prealloc means blocks can sit around in the cow fork
> for a while in anticipation of future copy-on-writes, right?
Yes.
--D
>
> Brian
>
> > Ofc now I see that xfs_file_iomap_begin_delay will create the data fork
> > da reservation for a non-shared block even if a cow fork extent already
> > exists (the write is promoted to cow), so perhaps this isn't strictly
> > necessary... but adding a data fork da extent when there's already a cow
> > fork extent seems like a (mostly harmless) bug to me.
> >
> > --D
> >
> > >
> > > Brian
> > >
> > > > @@ -1909,7 +1907,8 @@ xfs_inactive(
> > > >
> > > > if (S_ISREG(VFS_I(ip)->i_mode) &&
> > > > (ip->i_d.di_size != 0 || XFS_ISIZE(ip) != 0 ||
> > > > - ip->i_d.di_nextents > 0 || ip->i_delayed_blks > 0))
> > > > + ip->i_d.di_nextents > 0 || ip->i_delayed_blks > 0 ||
> > > > + ip->i_cow_blocks > 0))
> > > > truncate = 1;
> > > >
> > > > error = xfs_qm_dqattach(ip, 0);
> > > > diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> > > > index ff56486..6feee8a 100644
> > > > --- a/fs/xfs/xfs_inode.h
> > > > +++ b/fs/xfs/xfs_inode.h
> > > > @@ -62,6 +62,7 @@ typedef struct xfs_inode {
> > > > /* Miscellaneous state. */
> > > > unsigned long i_flags; /* see defined flags below */
> > > > unsigned int i_delayed_blks; /* count of delay alloc blks */
> > > > + unsigned int i_cow_blocks; /* count of cow fork blocks */
> > > >
> > > > struct xfs_icdinode i_d; /* most of ondisk inode */
> > > >
> > > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > > > index 56475fc..6c3381c 100644
> > > > --- a/fs/xfs/xfs_iops.c
> > > > +++ b/fs/xfs/xfs_iops.c
> > > > @@ -513,7 +513,8 @@ xfs_vn_getattr(
> > > > stat->mtime = inode->i_mtime;
> > > > stat->ctime = inode->i_ctime;
> > > > stat->blocks =
> > > > - XFS_FSB_TO_BB(mp, ip->i_d.di_nblocks + ip->i_delayed_blks);
> > > > + XFS_FSB_TO_BB(mp, ip->i_d.di_nblocks + ip->i_delayed_blks +
> > > > + ip->i_cow_blocks);
> > > >
> > > > if (ip->i_d.di_version == 3) {
> > > > if (request_mask & STATX_BTIME) {
> > > > diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> > > > index d583105..412d7eb 100644
> > > > --- a/fs/xfs/xfs_itable.c
> > > > +++ b/fs/xfs/xfs_itable.c
> > > > @@ -122,7 +122,8 @@ xfs_bulkstat_one_int(
> > > > case XFS_DINODE_FMT_BTREE:
> > > > buf->bs_rdev = 0;
> > > > buf->bs_blksize = mp->m_sb.sb_blocksize;
> > > > - buf->bs_blocks = dic->di_nblocks + ip->i_delayed_blks;
> > > > + buf->bs_blocks = dic->di_nblocks + ip->i_delayed_blks +
> > > > + ip->i_cow_blocks;
> > > > break;
> > > > }
> > > > xfs_iunlock(ip, XFS_ILOCK_SHARED);
> > > > diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> > > > index 5b848f4..28f12f8 100644
> > > > --- a/fs/xfs/xfs_qm.c
> > > > +++ b/fs/xfs/xfs_qm.c
> > > > @@ -1847,7 +1847,7 @@ xfs_qm_vop_chown_reserve(
> > > > ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
> > > > ASSERT(XFS_IS_QUOTA_RUNNING(mp));
> > > >
> > > > - delblks = ip->i_delayed_blks;
> > > > + delblks = ip->i_delayed_blks + ip->i_cow_blocks;
> > > > blkflags = XFS_IS_REALTIME_INODE(ip) ?
> > > > XFS_QMOPT_RES_RTBLKS : XFS_QMOPT_RES_REGBLKS;
> > > >
> > > > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > > > index e367351..f875ea7 100644
> > > > --- a/fs/xfs/xfs_reflink.c
> > > > +++ b/fs/xfs/xfs_reflink.c
> > > > @@ -619,7 +619,7 @@ xfs_reflink_cancel_cow_blocks(
> > > > }
> > > >
> > > > /* clear tag if cow fork is emptied */
> > > > - if (!ifp->if_bytes)
> > > > + if (ip->i_cow_blocks == 0)
> > > > xfs_inode_clear_cowblocks_tag(ip);
> > > >
> > > > return error;
> > > > @@ -704,7 +704,7 @@ xfs_reflink_end_cow(
> > > > trace_xfs_reflink_end_cow(ip, offset, count);
> > > >
> > > > /* No COW extents? That's easy! */
> > > > - if (ifp->if_bytes == 0)
> > > > + if (ip->i_cow_blocks == 0)
> > > > return 0;
> > > >
> > > > offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset);
> > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > > index f3e0001..9d04cfb 100644
> > > > --- a/fs/xfs/xfs_super.c
> > > > +++ b/fs/xfs/xfs_super.c
> > > > @@ -989,6 +989,7 @@ xfs_fs_destroy_inode(
> > > > xfs_inactive(ip);
> > > >
> > > > ASSERT(XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0);
> > > > + ASSERT(XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_cow_blocks == 0);
> > > > XFS_STATS_INC(ip->i_mount, vn_reclaim);
> > > >
> > > > /*
> > > >
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2018-01-26 19:08 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-24 2:17 [PATCH 00/11] xfs: reflink/scrub/quota fixes Darrick J. Wong
2018-01-24 2:18 ` [PATCH 01/11] xfs: reflink should break pnfs leases before sharing blocks Darrick J. Wong
2018-01-24 14:16 ` Brian Foster
2018-01-26 9:06 ` Christoph Hellwig
2018-01-26 18:26 ` Darrick J. Wong
2018-01-24 2:18 ` [PATCH 02/11] xfs: only grab shared inode locks for source file during reflink Darrick J. Wong
2018-01-24 14:18 ` Brian Foster
2018-01-24 18:40 ` Darrick J. Wong
2018-01-26 12:07 ` Christoph Hellwig
2018-01-26 18:48 ` Darrick J. Wong
2018-01-27 3:32 ` Dave Chinner
2018-01-24 2:18 ` [PATCH 03/11] xfs: call xfs_qm_dqattach before performing reflink operations Darrick J. Wong
2018-01-24 14:18 ` Brian Foster
2018-01-26 9:07 ` Christoph Hellwig
2018-01-24 2:18 ` [PATCH 04/11] xfs: CoW fork operations should only update quota reservations Darrick J. Wong
2018-01-24 14:22 ` Brian Foster
2018-01-24 19:14 ` Darrick J. Wong
2018-01-25 13:01 ` Brian Foster
2018-01-25 17:52 ` Darrick J. Wong
2018-01-25 1:20 ` [PATCH v2 " Darrick J. Wong
2018-01-25 13:03 ` Brian Foster
2018-01-25 18:20 ` Darrick J. Wong
2018-01-26 13:02 ` Brian Foster
2018-01-26 18:40 ` Darrick J. Wong
2018-01-26 12:12 ` Christoph Hellwig
2018-01-24 2:18 ` [PATCH 05/11] xfs: track CoW blocks separately in the inode Darrick J. Wong
2018-01-25 13:06 ` Brian Foster
2018-01-25 19:21 ` Darrick J. Wong
2018-01-26 13:04 ` Brian Foster
2018-01-26 19:08 ` Darrick J. Wong [this message]
2018-01-26 12:15 ` Christoph Hellwig
2018-01-26 19:00 ` Darrick J. Wong
2018-01-26 23:51 ` Darrick J. Wong
2018-01-24 2:18 ` [PATCH 06/11] xfs: fix up cowextsz allocation shortfalls Darrick J. Wong
2018-01-25 17:31 ` Brian Foster
2018-01-25 20:20 ` Darrick J. Wong
2018-01-26 13:06 ` Brian Foster
2018-01-26 19:12 ` Darrick J. Wong
2018-01-26 9:11 ` Christoph Hellwig
2018-01-24 2:18 ` [PATCH 07/11] xfs: always zero di_flags2 when we free the inode Darrick J. Wong
2018-01-25 17:31 ` Brian Foster
2018-01-25 18:36 ` Darrick J. Wong
2018-01-26 9:08 ` Christoph Hellwig
2018-01-24 2:18 ` [PATCH 08/11] xfs: fix tracepoint %p formats Darrick J. Wong
2018-01-25 17:31 ` Brian Foster
2018-01-25 18:47 ` Darrick J. Wong
2018-01-26 0:19 ` Darrick J. Wong
2018-01-26 9:09 ` Christoph Hellwig
2018-01-24 2:18 ` [PATCH 09/11] xfs: make tracepoint inode number format consistent Darrick J. Wong
2018-01-25 17:31 ` Brian Foster
2018-01-26 9:09 ` Christoph Hellwig
2018-01-24 2:19 ` [PATCH 10/11] xfs: refactor inode verifier corruption error printing Darrick J. Wong
2018-01-25 17:31 ` Brian Foster
2018-01-25 18:23 ` Darrick J. Wong
2018-01-26 9:10 ` Christoph Hellwig
2018-01-24 2:19 ` [PATCH 11/11] xfs: don't clobber inobt/finobt cursors when xref with rmap Darrick J. Wong
2018-01-26 9:10 ` Christoph Hellwig
2018-01-25 5:26 ` [PATCH 12/11] xfs: refactor quota code in xfs_bmap_btalloc Darrick J. Wong
2018-01-26 12:17 ` Christoph Hellwig
2018-01-26 21:46 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180126190841.GA9068@magnolia \
--to=darrick.wong@oracle.com \
--cc=bfoster@redhat.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).