linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/4] xfs: force writes to delalloc regions to unwritten
Date: Mon, 18 Mar 2019 09:40:11 +1100	[thread overview]
Message-ID: <20190317224011.GO23020@dastard> (raw)
In-Reply-To: <20190315122939.GB5182@bfoster>

On Fri, Mar 15, 2019 at 08:29:40AM -0400, Brian Foster wrote:
> On Fri, Mar 15, 2019 at 02:40:38PM +1100, Dave Chinner wrote:
> > > If the delalloc extent crosses EOF then I think it makes sense to map it
> > > in as unwritten, issue the write, and convert the extent to real during
> > > io completion, and not split it up just to avoid having unwritten
> > > extents past EOF.
> > 
> > Yup, makes sense. For a sequential write, they should always be
> > beyond EOF. For anything else, we use unwritten.
> > 
> 
> I'm not quite sure I follow the suggested change in behavior here tbh so
> maybe this is not an issue, but another thing to consider is whether
> selective delalloc -> real conversion for post-eof extents translates to
> conditional stale data exposure vectors in certain file extension
> scenarios.

No, it doesn't because the transaction that moves EOF is done after
the data write into the region it exposes is complete. hence the
device cache flush that occurs before the log write containing the
transaction that moves the EOF will always result in the data being
on stable storage *before* the extending szie transaction hits the
journal and exposes it.

> I think even the post-eof zeroing code may be susceptible to
> this problem if the log happens to push after a truncate transaction
> commits but before the associated dirty pages are written back..

That's what this code in xfs_setattr_size() handles:

        /*
         * We are going to log the inode size change in this transaction so
         * any previous writes that are beyond the on disk EOF and the new
         * EOF that have not been written out need to be written here.  If we
         * do not write the data out, we expose ourselves to the null files
         * problem. Note that this includes any block zeroing we did above;
         * otherwise those blocks may not be zeroed after a crash.
         */
        if (did_zeroing ||
            (newsize > ip->i_d.di_size && oldsize != ip->i_d.di_size)) {
                error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping,
                                                ip->i_d.di_size, newsize - 1);
                if (error)
                        return error;
        }

        error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);

i.e. it ensures the data and post-eof zeroing is on disk before the
truncate transaction is started. We already hold the IOLOCK at this
point and drained in-flight direct IO, so no IO that can affect the
truncate region will begin or be in progress after the flush and
wait above is complete.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2019-03-17 22:40 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-14 21:28 [PATCH 0/4] xfs: various rough fixes Darrick J. Wong
2019-03-14 21:29 ` [PATCH 1/4] xfs: only free posteof blocks on first close Darrick J. Wong
2019-03-15  3:42   ` Dave Chinner
2019-03-14 21:29 ` [PATCH 2/4] xfs: force writes to delalloc regions to unwritten Darrick J. Wong
2019-03-14 23:08   ` Dave Chinner
2019-03-15  3:13     ` Darrick J. Wong
2019-03-15  3:40       ` Dave Chinner
2019-03-15 12:29         ` Brian Foster
2019-03-17 22:40           ` Dave Chinner [this message]
2019-03-18 12:40             ` Brian Foster
2019-03-18 20:35               ` Dave Chinner
2019-03-18 21:50                 ` Brian Foster
2019-03-19 18:02                   ` Darrick J. Wong
2019-03-19 20:25                     ` Dave Chinner
2019-03-20 12:02                       ` Brian Foster
2019-03-20 21:10                         ` Dave Chinner
2019-03-21 12:27                           ` Brian Foster
2019-03-22  1:52                             ` Dave Chinner
2019-03-22 12:46                               ` Brian Foster
2019-04-03 22:38                                 ` Darrick J. Wong
2019-04-04 12:50                                   ` Brian Foster
2019-04-04 15:22                                     ` Darrick J. Wong
2019-04-04 16:16                                       ` Brian Foster
2019-04-04 22:08                                     ` Dave Chinner
2019-04-05 11:24                                       ` Brian Foster
2019-04-05 15:12                                         ` Darrick J. Wong
2019-04-08  0:17                                           ` Dave Chinner
2019-04-08 12:02                                             ` Brian Foster
2019-03-14 21:29 ` [PATCH 3/4] xfs: trace transaction reservation logcount overflow Darrick J. Wong
2019-03-15 12:30   ` Brian Foster
2019-03-14 21:29 ` [PATCH 4/4] xfs: avoid reflink end cow deadlock Darrick J. Wong
2019-03-15 12:31   ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190317224011.GO23020@dastard \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).