From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: ross.zwisler@linux.intel.com, jack@suse.cz, xfs@oss.sgi.com
Subject: Re: [PATCH 3/6] xfs: Don't use unwritten extents for DAX
Date: Mon, 2 Nov 2015 12:14:33 +1100 [thread overview]
Message-ID: <20151102011433.GW19199@dastard> (raw)
In-Reply-To: <20151030123657.GC54905@bfoster.bfoster>
On Fri, Oct 30, 2015 at 08:36:57AM -0400, Brian Foster wrote:
> On Fri, Oct 30, 2015 at 10:37:56AM +1100, Dave Chinner wrote:
> > On Thu, Oct 29, 2015 at 10:29:50AM -0400, Brian Foster wrote:
> > > On Mon, Oct 19, 2015 at 02:27:15PM +1100, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > >
> ...
> > > > + /*
> > > > + * For DAX, we do not allocate unwritten extents, but instead we zero
> > > > + * the block before we commit the transaction. Ideally we'd like to do
> > > > + * this outside the transaction context, but if we commit and then crash
> > > > + * we may not have zeroed the blocks and this will be exposed on
> > > > + * recovery of the allocation. Hence we must zero before commit.
> > > > + * Further, if we are mapping unwritten extents here, we need to zero
> > > > + * and convert them to written so that we don't need an unwritten extent
> > > > + * callback for DAX. This also means that we need to be able to dip into
> > > > + * the reserve block pool if there is no space left but we need to do
> > > > + * unwritten extent conversion.
> > > > + */
> > > > + if (IS_DAX(VFS_I(ip))) {
> > > > + bmapi_flags = XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO;
> > > > + tp->t_flags |= XFS_TRANS_RESERVE;
> > > > + }
> > >
> > > Am I following the commit log description correctly in that block
> > > zeroing is only required for DAX faults? Do we zero blocks for DAX DIO
> > > as well to be consistent, or is that also required (because it looks
> > > like we still have end_io completion for dio writes anyways)?
> >
> > DAX DIO will do the zeroing rather than using unwritten extents,
> > too. But we still have DIO IO completion as that needs to do file
> > size updates.
> >
>
> Right, my question is: is the DAX DIO zeroing required to avoid the
> races described as the purpose for this patch, or is this just here as a
> simplification? In other words, why not do block zeroing only for DAX
> faults and not DAX/DIO?
Because the only reason the DIO code does 'allocate unwritten;
convert unwritten on IO completion' is so that if we have:
allocate
trans_commit
.... log force
journal IO submit
.... journal IO completion
submit data io
crash
We don't expose allocated blocks containing stale data to userspace
via recovery. The allcoation uses unwritten extents to ensure that
if the allocation is recovered without the correspending completion,
it reads as zeros rather whatever was previously on disk in taht
location.
For DAX, we can zero the blocks inside the allocation transaction
for direct IO, and hence even if we have the above happen, we'll
only ever expose zeros. Hence we don't need unwritten extents in the
DIO path to avoid stale data exposure, and so we can simply avoid
all that extra overhead of unwritten extent conversion on
completion...
> I ask because my understanding is the purpose of this patch is a special
> atomic zeroed allocation requirement just for mmap.
The requirement is set by DAX+mmap; the implementation is a generic
"allocate zeroed blocks" mechanism that can be applied to any
allocation that uses unwritten extents to allocate zeroed blocks if
zeroing is more efficient than using unwritten extents....
> Unless there is some
> special mixed dio/mmap case I'm missing, doing so for DAX/DIO basically
> causes a clear_pmem() over every page sized chunk of the target I/O
> range for which we already have the data.
I don't follow - this only zeros blocks when we do allocation of new
blocks or overwrite unwritten extents, not on blocks which we
already have written data extents allocated for...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2015-11-02 1:15 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-19 3:27 [PATCH 0/6 V2] xfs: upfront block zeroing for DAX Dave Chinner
2015-10-19 3:27 ` [PATCH 1/6] xfs: fix inode size update overflow in xfs_map_direct() Dave Chinner
2015-10-29 14:27 ` Brian Foster
2015-10-19 3:27 ` [PATCH 2/6] xfs: introduce BMAPI_ZERO for allocating zeroed extents Dave Chinner
2015-10-29 14:27 ` Brian Foster
2015-10-29 23:35 ` Dave Chinner
2015-10-30 12:36 ` Brian Foster
2015-11-02 1:21 ` Dave Chinner
2015-10-19 3:27 ` [PATCH 3/6] xfs: Don't use unwritten extents for DAX Dave Chinner
2015-10-29 14:29 ` Brian Foster
2015-10-29 23:37 ` Dave Chinner
2015-10-30 12:36 ` Brian Foster
2015-11-02 1:14 ` Dave Chinner [this message]
2015-11-02 14:15 ` Brian Foster
2015-11-02 21:44 ` Dave Chinner
2015-11-03 3:53 ` Dan Williams
2015-11-03 5:04 ` Dave Chinner
2015-11-04 0:50 ` Ross Zwisler
2015-11-04 1:02 ` Dan Williams
2015-11-04 4:46 ` Ross Zwisler
2015-11-04 9:06 ` Jan Kara
2015-11-04 15:35 ` Ross Zwisler
2015-11-04 17:21 ` Jan Kara
2015-11-03 9:16 ` Jan Kara
2015-10-19 3:27 ` [PATCH 4/6] xfs: DAX does not use IO completion callbacks Dave Chinner
2015-10-29 14:29 ` Brian Foster
2015-10-29 23:39 ` Dave Chinner
2015-10-30 12:37 ` Brian Foster
2015-10-19 3:27 ` [PATCH 5/6] xfs: add ->pfn_mkwrite support for DAX Dave Chinner
2015-10-29 14:30 ` Brian Foster
2015-10-19 3:27 ` [PATCH 6/6] xfs: xfs_filemap_pmd_fault treats read faults as write faults Dave Chinner
2015-10-29 14:30 ` Brian Foster
2015-11-05 23:48 ` [PATCH 0/6 V2] xfs: upfront block zeroing for DAX Ross Zwisler
2015-11-06 22:32 ` Dave Chinner
2015-11-06 18:12 ` Boylston, Brian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151102011433.GW19199@dastard \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=jack@suse.cz \
--cc=ross.zwisler@linux.intel.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox