Re: [PATCH 3/6] xfs: Don't use unwritten extents for DAX

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Brian Foster <bfoster@redhat.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Jan Kara <jack@suse.cz>,
	xfs@oss.sgi.com, linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>
Subject: Re: [PATCH 3/6] xfs: Don't use unwritten extents for DAX
Date: Tue, 3 Nov 2015 17:50:56 -0700	[thread overview]
Message-ID: <20151104005056.GA24710@linux.intel.com> (raw)
In-Reply-To: <20151103050413.GB19199@dastard>

On Tue, Nov 03, 2015 at 04:04:13PM +1100, Dave Chinner wrote:
> On Mon, Nov 02, 2015 at 07:53:27PM -0800, Dan Williams wrote:
> > On Mon, Nov 2, 2015 at 1:44 PM, Dave Chinner <david@fromorbit.com> wrote:
<>
> > > This comes back to the comments I made w.r.t. the pmem driver
> > > implementation doing synchronous IO by immediately forcing CPU cache
> > > flushes and barriers. it's obviously correct, but it looks like
> > > there's going to be a major performance penalty associated with it.
> > > This is why I recently suggested that a pmem driver that doesn't do
> > > CPU cache writeback during IO but does it on REQ_FLUSH is an
> > > architecture we'll likely have to support.
> > >
> > 
> > The only thing we can realistically delay is wmb_pmem() i.e. the final
> > sync waiting for data that has *left* the cpu cache.  Unless/until we
> > get a architecturally guaranteed method to write-back the entire
> > cache, or flush the cache by physical-cache-way we're stuck with
> > either non-temporal cycles or looping on potentially huge virtual
> > address ranges.
> 
> I'm missing something: why won't flushing the address range returned
> by bdev_direct_access() during a fsync operation work? i.e. we're
> working with exactly the same address as dax_clear_blocks() and
> dax_do_io() use, so why can't we look up that address and flush it
> from fsync?

I could be wrong, but I don't see a reason why DAX can't use the strategy of
writing data and marking it dirty in one step and then flushing later in
response to fsync/msync.  I think this could be used everywhere we write or
zero data - dax_clear_blocks(), dax_io() etc.  (I believe that lots of the
block zeroing code will go away once we have the XFS and ext4 patches in that
guarantee we will only get written and zeroed extents from the filesystem in
response to get_block().)  I think the PMEM driver, lacking the ability to
mark things as dirty in the radix tree, etc, will need to keep doing things
synchronously.

Hmm...if we go this path, though, is that an argument against moving the
zeroing from DAX down into the driver?  True, with BRD it makes things nice
and efficient because you can zero and never flush, and the driver knows
there's nothing else to do.

For PMEM, though, you lose the ability to zero the data and then queue the
flushing for later, as you would be able to do if you left the zeroing code in
DAX.  The benefit of this is that if you are going to immediately re-write the
newly zeroed data (which seems common), PMEM will end up doing an extra cache
flush of the zeroes, only to have them overwritten and marked as dirty by DAX.
If we leave the zeroing to DAX we can mark it dirty once, zero it once, write
it once, and flush it once.

This would make us lose the ability to do hardware-assisted flushing in the
future that requires driver specific knowledge, though I don't think that
exists yet.  Perhaps we should leave the zeroing in DAX for now to take
advantage of the single flush, and then move it down if a driver can improve
performance with hardware assisted PMEM zeroing?

next prev parent reply	other threads:[~2015-11-04  0:50 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1445225238-30413-1-git-send-email-david@fromorbit.com>
     [not found] ` <1445225238-30413-4-git-send-email-david@fromorbit.com>
     [not found]   ` <20151029142950.GE11663@bfoster.bfoster>
     [not found]     ` <20151029233756.GS19199@dastard>
     [not found]       ` <20151030123657.GC54905@bfoster.bfoster>
     [not found]         ` <20151102011433.GW19199@dastard>
     [not found]           ` <20151102141509.GA29346@bfoster.bfoster>
2015-11-02 21:44             ` [PATCH 3/6] xfs: Don't use unwritten extents for DAX Dave Chinner
2015-11-03  3:53               ` Dan Williams
2015-11-03  5:04                 ` Dave Chinner
2015-11-04  0:50                   ` Ross Zwisler [this message]
2015-11-04  1:02                     ` Dan Williams
2015-11-04  4:46                       ` Ross Zwisler
2015-11-04  9:06                         ` Jan Kara
2015-11-04 15:35                           ` Ross Zwisler
2015-11-04 17:21                             ` Jan Kara
2015-11-03  9:16               ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151104005056.GA24710@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=bfoster@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).