linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	XFS Developers <xfs@oss.sgi.com>,
	linux-ext4 <linux-ext4@vger.kernel.org>
Subject: Re: Subtle races between DAX mmap fault and write path
Date: Mon, 1 Aug 2016 11:46:45 +1000	[thread overview]
Message-ID: <20160801014645.GI16044@dastard> (raw)
In-Reply-To: <CAPcyv4gLTkx4ne7pWuMSqfFpLoOBx=TowvcWXw9UGUxn=jd-Tg@mail.gmail.com>

On Fri, Jul 29, 2016 at 05:53:07PM -0700, Dan Williams wrote:
> On Fri, Jul 29, 2016 at 5:12 PM, Dave Chinner <david@fromorbit.com> wrote:
....
> > So what you are saying is that on and ADR machine, we have these
> > domains w.r.t. power fail:
> >
> > cpu-cache -> cpu-write-buffer -> bus -> imc -> imc-write-buffer -> media
> >
> > |-------------volatile-------------------|-----persistent--------------|
> >
> > because anything that gets to the IMC is guaranteed to be flushed to
> > stable media on power fail.
> >
> > But on a posted-write-buffer system, we have this:
> >
> > cpu-cache -> cpu-write-buffer -> bus -> imc -> imc-write-buffer -> media
> >
> > |-------------volatile-------------------------------------------|--persistent--|
> >
> > IOWs, only things already posted to the media via REQ_FLUSH are
> > considered stable on persistent media.  What happens in this case
> > when power fails during a media update? Incomplete writes?
> 
> Yes, power failure during a media update will end up with incomplete
> writes on an 8-byte boundary.

So we'd see that from the point of view of a torn single sector
write. Ok, so we better limit DAX to CRC enabled filesystems to
ensure these sorts of events are always caught by the filesystem.

> >> > Or have we somehow ended up with the fucked up situation where
> >> > dax_do_io() writes are (effectively) immediately persistent and
> >> > untracked by internal infrastructure, whilst mmap() writes
> >> > require internal dirty tracking and fsync() to flush caches via
> >> > writeback?
> >>
> >> dax_do_io() writes are not immediately persistent.  They bypass the
> >> cpu-cache and cpu-write-bufffer and are ready to be flushed to media
> >> by REQ_FLUSH or power-fail on an ADR system.
> >
> > IOWs, on an ADR system  write is /effectively/ immediately persistent
> > because if power fails ADR guarantees it will be flushed to stable
> > media, while on a posted write system it is volatile and will be
> > lost. Right?
> 
> Right.

Thanks for the clarification.

> > If we track the dirty blocks from write in the radix tree like we
> > for mmap, then we can just use a normal memcpy() in dax_do_io(),
> > getting rid of the slow cache bypass that is currently run. Radix
> > tree updates are much less expensive than a slow memcpy of large
> > amounts of data, ad fsync can then take care of persistence, just
> > like we do for mmap.
> 
> If we go this route to increase the amount of dirty-data tracking in
> the radix it raises the priority of one of the items on the backlog;
> namely, determine the crossover point where wbinvd of the entire cache
> is faster than a clflush / clwb loop.

Actually, I'd look at it from the other persepctive - at what point
does fine-grained dirty tracking run faster than the brute force
flush? If the gains are only marginal, then we need to question
whether fine grained tracking is worth the complexity at all...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2016-08-01  1:46 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-27 12:07 Subtle races between DAX mmap fault and write path Jan Kara
2016-07-27 21:10 ` Ross Zwisler
2016-07-27 22:19   ` Dave Chinner
2016-07-28  8:10     ` Jan Kara
2016-07-29  2:21       ` Dave Chinner
2016-07-29 14:44         ` Dan Williams
2016-07-30  0:12           ` Dave Chinner
2016-07-30  0:53             ` Dan Williams
2016-08-01  1:46               ` Dave Chinner [this message]
2016-08-01  3:13                 ` Keith Packard
2016-08-01  4:07                   ` Dave Chinner
2016-08-01  4:39                     ` Dan Williams
2016-08-01  7:39                       ` Dave Chinner
2016-08-01 10:13             ` Boaz Harrosh
2016-08-02  0:21               ` Dave Chinner
2016-08-04 18:40                 ` Kani, Toshimitsu
2016-08-05 11:27                   ` Dave Chinner
2016-08-05 15:18                     ` Kani, Toshimitsu
2016-08-05 19:58                     ` Boylston, Brian
2016-08-08  9:26                       ` Jan Kara
2016-08-08 12:30                         ` Boylston, Brian
2016-08-08 13:11                           ` Christoph Hellwig
2016-08-08 18:28                           ` Jan Kara
2016-08-08 19:32                             ` Kani, Toshimitsu
2016-08-08 23:12                       ` Dave Chinner
2016-08-09  1:00                         ` Kani, Toshimitsu
2016-08-09  5:58                           ` Dave Chinner
2016-08-01 17:47             ` Dan Williams
2016-07-28  8:47   ` Jan Kara
2016-07-27 21:38 ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160801014645.GI16044@dastard \
    --to=david@fromorbit.com \
    --cc=dan.j.williams@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).