linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Keith Packard <keithp@keithp.com>, Jan Kara <jack@suse.cz>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	XFS Developers <xfs@oss.sgi.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-ext4 <linux-ext4@vger.kernel.org>
Subject: Re: Subtle races between DAX mmap fault and write path
Date: Mon, 1 Aug 2016 17:39:06 +1000	[thread overview]
Message-ID: <20160801073906.GK16044@dastard> (raw)
In-Reply-To: <CAPcyv4jN5OvbPMA07w0vjBCBrGH-j2YsCjfvB3b6S6JJ_zUA=Q@mail.gmail.com>

On Sun, Jul 31, 2016 at 09:39:38PM -0700, Dan Williams wrote:
> On Sun, Jul 31, 2016 at 9:07 PM, Dave Chinner <david@fromorbit.com> wrote:
> > OTOH, DAX directly exposes the physical layout to the filesytem.
> > And because it's DAX-based pmem and not cached struct pages, we
> > can't run vm_map_ram() to virtually map the range we need to see as
> > a contiguous range, as we do in XFS for large objects such as directory
> > blocks and log buffers. For other large objects such as inode
> > clusters, we can directly map each page as the objects within the
> > clusters are page aligned and never overlap page boundaries, but
> > that only works for inode and dquot buffers. Hence DAX as it stands
> > makes it extremely difficult to "retrofit" DAX into all aspects of
> > existing fileystems because exposing physical discontiguities breaks
> > code that assumes they don't exist.
> 
> On this specific point about page remapping, the administrator can
> configure struct pages for pmem and you can detect whether they are
> present in the filesystem with pfn_t_has_page().  I.e. you could
> require pages be present for XFS, if that helps...

It's kinda silly to require struct pages for the entire pmem device
if they are only needed for accessing a (comparitively) small amount
of metadata.

Besides, now that I look at it more deeply, we can't use virtually
mapped pmem for the log buffers.  We can't allocate memory at the
point in time where we work out what LBA in the log we need to map
to physical pmem for the current log write.  Hence calls to
vm_map_ram() can't be used, and so that rules out using mapped page
based pmem for log buffers.

I'll probably have to rewrite the xlog_write() engine completely to
be able to handle discontiguous pages in the iclog buffers before we
can consider mapping them via DAX now, and I'm really not sure it's
worth the effort. I'd much prefer to spend time designing a native
pmem filesystem....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2016-08-01  7:39 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-27 12:07 Subtle races between DAX mmap fault and write path Jan Kara
2016-07-27 21:10 ` Ross Zwisler
2016-07-27 22:19   ` Dave Chinner
2016-07-28  8:10     ` Jan Kara
2016-07-29  2:21       ` Dave Chinner
2016-07-29 14:44         ` Dan Williams
2016-07-30  0:12           ` Dave Chinner
2016-07-30  0:53             ` Dan Williams
2016-08-01  1:46               ` Dave Chinner
2016-08-01  3:13                 ` Keith Packard
2016-08-01  4:07                   ` Dave Chinner
2016-08-01  4:39                     ` Dan Williams
2016-08-01  7:39                       ` Dave Chinner [this message]
2016-08-01 10:13             ` Boaz Harrosh
2016-08-02  0:21               ` Dave Chinner
2016-08-04 18:40                 ` Kani, Toshimitsu
2016-08-05 11:27                   ` Dave Chinner
2016-08-05 15:18                     ` Kani, Toshimitsu
2016-08-05 19:58                     ` Boylston, Brian
2016-08-08  9:26                       ` Jan Kara
2016-08-08 12:30                         ` Boylston, Brian
2016-08-08 13:11                           ` Christoph Hellwig
2016-08-08 18:28                           ` Jan Kara
2016-08-08 19:32                             ` Kani, Toshimitsu
2016-08-08 23:12                       ` Dave Chinner
2016-08-09  1:00                         ` Kani, Toshimitsu
2016-08-09  5:58                           ` Dave Chinner
2016-08-01 17:47             ` Dan Williams
2016-07-28  8:47   ` Jan Kara
2016-07-27 21:38 ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160801073906.GK16044@dastard \
    --to=david@fromorbit.com \
    --cc=dan.j.williams@intel.com \
    --cc=jack@suse.cz \
    --cc=keithp@keithp.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).