linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
To: Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>,
	linux-nvdimm
	<linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org>,
	Linux API <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>,
	"linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"
	<linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
	Linux FS Devel
	<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Subject: Re: [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem
Date: Mon, 19 Jun 2017 23:21:07 +1000	[thread overview]
Message-ID: <20170619132107.GG11993@dastard> (raw)
In-Reply-To: <CALCETrVY38h2ajpod2U_2pdHSp8zO4mG2p19h=OnnHmhGTairw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Sat, Jun 17, 2017 at 10:05:45PM -0700, Andy Lutomirski wrote:
> On Sat, Jun 17, 2017 at 8:15 PM, Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> > On Sat, Jun 17, 2017 at 4:50 PM, Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> >> My other objection is that the syscall intentionally leaks a reference
> >> to the file.  This means it needs overflow protection and it probably
> >> shouldn't ever be allowed to use it without privilege.
> >
> > We only hold the one reference while S_DAXFILE is set, so I think the
> > protection is there, and per Dave's original proposal this requires
> > CAP_LINUX_IMMUTABLE.
> >
> >> Why can't the underlying issue be easily fixed, though?  Could
> >> .page_mkwrite just make sure that metadata is synced when the FS uses
> >> DAX?
> >
> > Yes, it most definitely could and that idea has been floated.
> >
> >> On a DAX fs, syncing metadata should be extremely fast.

<sigh>

This again....

Persistent memory means the *I/O* is fast. It does not mean that
*complex filesystem operations* are fast.

Don't forget that there's an shitload of CPU that gets burnt to make
sure that the metadata is synced correctly. Do that /synchronously/
on *every* write page fault (which, BTW, modify mtime, so will
always have dirty metadata to sync) and now you have a serious
performance problem with your "fast" DAX access method.

And that's before we even consider all the problems with running
sync operations in page fault context....

> >> This
> >> could be conditioned on an madvise or mmap flag if performance might
> >> be an issue.  As far as I know, this change alone should be
> >> sufficient.
> >
> > The hang up is that it requires per-fs enabling as it needs to be
> > careful to manage mmap_sem vs fs journal locks for example. I know the
> > in-development NOVA [1] filesystem is planning to support this out of
> > the gate. ext4 would be open to implementing it, but I think xfs is
> > cold on the idea. Christoph originally proposed it here [2], before
> > Dave went on to propose immutable semantics.
> 
> Hmm.  Given a choice between a very clean API that works without
> privilege but is awkward to implement on XFS and an awkward-to-use
> API, I'd personally choose the former.

Yup, you have the choice of a clean kernel API that will be
substantially slower than the existing "dirty page" tracking and
having the app run fsync() when necessary, or having to do a little
more work in a library routine that preallocates a file and sets a
flag on it?

The apps will use the library API, not the kernel API, so who really
cares if there's a few steps to setting up the file state
appropriately?

> Dave, even with the lock ordering issue, couldn't XFS implement
> MAP_PMEM_AWARE by having .page_mkwrite work roughly like this:
> 
> if (metadata is dirty) {
>   up_write(&mmap_sem);
>   sync the metadata;
>   down_write(&mmap_sem);
>   return 0;  /* retry the fault */
> } else {
>   return whatever success code;
> }

How do you know that there is dependent filesystem metadata that
needs syncing at a level that you can safely manipulate the
mmap_sem? And how, exactly, do you do this without races? It'd be
trivial to DOS such retryable DAX faults simply by touching the file
in a tight loop in a separate process...

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org

  parent reply	other threads:[~2017-06-19 13:21 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-17  1:15 [RFC PATCH 0/2] daxfile: enable byte-addressable updates to pmem Dan Williams
     [not found] ` <149766212410.22552.15957843500156182524.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-06-17  1:15   ` [RFC PATCH 1/2] mm: introduce bmap_walk() Dan Williams
2017-06-17  5:22     ` Christoph Hellwig
     [not found]       ` <20170617052212.GA8246-jcswGhMUV9g@public.gmane.org>
2017-06-17 12:29         ` Dan Williams
2017-06-18  7:51           ` Christoph Hellwig
2017-06-19 16:18             ` Darrick J. Wong
     [not found]             ` <20170618075152.GA25871-jcswGhMUV9g@public.gmane.org>
2017-06-19 18:19               ` Al Viro
2017-06-20  7:34                 ` Christoph Hellwig
2017-06-17  1:15 ` [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem Dan Williams
     [not found]   ` <149766213493.22552.4057048843646200083.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-06-17 16:25     ` Andy Lutomirski
2017-06-17 21:52       ` Dan Williams
     [not found]         ` <CAPcyv4j4UEegViDJcLZjVv5AFGC18-DcvHFnhZatB0hH3BY85g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-17 23:50           ` Andy Lutomirski
2017-06-18  3:15             ` Dan Williams
2017-06-18  5:05               ` Andy Lutomirski
     [not found]                 ` <CALCETrVY38h2ajpod2U_2pdHSp8zO4mG2p19h=OnnHmhGTairw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-19 13:21                   ` Dave Chinner [this message]
2017-06-19 15:22                     ` Andy Lutomirski
     [not found]                       ` <CALCETrUe0igzK0RZTSSondkCY3ApYQti89tOh00f0j_APrf_dQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-20  0:46                         ` Dave Chinner
2017-06-20  5:53                           ` Andy Lutomirski
2017-06-20  8:49                             ` Christoph Hellwig
     [not found]                               ` <20170620084924.GA9752-jcswGhMUV9g@public.gmane.org>
2017-06-20 16:17                                 ` Dan Williams
     [not found]                                   ` <CAPcyv4jkH6iwDoG4NnCaTNXozwYgVXiJDe2iFSONcE63KvGQoA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-20 16:26                                     ` Andy Lutomirski
2017-06-20 23:53                                   ` Dave Chinner
2017-06-21  1:24                                     ` Darrick J. Wong
2017-06-21  2:19                                       ` Dave Chinner
     [not found]                             ` <CALCETrVuoPDRuuhc9X8eVCYiFUzWLSTRkcjbD6jas_2J2GixNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-20 10:11                               ` Dave Chinner
2017-06-20 16:14                                 ` Andy Lutomirski
2017-06-21  1:40                                   ` Dave Chinner
2017-06-21  5:18                                     ` Andy Lutomirski
     [not found]                                       ` <CALCETrVYmbyNS-btvsN_M-QyWPZA_Y_4JXOM893g7nhZA+WviQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-22  0:02                                         ` Dave Chinner
2017-06-22  4:07                                           ` Andy Lutomirski
2017-06-23  0:52                                             ` Dave Chinner
2017-06-23  3:07                                               ` Andy Lutomirski
2017-06-18  8:18               ` Christoph Hellwig
     [not found]                 ` <20170618081850.GA26332-jcswGhMUV9g@public.gmane.org>
2017-06-19  1:51                   ` Dan Williams
2017-06-20  5:22   ` Darrick J. Wong
2017-06-20 15:42     ` Ross Zwisler
2017-06-22  7:09       ` Darrick J. Wong
     [not found]     ` <20170620052214.GA3787-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2017-06-21 23:37       ` Dave Chinner
2017-06-22  7:23         ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170619132107.GG11993@dastard \
    --to=david-fqsqvqoi3ljby3ivrkzq2a@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=hch-jcswGhMUV9g@public.gmane.org \
    --cc=jack-AlSwsSmVLrQ@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org \
    --cc=luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).