From: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
To: Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>,
linux-nvdimm
<linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org>,
Linux API <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>,
"linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"
<linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
Linux FS Devel
<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Subject: Re: [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem
Date: Thu, 22 Jun 2017 10:02:35 +1000 [thread overview]
Message-ID: <20170622000235.GN17542@dastard> (raw)
In-Reply-To: <CALCETrVYmbyNS-btvsN_M-QyWPZA_Y_4JXOM893g7nhZA+WviQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Tue, Jun 20, 2017 at 10:18:24PM -0700, Andy Lutomirski wrote:
> On Tue, Jun 20, 2017 at 6:40 PM, Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org> wrote:
> >> A per-inode
> >> count of the number of live DAX mappings or of the number of struct
> >> file instances that have requested DAX would work here.
> >
> > For what purpose does this serve? The reflink invalidates all the
> > existing mappings, so the next write access causes a fault and then
> > page_mkwrite is called and the shared extent will get COWed....
>
> The same purpose as XFS's FS_XFLAG_DAX (assuming I'm understanding it
> right), except that IMO an API that doesn't involve making a change to
> an inode that sticks around would be nice. The inode flag has the
> unfortunate property that, if two different programs each try to set
> the flag, mmap, write, and clear the flag, they'll stomp on each other
> and risk data corruption.
>
> I admit I'm now thoroughly confused as to exactly what XFS does here
> -- does FS_XFLAG_DAX persist across unmount/mount?
Yes, it is.
i.e. DAX on XFS does not rely on a naive fs-wide mount option. You
can have applications on pmem filesystems use either DAX or normal
IO based on directory/inode flags. Something doesn't work with DAX,
so just remove the DAX flags from the directories/inodes, and it
will safely and transparently switch to page-cache based IO.
<snip>
> Here's the overall point I'm trying to make: unprivileged programs
> that want to write to DAX files with userspace commit mechanisms
> (CLFLUSHOPT;SFENCE, etc) should be able to do so reliably, without
> privilege, and with reasonably clean APIs. Ideally they could do this
> to any file they have write access to.
The privilege argument is irrelevant now - it was /suggested/
initially as a way of preventing people from shooting themselves in
the foot based on the immutable file model. It's clear that's not
desired, and it's not a show stopper.
> Programs that want to write to
> mmapped files, DAX or otherwise, without latency spikes due to
> .page_mkwrite should be able to opt in to a heavier weight mechanism.
> But these two issues are someone independent, and I think they should
> be solved separately.
You seem to be calling the "fdatasync on every page fault" the
"lightweight" option. That's the brute-force-with-big-hammer
solution - it's most definitely not lightweight as every page fault
has extra overhead to call ->fsync(). Sure, the API is simple, but
the runtime overhead is significant.
The lightweight *runtime* option is to set up the file in such a
way that there is never any extra overhead at page fault time. This
is what immutable extent maps provide. Indeed, because the mappings
never change, you could use hardware dirty tracking if you wanted,
as there's no need to look up the filesystem to do writeback as
everything needed for writeback was mapped at page fault time. This
"map first and then just write when you need to" is *exactly how
swap files work*.
Even if you are considering the complexity of the APIs, it's hardly
a "heavyweight" when it only requires a single call to fallocate()
before mmap() to set up the immutable extents on the file...
Cheers,
Dave.
--
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
next prev parent reply other threads:[~2017-06-22 0:02 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-17 1:15 [RFC PATCH 0/2] daxfile: enable byte-addressable updates to pmem Dan Williams
[not found] ` <149766212410.22552.15957843500156182524.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-06-17 1:15 ` [RFC PATCH 1/2] mm: introduce bmap_walk() Dan Williams
2017-06-17 5:22 ` Christoph Hellwig
[not found] ` <20170617052212.GA8246-jcswGhMUV9g@public.gmane.org>
2017-06-17 12:29 ` Dan Williams
2017-06-18 7:51 ` Christoph Hellwig
2017-06-19 16:18 ` Darrick J. Wong
[not found] ` <20170618075152.GA25871-jcswGhMUV9g@public.gmane.org>
2017-06-19 18:19 ` Al Viro
2017-06-20 7:34 ` Christoph Hellwig
2017-06-17 1:15 ` [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem Dan Williams
[not found] ` <149766213493.22552.4057048843646200083.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-06-17 16:25 ` Andy Lutomirski
2017-06-17 21:52 ` Dan Williams
[not found] ` <CAPcyv4j4UEegViDJcLZjVv5AFGC18-DcvHFnhZatB0hH3BY85g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-17 23:50 ` Andy Lutomirski
2017-06-18 3:15 ` Dan Williams
2017-06-18 5:05 ` Andy Lutomirski
[not found] ` <CALCETrVY38h2ajpod2U_2pdHSp8zO4mG2p19h=OnnHmhGTairw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-19 13:21 ` Dave Chinner
2017-06-19 15:22 ` Andy Lutomirski
[not found] ` <CALCETrUe0igzK0RZTSSondkCY3ApYQti89tOh00f0j_APrf_dQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-20 0:46 ` Dave Chinner
2017-06-20 5:53 ` Andy Lutomirski
2017-06-20 8:49 ` Christoph Hellwig
[not found] ` <20170620084924.GA9752-jcswGhMUV9g@public.gmane.org>
2017-06-20 16:17 ` Dan Williams
[not found] ` <CAPcyv4jkH6iwDoG4NnCaTNXozwYgVXiJDe2iFSONcE63KvGQoA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-20 16:26 ` Andy Lutomirski
2017-06-20 23:53 ` Dave Chinner
2017-06-21 1:24 ` Darrick J. Wong
2017-06-21 2:19 ` Dave Chinner
[not found] ` <CALCETrVuoPDRuuhc9X8eVCYiFUzWLSTRkcjbD6jas_2J2GixNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-20 10:11 ` Dave Chinner
2017-06-20 16:14 ` Andy Lutomirski
2017-06-21 1:40 ` Dave Chinner
2017-06-21 5:18 ` Andy Lutomirski
[not found] ` <CALCETrVYmbyNS-btvsN_M-QyWPZA_Y_4JXOM893g7nhZA+WviQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-06-22 0:02 ` Dave Chinner [this message]
2017-06-22 4:07 ` Andy Lutomirski
2017-06-23 0:52 ` Dave Chinner
2017-06-23 3:07 ` Andy Lutomirski
2017-06-18 8:18 ` Christoph Hellwig
[not found] ` <20170618081850.GA26332-jcswGhMUV9g@public.gmane.org>
2017-06-19 1:51 ` Dan Williams
2017-06-20 5:22 ` Darrick J. Wong
2017-06-20 15:42 ` Ross Zwisler
2017-06-22 7:09 ` Darrick J. Wong
[not found] ` <20170620052214.GA3787-PTl6brltDGh4DFYR7WNSRA@public.gmane.org>
2017-06-21 23:37 ` Dave Chinner
2017-06-22 7:23 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170622000235.GN17542@dastard \
--to=david-fqsqvqoi3ljby3ivrkzq2a@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=hch-jcswGhMUV9g@public.gmane.org \
--cc=jack-AlSwsSmVLrQ@public.gmane.org \
--cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org \
--cc=luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).