From: Christoph Hellwig <hch@lst.de>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>,
"Darrick J. Wong" <darrick.wong@oracle.com>,
Jan Kara <jack@suse.cz>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
Dave Chinner <david@fromorbit.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
linux-xfs@vger.kernel.org, Jeff Moyer <jmoyer@redhat.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andy Lutomirski <luto@kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Ross Zwisler <ross.zwisler@linux.intel.com>,
Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap
Date: Sun, 13 Aug 2017 11:24:36 +0200 [thread overview]
Message-ID: <20170813092436.GB32112@lst.de> (raw)
In-Reply-To: <CAPcyv4iKG=Zoo1VXu23YBc6uWDFV4QSS+eAm9MSOynx2FjTHbw@mail.gmail.com>
On Sat, Aug 12, 2017 at 12:19:50PM -0700, Dan Williams wrote:
> The application does not need to know the storage address, it needs to
> know that the storage address to file offset is fixed. With this
> information it can make assumptions about the permanence of results it
> gets from the kernel.
Only if we clearly document that fact - and documenting the permanence
is different from saying the block map won't change.
> For example get_user_pages() today makes no guarantees outside of
> "page will not be freed",
It also makes the extremely important gurantee that the page won't
_move_ - e.g. that we won't do a memory migration for compaction or
other reasons. That's why for example RDMA can use to register
memory and then we can later set up memory windows that point to this
registration from userspace and implement userspace RDMA.
> but with immutable files and dax you now
> have a mechanism for userspace to coordinate direct access to storage
> addresses. Those raw storage addresses need not be exposed to the
> application, as you say it doesn't need to know that detail. MAP_SYNC
> does not fully satisfy this case because it requires agents that can
> generate MMU faults to coordinate with the filesystem.
The file system is always in the fault path, can you explain what other
agents you are talking about?
> All I know is that SMB Direct for persistent memory seems like a
> potential consumer. I know they're not going to use a userspace
> filesystem or put an SMB server in the kernel.
Last I talked to the Samba folks they didn't expect a userspace
SMB direct implementation to work anyway due to the fact that
libibverbs memory registrations interact badly with their fork()ing
daemon model. That being said during the recent submission of the
RDMA client code some comments were made about userspace versions of
it, so I'm not sure if that opinion has changed in one way or another.
Thay being said I think we absolutely should support RDMA memory
registrations for DAX mappings. I'm just not sure how S_IOMAP_IMMUTABLE
helps with that. We'll want a MAP_SYNC | MAP_POPULATE to make sure
all the blocks are polulated and all ptes are set up. Second we need
to make sure get_user_page works, which for now means we'll need a
struct page mapping for the region (which will be really annoying
for PCIe mappings, like the upcoming NVMe persistent memory region),
and we need to gurantee that the extent mapping won't change while
the get_user_pages holds the pages inside it. I think that is true
due to side effects even with the current DAX code, but we'll need to
make it explicit. And maybe that's where we need to converge -
"sealing" the extent map makes sense as such a temporary measure
that is not persisted on disk, which automatically gets released
when the holding process exits, because we sort of already do this
implicitly. It might also make sense to have explicitl breakable
seals similar to what I do for the pNFS blocks kernel server, as
any userspace RDMA file server would also need those semantics.
Last but not least we have any interesting additional case for modern
Mellanox hardware - On Demand Paging where we don't actually do a
get_user_pages but the hardware implements SVM and thus gets fed
virtual addresses directly. My head spins when talking about the
implications for DAX mappings on that, so I'm just throwing that in
for now instead of trying to come up with a solution.
next prev parent reply other threads:[~2017-08-13 9:24 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <150181368442.32119.13336247800141074356.stgit@dwillia2-desk3.amr.corp.intel.com>
[not found] ` <150181368442.32119.13336247800141074356.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-08-04 2:38 ` [PATCH v2 0/5] fs, xfs: block map immutable files for dax, dma-to-storage, and swap Dan Williams
[not found] ` <CAPcyv4ii41F-Rj9pPGc0FHwrQ=hkSF_f0niQDn5_NjU-wcL+gg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-08-05 9:50 ` Christoph Hellwig
[not found] ` <20170805095013.GC14930-jcswGhMUV9g@public.gmane.org>
2017-08-06 18:51 ` Dan Williams
[not found] ` <CAPcyv4jgKmakB0WRUjx=2eD3YJ1x+C8cgnR6tA+g4+m+0etawQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-08-11 10:44 ` Christoph Hellwig
[not found] ` <20170811104429.GA13736-jcswGhMUV9g@public.gmane.org>
2017-08-11 22:26 ` Dan Williams
2017-08-12 3:57 ` Andy Lutomirski
[not found] ` <CALCETrVvMbaxobdydtsdQWHyP1VhL1fpq1qS4M3=SmR1y4x5kw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-08-12 4:44 ` Dan Williams
2017-08-12 7:34 ` Christoph Hellwig
[not found] ` <CAPcyv4jrZ5a+zmAehZDxfP=+6BNCFAXOFWro2L7ruLkk+cY7OQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-08-12 7:33 ` Christoph Hellwig
[not found] ` <20170812073349.GA12679-jcswGhMUV9g@public.gmane.org>
2017-08-12 19:19 ` Dan Williams
2017-08-13 9:24 ` Christoph Hellwig [this message]
[not found] ` <20170813092436.GB32112-jcswGhMUV9g@public.gmane.org>
2017-08-13 20:31 ` Dan Williams
[not found] ` <CAPcyv4ixTgSWG9K2Eg3XJmOvqJht81qL+Z3njoOjcXCD7XMpZw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-08-14 12:40 ` Jan Kara
2017-08-14 16:14 ` Dan Williams
[not found] ` <CAPcyv4hi_Y5Qj=h_Qf4Bcyv+EWBosa2gQT+-8ro3hPY9VMshSA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-08-15 8:37 ` Jan Kara
2017-08-15 23:50 ` Dan Williams
[not found] ` <CAPcyv4hFTn4Fz5o+Gm857mS-RA6WAVsf4CmwiLiK2O8w2_SamQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-08-16 13:57 ` Jan Kara
[not found] ` <20170814124059.GC17820-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2017-08-21 9:16 ` Peter Zijlstra
2017-08-14 21:46 ` Darrick J. Wong
2017-08-13 23:46 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170813092436.GB32112@lst.de \
--to=hch@lst.de \
--cc=dan.j.williams@intel.com \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=jack@suse.cz \
--cc=jmoyer@redhat.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=linux-xfs@vger.kernel.org \
--cc=luto@kernel.org \
--cc=ross.zwisler@linux.intel.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).