From: Adam Manzanares <Adam.Manzanares@wdc.com>
To: "jglisse@redhat.com" <jglisse@redhat.com>,
"lsf-pc@lists.linux-foundation.org"
<lsf-pc@lists.linux-foundation.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: [LSF/MM TOPIC] Direct block mapping through fs for device
Date: Fri, 26 Apr 2019 20:28:32 +0000 [thread overview]
Message-ID: <b24c6f711d2e23792d6577a4ca508d75b0af4d9e.camel@wdc.com> (raw)
In-Reply-To: <20190426013814.GB3350@redhat.com>
On Thu, 2019-04-25 at 21:38 -0400, Jerome Glisse wrote:
> I see that they are still empty spot in LSF/MM schedule so i would
> like to
> have a discussion on allowing direct block mapping of file for
> devices (nic,
> gpu, fpga, ...). This is mm, fs and block discussion, thought the mm
> side
> is pretty light ie only adding 2 callback to vm_operations_struct:
>
> int (*device_map)(struct vm_area_struct *vma,
> struct device *importer,
> struct dma_buf **bufp,
> unsigned long start,
> unsigned long end,
> unsigned flags,
> dma_addr_t *pa);
>
> // Some flags i can think of:
> DEVICE_MAP_FLAG_PIN // ie return a dma_buf object
> DEVICE_MAP_FLAG_WRITE // importer want to be able to write
> DEVICE_MAP_FLAG_SUPPORT_ATOMIC_OP // importer want to do atomic
> operation
> // on the mapping
>
> void (*device_unmap)(struct vm_area_struct *vma,
> struct device *importer,
> unsigned long start,
> unsigned long end,
> dma_addr_t *pa);
>
> Each filesystem could add this callback and decide wether or not to
> allow
> the importer to directly map block. Filesystem can use what ever
> logic they
> want to make that decision. For instance if they are page in the page
> cache
> for the range then it can say no and the device would fallback to
> main
> memory. Filesystem can also update its internal data structure to
> keep
> track of direct block mapping.
>
> If filesystem decide to allow the direct block mapping then it
> forward the
> request to the block device which itself can decide to forbid the
> direct
> mapping again for any reasons. For instance running out of BAR space
> or
> peer to peer between block device and importer device is not
> supported or
> block device does not want to allow writeable peer mapping ...
>
>
> So event flow is:
> 1 program mmap a file (end never intend to access it with CPU)
> 2 program try to access the mmap from a device A
> 3 device A driver see device_map callback on the vma and call it
> 4a on success device A driver program the device to mapped dma
> address
> 4b on failure device A driver fallback to faulting so that it can
> use
> page from page cache
>
> This API assume that the importer does support mmu notifier and thus
> that
> the fs can invalidate device mapping at _any_ time by sending mmu
> notifier
> to all mapping of the file (for a given range in the file or for the
> whole
> file). Obviously you want to minimize disruption and thus only
> invalidate
> when necessary.
>
> The dma_buf parameter can be use to add pinning support for
> filesystem who
> wish to support that case too. Here the mapping lifetime get
> disconnected
> from the vma and is transfer to the dma_buf allocated by filesystem.
> Again
> filesystem can decide to say no as pinning blocks has drastic
> consequence
> for filesystem and block device.
>
>
> This has some similarities to the hmmap and caching topic (which is
> mapping
> block directly to CPU AFAIU) but device mapping can cut some corner
> for
> instance some device can forgo atomic operation on such mapping and
> thus
> can work over PCIE while CPU can not do atomic to PCIE BAR.
>
> Also this API here can be use to allow peer to peer access between
> devices
> when the vma is a mmap of a device file and thus vm_operations_struct
> come
> from some exporter device driver. So same 2 vm_operations_struct call
> back
> can be use in more cases than what i just described here.
>
>
> So i would like to gather people feedback on general approach and few
> things
> like:
> - Do block device need to be able to invalidate such mapping too
> ?
>
> It is easy for fs the to invalidate as it can walk file
> mappings
> but block device do not know about file.
>
> - Do we want to provide some generic implementation to share
> accross
> fs ?
>
> - Maybe some share helpers for block devices that could track
> file
> corresponding to peer mapping ?
I'm interested in being a part of this discussion.
>
>
> Cheers,
> Jérôme
prev parent reply other threads:[~2019-04-26 20:29 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-26 1:38 [LSF/MM TOPIC] Direct block mapping through fs for device Jerome Glisse
2019-04-26 6:28 ` Dave Chinner
2019-04-26 12:45 ` Christoph Hellwig
2019-04-26 14:45 ` Darrick J. Wong
2019-04-26 14:47 ` Christoph Hellwig
2019-04-26 15:20 ` Jerome Glisse
2019-04-27 1:25 ` Dave Chinner
2019-04-29 13:26 ` Jerome Glisse
2019-05-01 23:47 ` Dave Chinner
2019-05-02 1:52 ` Matthew Wilcox
2019-04-26 20:28 ` Adam Manzanares [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b24c6f711d2e23792d6577a4ca508d75b0af4d9e.camel@wdc.com \
--to=adam.manzanares@wdc.com \
--cc=jglisse@redhat.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).