Re: [LSF/MM TOPIC] Direct block mapping through fs for device

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Adam Manzanares <Adam.Manzanares@wdc.com>
To: "jglisse@redhat.com" <jglisse@redhat.com>,
	"lsf-pc@lists.linux-foundation.org" 
	<lsf-pc@lists.linux-foundation.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: [LSF/MM TOPIC] Direct block mapping through fs for device
Date: Fri, 26 Apr 2019 20:28:32 +0000	[thread overview]
Message-ID: <b24c6f711d2e23792d6577a4ca508d75b0af4d9e.camel@wdc.com> (raw)
In-Reply-To: <20190426013814.GB3350@redhat.com>

On Thu, 2019-04-25 at 21:38 -0400, Jerome Glisse wrote:
> I see that they are still empty spot in LSF/MM schedule so i would
> like to
> have a discussion on allowing direct block mapping of file for
> devices (nic,
> gpu, fpga, ...). This is mm, fs and block discussion, thought the mm
> side
> is pretty light ie only adding 2 callback to vm_operations_struct:
> 
>     int (*device_map)(struct vm_area_struct *vma,
>                       struct device *importer,
>                       struct dma_buf **bufp,
>                       unsigned long start,
>                       unsigned long end,
>                       unsigned flags,
>                       dma_addr_t *pa);
> 
>     // Some flags i can think of:
>     DEVICE_MAP_FLAG_PIN // ie return a dma_buf object
>     DEVICE_MAP_FLAG_WRITE // importer want to be able to write
>     DEVICE_MAP_FLAG_SUPPORT_ATOMIC_OP // importer want to do atomic
> operation
>                                       // on the mapping
> 
>     void (*device_unmap)(struct vm_area_struct *vma,
>                          struct device *importer,
>                          unsigned long start,
>                          unsigned long end,
>                          dma_addr_t *pa);
> 
> Each filesystem could add this callback and decide wether or not to
> allow
> the importer to directly map block. Filesystem can use what ever
> logic they
> want to make that decision. For instance if they are page in the page
> cache
> for the range then it can say no and the device would fallback to
> main
> memory. Filesystem can also update its internal data structure to
> keep
> track of direct block mapping.
> 
> If filesystem decide to allow the direct block mapping then it
> forward the
> request to the block device which itself can decide to forbid the
> direct
> mapping again for any reasons. For instance running out of BAR space
> or
> peer to peer between block device and importer device is not
> supported or
> block device does not want to allow writeable peer mapping ...
> 
> 
> So event flow is:
>     1  program mmap a file (end never intend to access it with CPU)
>     2  program try to access the mmap from a device A
>     3  device A driver see device_map callback on the vma and call it
>     4a on success device A driver program the device to mapped dma
> address
>     4b on failure device A driver fallback to faulting so that it can
> use
>        page from page cache
> 
> This API assume that the importer does support mmu notifier and thus
> that
> the fs can invalidate device mapping at _any_ time by sending mmu
> notifier
> to all mapping of the file (for a given range in the file or for the
> whole
> file). Obviously you want to minimize disruption and thus only
> invalidate
> when necessary.
> 
> The dma_buf parameter can be use to add pinning support for
> filesystem who
> wish to support that case too. Here the mapping lifetime get
> disconnected
> from the vma and is transfer to the dma_buf allocated by filesystem.
> Again
> filesystem can decide to say no as pinning blocks has drastic
> consequence
> for filesystem and block device.
> 
> 
> This has some similarities to the hmmap and caching topic (which is
> mapping
> block directly to CPU AFAIU) but device mapping can cut some corner
> for
> instance some device can forgo atomic operation on such mapping and
> thus
> can work over PCIE while CPU can not do atomic to PCIE BAR.
> 
> Also this API here can be use to allow peer to peer access between
> devices
> when the vma is a mmap of a device file and thus vm_operations_struct
> come
> from some exporter device driver. So same 2 vm_operations_struct call
> back
> can be use in more cases than what i just described here.
> 
> 
> So i would like to gather people feedback on general approach and few
> things
> like:
>     - Do block device need to be able to invalidate such mapping too
> ?
> 
>       It is easy for fs the to invalidate as it can walk file
> mappings
>       but block device do not know about file.
> 
>     - Do we want to provide some generic implementation to share
> accross
>       fs ?
> 
>     - Maybe some share helpers for block devices that could track
> file
>       corresponding to peer mapping ?

I'm interested in being a part of this discussion.

> 
> 
> Cheers,
> Jérôme

     prev parent reply	other threads:[~2019-04-26 20:29 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-26  1:38 [LSF/MM TOPIC] Direct block mapping through fs for device Jerome Glisse
2019-04-26  6:28 ` Dave Chinner
2019-04-26 12:45   ` Christoph Hellwig
2019-04-26 14:45     ` Darrick J. Wong
2019-04-26 14:47       ` Christoph Hellwig
2019-04-26 15:20   ` Jerome Glisse
2019-04-27  1:25     ` Dave Chinner
2019-04-29 13:26       ` Jerome Glisse
2019-05-01 23:47         ` Dave Chinner
2019-05-02  1:52         ` Matthew Wilcox
2019-04-26 20:28 ` Adam Manzanares [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b24c6f711d2e23792d6577a4ca508d75b0af4d9e.camel@wdc.com \
    --to=adam.manzanares@wdc.com \
    --cc=jglisse@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).