From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: Stephen Bates <sbates@raithlin.com>,
Dan Williams <dan.j.williams@intel.com>,
Haggai Eran <haggaie@mellanox.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
"linux-nvdimm@ml01.01.org" <linux-nvdimm@ml01.01.org>,
"christian.koenig@amd.com" <christian.koenig@amd.com>,
"Suravee.Suthikulpanit@amd.com" <suravee.suthikulpanit@amd.com>,
"John.Bridgman@amd.com" <john.bridgman@amd.com>,
"Alexander.Deucher@amd.com" <alexander.deucher@amd.com>,
"Linux-media@vger.kernel.org" <linux-media@vger.kernel.org>,
"dri-devel@lists.freedesktop.org"
<dri-devel@lists.freedesktop.org>,
Max Gurtovoy <maxg@mellanox.com>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"serguei.sagalovitch@amd.com" <serguei.sagalovitch@amd.com>,
"Paul.Blinzer@amd.com" <paul.blinzer@amd.com>,
"Felix.Kuehling@amd.com" <felix.kuehling@amd.com>,
"ben.sander@amd.com" <ben.sander@amd.com>
Subject: Re: Enabling peer to peer device transactions for PCIe devices
Date: Tue, 6 Dec 2016 10:28:38 -0700 [thread overview]
Message-ID: <20161206172838.GB19318@obsidianresearch.com> (raw)
In-Reply-To: <ec136c34-417d-8a55-c176-2c1d759a5fb8@deltatee.com>
On Tue, Dec 06, 2016 at 09:51:15AM -0700, Logan Gunthorpe wrote:
> Hey,
>
> On 06/12/16 09:38 AM, Jason Gunthorpe wrote:
> >>> I'm not opposed to mapping /dev/nvmeX. However, the lookup is trivial
> >>> to accomplish in sysfs through /sys/dev/char to find the sysfs path of the
> >>> device-dax instance under the nvme device, or if you already have the nvme
> >>> sysfs path the dax instance(s) will appear under the "dax" sub-directory.
> >>
> >> Personally I think mapping the dax resource in the sysfs tree is a nice
> >> way to do this and a bit more intuitive than mapping a /dev/nvmeX.
> >
> > It is still not at all clear to me what userpsace is supposed to do
> > with this on nvme.. How is the CMB usable from userspace?
>
> The flow is pretty simple. For example to write to NVMe from an RDMA device:
>
> 1) Obtain a chunk of the CMB to use as a buffer(either by mmaping
> /dev/nvmx, the device dax char device or through a block layer interface
> (which sounds like a good suggestion from Christoph, but I'm not really
> sure how it would look).
Okay, so clearly this needs a kernel side NVMe specific allocator
and locking so users don't step on each other..
Or as Christoph says some kind of general mechanism to get these
bounce buffers..
> 2) Create an MR with the buffer and use an RDMA function to fill it with
> data from a remote host. This will cause the RDMA hardware to write
> directly to the memory in the NVMe card.
>
> 3) Using O_DIRECT, write the buffer to a file on the NVMe filesystem.
> When the address reaches hardware the NVMe will recognize it as local
> memory and copy it directly there.
Ah, I see.
As a first draft I'd stick with some kind of API built into the
/dev/nvmeX that backs the filesystem. The user app would fstat the
target file, open /dev/block/MAJOR(st_dev):MINOR(st_dev), do some
ioctl to get a CMB mmap, and then proceed from there..
When that is all working kernel-side, it would make sense to look at a
more general mechanism that could be used unprivileged??
> Thus we are able to transfer data to any file on an NVMe device without
> going through system memory. This has benefits on systems with lots of
> activity in system memory but step 3 is likely to be slowish due to the
> need to pin/unpin the memory for every transaction.
This is similar to the GPU issues too.. On NVMe you don't need to pin
the pages, you just need to lock that VMA so it doesn't get freed from
the NVMe CMB allocator while the IO is running...
Probably in the long run the get_user_pages is going to have to be
pushed down into drivers.. Future MMU coherent IO hardware also does
not need the pinning or other overheads.
Jason
next prev parent reply other threads:[~2016-12-06 17:28 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-21 20:36 Enabling peer to peer device transactions for PCIe devices Deucher, Alexander
2016-11-22 18:11 ` Dan Williams
[not found] ` <75a1f44f-c495-7d1e-7e1c-17e89555edba@amd.com>
2016-11-22 20:01 ` Dan Williams
2016-11-22 20:10 ` Daniel Vetter
2016-11-22 20:24 ` Dan Williams
2016-11-22 20:35 ` Serguei Sagalovitch
2016-11-22 21:03 ` Daniel Vetter
2016-11-22 21:21 ` Dan Williams
2016-11-22 22:21 ` Sagalovitch, Serguei
2016-11-23 7:49 ` Daniel Vetter
2016-11-23 8:51 ` Christian König
2016-11-23 19:27 ` Serguei Sagalovitch
2016-11-23 17:03 ` Dave Hansen
2016-11-23 17:13 ` Logan Gunthorpe
2016-11-23 17:27 ` Bart Van Assche
2016-11-23 18:40 ` Dan Williams
2016-11-23 19:12 ` Jason Gunthorpe
2016-11-23 19:24 ` Serguei Sagalovitch
2016-11-23 19:06 ` Serguei Sagalovitch
2016-11-23 19:05 ` Jason Gunthorpe
2016-11-23 19:14 ` Serguei Sagalovitch
2016-11-23 19:32 ` Jason Gunthorpe
[not found] ` <c2c88376-5ba7-37d1-4d3e-592383ebb00a@amd.com>
2016-11-23 20:33 ` Jason Gunthorpe
2016-11-23 21:11 ` Logan Gunthorpe
2016-11-23 21:55 ` Jason Gunthorpe
2016-11-23 22:42 ` Dan Williams
2016-11-23 23:25 ` Jason Gunthorpe
2016-11-24 9:45 ` Christian König
2016-11-24 16:26 ` Jason Gunthorpe
2016-11-24 17:00 ` Serguei Sagalovitch
2016-11-24 17:55 ` Logan Gunthorpe
2016-11-25 13:06 ` Christian König
2016-11-25 16:45 ` Logan Gunthorpe
2016-11-25 17:20 ` Serguei Sagalovitch
2016-11-25 20:26 ` Felix Kuehling
2016-11-25 20:48 ` Serguei Sagalovitch
2016-11-24 0:40 ` Sagalovitch, Serguei
2016-11-24 16:24 ` Jason Gunthorpe
2016-11-24 1:25 ` Logan Gunthorpe
2016-11-24 16:42 ` Jason Gunthorpe
2016-11-24 18:11 ` Logan Gunthorpe
2016-11-25 7:58 ` Christoph Hellwig
2016-11-25 19:41 ` Jason Gunthorpe
2016-11-25 17:59 ` Serguei Sagalovitch
2016-11-25 13:22 ` Christian König
2016-11-25 17:16 ` Serguei Sagalovitch
2016-11-25 19:34 ` Jason Gunthorpe
2016-11-25 19:49 ` Serguei Sagalovitch
2016-11-25 20:19 ` Jason Gunthorpe
2016-11-25 23:41 ` Alex Deucher
2016-11-25 19:32 ` Jason Gunthorpe
2016-11-25 20:40 ` Christian König
2016-11-25 20:51 ` Felix Kuehling
2016-11-25 21:18 ` Jason Gunthorpe
2016-11-27 8:16 ` Haggai Eran
2016-11-27 14:02 ` Haggai Eran
2016-11-27 14:07 ` Christian König
2016-11-28 5:31 ` zhoucm1
2016-11-28 14:48 ` Serguei Sagalovitch
2016-11-28 18:36 ` Haggai Eran
2016-11-28 16:57 ` Jason Gunthorpe
2016-11-28 18:19 ` Haggai Eran
2016-11-28 19:02 ` Jason Gunthorpe
2016-11-30 10:45 ` Haggai Eran
2016-11-30 16:23 ` Jason Gunthorpe
2016-11-30 17:28 ` Serguei Sagalovitch
2016-12-04 7:33 ` Haggai Eran
2016-11-30 18:01 ` Logan Gunthorpe
2016-12-04 7:42 ` Haggai Eran
2016-12-04 13:06 ` Stephen Bates
2016-12-04 13:23 ` Stephen Bates
2016-12-05 17:18 ` Jason Gunthorpe
2016-12-05 17:40 ` Dan Williams
2016-12-05 18:02 ` Jason Gunthorpe
2016-12-05 18:08 ` Dan Williams
2016-12-05 18:39 ` Logan Gunthorpe
2016-12-05 18:48 ` Dan Williams
2016-12-05 19:14 ` Jason Gunthorpe
2016-12-05 19:27 ` Logan Gunthorpe
2016-12-05 19:46 ` Jason Gunthorpe
2016-12-05 19:59 ` Logan Gunthorpe
2016-12-05 20:06 ` Christoph Hellwig
2016-12-06 8:06 ` Stephen Bates
2016-12-06 16:38 ` Jason Gunthorpe
2016-12-06 16:51 ` Logan Gunthorpe
2016-12-06 17:28 ` Jason Gunthorpe [this message]
2016-12-06 21:47 ` Logan Gunthorpe
2016-12-06 22:02 ` Dan Williams
2016-12-06 17:12 ` Christoph Hellwig
2016-12-04 7:53 ` Haggai Eran
2016-11-30 17:10 ` Deucher, Alexander
2016-11-28 18:20 ` Logan Gunthorpe
2016-11-28 19:35 ` Serguei Sagalovitch
2016-11-28 21:36 ` Logan Gunthorpe
2016-11-28 21:55 ` Serguei Sagalovitch
2016-11-28 22:24 ` Jason Gunthorpe
2017-01-05 18:39 ` Jerome Glisse
2017-01-05 19:01 ` Jason Gunthorpe
2017-01-05 19:54 ` Jerome Glisse
2017-01-05 20:07 ` Jason Gunthorpe
2017-01-05 20:19 ` Jerome Glisse
2017-01-05 22:42 ` Jason Gunthorpe
2017-01-05 23:23 ` Jerome Glisse
2017-01-06 0:30 ` Jason Gunthorpe
2017-01-06 0:41 ` Serguei Sagalovitch
2017-01-06 1:58 ` Jerome Glisse
2017-01-06 16:56 ` Serguei Sagalovitch
2017-01-06 17:37 ` Jerome Glisse
2017-01-06 18:26 ` Jason Gunthorpe
2017-01-06 19:12 ` Deucher, Alexander
2017-01-06 22:10 ` Logan Gunthorpe
2017-01-12 4:54 ` Stephen Bates
2017-01-12 15:11 ` Jerome Glisse
2017-01-12 17:17 ` Jason Gunthorpe
2017-01-13 13:04 ` Christian König
2017-01-12 22:35 ` Logan Gunthorpe
2017-01-06 15:08 ` Henrique Almeida
2017-10-20 12:36 ` Ludwig Petrosyan
2017-10-20 15:48 ` Logan Gunthorpe
2017-10-22 6:13 ` Petrosyan, Ludwig
2017-10-22 17:19 ` Logan Gunthorpe
2017-10-23 16:08 ` David Laight
2017-10-23 22:04 ` Logan Gunthorpe
2017-10-24 5:58 ` Petrosyan, Ludwig
2017-10-24 14:58 ` David Laight
2017-10-26 13:28 ` Petrosyan, Ludwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161206172838.GB19318@obsidianresearch.com \
--to=jgunthorpe@obsidianresearch.com \
--cc=alexander.deucher@amd.com \
--cc=ben.sander@amd.com \
--cc=christian.koenig@amd.com \
--cc=dan.j.williams@intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=felix.kuehling@amd.com \
--cc=haggaie@mellanox.com \
--cc=john.bridgman@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=linux-nvdimm@ml01.01.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=maxg@mellanox.com \
--cc=paul.blinzer@amd.com \
--cc=sbates@raithlin.com \
--cc=serguei.sagalovitch@amd.com \
--cc=suravee.suthikulpanit@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).