From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
To: "Christian König" <christian.koenig@amd.com>
Cc: Logan Gunthorpe <logang@deltatee.com>,
Serguei Sagalovitch <serguei.sagalovitch@amd.com>,
Dan Williams <dan.j.williams@intel.com>,
"Deucher, Alexander" <Alexander.Deucher@amd.com>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"Kuehling, Felix" <Felix.Kuehling@amd.com>,
"Bridgman, John" <John.Bridgman@amd.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"dri-devel@lists.freedesktop.org"
<dri-devel@lists.freedesktop.org>,
"Sander, Ben" <ben.sander@amd.com>,
"Suthikulpanit, Suravee" <Suravee.Suthikulpanit@amd.com>,
"Blinzer, Paul" <Paul.Blinzer@amd.com>,
"Linux-media@vger.kernel.org" <Linux-media@vger.kernel.org>,
Haggai Eran <haggaie@mellanox.com>
Subject: Re: Enabling peer to peer device transactions for PCIe devices
Date: Fri, 25 Nov 2016 12:32:52 -0700 [thread overview]
Message-ID: <20161125193252.GC16504@obsidianresearch.com> (raw)
In-Reply-To: <3f2d2db3-fb75-2422-2a18-a8497fd5d70e@amd.com>
On Fri, Nov 25, 2016 at 02:22:17PM +0100, Christian König wrote:
> >Like you say below we have to handle short lived in the usual way, and
> >that covers basically every device except IB MRs, including the
> >command queue on a NVMe drive.
>
> Well a problem which wasn't mentioned so far is that while GPUs do have a
> page table to mirror the CPU page table, they usually can't recover from
> page faults.
> So what we do is making sure that all memory accessed by the GPU Jobs stays
> in place while those jobs run (pretty much the same pinning you do for the
> DMA).
Yes, it is DMA, so this is a valid approach.
But, you don't need page faults from the GPU to do proper coherent
page table mirroring. Basically when the driver submits the work to
the GPU it 'faults' the pages into the CPU and mirror translation
table (instead of pinning).
Like in ODP, MMU notifiers/HMM are used to monitor for translation
changes. If a change comes in the GPU driver checks if an executing
command is touching those pages and blocks the MMU notifier until the
command flushes, then unfaults the page (blocking future commands) and
unblocks the mmu notifier.
The code moving the page will move it and the next GPU command that
needs it will refault it in the usual way, just like the CPU would.
This might be much more efficient since it optimizes for the common
case of unchanging translation tables.
This assumes the commands are fairly short lived of course, the
expectation of the mmu notifiers is that a flush is reasonably prompt
..
> >Serguei, what is your plan in GPU land for migration? Ie if I have a
> >CPU mapped page and the GPU moves it to VRAM, it becomes non-cachable
> >- do you still allow the CPU to access it? Or do you swap it back to
> >cachable memory if the CPU touches it?
>
> Depends on the policy in command, but currently it's the other way around
> most of the time.
>
> E.g. we allocate memory in VRAM, the CPU writes to it WC and avoids reading
> because that is slow, the GPU in turn can access it with full speed.
>
> When we run out of VRAM we move those allocations to system memory and
> update both the CPU as well as the GPU page tables.
>
> So that move is transparent for both userspace as well as shaders running on
> the GPU.
That makes sense to me, but the objection that came back for
non-cachable CPU mappings is that it basically breaks too much stuff
subtly, eg atomics, unaligned accesses, the CPU threading memory
model, all change on various architectures and break when caching is
disabled.
IMHO that is OK for specialty things like the GPU where the mmap comes
in via drm or something and apps know to handle that buffer specially.
But it is certainly not OK for DAX where the application is coded for
normal file open()/mmap() is not prepared for a mmap where (eg)
unaligned read accesses or atomics don't work depending on how the
filesystem is setup.
Which is why I think iopmem is still problematic..
At the very least I think a mmap flag or open flag should be needed to
opt into this behavior and by default non-cachebale DAX mmaps should
be paged into system ram when the CPU accesses them.
I'm hearing most people say ZONE_DEVICE is the way to handle this,
which means the missing remaing piece for RDMA is some kind of DMA
core support for p2p address translation..
Jason
next prev parent reply other threads:[~2016-11-25 19:33 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-21 20:36 Enabling peer to peer device transactions for PCIe devices Deucher, Alexander
2016-11-22 18:11 ` Dan Williams
[not found] ` <75a1f44f-c495-7d1e-7e1c-17e89555edba@amd.com>
2016-11-22 20:01 ` Dan Williams
2016-11-22 20:10 ` Daniel Vetter
2016-11-22 20:24 ` Dan Williams
2016-11-22 20:35 ` Serguei Sagalovitch
2016-11-22 21:03 ` Daniel Vetter
2016-11-22 21:21 ` Dan Williams
2016-11-22 22:21 ` Sagalovitch, Serguei
2016-11-23 7:49 ` Daniel Vetter
2016-11-23 8:51 ` Christian König
2016-11-23 19:27 ` Serguei Sagalovitch
2016-11-23 17:03 ` Dave Hansen
2016-11-23 17:13 ` Logan Gunthorpe
2016-11-23 17:27 ` Bart Van Assche
2016-11-23 18:40 ` Dan Williams
2016-11-23 19:12 ` Jason Gunthorpe
2016-11-23 19:24 ` Serguei Sagalovitch
2016-11-23 19:06 ` Serguei Sagalovitch
2016-11-23 19:05 ` Jason Gunthorpe
2016-11-23 19:14 ` Serguei Sagalovitch
2016-11-23 19:32 ` Jason Gunthorpe
[not found] ` <c2c88376-5ba7-37d1-4d3e-592383ebb00a@amd.com>
2016-11-23 20:33 ` Jason Gunthorpe
2016-11-23 21:11 ` Logan Gunthorpe
2016-11-23 21:55 ` Jason Gunthorpe
2016-11-23 22:42 ` Dan Williams
2016-11-23 23:25 ` Jason Gunthorpe
2016-11-24 9:45 ` Christian König
2016-11-24 16:26 ` Jason Gunthorpe
2016-11-24 17:00 ` Serguei Sagalovitch
2016-11-24 17:55 ` Logan Gunthorpe
2016-11-25 13:06 ` Christian König
2016-11-25 16:45 ` Logan Gunthorpe
2016-11-25 17:20 ` Serguei Sagalovitch
2016-11-25 20:26 ` Felix Kuehling
2016-11-25 20:48 ` Serguei Sagalovitch
2016-11-24 0:40 ` Sagalovitch, Serguei
2016-11-24 16:24 ` Jason Gunthorpe
2016-11-24 1:25 ` Logan Gunthorpe
2016-11-24 16:42 ` Jason Gunthorpe
2016-11-24 18:11 ` Logan Gunthorpe
2016-11-25 7:58 ` Christoph Hellwig
2016-11-25 19:41 ` Jason Gunthorpe
2016-11-25 17:59 ` Serguei Sagalovitch
2016-11-25 13:22 ` Christian König
2016-11-25 17:16 ` Serguei Sagalovitch
2016-11-25 19:34 ` Jason Gunthorpe
2016-11-25 19:49 ` Serguei Sagalovitch
2016-11-25 20:19 ` Jason Gunthorpe
2016-11-25 23:41 ` Alex Deucher
2016-11-25 19:32 ` Jason Gunthorpe [this message]
2016-11-25 20:40 ` Christian König
2016-11-25 20:51 ` Felix Kuehling
2016-11-25 21:18 ` Jason Gunthorpe
2016-11-27 8:16 ` Haggai Eran
2016-11-27 14:02 ` Haggai Eran
2016-11-27 14:07 ` Christian König
2016-11-28 5:31 ` zhoucm1
2016-11-28 14:48 ` Serguei Sagalovitch
2016-11-28 18:36 ` Haggai Eran
2016-11-28 16:57 ` Jason Gunthorpe
2016-11-28 18:19 ` Haggai Eran
2016-11-28 19:02 ` Jason Gunthorpe
2016-11-30 10:45 ` Haggai Eran
2016-11-30 16:23 ` Jason Gunthorpe
2016-11-30 17:28 ` Serguei Sagalovitch
2016-12-04 7:33 ` Haggai Eran
2016-11-30 18:01 ` Logan Gunthorpe
2016-12-04 7:42 ` Haggai Eran
2016-12-04 13:06 ` Stephen Bates
2016-12-04 13:23 ` Stephen Bates
2016-12-05 17:18 ` Jason Gunthorpe
2016-12-05 17:40 ` Dan Williams
2016-12-05 18:02 ` Jason Gunthorpe
2016-12-05 18:08 ` Dan Williams
2016-12-05 18:39 ` Logan Gunthorpe
2016-12-05 18:48 ` Dan Williams
2016-12-05 19:14 ` Jason Gunthorpe
2016-12-05 19:27 ` Logan Gunthorpe
2016-12-05 19:46 ` Jason Gunthorpe
2016-12-05 19:59 ` Logan Gunthorpe
2016-12-05 20:06 ` Christoph Hellwig
2016-12-06 8:06 ` Stephen Bates
2016-12-06 16:38 ` Jason Gunthorpe
2016-12-06 16:51 ` Logan Gunthorpe
2016-12-06 17:28 ` Jason Gunthorpe
2016-12-06 21:47 ` Logan Gunthorpe
2016-12-06 22:02 ` Dan Williams
2016-12-06 17:12 ` Christoph Hellwig
2016-12-04 7:53 ` Haggai Eran
2016-11-30 17:10 ` Deucher, Alexander
2016-11-28 18:20 ` Logan Gunthorpe
2016-11-28 19:35 ` Serguei Sagalovitch
2016-11-28 21:36 ` Logan Gunthorpe
2016-11-28 21:55 ` Serguei Sagalovitch
2016-11-28 22:24 ` Jason Gunthorpe
2017-01-05 18:39 ` Jerome Glisse
2017-01-05 19:01 ` Jason Gunthorpe
2017-01-05 19:54 ` Jerome Glisse
2017-01-05 20:07 ` Jason Gunthorpe
2017-01-05 20:19 ` Jerome Glisse
2017-01-05 22:42 ` Jason Gunthorpe
2017-01-05 23:23 ` Jerome Glisse
2017-01-06 0:30 ` Jason Gunthorpe
2017-01-06 0:41 ` Serguei Sagalovitch
2017-01-06 1:58 ` Jerome Glisse
2017-01-06 16:56 ` Serguei Sagalovitch
2017-01-06 17:37 ` Jerome Glisse
2017-01-06 18:26 ` Jason Gunthorpe
2017-01-06 19:12 ` Deucher, Alexander
2017-01-06 22:10 ` Logan Gunthorpe
2017-01-12 4:54 ` Stephen Bates
2017-01-12 15:11 ` Jerome Glisse
2017-01-12 17:17 ` Jason Gunthorpe
2017-01-13 13:04 ` Christian König
2017-01-12 22:35 ` Logan Gunthorpe
2017-01-06 15:08 ` Henrique Almeida
2017-10-20 12:36 ` Ludwig Petrosyan
2017-10-20 15:48 ` Logan Gunthorpe
2017-10-22 6:13 ` Petrosyan, Ludwig
2017-10-22 17:19 ` Logan Gunthorpe
2017-10-23 16:08 ` David Laight
2017-10-23 22:04 ` Logan Gunthorpe
2017-10-24 5:58 ` Petrosyan, Ludwig
2017-10-24 14:58 ` David Laight
2017-10-26 13:28 ` Petrosyan, Ludwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161125193252.GC16504@obsidianresearch.com \
--to=jgunthorpe@obsidianresearch.com \
--cc=Alexander.Deucher@amd.com \
--cc=Felix.Kuehling@amd.com \
--cc=John.Bridgman@amd.com \
--cc=Linux-media@vger.kernel.org \
--cc=Paul.Blinzer@amd.com \
--cc=Suravee.Suthikulpanit@amd.com \
--cc=ben.sander@amd.com \
--cc=christian.koenig@amd.com \
--cc=dan.j.williams@intel.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=haggaie@mellanox.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@ml01.01.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=serguei.sagalovitch@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).