From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
To: "Christian König" <deathsimple@vodafone.de>
Cc: "Christian König" <christian.koenig@amd.com>,
"Haggai Eran" <haggaie@mellanox.com>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@ml01.01.org>,
"Kuehling, Felix" <Felix.Kuehling@amd.com>,
"Serguei Sagalovitch" <serguei.sagalovitch@amd.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"dri-devel@lists.freedesktop.org"
<dri-devel@lists.freedesktop.org>,
"Blinzer, Paul" <Paul.Blinzer@amd.com>,
"Suthikulpanit, Suravee" <Suravee.Suthikulpanit@amd.com>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"Deucher, Alexander" <Alexander.Deucher@amd.com>,
"Dan Williams" <dan.j.williams@intel.com>,
"Logan Gunthorpe" <logang@deltatee.com>,
"Sander, Ben" <ben.sander@amd.com>,
"Linux-media@vger.kernel.org" <Linux-media@vger.kernel.org>
Subject: Re: Enabling peer to peer device transactions for PCIe devices
Date: Fri, 25 Nov 2016 14:18:46 -0700 [thread overview]
Message-ID: <20161125211846.GA22521@obsidianresearch.com> (raw)
In-Reply-To: <a98185d9-ffb1-6469-4272-2d1222600825@vodafone.de>
On Fri, Nov 25, 2016 at 09:40:10PM +0100, Christian König wrote:
> We call this "userptr" and it's just a combination of get_user_pages() on
> command submission and making sure the returned list of pages stays valid
> using a MMU notifier.
Doesn't that still pin the page?
> The "big" problem with this approach is that it is horrible slow. I mean
> seriously horrible slow so that we actually can't use it for some of the
> purposes we wanted to use it.
>
> >The code moving the page will move it and the next GPU command that
> >needs it will refault it in the usual way, just like the CPU would.
>
> And here comes the problem. CPU do this on a page by page basis, so they
> fault only what needed and everything else gets filled in on demand. This
> results that faulting a page is relatively light weight operation.
>
> But for GPU command submission we don't know which pages might be accessed
> beforehand, so what we do is walking all possible pages and make sure all of
> them are present.
Little confused why this is slow? So you fault the entire user MM into
your page tables at start of day and keep track of it with mmu
notifiers?
> >This might be much more efficient since it optimizes for the common
> >case of unchanging translation tables.
>
> Yeah, completely agree. It works perfectly fine as long as you don't have
> two drivers trying to mess with the same page.
Well, the idea would be to not have the GPU block the other driver
beyond hinting that the page shouldn't be swapped out.
> >This assumes the commands are fairly short lived of course, the
> >expectation of the mmu notifiers is that a flush is reasonably prompt
>
> Correct, this is another problem. GFX command submissions usually don't take
> longer than a few milliseconds, but compute command submission can easily
> take multiple hours.
So, that won't work - you have the same issue as RDMA with work loads
like that.
If you can't somehow fence the hardware then pinning is the only
solution. Felix has the right kind of suggestion for what is needed -
globally stop the GPU, fence the DMA, fix the page tables, and start
it up again. :\
> I can easily imagine what would happen when kswapd is blocked by a GPU
> command submission for an hour or so while the system is under memory
> pressure :)
Right. The advantage of pinning is it tells the other stuff not to
touch the page and doesn't block it, MMU notifiers have to be able to
block&fence quickly.
> I'm thinking on this problem for about a year now and going in circles for
> quite a while. So if you have ideas on this even if they sound totally
> crazy, feel free to come up.
Well, it isn't a software problem. From what I've seen in this thread
the GPU application requires coherent page table mirroring, so the
only full & complete solution is going to be to actually implement
that somehow in GPU hardware.
Everything else is going to be deeply flawed somehow. Linux just
doesn't have the support for this kind of stuff - and I'm honestly not
sure something better is even possible considering the hardware
constraints....
This doesn't have to be faulting, but really anything that lets you
pause the GPU DMA and reload the page tables.
You might look at trying to use the IOMMU and/or PCI ATS in very new
hardware. IIRC the physical IOMMU hardware can do the fault and fence
and block stuff, but I'm not sure about software support for using the
IOMMU to create coherent user page table mirrors - that is something
Linux doesn't do today. But there is demand for this kind of capability..
Jason
next prev parent reply other threads:[~2016-11-25 21:19 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-21 20:36 Enabling peer to peer device transactions for PCIe devices Deucher, Alexander
2016-11-22 18:11 ` Dan Williams
[not found] ` <75a1f44f-c495-7d1e-7e1c-17e89555edba@amd.com>
2016-11-22 20:01 ` Dan Williams
2016-11-22 20:10 ` Daniel Vetter
2016-11-22 20:24 ` Dan Williams
2016-11-22 20:35 ` Serguei Sagalovitch
2016-11-22 21:03 ` Daniel Vetter
2016-11-22 21:21 ` Dan Williams
2016-11-22 22:21 ` Sagalovitch, Serguei
2016-11-23 7:49 ` Daniel Vetter
2016-11-23 8:51 ` Christian König
2016-11-23 19:27 ` Serguei Sagalovitch
2016-11-23 17:03 ` Dave Hansen
2016-11-23 17:13 ` Logan Gunthorpe
2016-11-23 17:27 ` Bart Van Assche
2016-11-23 18:40 ` Dan Williams
2016-11-23 19:12 ` Jason Gunthorpe
2016-11-23 19:24 ` Serguei Sagalovitch
2016-11-23 19:06 ` Serguei Sagalovitch
2016-11-23 19:05 ` Jason Gunthorpe
2016-11-23 19:14 ` Serguei Sagalovitch
2016-11-23 19:32 ` Jason Gunthorpe
[not found] ` <c2c88376-5ba7-37d1-4d3e-592383ebb00a@amd.com>
2016-11-23 20:33 ` Jason Gunthorpe
2016-11-23 21:11 ` Logan Gunthorpe
2016-11-23 21:55 ` Jason Gunthorpe
2016-11-23 22:42 ` Dan Williams
2016-11-23 23:25 ` Jason Gunthorpe
2016-11-24 9:45 ` Christian König
2016-11-24 16:26 ` Jason Gunthorpe
2016-11-24 17:00 ` Serguei Sagalovitch
2016-11-24 17:55 ` Logan Gunthorpe
2016-11-25 13:06 ` Christian König
2016-11-25 16:45 ` Logan Gunthorpe
2016-11-25 17:20 ` Serguei Sagalovitch
2016-11-25 20:26 ` Felix Kuehling
2016-11-25 20:48 ` Serguei Sagalovitch
2016-11-24 0:40 ` Sagalovitch, Serguei
2016-11-24 16:24 ` Jason Gunthorpe
2016-11-24 1:25 ` Logan Gunthorpe
2016-11-24 16:42 ` Jason Gunthorpe
2016-11-24 18:11 ` Logan Gunthorpe
2016-11-25 7:58 ` Christoph Hellwig
2016-11-25 19:41 ` Jason Gunthorpe
2016-11-25 17:59 ` Serguei Sagalovitch
2016-11-25 13:22 ` Christian König
2016-11-25 17:16 ` Serguei Sagalovitch
2016-11-25 19:34 ` Jason Gunthorpe
2016-11-25 19:49 ` Serguei Sagalovitch
2016-11-25 20:19 ` Jason Gunthorpe
2016-11-25 23:41 ` Alex Deucher
2016-11-25 19:32 ` Jason Gunthorpe
2016-11-25 20:40 ` Christian König
2016-11-25 20:51 ` Felix Kuehling
2016-11-25 21:18 ` Jason Gunthorpe [this message]
2016-11-27 8:16 ` Haggai Eran
2016-11-27 14:02 ` Haggai Eran
2016-11-27 14:07 ` Christian König
2016-11-28 5:31 ` zhoucm1
2016-11-28 14:48 ` Serguei Sagalovitch
2016-11-28 18:36 ` Haggai Eran
2016-11-28 16:57 ` Jason Gunthorpe
2016-11-28 18:19 ` Haggai Eran
2016-11-28 19:02 ` Jason Gunthorpe
2016-11-30 10:45 ` Haggai Eran
2016-11-30 16:23 ` Jason Gunthorpe
2016-11-30 17:28 ` Serguei Sagalovitch
2016-12-04 7:33 ` Haggai Eran
2016-11-30 18:01 ` Logan Gunthorpe
2016-12-04 7:42 ` Haggai Eran
2016-12-04 13:06 ` Stephen Bates
2016-12-04 13:23 ` Stephen Bates
2016-12-05 17:18 ` Jason Gunthorpe
2016-12-05 17:40 ` Dan Williams
2016-12-05 18:02 ` Jason Gunthorpe
2016-12-05 18:08 ` Dan Williams
2016-12-05 18:39 ` Logan Gunthorpe
2016-12-05 18:48 ` Dan Williams
2016-12-05 19:14 ` Jason Gunthorpe
2016-12-05 19:27 ` Logan Gunthorpe
2016-12-05 19:46 ` Jason Gunthorpe
2016-12-05 19:59 ` Logan Gunthorpe
2016-12-05 20:06 ` Christoph Hellwig
2016-12-06 8:06 ` Stephen Bates
2016-12-06 16:38 ` Jason Gunthorpe
2016-12-06 16:51 ` Logan Gunthorpe
2016-12-06 17:28 ` Jason Gunthorpe
2016-12-06 21:47 ` Logan Gunthorpe
2016-12-06 22:02 ` Dan Williams
2016-12-06 17:12 ` Christoph Hellwig
2016-12-04 7:53 ` Haggai Eran
2016-11-30 17:10 ` Deucher, Alexander
2016-11-28 18:20 ` Logan Gunthorpe
2016-11-28 19:35 ` Serguei Sagalovitch
2016-11-28 21:36 ` Logan Gunthorpe
2016-11-28 21:55 ` Serguei Sagalovitch
2016-11-28 22:24 ` Jason Gunthorpe
2017-01-05 18:39 ` Jerome Glisse
2017-01-05 19:01 ` Jason Gunthorpe
2017-01-05 19:54 ` Jerome Glisse
2017-01-05 20:07 ` Jason Gunthorpe
2017-01-05 20:19 ` Jerome Glisse
2017-01-05 22:42 ` Jason Gunthorpe
2017-01-05 23:23 ` Jerome Glisse
2017-01-06 0:30 ` Jason Gunthorpe
2017-01-06 0:41 ` Serguei Sagalovitch
2017-01-06 1:58 ` Jerome Glisse
2017-01-06 16:56 ` Serguei Sagalovitch
2017-01-06 17:37 ` Jerome Glisse
2017-01-06 18:26 ` Jason Gunthorpe
2017-01-06 19:12 ` Deucher, Alexander
2017-01-06 22:10 ` Logan Gunthorpe
2017-01-12 4:54 ` Stephen Bates
2017-01-12 15:11 ` Jerome Glisse
2017-01-12 17:17 ` Jason Gunthorpe
2017-01-13 13:04 ` Christian König
2017-01-12 22:35 ` Logan Gunthorpe
2017-01-06 15:08 ` Henrique Almeida
2017-10-20 12:36 ` Ludwig Petrosyan
2017-10-20 15:48 ` Logan Gunthorpe
2017-10-22 6:13 ` Petrosyan, Ludwig
2017-10-22 17:19 ` Logan Gunthorpe
2017-10-23 16:08 ` David Laight
2017-10-23 22:04 ` Logan Gunthorpe
2017-10-24 5:58 ` Petrosyan, Ludwig
2017-10-24 14:58 ` David Laight
2017-10-26 13:28 ` Petrosyan, Ludwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161125211846.GA22521@obsidianresearch.com \
--to=jgunthorpe@obsidianresearch.com \
--cc=Alexander.Deucher@amd.com \
--cc=Felix.Kuehling@amd.com \
--cc=Linux-media@vger.kernel.org \
--cc=Paul.Blinzer@amd.com \
--cc=Suravee.Suthikulpanit@amd.com \
--cc=ben.sander@amd.com \
--cc=christian.koenig@amd.com \
--cc=dan.j.williams@intel.com \
--cc=deathsimple@vodafone.de \
--cc=dri-devel@lists.freedesktop.org \
--cc=haggaie@mellanox.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@ml01.01.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=logang@deltatee.com \
--cc=serguei.sagalovitch@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).