Re: [RFC] Heterogeneous memory management (mirror process address space on a device mmu).

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jerome Glisse <j.glisse@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Mel Gorman <mgorman@suse.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linda Wang <lwang@redhat.com>, Kevin E Martin <kem@redhat.com>,
	Jerome Glisse <jglisse@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Johannes Weiner <jweiner@redhat.com>,
	Larry Woodman <lwoodman@redhat.com>,
	Rik van Riel <riel@redhat.com>, Dave Airlie <airlied@redhat.com>,
	Jeff Law <law@redhat.com>, Brendan Conoboy <blc@redhat.com>,
	Joe Donohue <jdonohue@redhat.com>,
	Duncan Poole <dpoole@nvidia.com>,
	Sherry Cheung <SCheung@nvidia.com>,
	Subhash Gutti <sgutti@nvidia.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Mark Hairgrove <mhairgrove@nvidia.com>,
	Lucien Dunning <ldunning@nvidia.com>,
Subject: Re: [RFC] Heterogeneous memory management (mirror process address space on a device mmu).
Date: Tue, 6 May 2014 11:33:17 -0400	[thread overview]
Message-ID: <20140506153315.GB6731@gmail.com> (raw)
In-Reply-To: <CA+55aFwM-g01tCZ1NknwvMeSMpwyKyTm6hysN-GmrZ_APtk7UA@mail.gmail.com>

On Tue, May 06, 2014 at 08:18:34AM -0700, Linus Torvalds wrote:
> On Tue, May 6, 2014 at 8:00 AM, Jerome Glisse <j.glisse@gmail.com> wrote:
> >
> > So question becomes how to implement process address space mirroring
> > without pinning memory and track cpu page table update knowing that
> > device page table update is unbound can not be atomic from cpu point
> > of view.
> 
> Perhaps as a fake TLB and interacting with the TLB shootdown? And
> making sure that everything is atomic?
> 
> Some of these devices are going to actually *share* the real page
> tables. Not "cache" them. Actually use the page tables directly.
> That's where all these on-die APU things are going, where the device
> really ends up being something much more like ASMP (asymmetric
> multi-processing) than a traditional external device.
> 
> So we *will* have to extend our notion of TLB shootdown to have not
> just a mask of possible active CPU's, but possible active devices. No
> question about that.

Well no, as i said and explain in my mail APU and IOMMUv2 is a one sided
coin and you can not use the device memory with such solution. So yes
there is interest from many player to mirror the cpu page table by other
means than by having the IOMMU walk the cpu page table (this include
AMD).

> 
> But doing this with sleeping in some stupid VM notifier is completely
> out of the question, because it *CANNOT EVEN WORK* for that eventual
> real goal of sharing the physical page tables where the device can do
> things like atomic dirty/accessed bit settings etc. It can only work
> for crappy sh*t that does the half-way thing. It's completely racy wrt
> the actual page table updates. That kind of page table sharing needs
> true atomicity for exactly the same reason we need it for our current
> SMP. So it needs to have all the same page table locking rules etc.
> Not that shitty notifier callback.
> 
> As I said, the VM notifiers were misdesigned to begin with. They are
> an abomination. We're not going to extend on that and make it worse.
> We are *certainly* not going to make them blocking and screwing our
> core VM that way. And that's doubly and triply true when it cannot
> work for the generic case _anyway_.
> 
>               Linus

So how can i solve the issue at hand. A device that has its own page
table and can not mirror the cpu page table, nor can the device page
table be updated atomicly from the cpu. Yes such device will exist
and the IOMMUv2 walking the cpu page table is not capable of supporting
GPU memory which is a big big big needed feature. Compare 20Gb/s vs
300Gb/s of GPU memory.

I understand that we do not want to sleep when updating process cpu
page table but note that only process that use the gpu would have to
sleep. So only process that can actually benefit from the using GPU
will suffer the consequences.

That said it also play a role with page reclamation hence why i am
proposing to have a separate lru for page involve with a GPU.

So having the hardware walking the cpu page table is out of the
question.

Cheers,
Jérôme

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2014-05-06 15:33 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-02 13:51 [RFC] Heterogeneous memory management (mirror process address space on a device mmu) j.glisse
2014-05-02 13:52 ` [PATCH 01/11] mm: differentiate unmap for vmscan from other unmap j.glisse
2014-05-02 13:52 ` [PATCH 02/11] mmu_notifier: add action information to address invalidation j.glisse
2014-05-02 13:52 ` [PATCH 03/11] mmu_notifier: pass through vma to invalidate_range and invalidate_page j.glisse
2014-05-02 13:52 ` [PATCH 04/11] interval_tree: helper to find previous item of a node in rb interval tree j.glisse
2014-05-02 13:52 ` [PATCH 05/11] mm/memcg: support accounting null page and transfering null charge to new page j.glisse
2014-05-02 13:52 ` [PATCH 06/11] hmm: heterogeneous memory management j.glisse
2014-05-02 13:52 ` [PATCH 07/11] hmm: support moving anonymous page to remote memory j.glisse
2014-05-02 13:52 ` [PATCH 08/11] hmm: support for migrate file backed pages " j.glisse
2014-05-02 13:52 ` [PATCH 09/11] fs/ext4: add support for hmm migration to remote memory of pagecache j.glisse
2014-05-02 13:52 ` [PATCH 10/11] hmm/dummy: dummy driver to showcase the hmm api j.glisse
2014-05-02 13:52 ` [PATCH 11/11] hmm/dummy_driver: add support for fake remote memory using pages j.glisse
2014-05-06 10:29 ` [RFC] Heterogeneous memory management (mirror process address space on a device mmu) Peter Zijlstra
2014-05-06 14:57   ` Linus Torvalds
2014-05-06 15:00     ` Jerome Glisse
2014-05-06 15:18       ` Linus Torvalds
2014-05-06 15:33         ` Jerome Glisse [this message]
2014-05-06 15:42           ` Rik van Riel
2014-05-06 15:47           ` Linus Torvalds
2014-05-06 16:18             ` Jerome Glisse
2014-05-06 16:32               ` Linus Torvalds
2014-05-06 16:49                 ` Jerome Glisse
2014-05-06 17:28                 ` Jerome Glisse
2014-05-06 17:43                   ` Linus Torvalds
2014-05-06 18:13                     ` Jerome Glisse
2014-05-06 18:22                       ` Linus Torvalds
2014-05-06 18:38                         ` Jerome Glisse
2014-05-07  7:18                 ` Benjamin Herrenschmidt
2014-05-07  7:14               ` Benjamin Herrenschmidt
2014-05-07 12:39                 ` Jerome Glisse
2014-05-09  1:26                 ` Jerome Glisse
2014-05-10  4:28                   ` Benjamin Herrenschmidt
2014-05-11  0:48                     ` Jerome Glisse
2014-05-06 16:30             ` Rik van Riel
2014-05-06 16:34               ` Linus Torvalds
2014-05-06 16:47                 ` Rik van Riel
2014-05-06 16:54                   ` Jerome Glisse
2014-05-06 18:02                     ` H. Peter Anvin
2014-05-06 18:26                       ` Jerome Glisse
2014-05-06 22:44                 ` David Airlie
2014-05-07  2:33   ` Davidlohr Bueso
2014-05-07 13:00     ` Peter Zijlstra
2014-05-07 17:34       ` Davidlohr Bueso
2014-05-07 16:21     ` Linus Torvalds
2014-05-08 16:47     ` sagi grimberg
2014-05-08 17:56       ` Jerome Glisse
2014-05-09  1:42         ` Davidlohr Bueso
2014-05-09  1:45           ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140506153315.GB6731@gmail.com \
    --to=j.glisse@gmail.com \
    --cc=SCheung@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=airlied@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=blc@redhat.com \
    --cc=dpoole@nvidia.com \
    --cc=hpa@zytor.com \
    --cc=jdonohue@redhat.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=jweiner@redhat.com \
    --cc=kem@redhat.com \
    --cc=law@redhat.com \
    --cc=ldunning@nvidia.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lwang@redhat.com \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mhairgrove@nvidia.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=sgutti@nvidia.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).