Re: [RFC] Heterogeneous memory management (mirror process address space on a device mmu).

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jerome Glisse <j.glisse@gmail.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Mel Gorman <mgorman@suse.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linda Wang <lwang@redhat.com>, Kevin E Martin <kem@redhat.com>,
	Jerome Glisse <jglisse@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Johannes Weiner <jweiner@redhat.com>,
	Larry Woodman <lwoodman@redhat.com>,
	Rik van Riel <riel@redhat.com>, Dave Airlie <airlied@redhat.com>,
	Jeff Law <law@redhat.com>, Brendan Conoboy <blc@redhat.com>,
	Joe Donohue <jdonohue@redhat.com>,
	Duncan Poole <dpoole@nvidia.com>,
	Sherry Cheung <SCheung@nvidia.com>,
	Subhash Gutti <sgutti@nvidia.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Mark Hairgrove <mhairgrove@nvi
Subject: Re: [RFC] Heterogeneous memory management (mirror process address space on a device mmu).
Date: Wed, 7 May 2014 08:39:49 -0400	[thread overview]
Message-ID: <20140507123948.GA2582@gmail.com> (raw)
In-Reply-To: <1399446892.4161.34.camel@pasglop>

On Wed, May 07, 2014 at 05:14:52PM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2014-05-06 at 12:18 -0400, Jerome Glisse wrote:
> > 
> > I do understand that i was pointing out that if i move to, tlb which i
> > am fine with, i will still need to sleep there. That's all i wanted to
> > stress, i did not wanted force using mmu_notifier, i am fine with them
> > becoming atomic as long as i have a place where i can intercept cpu
> > page table update and propagate them to device mmu.
> 
> Your MMU notifier can maintain a map of "dirty" PTEs and you do the
> actual synchronization in the subsequent flush_tlb_* , you need to add
> hooks there but it's much less painful than in the notifiers.

Well getting back the dirty info from the GPU also require to sleep. Maybe
i should explain how it is suppose to work. GPU have several command buffer
and execute instructions inside those command buffer in sequential order.
To update the GPU mmu you need to schedule command into one of those command
buffer but when you do so you do not know how much command are in front of
you and how long it will take to the GPU to get to your command.

Yes GPU this patchset target have preemption but it is not as flexible as
CPU preemption there is not kernel thread running and scheduling, all the
scheduling is done in hardware. So the preemption is more limited that on
CPU.

That is why any update or information retrieval from the GPU need to go
through some command buffer and no matter how high priority the command
buffer for mmu update is, it can still long time (think flushing thousand
of GPU thread and saving there context).

> 
> *However* Linus, even then we can't sleep. We do things like
> ptep_clear_flush() that need the PTL and have the synchronous flush
> semantics.
> 
> Sure, today we wait, possibly for a long time, with IPIs, but we do not
> sleep. Jerome would have to operate within a similar context. No sleep
> for you :)
> 
> Cheers,
> Ben.

So for the ptep_clear_flush my idea is to have a special lru for page that
are in use by the GPU. This will prevent the page reclaimation try_to_unmap
and thus the ptep_clear_flush. I would block ksm so again another user that
would no do ptep_clear_flush. I would need to fix remap_file_pages either
adding some callback there or refactor the unmap and tlb flushing.

Finaly for page migration i see several solutions, forbid it (easy for me
but likely not what we want) have special code inside migrate code to handle
page in use by a device, or have special code inside try_to_unmap to handle
it.

I think this is all the current user of ptep_clear_flush and derivative that
does flush tlb while holding spinlock.

Note that for special lru or event special handling of page in use by a device
i need a new page flag. Would this be acceptable ?

For the special lru i was thinking of doing it per device as anyway each device
is unlikely to constantly address all the page it has mapped. Simple lru list
would do and probably offering some helper for device driver to mark page accessed
so page frequently use are not reclaim.

But a global list is fine as well and simplify the case diffirent device use
same pages.

Cheers,
Jérôme Glisse

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2014-05-07 12:39 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-02 13:51 [RFC] Heterogeneous memory management (mirror process address space on a device mmu) j.glisse
2014-05-02 13:52 ` [PATCH 01/11] mm: differentiate unmap for vmscan from other unmap j.glisse
2014-05-02 13:52 ` [PATCH 02/11] mmu_notifier: add action information to address invalidation j.glisse
2014-05-02 13:52 ` [PATCH 03/11] mmu_notifier: pass through vma to invalidate_range and invalidate_page j.glisse
2014-05-02 13:52 ` [PATCH 04/11] interval_tree: helper to find previous item of a node in rb interval tree j.glisse
2014-05-02 13:52 ` [PATCH 05/11] mm/memcg: support accounting null page and transfering null charge to new page j.glisse
2014-05-02 13:52 ` [PATCH 06/11] hmm: heterogeneous memory management j.glisse
2014-05-02 13:52 ` [PATCH 07/11] hmm: support moving anonymous page to remote memory j.glisse
2014-05-02 13:52 ` [PATCH 08/11] hmm: support for migrate file backed pages " j.glisse
2014-05-02 13:52 ` [PATCH 09/11] fs/ext4: add support for hmm migration to remote memory of pagecache j.glisse
2014-05-02 13:52 ` [PATCH 10/11] hmm/dummy: dummy driver to showcase the hmm api j.glisse
2014-05-02 13:52 ` [PATCH 11/11] hmm/dummy_driver: add support for fake remote memory using pages j.glisse
2014-05-06 10:29 ` [RFC] Heterogeneous memory management (mirror process address space on a device mmu) Peter Zijlstra
2014-05-06 14:57   ` Linus Torvalds
2014-05-06 15:00     ` Jerome Glisse
2014-05-06 15:18       ` Linus Torvalds
2014-05-06 15:33         ` Jerome Glisse
2014-05-06 15:42           ` Rik van Riel
2014-05-06 15:47           ` Linus Torvalds
2014-05-06 16:18             ` Jerome Glisse
2014-05-06 16:32               ` Linus Torvalds
2014-05-06 16:49                 ` Jerome Glisse
2014-05-06 17:28                 ` Jerome Glisse
2014-05-06 17:43                   ` Linus Torvalds
2014-05-06 18:13                     ` Jerome Glisse
2014-05-06 18:22                       ` Linus Torvalds
2014-05-06 18:38                         ` Jerome Glisse
2014-05-07  7:18                 ` Benjamin Herrenschmidt
2014-05-07  7:14               ` Benjamin Herrenschmidt
2014-05-07 12:39                 ` Jerome Glisse [this message]
2014-05-09  1:26                 ` Jerome Glisse
2014-05-10  4:28                   ` Benjamin Herrenschmidt
2014-05-11  0:48                     ` Jerome Glisse
2014-05-06 16:30             ` Rik van Riel
2014-05-06 16:34               ` Linus Torvalds
2014-05-06 16:47                 ` Rik van Riel
2014-05-06 16:54                   ` Jerome Glisse
2014-05-06 18:02                     ` H. Peter Anvin
2014-05-06 18:26                       ` Jerome Glisse
2014-05-06 22:44                 ` David Airlie
2014-05-07  2:33   ` Davidlohr Bueso
2014-05-07 13:00     ` Peter Zijlstra
2014-05-07 17:34       ` Davidlohr Bueso
2014-05-07 16:21     ` Linus Torvalds
2014-05-08 16:47     ` sagi grimberg
2014-05-08 17:56       ` Jerome Glisse
2014-05-09  1:42         ` Davidlohr Bueso
2014-05-09  1:45           ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140507123948.GA2582@gmail.com \
    --to=j.glisse@gmail.com \
    --cc=SCheung@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=airlied@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=blc@redhat.com \
    --cc=dpoole@nvidia.com \
    --cc=hpa@zytor.com \
    --cc=jdonohue@redhat.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=jweiner@redhat.com \
    --cc=kem@redhat.com \
    --cc=law@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lwang@redhat.com \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mhairgrove@nvi \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=sgutti@nvidia.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).