Re: [RFC] Heterogeneous memory management (mirror process address space on a device mmu).

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jerome Glisse <j.glisse@gmail.com>
To: sagi grimberg <sagig@mellanox.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, Mel Gorman <mgorman@suse.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linda Wang <lwang@redhat.com>, Kevin E Martin <kem@redhat.com>,
	Jerome Glisse <jglisse@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Johannes Weiner <jweiner@redhat.com>,
	Larry Woodman <lwoodman@redhat.com>,
	Rik van Riel <riel@redhat.com>, Dave Airlie <airlied@redhat.com>,
	Jeff Law <law@redhat.com>, Brendan Conoboy <blc@redhat.com>,
	Joe Donohue <jdonohue@redhat.com>,
	Duncan Poole <dpoole@nvidia.com>,
	Sherry Cheung <SCheung@nvidia.com>,
	Subhash Gutti <sgutti@nvidia.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Mark Hairgrove <mhairgrove@nvidia.com>,
	Lucien Dunning <ldunning@nvidia.com>,
	Cameron Buschardt <cabuschardt@nvidia.com>,
	Arvind Gopalakrishnan <arvindg@nvidia.com>,
	Haggai Eran <haggaie@mellanox.com>,
	Or Gerlitz <ogerlitz@mellanox.com>,
	Shachar Raindel <raindel@mellanox.com>,
	Liran Liss <liranl@mellanox.com>,
	Roland Dreier <roland@purestorage.com>,
	"Sander, Ben" <ben.sander@amd.com>,
	"Stoner, Greg" <Greg.Stoner@amd.com>,
	"Bridgman, John" <John.Bridgman@amd.com>,
	"Mantor, Michael" <Michael.Mantor@amd.com>,
	"Blinzer, Paul" <Paul.Blinzer@amd.com>,
	"Morichetti, Laurent" <Laurent.Morichetti@amd.com>,
	"Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"Gabbay, Oded" <Oded.Gabbay@amd.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [RFC] Heterogeneous memory management (mirror process address space on a device mmu).
Date: Thu, 8 May 2014 13:56:26 -0400	[thread overview]
Message-ID: <20140508175624.GA3121@gmail.com> (raw)
In-Reply-To: <536BB508.2020704@mellanox.com>

On Thu, May 08, 2014 at 07:47:04PM +0300, sagi grimberg wrote:
> On 5/7/2014 5:33 AM, Davidlohr Bueso wrote:
> >On Tue, 2014-05-06 at 12:29 +0200, Peter Zijlstra wrote:
> >>So you forgot to CC Linus, Linus has expressed some dislike for
> >>preemptible mmu_notifiers in the recent past:
> >>
> >>   https://lkml.org/lkml/2013/9/30/385
> >I'm glad this came up again.
> >
> >So I've been running benchmarks (mostly aim7, which nicely exercises our
> >locks) comparing my recent v4 for rwsem optimistic spinning against
> >previous implementation ideas for the anon-vma lock, mostly:
> >
> >- rwsem (currently)
> >- rwlock_t
> >- qrwlock_t
> >- rwsem+optspin
> >
> >Of course, *any* change provides significant improvement in throughput
> >for several workloads, by avoiding to block -- there are more
> >performance numbers in the different patches. This is fairly obvious.
> >
> >What is perhaps not so obvious is that rwsem+optimistic spinning beats
> >all others, including the improved qrwlock from Waiman and Peter. This
> >is mostly because of the idea of cancelable MCS, which was mimic'ed from
> >mutexes. The delta in most cases is around +10-15%, which is non
> >trivial.
> 
> These are great news David!
> 
> >I mention this because from a performance PoV, we'll stop caring so much
> >about the type of lock we require in the notifier related code. So while
> >this is not conclusive, I'm not as opposed to keeping the locks blocking
> >as I once was. Now this might still imply things like poor design
> >choices, but that's neither here nor there.
> 
> So is the rwsem+opt strategy the way to go Given it keeps everyone happy?
> We will be more than satisfied with it as it will allow us to
> guarantee device
> MMU update.
> 
> >/me sees Sagi smiling ;)
> 
> :)

So i started doing thing with tlb flush but i must say things looks ugly.
I need a new page flag (goodbye 32bits platform) and i need my own lru and
page reclaimation for any page in use by a device, i need to hook up inside
try_to_unmap or migrate (but i will do the former). I am trying to be smart
by trying to schedule a worker on another cpu before before sending the ipi
so that while the ipi is in progress hopefully another cpu might schedule
the invalidation on the GPU and the wait after ipi for the gpu will be quick.

So all in all this is looking ugly and it does not change the fact that i
sleep (well need to be able to sleep). It just move the sleeping to another
part.

Maybe i should stress that with the mmu_notifier version it only sleep for
process that are using the GPU those process are using userspace API like
OpenCL which are not playing well with fork, ie read do not use fork if
you are using such API.

So for my case if a process has mm->hmm set to something that would mean
that there is a GPU using that address space and that it is unlikely to
go under the massive workload that people try to optimize the anon_vma
lock for.

My point is that with rwsem+optspin it could try spinning if mm->hmm
was NULL and make the massive fork workload go fast, or it could sleep
directly if mm->hmm is set.

This way my addition are not damaging anyone workload, only the workload
that would use hmm would likely have lock contention on fork but those
workload should not fork in the first place and if they do they should
pay a price.

I will finish up the tlb hackish version of hmm so people can judge how
ugly it is (in my view) and send it here as soon as i can.

But i think it's clear that with rwsem+optspin we can make all workload
happy and fast.

Cheers,
Jerome Glisse

> 
> Sagi.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2014-05-08 17:56 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-02 13:51 [RFC] Heterogeneous memory management (mirror process address space on a device mmu) j.glisse
2014-05-02 13:52 ` [PATCH 01/11] mm: differentiate unmap for vmscan from other unmap j.glisse
2014-05-02 13:52 ` [PATCH 02/11] mmu_notifier: add action information to address invalidation j.glisse
2014-05-02 13:52 ` [PATCH 03/11] mmu_notifier: pass through vma to invalidate_range and invalidate_page j.glisse
2014-05-02 13:52 ` [PATCH 04/11] interval_tree: helper to find previous item of a node in rb interval tree j.glisse
2014-05-02 13:52 ` [PATCH 05/11] mm/memcg: support accounting null page and transfering null charge to new page j.glisse
2014-05-02 13:52 ` [PATCH 06/11] hmm: heterogeneous memory management j.glisse
2014-05-02 13:52 ` [PATCH 07/11] hmm: support moving anonymous page to remote memory j.glisse
2014-05-02 13:52 ` [PATCH 08/11] hmm: support for migrate file backed pages " j.glisse
2014-05-02 13:52 ` [PATCH 09/11] fs/ext4: add support for hmm migration to remote memory of pagecache j.glisse
2014-05-02 13:52 ` [PATCH 10/11] hmm/dummy: dummy driver to showcase the hmm api j.glisse
2014-05-02 13:52 ` [PATCH 11/11] hmm/dummy_driver: add support for fake remote memory using pages j.glisse
2014-05-06 10:29 ` [RFC] Heterogeneous memory management (mirror process address space on a device mmu) Peter Zijlstra
2014-05-06 14:57   ` Linus Torvalds
2014-05-06 15:00     ` Jerome Glisse
2014-05-06 15:18       ` Linus Torvalds
2014-05-06 15:33         ` Jerome Glisse
2014-05-06 15:42           ` Rik van Riel
2014-05-06 15:47           ` Linus Torvalds
2014-05-06 16:18             ` Jerome Glisse
2014-05-06 16:32               ` Linus Torvalds
2014-05-06 16:49                 ` Jerome Glisse
2014-05-06 17:28                 ` Jerome Glisse
2014-05-06 17:43                   ` Linus Torvalds
2014-05-06 18:13                     ` Jerome Glisse
2014-05-06 18:22                       ` Linus Torvalds
2014-05-06 18:38                         ` Jerome Glisse
2014-05-07  7:18                 ` Benjamin Herrenschmidt
2014-05-07  7:14               ` Benjamin Herrenschmidt
2014-05-07 12:39                 ` Jerome Glisse
2014-05-09  1:26                 ` Jerome Glisse
2014-05-10  4:28                   ` Benjamin Herrenschmidt
2014-05-11  0:48                     ` Jerome Glisse
2014-05-06 16:30             ` Rik van Riel
2014-05-06 16:34               ` Linus Torvalds
2014-05-06 16:47                 ` Rik van Riel
2014-05-06 16:54                   ` Jerome Glisse
2014-05-06 18:02                     ` H. Peter Anvin
2014-05-06 18:26                       ` Jerome Glisse
2014-05-06 22:44                 ` David Airlie
2014-05-07  2:33   ` Davidlohr Bueso
2014-05-07 13:00     ` Peter Zijlstra
2014-05-07 17:34       ` Davidlohr Bueso
2014-05-07 16:21     ` Linus Torvalds
2014-05-08 16:47     ` sagi grimberg
2014-05-08 17:56       ` Jerome Glisse [this message]
2014-05-09  1:42         ` Davidlohr Bueso
2014-05-09  1:45           ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140508175624.GA3121@gmail.com \
    --to=j.glisse@gmail.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Greg.Stoner@amd.com \
    --cc=John.Bridgman@amd.com \
    --cc=Laurent.Morichetti@amd.com \
    --cc=Michael.Mantor@amd.com \
    --cc=Oded.Gabbay@amd.com \
    --cc=Paul.Blinzer@amd.com \
    --cc=SCheung@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=airlied@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arvindg@nvidia.com \
    --cc=ben.sander@amd.com \
    --cc=blc@redhat.com \
    --cc=cabuschardt@nvidia.com \
    --cc=davidlohr@hp.com \
    --cc=dpoole@nvidia.com \
    --cc=haggaie@mellanox.com \
    --cc=hpa@zytor.com \
    --cc=jdonohue@redhat.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=jweiner@redhat.com \
    --cc=kem@redhat.com \
    --cc=law@redhat.com \
    --cc=ldunning@nvidia.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liranl@mellanox.com \
    --cc=lwang@redhat.com \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mhairgrove@nvidia.com \
    --cc=ogerlitz@mellanox.com \
    --cc=peterz@infradead.org \
    --cc=raindel@mellanox.com \
    --cc=riel@redhat.com \
    --cc=roland@purestorage.com \
    --cc=sagig@mellanox.com \
    --cc=sgutti@nvidia.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).