From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jerome Glisse <j.glisse@gmail.com>
Subject: Re: [RFC] Heterogeneous memory management (mirror process address
 space on a device mmu).
Date: Thu, 8 May 2014 21:45:45 -0400
Message-ID: <20140509014544.GB2906@gmail.com>
References: <1399038730-25641-1-git-send-email-j.glisse@gmail.com>
 <20140506102925.GD11096@twins.programming.kicks-ass.net>
 <1399429987.2581.25.camel@buesod1.americas.hpqcorp.net>
 <536BB508.2020704@mellanox.com>
 <20140508175624.GA3121@gmail.com>
 <1399599734.2497.2.camel@buesod1.americas.hpqcorp.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: sagi grimberg <sagig@mellanox.com>,
	Peter Zijlstra <peterz@infradead.org>, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Mel Gorman <mgorman@suse.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linda Wang <lwang@redhat.com>, Kevin E Martin <kem@redhat.com>,
	Jerome Glisse <jglisse@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Johannes Weiner <jweiner@redhat.com>,
	Larry Woodman <lwoodman@redhat.com>, Rik van Riel <riel@redhat.com>,
	Dave Airlie <airlied@redhat.com>, Jeff Law <law@redhat.com>,
	Brendan Conoboy <blc@redhat.com>, Joe Donohue <jdonohue@redhat.com>,
	Duncan Poole <dpoole@nvidia.com>,
	Sherry Cheung <SCheung@nvidia.com>,
	Subhash Gutti <sgutti@nvidia.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Mark Hairgrove <mhairgrove@nvidia.com>,
	Lucien Dunning <ldunning@nvidia.com>,
	Cameron Buschardt <
To: Davidlohr Bueso <davidlohr@hp.com>
Return-path: <owner-linux-mm@kvack.org>
Content-Disposition: inline
In-Reply-To: <1399599734.2497.2.camel@buesod1.americas.hpqcorp.net>
Sender: owner-linux-mm@kvack.org
List-Id: linux-fsdevel.vger.kernel.org

On Thu, May 08, 2014 at 06:42:14PM -0700, Davidlohr Bueso wrote:
> On Thu, 2014-05-08 at 13:56 -0400, Jerome Glisse wrote:
> > On Thu, May 08, 2014 at 07:47:04PM +0300, sagi grimberg wrote:
> > > On 5/7/2014 5:33 AM, Davidlohr Bueso wrote:
> > > >On Tue, 2014-05-06 at 12:29 +0200, Peter Zijlstra wrote:
> > > >>So you forgot to CC Linus, Linus has expressed some dislike for
> > > >>preemptible mmu_notifiers in the recent past:
> > > >>
> > > >>   https://lkml.org/lkml/2013/9/30/385
> > > >I'm glad this came up again.
> > > >
> > > >So I've been running benchmarks (mostly aim7, which nicely exercis=
es our
> > > >locks) comparing my recent v4 for rwsem optimistic spinning agains=
t
> > > >previous implementation ideas for the anon-vma lock, mostly:
> > > >
> > > >- rwsem (currently)
> > > >- rwlock_t
> > > >- qrwlock_t
> > > >- rwsem+optspin
> > > >
> > > >Of course, *any* change provides significant improvement in throug=
hput
> > > >for several workloads, by avoiding to block -- there are more
> > > >performance numbers in the different patches. This is fairly obvio=
us.
> > > >
> > > >What is perhaps not so obvious is that rwsem+optimistic spinning b=
eats
> > > >all others, including the improved qrwlock from Waiman and Peter. =
This
> > > >is mostly because of the idea of cancelable MCS, which was mimic'e=
d from
> > > >mutexes. The delta in most cases is around +10-15%, which is non
> > > >trivial.
> > >=20
> > > These are great news David!
> > >=20
> > > >I mention this because from a performance PoV, we'll stop caring s=
o much
> > > >about the type of lock we require in the notifier related code. So=
 while
> > > >this is not conclusive, I'm not as opposed to keeping the locks bl=
ocking
> > > >as I once was. Now this might still imply things like poor design
> > > >choices, but that's neither here nor there.
> > >=20
> > > So is the rwsem+opt strategy the way to go Given it keeps everyone =
happy?
> > > We will be more than satisfied with it as it will allow us to
> > > guarantee device
> > > MMU update.
> > >=20
> > > >/me sees Sagi smiling ;)
> > >=20
> > > :)
> >=20
> > So i started doing thing with tlb flush but i must say things looks u=
gly.
> > I need a new page flag (goodbye 32bits platform) and i need my own lr=
u and
> > page reclaimation for any page in use by a device, i need to hook up =
inside
> > try_to_unmap or migrate (but i will do the former). I am trying to be=
 smart
> > by trying to schedule a worker on another cpu before before sending t=
he ipi
> > so that while the ipi is in progress hopefully another cpu might sche=
dule
> > the invalidation on the GPU and the wait after ipi for the gpu will b=
e quick.
> >=20
> > So all in all this is looking ugly and it does not change the fact th=
at i
> > sleep (well need to be able to sleep). It just move the sleeping to a=
nother
> > part.
> >=20
> > Maybe i should stress that with the mmu_notifier version it only slee=
p for
> > process that are using the GPU those process are using userspace API =
like
> > OpenCL which are not playing well with fork, ie read do not use fork =
if
> > you are using such API.
> >=20
> > So for my case if a process has mm->hmm set to something that would m=
ean
> > that there is a GPU using that address space and that it is unlikely =
to
> > go under the massive workload that people try to optimize the anon_vm=
a
> > lock for.
> >=20
> > My point is that with rwsem+optspin it could try spinning if mm->hmm
> > was NULL and make the massive fork workload go fast, or it could slee=
p
> > directly if mm->hmm is set.
>=20
> Sorry? Unless I'm misunderstanding you, we don't do such things. Our
> locks are generic and need to work for any circumstance, no special
> cases here and there... _specially_ with these kind of things. So no,
> rwsem will spin as long as the owner is set, just like any other users.
>=20
> Thanks,
> Davidlohr
>=20

I do not mind spining all time i was just thinking that it could be optim=
ize
away in case there is hmm for the current mm as it means that any way the=
re
very much likely gonna be a schedule inside the mmu_notifier.

But if you prefer keep code generic i am fine with wasting cpu cycle.

Cheers,
J=E9r=F4me Glisse

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=3Dmailto:"dont@kvack.org"> email@kvack.org </a>