From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jerome Glisse <j.glisse@gmail.com>
Subject: Re: [RFC] Heterogeneous memory management (mirror process address
 space on a device mmu).
Date: Thu, 8 May 2014 21:26:03 -0400
Message-ID: <20140509012601.GA2906@gmail.com>
References: <1399038730-25641-1-git-send-email-j.glisse@gmail.com>
 <20140506102925.GD11096@twins.programming.kicks-ass.net>
 <CA+55aFzt47Jpp-KK-ocLGgzYt_w-vheqFLfaGZOUSjwVrgGUtw@mail.gmail.com>
 <20140506150014.GA6731@gmail.com>
 <CA+55aFwM-g01tCZ1NknwvMeSMpwyKyTm6hysN-GmrZ_APtk7UA@mail.gmail.com>
 <20140506153315.GB6731@gmail.com>
 <CA+55aFzzPtTkC22WvHNy6srN9PFzer0-_mgRXWO03NwmCdfy4g@mail.gmail.com>
 <20140506161836.GC6731@gmail.com>
 <1399446892.4161.34.camel@pasglop>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: Peter Zijlstra <peterz@infradead.org>, linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Mel Gorman <mgorman@suse.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linda Wang <lwang@redhat.com>, Kevin E Martin <kem@redhat.com>,
	Jerome Glisse <jglisse@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Johannes Weiner <jweiner@redhat.com>,
	Larry Woodman <lwoodman@redhat.com>, Rik van Riel <riel@redhat.com>,
	Dave Airlie <airlied@redhat.com>, Jeff Law <law@redhat.com>,
	Brendan Conoboy <blc@redhat.com>, Joe Donohue <jdonohue@redhat.com>,
	Duncan Poole <dpoole@nvidia.com>,
	Sherry Cheung <SCheung@nvidia.com>,
	Subhash Gutti <sgutti@nvidia.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Mark Hairgrove <mhairgrove@nvidia.com>,
	Lucien Dunning <ldunning@nvidia.com>,
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Return-path: <owner-linux-mm@kvack.org>
Content-Disposition: inline
In-Reply-To: <1399446892.4161.34.camel@pasglop>
Sender: owner-linux-mm@kvack.org
List-Id: linux-fsdevel.vger.kernel.org

On Wed, May 07, 2014 at 05:14:52PM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2014-05-06 at 12:18 -0400, Jerome Glisse wrote:
> >=20
> > I do understand that i was pointing out that if i move to, tlb which =
i
> > am fine with, i will still need to sleep there. That's all i wanted t=
o
> > stress, i did not wanted force using mmu_notifier, i am fine with the=
m
> > becoming atomic as long as i have a place where i can intercept cpu
> > page table update and propagate them to device mmu.
>=20
> Your MMU notifier can maintain a map of "dirty" PTEs and you do the
> actual synchronization in the subsequent flush_tlb_* , you need to add
> hooks there but it's much less painful than in the notifiers.
>=20
> *However* Linus, even then we can't sleep. We do things like
> ptep_clear_flush() that need the PTL and have the synchronous flush
> semantics.
>=20
> Sure, today we wait, possibly for a long time, with IPIs, but we do not
> sleep. Jerome would have to operate within a similar context. No sleep
> for you :)
>=20
> Cheers,
> Ben.
>=20
>=20

So Linus, Benjamin is right there was couple case i did not think about.
For instance with cow page, one thread might trigger copy on write alloca=
te
new page and update page table and another cpu thread might start using t=
he
new page before we even get a chance to update the GPU page table thus GP=
U
could be working on outdated data.

Same kind of race exist on fork when we write protect a page or on when w=
e
split a huge page.

I thought that i only needed to special case page reclaimation, migration
and forbid things like ksm but i am wrong.

So with that in mind are you ok if i pursue the mmu_notifier case taking
into account the result about rwsem+optspin that would allow to make the
many fork workload fast while still allowing mmu_notifier callback to
sleep ?

Otherwise i have no other choice than to add something like mmu_notifier
in the place where there can a be race (huge page split, cow, ...). Which
sounds like a bad idea to me when mmu_notifier is perfect for the job.

Cheers,
J=E9r=F4me Glisse

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=3Dmailto:"dont@kvack.org"> email@kvack.org </a>