From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jerome Glisse Subject: Re: [RFC] Heterogeneous memory management (mirror process address space on a device mmu). Date: Thu, 8 May 2014 21:26:03 -0400 Message-ID: <20140509012601.GA2906@gmail.com> References: <1399038730-25641-1-git-send-email-j.glisse@gmail.com> <20140506102925.GD11096@twins.programming.kicks-ass.net> <20140506150014.GA6731@gmail.com> <20140506153315.GB6731@gmail.com> <20140506161836.GC6731@gmail.com> <1399446892.4161.34.camel@pasglop> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Peter Zijlstra , linux-mm , Linux Kernel Mailing List , linux-fsdevel , Mel Gorman , "H. Peter Anvin" , Andrew Morton , Linda Wang , Kevin E Martin , Jerome Glisse , Andrea Arcangeli , Johannes Weiner , Larry Woodman , Rik van Riel , Dave Airlie , Jeff Law , Brendan Conoboy , Joe Donohue , Duncan Poole , Sherry Cheung , Subhash Gutti , John Hubbard , Mark Hairgrove , Lucien Dunning , To: Linus Torvalds , Benjamin Herrenschmidt Return-path: Content-Disposition: inline In-Reply-To: <1399446892.4161.34.camel@pasglop> Sender: owner-linux-mm@kvack.org List-Id: linux-fsdevel.vger.kernel.org On Wed, May 07, 2014 at 05:14:52PM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2014-05-06 at 12:18 -0400, Jerome Glisse wrote: > >=20 > > I do understand that i was pointing out that if i move to, tlb which = i > > am fine with, i will still need to sleep there. That's all i wanted t= o > > stress, i did not wanted force using mmu_notifier, i am fine with the= m > > becoming atomic as long as i have a place where i can intercept cpu > > page table update and propagate them to device mmu. >=20 > Your MMU notifier can maintain a map of "dirty" PTEs and you do the > actual synchronization in the subsequent flush_tlb_* , you need to add > hooks there but it's much less painful than in the notifiers. >=20 > *However* Linus, even then we can't sleep. We do things like > ptep_clear_flush() that need the PTL and have the synchronous flush > semantics. >=20 > Sure, today we wait, possibly for a long time, with IPIs, but we do not > sleep. Jerome would have to operate within a similar context. No sleep > for you :) >=20 > Cheers, > Ben. >=20 >=20 So Linus, Benjamin is right there was couple case i did not think about. For instance with cow page, one thread might trigger copy on write alloca= te new page and update page table and another cpu thread might start using t= he new page before we even get a chance to update the GPU page table thus GP= U could be working on outdated data. Same kind of race exist on fork when we write protect a page or on when w= e split a huge page. I thought that i only needed to special case page reclaimation, migration and forbid things like ksm but i am wrong. So with that in mind are you ok if i pursue the mmu_notifier case taking into account the result about rwsem+optspin that would allow to make the many fork workload fast while still allowing mmu_notifier callback to sleep ? Otherwise i have no other choice than to add something like mmu_notifier in the place where there can a be race (huge page split, cow, ...). Which sounds like a bad idea to me when mmu_notifier is perfect for the job. Cheers, J=E9r=F4me Glisse -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org