From mboxrd@z Thu Jan 1 00:00:00 1970 From: j.glisse@gmail.com Subject: HMM (Heterogeneous Memory Management) v7 Date: Mon, 22 Dec 2014 11:48:54 -0500 Message-ID: <1419266940-5440-1-git-send-email-j.glisse@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: , , Linus Torvalds , , Mel Gorman , "H. Peter Anvin" , Peter Zijlstra , Andrea Arcangeli , Johannes Weiner , Larry Woodman , Rik van Riel , Dave Airlie , Brendan Conoboy , Joe Donohue , Duncan Poole , Sherry Cheung , Subhash Gutti , John Hubbard , Mark Hairgrove , Lucien Dunning , Cameron Buschardt , Arvind Gopalakrishnan , Shachar Raindel , Liran Liss , Roland Dreier Sender: owner-linux-mm@kvack.org List-Id: linux-fsdevel.vger.kernel.org So after PTO and before end of year frenzy here is an updated HMM patchse= t. While not reusing Linus page table design, i use something that is, in my view at least, close to it. Also i avoid pretending that this will be use= ful to other and move it to hmm specific code. There is a longer justificatio= n on why implementing a new page table code instead of using radix or other existing kernel structure as part of commit message. Everything else is pretty much the same, ie this patchset is just the gro= und foundation on to which we want to build our features set. Main feature be= ing migrating memory to device memory. The very first version of this patchse= t already show cased proof of concept of much of the features. Below is previous patchset cover letter pretty much unchanged as backgrou= nd and motivation for it did not. What it is ? In a nutshell HMM is a subsystem that provide an easy to use api to mirro= r a process address on a device with minimal hardware requirement (mainly dev= ice page fault and read only page mapping). This does not rely on ATS and PAS= ID PCIE extensions. It intends to supersede those extensions by allowing to = move system memory to device memory in a transparent fashion for core kernel m= m code (ie cpu page fault on page residing in device memory will trigger migration back to system memory). Why doing this ? We want to be able to mirror a process address space so that compute api = such as OpenCL or other similar api can start using the exact same address spa= ce on the GPU as on the CPU. This will greatly simplify usages of those api. Mo= reover we believe that we will see more and more specialize unit functions that = will want to mirror process address using their own mmu. The migration side is simply because GPU memory bandwidth is far beyond t= han system memory bandwith and there is no sign that this gap is closing (qui= te the opposite). Current status and future features : None of this core code change in any major way core kernel mm code. This is simple ground work with no impact on existing code path. Features that will be implemented on top of this are : 1 - Tansparently handle page mapping on behalf of device driver (DMA). 2 - Improve DMA api to better match new usage pattern of HMM. 3 - Migration of anonymous memory to device memory. 4 - Locking memory to remote memory (CPU access trigger SIGBUS). 5 - Access exclusion btw CPU and device for atomic operations. 6 - Migration of file backed memory to device memory. How future features will be implemented : 1 - Simply use existing DMA api to map page on behalf of a device. 2 - Introduce new DMA api to match new semantic of HMM. It is no longer p= age we map but address range and managing which page is effectively backi= ng an address should be easy to update. I gave a presentation about that during this LPC. 3 - Requires change to cpu page fault code path to handle migration back = to system memory on cpu access. An implementation of this was already se= nt as part of v1. This will be low impact and only add a new special swa= p type handling to existing fault code. 4 - Require a new syscall as i can not see which current syscall would be appropriate for this. My first feeling was to use mbind as it has the right semantic (binding a range of address to a device) but mbind is too numa centric. Second one was madvise, but semantic does not match, madvise does all= ow kernel to ignore them while we do want to block cpu access for as lon= g as the range is bind to a device. So i do not think any of existing syscall can be extended with new fl= ags but maybe i am wrong. 5 - Allowing to map a page as read only on the CPU while a device perform some atomic operation on it (this is mainly to work around system bus that do not support atomic memory access and sadly there is a large base of hardware without that feature). Easiest implementation would be using some page flags but there is no= ne left. So it must be some flags in vma to know if there is a need to q= uery HMM for write protection. 6 - This is the trickiest one to implement and while i showed a proof of concept with v1, i am still have a lot of conflictual feeling about h= ow to achieve this. As usual comments are more then welcome. Thanks in advance to anyone that take a look at this code. Previous patchset posting : v1 http://lwn.net/Articles/597289/ v2 https://lkml.org/lkml/2014/6/12/559 (cover letter did not make it to= ml) v3 https://lkml.org/lkml/2014/6/13/633 v4 https://lkml.org/lkml/2014/8/29/423 v5 https://lkml.org/lkml/2014/11/3/759 v6 http://lwn.net/Articles/619737/ Cheers, J=C3=A9r=C3=B4me To: "Andrew Morton" , Cc: , Cc: linux-mm , Cc: , Cc: "Linus Torvalds" , Cc: "Mel Gorman" , Cc: "H. Peter Anvin" , Cc: "Peter Zijlstra" , Cc: "Linda Wang" , Cc: "Kevin E Martin" , Cc: "Jerome Glisse" , Cc: "Andrea Arcangeli" , Cc: "Johannes Weiner" , Cc: "Larry Woodman" , Cc: "Rik van Riel" , Cc: "Dave Airlie" , Cc: "Jeff Law" , Cc: "Brendan Conoboy" , Cc: "Joe Donohue" , Cc: "Duncan Poole" , Cc: "Sherry Cheung" , Cc: "Subhash Gutti" , Cc: "John Hubbard" , Cc: "Mark Hairgrove" , Cc: "Lucien Dunning" , Cc: "Cameron Buschardt" , Cc: "Arvind Gopalakrishnan" , Cc: "Haggai Eran" , Cc: "Or Gerlitz" , Cc: "Sagi Grimberg" Cc: "Shachar Raindel" , Cc: "Liran Liss" , Cc: "Roland Dreier" , Cc: "Sander, Ben" , Cc: "Stoner, Greg" , Cc: "Bridgman, John" , Cc: "Mantor, Michael" , Cc: "Blinzer, Paul" , Cc: "Morichetti, Laurent" , Cc: "Deucher, Alexander" , Cc: "Gabbay, Oded" , -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org