* Mirroring process address space on device @ 2016-03-16 17:10 ` Olu Ogunbowale 0 siblings, 0 replies; 20+ messages in thread From: Olu Ogunbowale @ 2016-03-16 17:10 UTC (permalink / raw) To: linux-mm Cc: linux-kernel, Linus Torvalds, Michel Lespinasse, Andrew Morton, Rik van Riel, Hugh Dickins, Russell King, Ralf Baechle, Paul Mundt, David S. Miller, Chris Metcalf, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Olu Ogunbowale In a nutshell: Export the memory management functions, unmapped_area() & unmapped_area_topdown(), as GPL symbols; this allows the kernel to better support process address space mirroring on both CPU and device for out-of-tree drivers by allowing the use of vm_unmapped_area() in a driver's file operation get_unmapped_area(). This is required by drivers that want to control or limit a process VMA range into which shared-virtual-memory (SVM) buffers are mapped during an mmap() call in order to ensure that said SVM VMA does not collide with any pre-existing VMAs used by non-buffer regions on the device because SVM buffers must have identical VMAs on both CPU and device. Exporting these functions is particularly useful for graphics devices as SVM support is required by the OpenCL & HSA specifications and also SVM support for 64-bit CPUs where the useable device SVM address range is/maybe a subset of the full 64-bit range of the CPU. Exporting also avoids the need to duplicate the VMA search code in such drivers. Why do this: The OpenCL API & Heterogeneous System Architecture (HSA) specifications requires mirroring a process address space on both the CPU and GPU, a so called shared-virtual-memory (SVM) support wherein the same virtual address is used to address the same content on both the CPU and GPU. There are different levels of support from coarse to fine-grained with slightly different semantics (1: coarse-grained buffer SVM, 2: fine-grained buffer SVM & 3: fine-grained system SVM); furthermore support for the highest level, fine-grained system SVM, is optional and this fact is central to the need for this requirement as explained below. For hardware & drivers implementing support for SVM up to the second level only, i.e. fine-grained buffer SVM level, this mirroring is effectively at a buffer allocation level and therefore excludes the need for any heterogeneous memory management (HMM) like functionality which is required to support SVM up to the highest level, i.e. fine-grained system SVM (see http://lwn.net/Articles/597289/ for details). In this case, drivers would benefit from being able to specify/control the SVM VMA range during a mmap() call especially if the device SVM VMA range is a subset of the full 32-bit/64-bit CPU (process/mmap) range. As the kernel already provides a char driver file->f_op->get_unmapped_area() entry point for this, the backend of such a call would require a constrained search for an unmapped address range using vm_unmapped_area() which currently calls into either unmapped_area() or unmapped_area_topdown() both of which are not currently exported symbols. Therefore, exporting these symbols allows the kerne to provide better support this type of process address space and it also avoids duplicating the VMA search code in these drivers. As always, comments are welcome and many thanks in advance for consideration. Olu Ogunbowale (1): mm: Export symbols unmapped_area() & unmapped_area_topdown() mm/mmap.c | 4 ++++ 1 file changed, 4 insertions(+) -- 2.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Mirroring process address space on device @ 2016-03-16 17:10 ` Olu Ogunbowale 0 siblings, 0 replies; 20+ messages in thread From: Olu Ogunbowale @ 2016-03-16 17:10 UTC (permalink / raw) To: linux-mm Cc: linux-kernel, Linus Torvalds, Michel Lespinasse, Andrew Morton, Rik van Riel, Hugh Dickins, Russell King, Ralf Baechle, Paul Mundt, David S. Miller, Chris Metcalf, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Olu Ogunbowale In a nutshell: Export the memory management functions, unmapped_area() & unmapped_area_topdown(), as GPL symbols; this allows the kernel to better support process address space mirroring on both CPU and device for out-of-tree drivers by allowing the use of vm_unmapped_area() in a driver's file operation get_unmapped_area(). This is required by drivers that want to control or limit a process VMA range into which shared-virtual-memory (SVM) buffers are mapped during an mmap() call in order to ensure that said SVM VMA does not collide with any pre-existing VMAs used by non-buffer regions on the device because SVM buffers must have identical VMAs on both CPU and device. Exporting these functions is particularly useful for graphics devices as SVM support is required by the OpenCL & HSA specifications and also SVM support for 64-bit CPUs where the useable device SVM address range is/maybe a subset of the full 64-bit range of the CPU. Exporting also avoids the need to duplicate the VMA search code in such drivers. Why do this: The OpenCL API & Heterogeneous System Architecture (HSA) specifications requires mirroring a process address space on both the CPU and GPU, a so called shared-virtual-memory (SVM) support wherein the same virtual address is used to address the same content on both the CPU and GPU. There are different levels of support from coarse to fine-grained with slightly different semantics (1: coarse-grained buffer SVM, 2: fine-grained buffer SVM & 3: fine-grained system SVM); furthermore support for the highest level, fine-grained system SVM, is optional and this fact is central to the need for this requirement as explained below. For hardware & drivers implementing support for SVM up to the second level only, i.e. fine-grained buffer SVM level, this mirroring is effectively at a buffer allocation level and therefore excludes the need for any heterogeneous memory management (HMM) like functionality which is required to support SVM up to the highest level, i.e. fine-grained system SVM (see http://lwn.net/Articles/597289/ for details). In this case, drivers would benefit from being able to specify/control the SVM VMA range during a mmap() call especially if the device SVM VMA range is a subset of the full 32-bit/64-bit CPU (process/mmap) range. As the kernel already provides a char driver file->f_op->get_unmapped_area() entry point for this, the backend of such a call would require a constrained search for an unmapped address range using vm_unmapped_area() which currently calls into either unmapped_area() or unmapped_area_topdown() both of which are not currently exported symbols. Therefore, exporting these symbols allows the kerne to provide better support this type of process address space and it also avoids duplicating the VMA search code in these drivers. As always, comments are welcome and many thanks in advance for consideration. Olu Ogunbowale (1): mm: Export symbols unmapped_area() & unmapped_area_topdown() mm/mmap.c | 4 ++++ 1 file changed, 4 insertions(+) -- 2.7.1 ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown() 2016-03-16 17:10 ` Olu Ogunbowale @ 2016-03-16 17:10 ` Olu Ogunbowale -1 siblings, 0 replies; 20+ messages in thread From: Olu Ogunbowale @ 2016-03-16 17:10 UTC (permalink / raw) To: linux-mm Cc: linux-kernel, Linus Torvalds, Michel Lespinasse, Andrew Morton, Rik van Riel, Hugh Dickins, Russell King, Ralf Baechle, Paul Mundt, David S. Miller, Chris Metcalf, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Olujide Ogunbowale From: Olujide Ogunbowale <Olu.Ogunbowale@imgtec.com> Export the memory management functions, unmapped_area() & unmapped_area_topdown(), as GPL symbols; this allows the kernel to better support process address space mirroring on both CPU and device for out-of-tree drivers by allowing the use of vm_unmapped_area() in a driver's file operation get_unmapped_area(). This is required by drivers that want to control or limit a process VMA range into which shared-virtual-memory (SVM) buffers are mapped during an mmap() call in order to ensure that said SVM VMA does not collide with any pre-existing VMAs used by non-buffer regions on the device because SVM buffers must have identical VMAs on both CPU and device. Exporting these functions is particularly useful for graphics devices as SVM support is required by the OpenCL & HSA specifications and also SVM support for 64-bit CPUs where the useable device SVM address range is/maybe a subset of the full 64-bit range of the CPU. Exporting also avoids the need to duplicate the VMA search code in such drivers. Signed-off-by: Olu Ogunbowale <Olu.Ogunbowale@imgtec.com> --- mm/mmap.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/mm/mmap.c b/mm/mmap.c index 76d1ec2..c08b518 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1804,6 +1804,8 @@ found: return gap_start; } +EXPORT_SYMBOL_GPL(unmapped_area); + unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) { struct mm_struct *mm = current->mm; @@ -1902,6 +1904,8 @@ found_highest: return gap_end; } +EXPORT_SYMBOL_GPL(unmapped_area_topdown); + /* Get an address range which is currently unmapped. * For shmat() with addr=0. * -- 1.7.9.5 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown() @ 2016-03-16 17:10 ` Olu Ogunbowale 0 siblings, 0 replies; 20+ messages in thread From: Olu Ogunbowale @ 2016-03-16 17:10 UTC (permalink / raw) To: linux-mm Cc: linux-kernel, Linus Torvalds, Michel Lespinasse, Andrew Morton, Rik van Riel, Hugh Dickins, Russell King, Ralf Baechle, Paul Mundt, David S. Miller, Chris Metcalf, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Olujide Ogunbowale From: Olujide Ogunbowale <Olu.Ogunbowale@imgtec.com> Export the memory management functions, unmapped_area() & unmapped_area_topdown(), as GPL symbols; this allows the kernel to better support process address space mirroring on both CPU and device for out-of-tree drivers by allowing the use of vm_unmapped_area() in a driver's file operation get_unmapped_area(). This is required by drivers that want to control or limit a process VMA range into which shared-virtual-memory (SVM) buffers are mapped during an mmap() call in order to ensure that said SVM VMA does not collide with any pre-existing VMAs used by non-buffer regions on the device because SVM buffers must have identical VMAs on both CPU and device. Exporting these functions is particularly useful for graphics devices as SVM support is required by the OpenCL & HSA specifications and also SVM support for 64-bit CPUs where the useable device SVM address range is/maybe a subset of the full 64-bit range of the CPU. Exporting also avoids the need to duplicate the VMA search code in such drivers. Signed-off-by: Olu Ogunbowale <Olu.Ogunbowale@imgtec.com> --- mm/mmap.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/mm/mmap.c b/mm/mmap.c index 76d1ec2..c08b518 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1804,6 +1804,8 @@ found: return gap_start; } +EXPORT_SYMBOL_GPL(unmapped_area); + unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info) { struct mm_struct *mm = current->mm; @@ -1902,6 +1904,8 @@ found_highest: return gap_end; } +EXPORT_SYMBOL_GPL(unmapped_area_topdown); + /* Get an address range which is currently unmapped. * For shmat() with addr=0. * -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown() 2016-03-16 17:10 ` Olu Ogunbowale @ 2016-03-16 20:36 ` Christoph Hellwig -1 siblings, 0 replies; 20+ messages in thread From: Christoph Hellwig @ 2016-03-16 20:36 UTC (permalink / raw) To: Olu Ogunbowale Cc: linux-mm, linux-kernel, Linus Torvalds, Michel Lespinasse, Andrew Morton, Rik van Riel, Hugh Dickins, Russell King, Ralf Baechle, Paul Mundt, David S. Miller, Chris Metcalf, Ingo Molnar, Thomas Gleixner, H. Peter Anvin On Wed, Mar 16, 2016 at 05:10:34PM +0000, Olu Ogunbowale wrote: > From: Olujide Ogunbowale <Olu.Ogunbowale@imgtec.com> > > Export the memory management functions, unmapped_area() & > unmapped_area_topdown(), as GPL symbols; this allows the kernel to > better support process address space mirroring on both CPU and device > for out-of-tree drivers by allowing the use of vm_unmapped_area() in a > driver's file operation get_unmapped_area(). No new exports without in-tree drivers. How about you get started to get your drives into the tree first? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown() @ 2016-03-16 20:36 ` Christoph Hellwig 0 siblings, 0 replies; 20+ messages in thread From: Christoph Hellwig @ 2016-03-16 20:36 UTC (permalink / raw) To: Olu Ogunbowale Cc: linux-mm, linux-kernel, Linus Torvalds, Michel Lespinasse, Andrew Morton, Rik van Riel, Hugh Dickins, Russell King, Ralf Baechle, Paul Mundt, David S. Miller, Chris Metcalf, Ingo Molnar, Thomas Gleixner, H. Peter Anvin On Wed, Mar 16, 2016 at 05:10:34PM +0000, Olu Ogunbowale wrote: > From: Olujide Ogunbowale <Olu.Ogunbowale@imgtec.com> > > Export the memory management functions, unmapped_area() & > unmapped_area_topdown(), as GPL symbols; this allows the kernel to > better support process address space mirroring on both CPU and device > for out-of-tree drivers by allowing the use of vm_unmapped_area() in a > driver's file operation get_unmapped_area(). No new exports without in-tree drivers. How about you get started to get your drives into the tree first? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown() 2016-03-16 20:36 ` Christoph Hellwig (?) @ 2016-03-16 21:00 ` Rik van Riel 2016-03-17 7:24 ` Ingo Molnar 2016-03-17 16:40 ` Olu Ogunbowale -1 siblings, 2 replies; 20+ messages in thread From: Rik van Riel @ 2016-03-16 21:00 UTC (permalink / raw) To: Christoph Hellwig, Olu Ogunbowale Cc: linux-mm, linux-kernel, Linus Torvalds, Michel Lespinasse, Andrew Morton, Hugh Dickins, Russell King, Ralf Baechle, Paul Mundt, David S. Miller, Chris Metcalf, Ingo Molnar, Thomas Gleixner, H. Peter Anvin [-- Attachment #1: Type: text/plain, Size: 865 bytes --] On Wed, 2016-03-16 at 13:36 -0700, Christoph Hellwig wrote: > On Wed, Mar 16, 2016 at 05:10:34PM +0000, Olu Ogunbowale wrote: > > > > From: Olujide Ogunbowale <Olu.Ogunbowale@imgtec.com> > > > > Export the memory management functions, unmapped_area() & > > unmapped_area_topdown(), as GPL symbols; this allows the kernel to > > better support process address space mirroring on both CPU and > > device > > for out-of-tree drivers by allowing the use of vm_unmapped_area() > > in a > > driver's file operation get_unmapped_area(). > No new exports without in-tree drivers. How about you get started > to get your drives into the tree first? The drivers appear to require the HMM framework though, which people are also reluctant to merge without the drivers. How do we get past this chicken & egg situation? -- All Rights Reversed. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown() 2016-03-16 21:00 ` Rik van Riel @ 2016-03-17 7:24 ` Ingo Molnar 2016-03-17 16:40 ` Olu Ogunbowale 1 sibling, 0 replies; 20+ messages in thread From: Ingo Molnar @ 2016-03-17 7:24 UTC (permalink / raw) To: Rik van Riel Cc: Christoph Hellwig, Olu Ogunbowale, linux-mm, linux-kernel, Linus Torvalds, Michel Lespinasse, Andrew Morton, Hugh Dickins, Russell King, Ralf Baechle, Paul Mundt, David S. Miller, Chris Metcalf, Ingo Molnar, Thomas Gleixner, H. Peter Anvin * Rik van Riel <riel@redhat.com> wrote: > On Wed, 2016-03-16 at 13:36 -0700, Christoph Hellwig wrote: > > On Wed, Mar 16, 2016 at 05:10:34PM +0000, Olu Ogunbowale wrote: > > > > > > From: Olujide Ogunbowale <Olu.Ogunbowale@imgtec.com> > > > > > > Export the memory management functions, unmapped_area() & > > > unmapped_area_topdown(), as GPL symbols; this allows the kernel to > > > better support process address space mirroring on both CPU and > > > device > > > for out-of-tree drivers by allowing the use of vm_unmapped_area() > > > in a > > > driver's file operation get_unmapped_area(). > > No new exports without in-tree drivers. How about you get started > > to get your drives into the tree first? > > The drivers appear to require the HMM framework though, > which people are also reluctant to merge without the > drivers. > > How do we get past this chicken & egg situation? Submit the export together with the drivers for review and Cc: VM folks - it all looks pretty small on the VM side. Thanks, Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown() @ 2016-03-17 7:24 ` Ingo Molnar 0 siblings, 0 replies; 20+ messages in thread From: Ingo Molnar @ 2016-03-17 7:24 UTC (permalink / raw) To: Rik van Riel Cc: Christoph Hellwig, Olu Ogunbowale, linux-mm, linux-kernel, Linus Torvalds, Michel Lespinasse, Andrew Morton, Hugh Dickins, Russell King, Ralf Baechle, Paul Mundt, David S. Miller, Chris Metcalf, Ingo Molnar, Thomas Gleixner, H. Peter Anvin * Rik van Riel <riel@redhat.com> wrote: > On Wed, 2016-03-16 at 13:36 -0700, Christoph Hellwig wrote: > > On Wed, Mar 16, 2016 at 05:10:34PM +0000, Olu Ogunbowale wrote: > > > > > > From: Olujide Ogunbowale <Olu.Ogunbowale@imgtec.com> > > > > > > Export the memory management functions, unmapped_area() & > > > unmapped_area_topdown(), as GPL symbols; this allows the kernel to > > > better support process address space mirroring on both CPU and > > > device > > > for out-of-tree drivers by allowing the use of vm_unmapped_area() > > > in a > > > driver's file operation get_unmapped_area(). > > No new exports without in-tree drivers. How about you get started > > to get your drives into the tree first? > > The drivers appear to require the HMM framework though, > which people are also reluctant to merge without the > drivers. > > How do we get past this chicken & egg situation? Submit the export together with the drivers for review and Cc: VM folks - it all looks pretty small on the VM side. Thanks, Ingo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown() 2016-03-16 21:00 ` Rik van Riel 2016-03-17 7:24 ` Ingo Molnar @ 2016-03-17 16:40 ` Olu Ogunbowale 1 sibling, 0 replies; 20+ messages in thread From: Olu Ogunbowale @ 2016-03-17 16:40 UTC (permalink / raw) To: Rik van Riel Cc: inux-mm, linux-kernel, Linus Torvalds, Michel Lespinasse, Andrew Morton, Jerome Glisse, Hugh Dickins, Russell King, Ralf Baechle, Paul Mundt, David S. Miller, Chris Metcalf, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Jackson DSouza On Wed, Mar 16, 2016 at 05:00:41PM -0400, Rik van Riel wrote: > > The drivers appear to require the HMM framework though, > which people are also reluctant to merge without the > drivers. > > How do we get past this chicken & egg situation? I would like to point out that support for HSA varies from one vendor/design to another; for some device/drivers (i.e AMD APU/HSA kernel driver), no form of address space mirroring is required (i.e. AMD IOMMU v2) AFAIK, others require address space mirroring so need the kernel HMM framework but only because they provide support for the full HSA/OpenCL SVM specification while some do not require HMM at all because they implement only a subset of the specification. These exports enables the latter approach which does not require the kernel HMM framework in order to support process address space mirroring. Regards, Olu ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown() 2016-03-16 17:10 ` Olu Ogunbowale @ 2016-03-17 14:37 ` Jerome Glisse -1 siblings, 0 replies; 20+ messages in thread From: Jerome Glisse @ 2016-03-17 14:37 UTC (permalink / raw) To: Olu Ogunbowale Cc: linux-mm, linux-kernel, Linus Torvalds, Michel Lespinasse, Andrew Morton, Rik van Riel, Hugh Dickins, Russell King, Ralf Baechle, Paul Mundt, David S. Miller, Chris Metcalf, Ingo Molnar, Thomas Gleixner, H. Peter Anvin On Wed, Mar 16, 2016 at 05:10:34PM +0000, Olu Ogunbowale wrote: > From: Olujide Ogunbowale <Olu.Ogunbowale@imgtec.com> > > Export the memory management functions, unmapped_area() & > unmapped_area_topdown(), as GPL symbols; this allows the kernel to > better support process address space mirroring on both CPU and device > for out-of-tree drivers by allowing the use of vm_unmapped_area() in a > driver's file operation get_unmapped_area(). > > This is required by drivers that want to control or limit a process VMA > range into which shared-virtual-memory (SVM) buffers are mapped during > an mmap() call in order to ensure that said SVM VMA does not collide > with any pre-existing VMAs used by non-buffer regions on the device > because SVM buffers must have identical VMAs on both CPU and device. > > Exporting these functions is particularly useful for graphics devices as > SVM support is required by the OpenCL & HSA specifications and also SVM > support for 64-bit CPUs where the useable device SVM address range > is/maybe a subset of the full 64-bit range of the CPU. Exporting also > avoids the need to duplicate the VMA search code in such drivers. What other driver do for non-buffer region is have the userspace side of the device driver mmap the device driver file and use vma range you get from that for those non-buffer region. On cpu access you can either chose to fault or to return a dummy page. With that trick no need to change kernel. Note that i do not see how you can solve the issue of your GPU having less bits then the cpu. For instance, lets assume that you have 46bits for the GPU while the CPU have 48bits. Now an application start and do bunch of allocation that end up above (1 << 46), then same application load your driver and start using some API that allow to transparently use previously allocated memory -> fails. Unless you are in scheme were all allocation must go through some special allocator but i thought this was not the case for HSA. I know lower level of OpenCL allows that. Cheers, Jerome -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown() @ 2016-03-17 14:37 ` Jerome Glisse 0 siblings, 0 replies; 20+ messages in thread From: Jerome Glisse @ 2016-03-17 14:37 UTC (permalink / raw) To: Olu Ogunbowale Cc: linux-mm, linux-kernel, Linus Torvalds, Michel Lespinasse, Andrew Morton, Rik van Riel, Hugh Dickins, Russell King, Ralf Baechle, Paul Mundt, David S. Miller, Chris Metcalf, Ingo Molnar, Thomas Gleixner, H. Peter Anvin On Wed, Mar 16, 2016 at 05:10:34PM +0000, Olu Ogunbowale wrote: > From: Olujide Ogunbowale <Olu.Ogunbowale@imgtec.com> > > Export the memory management functions, unmapped_area() & > unmapped_area_topdown(), as GPL symbols; this allows the kernel to > better support process address space mirroring on both CPU and device > for out-of-tree drivers by allowing the use of vm_unmapped_area() in a > driver's file operation get_unmapped_area(). > > This is required by drivers that want to control or limit a process VMA > range into which shared-virtual-memory (SVM) buffers are mapped during > an mmap() call in order to ensure that said SVM VMA does not collide > with any pre-existing VMAs used by non-buffer regions on the device > because SVM buffers must have identical VMAs on both CPU and device. > > Exporting these functions is particularly useful for graphics devices as > SVM support is required by the OpenCL & HSA specifications and also SVM > support for 64-bit CPUs where the useable device SVM address range > is/maybe a subset of the full 64-bit range of the CPU. Exporting also > avoids the need to duplicate the VMA search code in such drivers. What other driver do for non-buffer region is have the userspace side of the device driver mmap the device driver file and use vma range you get from that for those non-buffer region. On cpu access you can either chose to fault or to return a dummy page. With that trick no need to change kernel. Note that i do not see how you can solve the issue of your GPU having less bits then the cpu. For instance, lets assume that you have 46bits for the GPU while the CPU have 48bits. Now an application start and do bunch of allocation that end up above (1 << 46), then same application load your driver and start using some API that allow to transparently use previously allocated memory -> fails. Unless you are in scheme were all allocation must go through some special allocator but i thought this was not the case for HSA. I know lower level of OpenCL allows that. Cheers, Jérôme ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown() 2016-03-17 14:37 ` Jerome Glisse @ 2016-03-17 15:38 ` Oded Gabbay -1 siblings, 0 replies; 20+ messages in thread From: Oded Gabbay @ 2016-03-17 15:38 UTC (permalink / raw) To: Jerome Glisse Cc: Olu Ogunbowale, linux-mm, Linux-Kernel@Vger. Kernel. Org, Linus Torvalds, Michel Lespinasse, Andrew Morton, Rik van Riel, Hugh Dickins, Russell King, Ralf Baechle, Paul Mundt, David S. Miller, Chris Metcalf, Ingo Molnar, Thomas Gleixner, H. Peter Anvin On Thu, Mar 17, 2016 at 4:37 PM, Jerome Glisse <j.glisse@gmail.com> wrote: > On Wed, Mar 16, 2016 at 05:10:34PM +0000, Olu Ogunbowale wrote: >> From: Olujide Ogunbowale <Olu.Ogunbowale@imgtec.com> >> >> Export the memory management functions, unmapped_area() & >> unmapped_area_topdown(), as GPL symbols; this allows the kernel to >> better support process address space mirroring on both CPU and device >> for out-of-tree drivers by allowing the use of vm_unmapped_area() in a >> driver's file operation get_unmapped_area(). >> >> This is required by drivers that want to control or limit a process VMA >> range into which shared-virtual-memory (SVM) buffers are mapped during >> an mmap() call in order to ensure that said SVM VMA does not collide >> with any pre-existing VMAs used by non-buffer regions on the device >> because SVM buffers must have identical VMAs on both CPU and device. >> >> Exporting these functions is particularly useful for graphics devices as >> SVM support is required by the OpenCL & HSA specifications and also SVM >> support for 64-bit CPUs where the useable device SVM address range >> is/maybe a subset of the full 64-bit range of the CPU. Exporting also >> avoids the need to duplicate the VMA search code in such drivers. > > What other driver do for non-buffer region is have the userspace side > of the device driver mmap the device driver file and use vma range you > get from that for those non-buffer region. On cpu access you can either > chose to fault or to return a dummy page. With that trick no need to > change kernel. > > Note that i do not see how you can solve the issue of your GPU having > less bits then the cpu. For instance, lets assume that you have 46bits > for the GPU while the CPU have 48bits. Now an application start and do > bunch of allocation that end up above (1 << 46), then same application > load your driver and start using some API that allow to transparently > use previously allocated memory -> fails. > > Unless you are in scheme were all allocation must go through some > special allocator but i thought this was not the case for HSA. I know > lower level of OpenCL allows that. > > Cheers, > Jérôme In amdkfd (AMD HSA kernel driver), for APU's where the CPU and GPU sit on the same die, we don't need this as the GPU cores use the AMD IOMMU (v2) to access the system memory. i.e. we don't need to use vram (gpu memory) at all and we don't need to mirror address spaces. For dGPU, it's a different story. On GPUs where there is only 40-bit memory space, for example, GCN 1.0 and 1.1, I would assume a pass through a special allocator is a must, while memory addresses below the 40-bit limit will need to be reserved for HSA. Note that amdkfd doesn't support dGPU at this time. Thanks, Oded -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown() @ 2016-03-17 15:38 ` Oded Gabbay 0 siblings, 0 replies; 20+ messages in thread From: Oded Gabbay @ 2016-03-17 15:38 UTC (permalink / raw) To: Jerome Glisse Cc: Olu Ogunbowale, linux-mm, Linux-Kernel@Vger. Kernel. Org, Linus Torvalds, Michel Lespinasse, Andrew Morton, Rik van Riel, Hugh Dickins, Russell King, Ralf Baechle, Paul Mundt, David S. Miller, Chris Metcalf, Ingo Molnar, Thomas Gleixner, H. Peter Anvin On Thu, Mar 17, 2016 at 4:37 PM, Jerome Glisse <j.glisse@gmail.com> wrote: > On Wed, Mar 16, 2016 at 05:10:34PM +0000, Olu Ogunbowale wrote: >> From: Olujide Ogunbowale <Olu.Ogunbowale@imgtec.com> >> >> Export the memory management functions, unmapped_area() & >> unmapped_area_topdown(), as GPL symbols; this allows the kernel to >> better support process address space mirroring on both CPU and device >> for out-of-tree drivers by allowing the use of vm_unmapped_area() in a >> driver's file operation get_unmapped_area(). >> >> This is required by drivers that want to control or limit a process VMA >> range into which shared-virtual-memory (SVM) buffers are mapped during >> an mmap() call in order to ensure that said SVM VMA does not collide >> with any pre-existing VMAs used by non-buffer regions on the device >> because SVM buffers must have identical VMAs on both CPU and device. >> >> Exporting these functions is particularly useful for graphics devices as >> SVM support is required by the OpenCL & HSA specifications and also SVM >> support for 64-bit CPUs where the useable device SVM address range >> is/maybe a subset of the full 64-bit range of the CPU. Exporting also >> avoids the need to duplicate the VMA search code in such drivers. > > What other driver do for non-buffer region is have the userspace side > of the device driver mmap the device driver file and use vma range you > get from that for those non-buffer region. On cpu access you can either > chose to fault or to return a dummy page. With that trick no need to > change kernel. > > Note that i do not see how you can solve the issue of your GPU having > less bits then the cpu. For instance, lets assume that you have 46bits > for the GPU while the CPU have 48bits. Now an application start and do > bunch of allocation that end up above (1 << 46), then same application > load your driver and start using some API that allow to transparently > use previously allocated memory -> fails. > > Unless you are in scheme were all allocation must go through some > special allocator but i thought this was not the case for HSA. I know > lower level of OpenCL allows that. > > Cheers, > Jérôme In amdkfd (AMD HSA kernel driver), for APU's where the CPU and GPU sit on the same die, we don't need this as the GPU cores use the AMD IOMMU (v2) to access the system memory. i.e. we don't need to use vram (gpu memory) at all and we don't need to mirror address spaces. For dGPU, it's a different story. On GPUs where there is only 40-bit memory space, for example, GCN 1.0 and 1.1, I would assume a pass through a special allocator is a must, while memory addresses below the 40-bit limit will need to be reserved for HSA. Note that amdkfd doesn't support dGPU at this time. Thanks, Oded ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown() 2016-03-17 14:37 ` Jerome Glisse @ 2016-03-17 15:46 ` Olu Ogunbowale -1 siblings, 0 replies; 20+ messages in thread From: Olu Ogunbowale @ 2016-03-17 15:46 UTC (permalink / raw) To: Jerome Glisse Cc: linux-mm, linux-kernel, Linus Torvalds, Michel Lespinasse, Andrew Morton, Rik van Riel, Hugh Dickins, Russell King, Ralf Baechle, Paul Mundt, David S. Miller, Chris Metcalf, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Jackson DSouza On Thu, Mar 17, 2016 at 03:37:16PM +0100, Jerome Glisse wrote: > What other driver do for non-buffer region is have the userspace side > of the device driver mmap the device driver file and use vma range you > get from that for those non-buffer region. On cpu access you can either > chose to fault or to return a dummy page. With that trick no need to > change kernel. Yes, this approach works for some designs however arbitrary VMA ranges for non-buffer regions is not a feature of all mobile gpu designs for performance, power, and area (PPA) reasons. > Note that i do not see how you can solve the issue of your GPU having > less bits then the cpu. For instance, lets assume that you have 46bits > for the GPU while the CPU have 48bits. Now an application start and do > bunch of allocation that end up above (1 << 46), then same application > load your driver and start using some API that allow to transparently > use previously allocated memory -> fails. Yes, you are correct however for mobile SoC(s) though current top-end specifications have 4GB/8GB of installed ram so the usable SVM range is upper bound by this giving a fixed base hence the need for driver control of VMA range. > Unless you are in scheme were all allocation must go through some > special allocator but i thought this was not the case for HSA. I know > lower level of OpenCL allows that. Subsets of both specifications allows for restricted implementation AFAIK, this proposed changes are for HSA and OpenCL up to phase 2, where all SVM allocations go via special user mode allocator. Regards, Olu -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown() @ 2016-03-17 15:46 ` Olu Ogunbowale 0 siblings, 0 replies; 20+ messages in thread From: Olu Ogunbowale @ 2016-03-17 15:46 UTC (permalink / raw) To: Jerome Glisse Cc: linux-mm, linux-kernel, Linus Torvalds, Michel Lespinasse, Andrew Morton, Rik van Riel, Hugh Dickins, Russell King, Ralf Baechle, Paul Mundt, David S. Miller, Chris Metcalf, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Jackson DSouza On Thu, Mar 17, 2016 at 03:37:16PM +0100, Jerome Glisse wrote: > What other driver do for non-buffer region is have the userspace side > of the device driver mmap the device driver file and use vma range you > get from that for those non-buffer region. On cpu access you can either > chose to fault or to return a dummy page. With that trick no need to > change kernel. Yes, this approach works for some designs however arbitrary VMA ranges for non-buffer regions is not a feature of all mobile gpu designs for performance, power, and area (PPA) reasons. > Note that i do not see how you can solve the issue of your GPU having > less bits then the cpu. For instance, lets assume that you have 46bits > for the GPU while the CPU have 48bits. Now an application start and do > bunch of allocation that end up above (1 << 46), then same application > load your driver and start using some API that allow to transparently > use previously allocated memory -> fails. Yes, you are correct however for mobile SoC(s) though current top-end specifications have 4GB/8GB of installed ram so the usable SVM range is upper bound by this giving a fixed base hence the need for driver control of VMA range. > Unless you are in scheme were all allocation must go through some > special allocator but i thought this was not the case for HSA. I know > lower level of OpenCL allows that. Subsets of both specifications allows for restricted implementation AFAIK, this proposed changes are for HSA and OpenCL up to phase 2, where all SVM allocations go via special user mode allocator. Regards, Olu ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown() 2016-03-17 15:46 ` Olu Ogunbowale @ 2016-03-17 17:03 ` Jerome Glisse -1 siblings, 0 replies; 20+ messages in thread From: Jerome Glisse @ 2016-03-17 17:03 UTC (permalink / raw) To: Olu Ogunbowale Cc: linux-mm, linux-kernel, Linus Torvalds, Michel Lespinasse, Andrew Morton, Rik van Riel, Hugh Dickins, Russell King, Ralf Baechle, Paul Mundt, David S. Miller, Chris Metcalf, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Jackson DSouza On Thu, Mar 17, 2016 at 03:46:35PM +0000, Olu Ogunbowale wrote: > On Thu, Mar 17, 2016 at 03:37:16PM +0100, Jerome Glisse wrote: > > What other driver do for non-buffer region is have the userspace side > > of the device driver mmap the device driver file and use vma range you > > get from that for those non-buffer region. On cpu access you can either > > chose to fault or to return a dummy page. With that trick no need to > > change kernel. > > Yes, this approach works for some designs however arbitrary VMA ranges > for non-buffer regions is not a feature of all mobile gpu designs for > performance, power, and area (PPA) reasons. Well trick still works, if driver is loaded early during userspace program initialization then you force mmap to specific range inside the driver userspace code. If driver is loaded after and program is already using those range then you can register a notifier to track when those range. If they get release by the program you can have the userspace driver force creation of new reserve vma again. > > > Note that i do not see how you can solve the issue of your GPU having > > less bits then the cpu. For instance, lets assume that you have 46bits > > for the GPU while the CPU have 48bits. Now an application start and do > > bunch of allocation that end up above (1 << 46), then same application > > load your driver and start using some API that allow to transparently > > use previously allocated memory -> fails. > > Yes, you are correct however for mobile SoC(s) though current top-end > specifications have 4GB/8GB of installed ram so the usable SVM range is > upper bound by this giving a fixed base hence the need for driver control > of VMA range. Well controling range into which VMA can be allocated is not something that you should do lightly (thing like address space randomization would be impacted). And no the SVM range is not upper bound by the amount of memory but by the physical bus size if it is 48bits nothing forbid to put all the program memory above 8GB and nothing below. We are talking virtual address here. By the way i think most 64 bit ARM are 40 bits and it seems a shame for GPU to not go as high as the CPU. Cheers, Jerome -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown() @ 2016-03-17 17:03 ` Jerome Glisse 0 siblings, 0 replies; 20+ messages in thread From: Jerome Glisse @ 2016-03-17 17:03 UTC (permalink / raw) To: Olu Ogunbowale Cc: linux-mm, linux-kernel, Linus Torvalds, Michel Lespinasse, Andrew Morton, Rik van Riel, Hugh Dickins, Russell King, Ralf Baechle, Paul Mundt, David S. Miller, Chris Metcalf, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Jackson DSouza On Thu, Mar 17, 2016 at 03:46:35PM +0000, Olu Ogunbowale wrote: > On Thu, Mar 17, 2016 at 03:37:16PM +0100, Jerome Glisse wrote: > > What other driver do for non-buffer region is have the userspace side > > of the device driver mmap the device driver file and use vma range you > > get from that for those non-buffer region. On cpu access you can either > > chose to fault or to return a dummy page. With that trick no need to > > change kernel. > > Yes, this approach works for some designs however arbitrary VMA ranges > for non-buffer regions is not a feature of all mobile gpu designs for > performance, power, and area (PPA) reasons. Well trick still works, if driver is loaded early during userspace program initialization then you force mmap to specific range inside the driver userspace code. If driver is loaded after and program is already using those range then you can register a notifier to track when those range. If they get release by the program you can have the userspace driver force creation of new reserve vma again. > > > Note that i do not see how you can solve the issue of your GPU having > > less bits then the cpu. For instance, lets assume that you have 46bits > > for the GPU while the CPU have 48bits. Now an application start and do > > bunch of allocation that end up above (1 << 46), then same application > > load your driver and start using some API that allow to transparently > > use previously allocated memory -> fails. > > Yes, you are correct however for mobile SoC(s) though current top-end > specifications have 4GB/8GB of installed ram so the usable SVM range is > upper bound by this giving a fixed base hence the need for driver control > of VMA range. Well controling range into which VMA can be allocated is not something that you should do lightly (thing like address space randomization would be impacted). And no the SVM range is not upper bound by the amount of memory but by the physical bus size if it is 48bits nothing forbid to put all the program memory above 8GB and nothing below. We are talking virtual address here. By the way i think most 64 bit ARM are 40 bits and it seems a shame for GPU to not go as high as the CPU. Cheers, Jérôme ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown() 2016-03-17 17:03 ` Jerome Glisse @ 2016-03-17 17:42 ` Olu Ogunbowale -1 siblings, 0 replies; 20+ messages in thread From: Olu Ogunbowale @ 2016-03-17 17:42 UTC (permalink / raw) To: Jerome Glisse Cc: linux-mm, linux-kernel, Linus Torvalds, Michel Lespinasse, Andrew Morton, Rik van Riel, Hugh Dickins, Russell King, Ralf Baechle, Paul Mundt, David S. Miller, Chris Metcalf, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Jackson DSouza On Thu, Mar 17, 2016 at 06:03:50PM +0100, Jerome Glisse wrote: > Well trick still works, if driver is loaded early during userspace program > initialization then you force mmap to specific range inside the driver > userspace code. If driver is loaded after and program is already using those > range then you can register a notifier to track when those range. If they > get release by the program you can have the userspace driver force creation > of new reserve vma again. I should have been more clearer in my response, this applies only because we are in a scheme were all allocations must go through a special allocator because VMA base/range is reserved for SVM. > Well controling range into which VMA can be allocated is not something that > you should do lightly (thing like address space randomization would be > impacted). And no the SVM range is not upper bound by the amount of memory > but by the physical bus size if it is 48bits nothing forbid to put all the > program memory above 8GB and nothing below. We are talking virtual address > here. By the way i think most 64 bit ARM are 40 bits and it seems a shame > for GPU to not go as high as the CPU. Same as above. By the way, we support minimum 40-bits but can be paired with CPU(s) of higher bits; no problem if bits are equal or greater than CPU. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown() @ 2016-03-17 17:42 ` Olu Ogunbowale 0 siblings, 0 replies; 20+ messages in thread From: Olu Ogunbowale @ 2016-03-17 17:42 UTC (permalink / raw) To: Jerome Glisse Cc: linux-mm, linux-kernel, Linus Torvalds, Michel Lespinasse, Andrew Morton, Rik van Riel, Hugh Dickins, Russell King, Ralf Baechle, Paul Mundt, David S. Miller, Chris Metcalf, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Jackson DSouza On Thu, Mar 17, 2016 at 06:03:50PM +0100, Jerome Glisse wrote: > Well trick still works, if driver is loaded early during userspace program > initialization then you force mmap to specific range inside the driver > userspace code. If driver is loaded after and program is already using those > range then you can register a notifier to track when those range. If they > get release by the program you can have the userspace driver force creation > of new reserve vma again. I should have been more clearer in my response, this applies only because we are in a scheme were all allocations must go through a special allocator because VMA base/range is reserved for SVM. > Well controling range into which VMA can be allocated is not something that > you should do lightly (thing like address space randomization would be > impacted). And no the SVM range is not upper bound by the amount of memory > but by the physical bus size if it is 48bits nothing forbid to put all the > program memory above 8GB and nothing below. We are talking virtual address > here. By the way i think most 64 bit ARM are 40 bits and it seems a shame > for GPU to not go as high as the CPU. Same as above. By the way, we support minimum 40-bits but can be paired with CPU(s) of higher bits; no problem if bits are equal or greater than CPU. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2016-03-17 17:42 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-03-16 17:10 Mirroring process address space on device Olu Ogunbowale 2016-03-16 17:10 ` Olu Ogunbowale 2016-03-16 17:10 ` [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown() Olu Ogunbowale 2016-03-16 17:10 ` Olu Ogunbowale 2016-03-16 20:36 ` Christoph Hellwig 2016-03-16 20:36 ` Christoph Hellwig 2016-03-16 21:00 ` Rik van Riel 2016-03-17 7:24 ` Ingo Molnar 2016-03-17 7:24 ` Ingo Molnar 2016-03-17 16:40 ` Olu Ogunbowale 2016-03-17 14:37 ` Jerome Glisse 2016-03-17 14:37 ` Jerome Glisse 2016-03-17 15:38 ` Oded Gabbay 2016-03-17 15:38 ` Oded Gabbay 2016-03-17 15:46 ` Olu Ogunbowale 2016-03-17 15:46 ` Olu Ogunbowale 2016-03-17 17:03 ` Jerome Glisse 2016-03-17 17:03 ` Jerome Glisse 2016-03-17 17:42 ` Olu Ogunbowale 2016-03-17 17:42 ` Olu Ogunbowale
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.