* [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() @ 2014-06-13 16:26 Chris Wilson 2014-06-13 16:26 ` [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass Chris Wilson ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Chris Wilson @ 2014-06-13 16:26 UTC (permalink / raw) To: intel-gfx Cc: Rik van Riel, Peter Zijlstra, Cyrill Gorcunov, linux-mm, Mel Gorman, Johannes Weiner, Andrew Morton, Kirill A. Shutemov When using remap_pfn_range() from a fault handler, we are exposed to races between concurrent faults. Rather than hitting a BUG, report the error back to the caller, like vm_insert_pfn(). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: linux-mm@kvack.org --- mm/memory.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 037b812a9531..6603a9e6a731 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2306,19 +2306,23 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, { pte_t *pte; spinlock_t *ptl; + int ret = 0; pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); if (!pte) return -ENOMEM; arch_enter_lazy_mmu_mode(); do { - BUG_ON(!pte_none(*pte)); + if (!pte_none(*pte)) { + ret = -EBUSY; + break; + } set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); pfn++; } while (pte++, addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); pte_unmap_unlock(pte - 1, ptl); - return 0; + return ret; } static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, -- 2.0.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass 2014-06-13 16:26 [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() Chris Wilson @ 2014-06-13 16:26 ` Chris Wilson 2014-06-13 16:34 ` [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() Chris Wilson 2014-06-16 13:41 ` Kirill A. Shutemov 2 siblings, 0 replies; 16+ messages in thread From: Chris Wilson @ 2014-06-13 16:26 UTC (permalink / raw) To: intel-gfx; +Cc: linux-mm On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences, Upload rate for 2 linear surfaces: 8134MiB/s -> 8154MiB/s Upload rate for 2 tiled surfaces: 8625MiB/s -> 8632MiB/s Upload rate for 4 linear surfaces: 8127MiB/s -> 8134MiB/s Upload rate for 4 tiled surfaces: 8602MiB/s -> 8629MiB/s Upload rate for 8 linear surfaces: 8124MiB/s -> 8137MiB/s Upload rate for 8 tiled surfaces: 8603MiB/s -> 8624MiB/s Upload rate for 16 linear surfaces: 8123MiB/s -> 8128MiB/s Upload rate for 16 tiled surfaces: 8606MiB/s -> 8618MiB/s Upload rate for 32 linear surfaces: 8121MiB/s -> 8128MiB/s Upload rate for 32 tiled surfaces: 8605MiB/s -> 8614MiB/s Upload rate for 64 linear surfaces: 8121MiB/s -> 8127MiB/s Upload rate for 64 tiled surfaces: 3017MiB/s -> 5127MiB/s Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Testcase: igt/gem_fence_upload/performance Testcase: igt/gem_mmap_gtt Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> Cc: linux-mm@kvack.org --- drivers/gpu/drm/i915/i915_gem.c | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index c313cb2b641b..e6246634b419 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1565,22 +1565,23 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf) pfn = dev_priv->gtt.mappable_base + i915_gem_obj_ggtt_offset(obj); pfn >>= PAGE_SHIFT; - if (!obj->fault_mappable) { - int i; + ret = remap_pfn_range(vma, vma->vm_start, + pfn, vma->vm_end - vma->vm_start, + vma->vm_page_prot); + if (ret) { + /* After passing the sanity checks on remap_pfn_range(), we may + * abort whilst updating the pagetables due to ENOMEM and leave + * the tables in an inconsistent state. Reset them all now. + * However, we do not want to undo the work of another thread + * that beat us to prefaulting the PTEs. + */ + if (ret != -EBUSY) + zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start); + goto unpin; + } - for (i = 0; i < obj->base.size >> PAGE_SHIFT; i++) { - ret = vm_insert_pfn(vma, - (unsigned long)vma->vm_start + i * PAGE_SIZE, - pfn + i); - if (ret) - break; - } + obj->fault_mappable = true; - obj->fault_mappable = true; - } else - ret = vm_insert_pfn(vma, - (unsigned long)vmf->virtual_address, - pfn + page_offset); unpin: i915_gem_object_ggtt_unpin(obj); unlock: -- 2.0.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() 2014-06-13 16:26 [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() Chris Wilson 2014-06-13 16:26 ` [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass Chris Wilson @ 2014-06-13 16:34 ` Chris Wilson 2014-06-16 13:41 ` Kirill A. Shutemov 2 siblings, 0 replies; 16+ messages in thread From: Chris Wilson @ 2014-06-13 16:34 UTC (permalink / raw) To: intel-gfx Cc: Andrew Morton, Kirill A. Shutemov, Peter Zijlstra, Rik van Riel, Mel Gorman, Cyrill Gorcunov, Johannes Weiner, linux-mm On Fri, Jun 13, 2014 at 05:26:17PM +0100, Chris Wilson wrote: > When using remap_pfn_range() from a fault handler, we are exposed to > races between concurrent faults. Rather than hitting a BUG, report the > error back to the caller, like vm_insert_pfn(). > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Rik van Riel <riel@redhat.com> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Cyrill Gorcunov <gorcunov@gmail.com> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: linux-mm@kvack.org > --- > mm/memory.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 037b812a9531..6603a9e6a731 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2306,19 +2306,23 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, > { > pte_t *pte; > spinlock_t *ptl; > + int ret = 0; > > pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); > if (!pte) > return -ENOMEM; > arch_enter_lazy_mmu_mode(); > do { > - BUG_ON(!pte_none(*pte)); > + if (!pte_none(*pte)) { > + ret = -EBUSY; > + break; > + } > set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); > pfn++; > } while (pte++, addr += PAGE_SIZE, addr != end); > arch_leave_lazy_mmu_mode(); > pte_unmap_unlock(pte - 1, ptl); Oh. That will want the EBUSY path to increment pte or we will try to unmap the wrong page. -Chris -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() 2014-06-13 16:26 [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() Chris Wilson 2014-06-13 16:26 ` [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass Chris Wilson 2014-06-13 16:34 ` [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() Chris Wilson @ 2014-06-16 13:41 ` Kirill A. Shutemov 2014-06-19 7:19 ` [PATCH] " Chris Wilson 2 siblings, 1 reply; 16+ messages in thread From: Kirill A. Shutemov @ 2014-06-16 13:41 UTC (permalink / raw) To: Chris Wilson Cc: intel-gfx, Andrew Morton, Kirill A. Shutemov, Peter Zijlstra, Rik van Riel, Mel Gorman, Cyrill Gorcunov, Johannes Weiner, linux-mm Chris Wilson wrote: > When using remap_pfn_range() from a fault handler, we are exposed to > races between concurrent faults. Rather than hitting a BUG, report the > error back to the caller, like vm_insert_pfn(). > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Rik van Riel <riel@redhat.com> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Cyrill Gorcunov <gorcunov@gmail.com> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: linux-mm@kvack.org > --- > mm/memory.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 037b812a9531..6603a9e6a731 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2306,19 +2306,23 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, > { > pte_t *pte; > spinlock_t *ptl; > + int ret = 0; > > pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); > if (!pte) > return -ENOMEM; > arch_enter_lazy_mmu_mode(); > do { > - BUG_ON(!pte_none(*pte)); > + if (!pte_none(*pte)) { > + ret = -EBUSY; > + break; I think you need at least remove entries you've setup if the check failed not at first iteration. And nobody propagate your -EBUSY back to remap_pfn_range(): caller will see -ENOMEM, which is not what you want, I believe. -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range() 2014-06-16 13:41 ` Kirill A. Shutemov @ 2014-06-19 7:19 ` Chris Wilson 2014-06-19 11:50 ` Kirill A. Shutemov 0 siblings, 1 reply; 16+ messages in thread From: Chris Wilson @ 2014-06-19 7:19 UTC (permalink / raw) To: intel-gfx Cc: Chris Wilson, Andrew Morton, Kirill A. Shutemov, Peter Zijlstra, Rik van Riel, Mel Gorman, Cyrill Gorcunov, Johannes Weiner, linux-mm When using remap_pfn_range() from a fault handler, we are exposed to races between concurrent faults. Rather than hitting a BUG, report the error back to the caller, like vm_insert_pfn(). v2: Fix the pte address for unmapping along the error path. v3: Report the error back and cleanup partial remaps. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: linux-mm@kvack.org --- Whilst this has the semantics I want to allow two concurrent, but serialised, pagefaults that try to prefault the same object to succeed, it looks fragile and fraught with subtlety. -Chris --- mm/memory.c | 54 ++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 38 insertions(+), 16 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index d67fd9f..be51fcc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1657,32 +1657,41 @@ EXPORT_SYMBOL(vm_insert_mixed); * in null mappings (currently treated as "copy-on-access") */ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, - unsigned long addr, unsigned long end, - unsigned long pfn, pgprot_t prot) + unsigned long addr, unsigned long end, + unsigned long pfn, pgprot_t prot, + bool first) { pte_t *pte; spinlock_t *ptl; + int err = 0; pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); if (!pte) return -ENOMEM; arch_enter_lazy_mmu_mode(); do { - BUG_ON(!pte_none(*pte)); + if (!pte_none(*pte)) { + err = first ? -EBUSY : -EINVAL; + pte++; + break; + } + first = false; set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); pfn++; } while (pte++, addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); pte_unmap_unlock(pte - 1, ptl); - return 0; + return err; } static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, - unsigned long addr, unsigned long end, - unsigned long pfn, pgprot_t prot) + unsigned long addr, unsigned long end, + unsigned long pfn, pgprot_t prot, + bool first) { pmd_t *pmd; unsigned long next; + int err; pfn -= addr >> PAGE_SHIFT; pmd = pmd_alloc(mm, pud, addr); @@ -1691,19 +1700,23 @@ static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, VM_BUG_ON(pmd_trans_huge(*pmd)); do { next = pmd_addr_end(addr, end); - if (remap_pte_range(mm, pmd, addr, next, - pfn + (addr >> PAGE_SHIFT), prot)) - return -ENOMEM; + err = remap_pte_range(mm, pmd, addr, next, + pfn + (addr >> PAGE_SHIFT), prot, first); + if (err) + return err; + + first = false; } while (pmd++, addr = next, addr != end); return 0; } static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, - unsigned long addr, unsigned long end, - unsigned long pfn, pgprot_t prot) + unsigned long addr, unsigned long end, + unsigned long pfn, pgprot_t prot, bool first) { pud_t *pud; unsigned long next; + int err; pfn -= addr >> PAGE_SHIFT; pud = pud_alloc(mm, pgd, addr); @@ -1711,9 +1724,12 @@ static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, return -ENOMEM; do { next = pud_addr_end(addr, end); - if (remap_pmd_range(mm, pud, addr, next, - pfn + (addr >> PAGE_SHIFT), prot)) - return -ENOMEM; + err = remap_pmd_range(mm, pud, addr, next, + pfn + (addr >> PAGE_SHIFT), prot, first); + if (err) + return err; + + first = false; } while (pud++, addr = next, addr != end); return 0; } @@ -1735,6 +1751,7 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, unsigned long next; unsigned long end = addr + PAGE_ALIGN(size); struct mm_struct *mm = vma->vm_mm; + bool first = true; int err; /* @@ -1774,13 +1791,18 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, do { next = pgd_addr_end(addr, end); err = remap_pud_range(mm, pgd, addr, next, - pfn + (addr >> PAGE_SHIFT), prot); + pfn + (addr >> PAGE_SHIFT), prot, first); if (err) break; + + first = false; } while (pgd++, addr = next, addr != end); - if (err) + if (err) { untrack_pfn(vma, pfn, PAGE_ALIGN(size)); + if (err != -EBUSY) + zap_page_range_single(vma, addr, size, NULL); + } return err; } -- 1.9.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 16+ messages in thread
* RE: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range() 2014-06-19 7:19 ` [PATCH] " Chris Wilson @ 2014-06-19 11:50 ` Kirill A. Shutemov 2014-06-19 12:00 ` Chris Wilson 0 siblings, 1 reply; 16+ messages in thread From: Kirill A. Shutemov @ 2014-06-19 11:50 UTC (permalink / raw) To: Chris Wilson Cc: intel-gfx, Andrew Morton, Kirill A. Shutemov, Peter Zijlstra, Rik van Riel, Mel Gorman, Cyrill Gorcunov, Johannes Weiner, linux-mm Chris Wilson wrote: > When using remap_pfn_range() from a fault handler, we are exposed to > races between concurrent faults. Rather than hitting a BUG, report the > error back to the caller, like vm_insert_pfn(). > > v2: Fix the pte address for unmapping along the error path. > v3: Report the error back and cleanup partial remaps. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Rik van Riel <riel@redhat.com> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Cyrill Gorcunov <gorcunov@gmail.com> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: linux-mm@kvack.org > --- > > Whilst this has the semantics I want to allow two concurrent, but > serialised, pagefaults that try to prefault the same object to succeed, > it looks fragile and fraught with subtlety. > -Chris > > --- > mm/memory.c | 54 ++++++++++++++++++++++++++++++++++++++---------------- > 1 file changed, 38 insertions(+), 16 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index d67fd9f..be51fcc 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1657,32 +1657,41 @@ EXPORT_SYMBOL(vm_insert_mixed); > * in null mappings (currently treated as "copy-on-access") > */ > static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, > - unsigned long addr, unsigned long end, > - unsigned long pfn, pgprot_t prot) > + unsigned long addr, unsigned long end, > + unsigned long pfn, pgprot_t prot, > + bool first) > { With this long parameter list, wouldn't it cleaner to pass down a point to structure instead? This could simplify the code, I believe. > pte_t *pte; > spinlock_t *ptl; > + int err = 0; > > pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); > if (!pte) > return -ENOMEM; > arch_enter_lazy_mmu_mode(); > do { > - BUG_ON(!pte_none(*pte)); > + if (!pte_none(*pte)) { > + err = first ? -EBUSY : -EINVAL; > + pte++; > + break; > + } > + first = false; > set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); > pfn++; > } while (pte++, addr += PAGE_SIZE, addr != end); > arch_leave_lazy_mmu_mode(); > pte_unmap_unlock(pte - 1, ptl); > - return 0; > + return err; > } > > static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, > - unsigned long addr, unsigned long end, > - unsigned long pfn, pgprot_t prot) > + unsigned long addr, unsigned long end, > + unsigned long pfn, pgprot_t prot, > + bool first) > { > pmd_t *pmd; > unsigned long next; > + int err; > > pfn -= addr >> PAGE_SHIFT; > pmd = pmd_alloc(mm, pud, addr); > @@ -1691,19 +1700,23 @@ static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, > VM_BUG_ON(pmd_trans_huge(*pmd)); > do { > next = pmd_addr_end(addr, end); > - if (remap_pte_range(mm, pmd, addr, next, > - pfn + (addr >> PAGE_SHIFT), prot)) > - return -ENOMEM; > + err = remap_pte_range(mm, pmd, addr, next, > + pfn + (addr >> PAGE_SHIFT), prot, first); > + if (err) > + return err; > + > + first = false; > } while (pmd++, addr = next, addr != end); > return 0; > } > > static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, > - unsigned long addr, unsigned long end, > - unsigned long pfn, pgprot_t prot) > + unsigned long addr, unsigned long end, > + unsigned long pfn, pgprot_t prot, bool first) > { > pud_t *pud; > unsigned long next; > + int err; > > pfn -= addr >> PAGE_SHIFT; > pud = pud_alloc(mm, pgd, addr); > @@ -1711,9 +1724,12 @@ static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, > return -ENOMEM; > do { > next = pud_addr_end(addr, end); > - if (remap_pmd_range(mm, pud, addr, next, > - pfn + (addr >> PAGE_SHIFT), prot)) > - return -ENOMEM; > + err = remap_pmd_range(mm, pud, addr, next, > + pfn + (addr >> PAGE_SHIFT), prot, first); > + if (err) > + return err; > + > + first = false; > } while (pud++, addr = next, addr != end); > return 0; > } > @@ -1735,6 +1751,7 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, > unsigned long next; > unsigned long end = addr + PAGE_ALIGN(size); > struct mm_struct *mm = vma->vm_mm; > + bool first = true; > int err; > > /* > @@ -1774,13 +1791,18 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, > do { > next = pgd_addr_end(addr, end); > err = remap_pud_range(mm, pgd, addr, next, > - pfn + (addr >> PAGE_SHIFT), prot); > + pfn + (addr >> PAGE_SHIFT), prot, first); > if (err) > break; > + > + first = false; > } while (pgd++, addr = next, addr != end); > > - if (err) > + if (err) { > untrack_pfn(vma, pfn, PAGE_ALIGN(size)); > + if (err != -EBUSY) > + zap_page_range_single(vma, addr, size, NULL); Hm. If I read it correctly, you zap whole range, not only what you've set up. Looks wrong. And for after zap, you probably whant to return -EBUSY to caller of remap_pfn_range(), not -EINVAL. > + } > > return err; > } > -- > 1.9.1 > -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range() 2014-06-19 11:50 ` Kirill A. Shutemov @ 2014-06-19 12:00 ` Chris Wilson 2014-06-19 12:57 ` Kirill A. Shutemov 0 siblings, 1 reply; 16+ messages in thread From: Chris Wilson @ 2014-06-19 12:00 UTC (permalink / raw) To: Kirill A. Shutemov Cc: intel-gfx, Andrew Morton, Peter Zijlstra, Rik van Riel, Mel Gorman, Cyrill Gorcunov, Johannes Weiner, linux-mm On Thu, Jun 19, 2014 at 02:50:18PM +0300, Kirill A. Shutemov wrote: > > + if (err) { > > untrack_pfn(vma, pfn, PAGE_ALIGN(size)); > > + if (err != -EBUSY) > > + zap_page_range_single(vma, addr, size, NULL); > > Hm. If I read it correctly, you zap whole range, not only what you've > set up. Looks wrong. Yes. I didn't fancy threading the last touched pte back, but that should be easier if moving to a struct. > And for after zap, you probably whant to return -EBUSY to caller of > remap_pfn_range(), not -EINVAL. No, it has to be EINVAL for my purpose. If we return EBUSY, the caller will just report VM_NOPAGE back to the fault handler, and the fault will be retriggered - but the overlapping object will still be present. So the EINVAL is there to report that the range conflicts with another and that the caller should abort. It's a nasty semantic that works only when the concurrent pagefaults are serialised around the call to remap_pfn_range(). The alternative would be to always report EINVAL and clean up, and export pte_exists() so that the caller can detect when the PTEs have already been populated by the concurrent fault. -Chris -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range() 2014-06-19 12:00 ` Chris Wilson @ 2014-06-19 12:57 ` Kirill A. Shutemov 2014-06-19 13:22 ` Chris Wilson 0 siblings, 1 reply; 16+ messages in thread From: Kirill A. Shutemov @ 2014-06-19 12:57 UTC (permalink / raw) To: Chris Wilson Cc: Kirill A. Shutemov, intel-gfx, Andrew Morton, Peter Zijlstra, Rik van Riel, Mel Gorman, Cyrill Gorcunov, Johannes Weiner, linux-mm Chris Wilson wrote: > On Thu, Jun 19, 2014 at 02:50:18PM +0300, Kirill A. Shutemov wrote: > > > + if (err) { > > > untrack_pfn(vma, pfn, PAGE_ALIGN(size)); > > > + if (err != -EBUSY) > > > + zap_page_range_single(vma, addr, size, NULL); > > > > Hm. If I read it correctly, you zap whole range, not only what you've > > set up. Looks wrong. > > Yes. I didn't fancy threading the last touched pte back, but that should > be easier if moving to a struct. > > > And for after zap, you probably whant to return -EBUSY to caller of > > remap_pfn_range(), not -EINVAL. > > No, it has to be EINVAL for my purpose. If we return EBUSY, the caller > will just report VM_NOPAGE back to the fault handler, and the fault will > be retriggered - but the overlapping object will still be present. IIUC, what you're saying makes sense only if the range starts from PTE you've got fault to. I failed to see why this assumption should be part of remap_pfn_range() interface. One possible option is to create a variant of remap_pfn_range() which will return how many PTEs it was able to setup, before hitting the !pte_none(). Caller will decide what to do with partially filled range. -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range() 2014-06-19 12:57 ` Kirill A. Shutemov @ 2014-06-19 13:22 ` Chris Wilson 2014-06-19 13:59 ` Kirill A. Shutemov 0 siblings, 1 reply; 16+ messages in thread From: Chris Wilson @ 2014-06-19 13:22 UTC (permalink / raw) To: Kirill A. Shutemov Cc: intel-gfx, Andrew Morton, Peter Zijlstra, Rik van Riel, Mel Gorman, Cyrill Gorcunov, Johannes Weiner, linux-mm On Thu, Jun 19, 2014 at 03:57:46PM +0300, Kirill A. Shutemov wrote: > Chris Wilson wrote: > > On Thu, Jun 19, 2014 at 02:50:18PM +0300, Kirill A. Shutemov wrote: > > > > + if (err) { > > > > untrack_pfn(vma, pfn, PAGE_ALIGN(size)); > > > > + if (err != -EBUSY) > > > > + zap_page_range_single(vma, addr, size, NULL); > > > > > > Hm. If I read it correctly, you zap whole range, not only what you've > > > set up. Looks wrong. > > > > Yes. I didn't fancy threading the last touched pte back, but that should > > be easier if moving to a struct. > > > > > And for after zap, you probably whant to return -EBUSY to caller of > > > remap_pfn_range(), not -EINVAL. > > > > No, it has to be EINVAL for my purpose. If we return EBUSY, the caller > > will just report VM_NOPAGE back to the fault handler, and the fault will > > be retriggered - but the overlapping object will still be present. > > IIUC, what you're saying makes sense only if the range starts from PTE > you've got fault to. I failed to see why this assumption should be part of > remap_pfn_range() interface. That I agree with. > One possible option is to create a variant of remap_pfn_range() which will > return how many PTEs it was able to setup, before hitting the !pte_none(). > Caller will decide what to do with partially filled range. Looked at just returning the address remap_pfn_range() got up to, which is easy enough, but I think given that remap_pfn_range() will clean up correctly after a failed remap, any EBUSY from partway through would be a pathological driver error. -Chris -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mm: Report attempts to overwrite PTE from remap_pfn_range() 2014-06-19 13:22 ` Chris Wilson @ 2014-06-19 13:59 ` Kirill A. Shutemov 2014-06-21 15:53 ` [PATCH 1/4] mm: Refactor remap_pfn_range() Chris Wilson 0 siblings, 1 reply; 16+ messages in thread From: Kirill A. Shutemov @ 2014-06-19 13:59 UTC (permalink / raw) To: Chris Wilson Cc: Kirill A. Shutemov, intel-gfx, Andrew Morton, Peter Zijlstra, Rik van Riel, Mel Gorman, Cyrill Gorcunov, Johannes Weiner, linux-mm Chris Wilson wrote: > On Thu, Jun 19, 2014 at 03:57:46PM +0300, Kirill A. Shutemov wrote: > > One possible option is to create a variant of remap_pfn_range() which will > > return how many PTEs it was able to setup, before hitting the !pte_none(). > > Caller will decide what to do with partially filled range. > > Looked at just returning the address remap_pfn_range() got up to, which is > easy enough, but I think given that remap_pfn_range() will clean up > correctly after a failed remap, any EBUSY from partway through would be > a pathological driver error. I would prefer keep remap_pfn_range() interface intact with BUG_ON() on unexpected !pte_none() and introduce new function with more flexible behaviour (sharing underlying infrastructure). This way we can avoid changing every remap_pfn_range() caller. -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/4] mm: Refactor remap_pfn_range() 2014-06-19 13:59 ` Kirill A. Shutemov @ 2014-06-21 15:53 ` Chris Wilson 2014-06-21 15:53 ` [PATCH 2/4] io-mapping: Always create a struct to hold metadata about the io-mapping Chris Wilson ` (3 more replies) 0 siblings, 4 replies; 16+ messages in thread From: Chris Wilson @ 2014-06-21 15:53 UTC (permalink / raw) To: intel-gfx Cc: Rik van Riel, Peter Zijlstra, Cyrill Gorcunov, linux-mm, Mel Gorman, Johannes Weiner, Andrew Morton, Kirill A. Shutemov In preparation for exporting very similar functionality through another interface, gut the current remap_pfn_range(). The motivating factor here is to reuse the PGB/PUD/PMD/PTE walker, but allow back progation of errors rather than BUG_ON. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: linux-mm@kvack.org --- mm/memory.c | 102 +++++++++++++++++++++++++++++++++--------------------------- 1 file changed, 57 insertions(+), 45 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 037b812a9531..d2c7fe88a289 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2295,71 +2295,81 @@ int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr, } EXPORT_SYMBOL(vm_insert_mixed); +struct remap_pfn { + struct mm_struct *mm; + unsigned long addr; + unsigned long pfn; + pgprot_t prot; +}; + /* * maps a range of physical memory into the requested pages. the old * mappings are removed. any references to nonexistent pages results * in null mappings (currently treated as "copy-on-access") */ -static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, - unsigned long addr, unsigned long end, - unsigned long pfn, pgprot_t prot) +static inline int remap_pfn(struct remap_pfn *r, pte_t *pte) +{ + if (!pte_none(*pte)) + return -EBUSY; + + set_pte_at(r->mm, r->addr, pte, + pte_mkspecial(pfn_pte(r->pfn, r->prot))); + r->pfn++; + r->addr += PAGE_SIZE; + return 0; +} + +static int remap_pte_range(struct remap_pfn *r, pmd_t *pmd, unsigned long end) { pte_t *pte; spinlock_t *ptl; + int err; - pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); + pte = pte_alloc_map_lock(r->mm, pmd, r->addr, &ptl); if (!pte) return -ENOMEM; + arch_enter_lazy_mmu_mode(); do { - BUG_ON(!pte_none(*pte)); - set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); - pfn++; - } while (pte++, addr += PAGE_SIZE, addr != end); + err = remap_pfn(r, pte++); + } while (err == 0 && r->addr < end); arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(pte - 1, ptl); - return 0; + return err; } -static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, - unsigned long addr, unsigned long end, - unsigned long pfn, pgprot_t prot) +static inline int remap_pmd_range(struct remap_pfn *r, pud_t *pud, unsigned long end) { pmd_t *pmd; - unsigned long next; + int err; - pfn -= addr >> PAGE_SHIFT; - pmd = pmd_alloc(mm, pud, addr); + pmd = pmd_alloc(r->mm, pud, r->addr); if (!pmd) return -ENOMEM; VM_BUG_ON(pmd_trans_huge(*pmd)); + do { - next = pmd_addr_end(addr, end); - if (remap_pte_range(mm, pmd, addr, next, - pfn + (addr >> PAGE_SHIFT), prot)) - return -ENOMEM; - } while (pmd++, addr = next, addr != end); - return 0; + err = remap_pte_range(r, pmd++, pmd_addr_end(r->addr, end)); + } while (err == 0 && r->addr < end); + + return err; } -static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, - unsigned long addr, unsigned long end, - unsigned long pfn, pgprot_t prot) +static inline int remap_pud_range(struct remap_pfn *r, pgd_t *pgd, unsigned long end) { pud_t *pud; - unsigned long next; + int err; - pfn -= addr >> PAGE_SHIFT; - pud = pud_alloc(mm, pgd, addr); + pud = pud_alloc(r->mm, pgd, r->addr); if (!pud) return -ENOMEM; + do { - next = pud_addr_end(addr, end); - if (remap_pmd_range(mm, pud, addr, next, - pfn + (addr >> PAGE_SHIFT), prot)) - return -ENOMEM; - } while (pud++, addr = next, addr != end); - return 0; + err = remap_pmd_range(r, pud++, pud_addr_end(r->addr, end)); + } while (err == 0 && r->addr < end); + + return err; } /** @@ -2375,10 +2385,9 @@ static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn, unsigned long size, pgprot_t prot) { - pgd_t *pgd; - unsigned long next; unsigned long end = addr + PAGE_ALIGN(size); - struct mm_struct *mm = vma->vm_mm; + struct remap_pfn r; + pgd_t *pgd; int err; /* @@ -2412,19 +2421,22 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP; BUG_ON(addr >= end); - pfn -= addr >> PAGE_SHIFT; - pgd = pgd_offset(mm, addr); flush_cache_range(vma, addr, end); + + r.mm = vma->vm_mm; + r.addr = addr; + r.pfn = pfn; + r.prot = prot; + + pgd = pgd_offset(r.mm, addr); do { - next = pgd_addr_end(addr, end); - err = remap_pud_range(mm, pgd, addr, next, - pfn + (addr >> PAGE_SHIFT), prot); - if (err) - break; - } while (pgd++, addr = next, addr != end); + err = remap_pud_range(&r, pgd++, pgd_addr_end(r.addr, end)); + } while (err == 0 && r.addr < end); - if (err) + if (err) { untrack_pfn(vma, pfn, PAGE_ALIGN(size)); + BUG_ON(err == -EBUSY); + } return err; } -- 2.0.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/4] io-mapping: Always create a struct to hold metadata about the io-mapping 2014-06-21 15:53 ` [PATCH 1/4] mm: Refactor remap_pfn_range() Chris Wilson @ 2014-06-21 15:53 ` Chris Wilson 2014-06-21 15:53 ` [PATCH 3/4] mm: Export remap_io_mapping() Chris Wilson ` (2 subsequent siblings) 3 siblings, 0 replies; 16+ messages in thread From: Chris Wilson @ 2014-06-21 15:53 UTC (permalink / raw) To: intel-gfx; +Cc: linux-mm Currently, we only allocate a structure to hold metadata if we need to allocate an ioremap for every access, such as on x86-32. However, it would be useful to store basic information about the io-mapping, such as its page protection, on all platforms. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: linux-mm@kvack.org --- include/linux/io-mapping.h | 52 ++++++++++++++++++++++++++++------------------ 1 file changed, 32 insertions(+), 20 deletions(-) diff --git a/include/linux/io-mapping.h b/include/linux/io-mapping.h index 657fab4efab3..e053011f50bb 100644 --- a/include/linux/io-mapping.h +++ b/include/linux/io-mapping.h @@ -31,16 +31,17 @@ * See Documentation/io-mapping.txt */ -#ifdef CONFIG_HAVE_ATOMIC_IOMAP - -#include <asm/iomap.h> - struct io_mapping { resource_size_t base; unsigned long size; pgprot_t prot; + void __iomem *iomem; }; + +#ifdef CONFIG_HAVE_ATOMIC_IOMAP + +#include <asm/iomap.h> /* * For small address space machines, mapping large objects * into the kernel virtual space isn't practical. Where @@ -119,48 +120,59 @@ io_mapping_unmap(void __iomem *vaddr) #else #include <linux/uaccess.h> - -/* this struct isn't actually defined anywhere */ -struct io_mapping; +#include <asm/pgtable_types.h> /* Create the io_mapping object*/ static inline struct io_mapping * io_mapping_create_wc(resource_size_t base, unsigned long size) { - return (struct io_mapping __force *) ioremap_wc(base, size); + struct io_mapping *iomap; + + iomap = kmalloc(sizeof(*iomap), GFP_KERNEL); + if (!iomap) + return NULL; + + iomap->base = base; + iomap->size = size; + iomap->iomem = ioremap_wc(base, size); + iomap->prot = pgprot_writecombine(PAGE_KERNEL_IO); + + return iomap; } static inline void io_mapping_free(struct io_mapping *mapping) { - iounmap((void __force __iomem *) mapping); + iounmap(mapping->iomem); + kfree(mapping); } -/* Atomic map/unmap */ +/* Non-atomic map/unmap */ static inline void __iomem * -io_mapping_map_atomic_wc(struct io_mapping *mapping, - unsigned long offset) +io_mapping_map_wc(struct io_mapping *mapping, unsigned long offset) { - pagefault_disable(); - return ((char __force __iomem *) mapping) + offset; + return mapping->iomem + offset; } static inline void -io_mapping_unmap_atomic(void __iomem *vaddr) +io_mapping_unmap(void __iomem *vaddr) { - pagefault_enable(); } -/* Non-atomic map/unmap */ +/* Atomic map/unmap */ static inline void __iomem * -io_mapping_map_wc(struct io_mapping *mapping, unsigned long offset) +io_mapping_map_atomic_wc(struct io_mapping *mapping, + unsigned long offset) { - return ((char __force __iomem *) mapping) + offset; + pagefault_disable(); + return io_mapping_map_wc(mapping, offset); } static inline void -io_mapping_unmap(void __iomem *vaddr) +io_mapping_unmap_atomic(void __iomem *vaddr) { + io_mapping_unmap(vaddr); + pagefault_enable(); } #endif /* HAVE_ATOMIC_IOMAP */ -- 2.0.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/4] mm: Export remap_io_mapping() 2014-06-21 15:53 ` [PATCH 1/4] mm: Refactor remap_pfn_range() Chris Wilson 2014-06-21 15:53 ` [PATCH 2/4] io-mapping: Always create a struct to hold metadata about the io-mapping Chris Wilson @ 2014-06-21 15:53 ` Chris Wilson 2014-06-30 14:32 ` Kirill A. Shutemov 2014-06-21 15:53 ` [PATCH 4/4] drm/i915: Use remap_io_mapping() to prefault all PTE in a single pass Chris Wilson 2014-06-30 14:26 ` [PATCH 1/4] mm: Refactor remap_pfn_range() Kirill A. Shutemov 3 siblings, 1 reply; 16+ messages in thread From: Chris Wilson @ 2014-06-21 15:53 UTC (permalink / raw) To: intel-gfx Cc: Rik van Riel, Peter Zijlstra, Cyrill Gorcunov, linux-mm, Mel Gorman, Johannes Weiner, Andrew Morton, Kirill A. Shutemov This is similar to remap_pfn_range(), and uses the recently refactor code to do the page table walking. The key difference is that is back propagates its error as this is required for use from within a pagefault handler. The other difference, is that it combine the page protection from io-mapping, which is known from when the io-mapping is created, with the per-vma page protection flags. This avoids having to walk the entire system description to rediscover the special page protection established for the io-mapping. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: linux-mm@kvack.org --- include/linux/mm.h | 4 ++++ mm/memory.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index d6777060449f..aa766bbc6981 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1941,6 +1941,10 @@ unsigned long change_prot_numa(struct vm_area_struct *vma, struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr); int remap_pfn_range(struct vm_area_struct *, unsigned long addr, unsigned long pfn, unsigned long size, pgprot_t); +struct io_mapping; +int remap_io_mapping(struct vm_area_struct *, + unsigned long addr, unsigned long pfn, unsigned long size, + struct io_mapping *iomap); int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *); int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn); diff --git a/mm/memory.c b/mm/memory.c index d2c7fe88a289..8af2bd2de98e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -61,6 +61,7 @@ #include <linux/string.h> #include <linux/dma-debug.h> #include <linux/debugfs.h> +#include <linux/io-mapping.h> #include <asm/io.h> #include <asm/pgalloc.h> @@ -2443,6 +2444,51 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, EXPORT_SYMBOL(remap_pfn_range); /** + * remap_io_mapping - remap an IO mapping to userspace + * @vma: user vma to map to + * @addr: target user address to start at + * @pfn: physical address of kernel memory + * @size: size of map area + * @iomap: the source io_mapping + * + * Note: this is only safe if the mm semaphore is held when called. + */ +int remap_io_mapping(struct vm_area_struct *vma, + unsigned long addr, unsigned long pfn, unsigned long size, + struct io_mapping *iomap) +{ + unsigned long end = addr + PAGE_ALIGN(size); + struct remap_pfn r; + pgd_t *pgd; + int err; + + if (WARN_ON(addr >= end)) + return -EINVAL; + +#define MUST_SET (VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP) + BUG_ON(is_cow_mapping(vma->vm_flags)); + BUG_ON((vma->vm_flags & MUST_SET) != MUST_SET); +#undef MUST_SET + + r.mm = vma->vm_mm; + r.addr = addr; + r.pfn = pfn; + r.prot = __pgprot((pgprot_val(iomap->prot) & _PAGE_CACHE_MASK) | + (pgprot_val(vma->vm_page_prot) & ~_PAGE_CACHE_MASK)); + + pgd = pgd_offset(r.mm, addr); + do { + err = remap_pud_range(&r, pgd++, pgd_addr_end(r.addr, end)); + } while (err == 0 && r.addr < end); + + if (err) + zap_page_range_single(vma, addr, r.addr - addr, NULL); + + return err; +} +EXPORT_SYMBOL(remap_io_mapping); + +/** * vm_iomap_memory - remap memory to userspace * @vma: user vma to map to * @start: start of area -- 2.0.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* RE: [PATCH 3/4] mm: Export remap_io_mapping() 2014-06-21 15:53 ` [PATCH 3/4] mm: Export remap_io_mapping() Chris Wilson @ 2014-06-30 14:32 ` Kirill A. Shutemov 0 siblings, 0 replies; 16+ messages in thread From: Kirill A. Shutemov @ 2014-06-30 14:32 UTC (permalink / raw) To: Chris Wilson Cc: intel-gfx, linux-mm, Andrew Morton, Kirill A. Shutemov, Peter Zijlstra, Rik van Riel, Mel Gorman, Cyrill Gorcunov, Johannes Weiner Chris Wilson wrote: > This is similar to remap_pfn_range(), and uses the recently refactor > code to do the page table walking. The key difference is that is back > propagates its error as this is required for use from within a pagefault > handler. The other difference, is that it combine the page protection > from io-mapping, which is known from when the io-mapping is created, > with the per-vma page protection flags. This avoids having to walk the > entire system description to rediscover the special page protection > established for the io-mapping. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Looks okay to me: Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 4/4] drm/i915: Use remap_io_mapping() to prefault all PTE in a single pass 2014-06-21 15:53 ` [PATCH 1/4] mm: Refactor remap_pfn_range() Chris Wilson 2014-06-21 15:53 ` [PATCH 2/4] io-mapping: Always create a struct to hold metadata about the io-mapping Chris Wilson 2014-06-21 15:53 ` [PATCH 3/4] mm: Export remap_io_mapping() Chris Wilson @ 2014-06-21 15:53 ` Chris Wilson 2014-06-30 14:26 ` [PATCH 1/4] mm: Refactor remap_pfn_range() Kirill A. Shutemov 3 siblings, 0 replies; 16+ messages in thread From: Chris Wilson @ 2014-06-21 15:53 UTC (permalink / raw) To: intel-gfx; +Cc: linux-mm On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences, Upload rate for 2 linear surfaces: 8134MiB/s -> 8154MiB/s Upload rate for 2 tiled surfaces: 8625MiB/s -> 8632MiB/s Upload rate for 4 linear surfaces: 8127MiB/s -> 8134MiB/s Upload rate for 4 tiled surfaces: 8602MiB/s -> 8629MiB/s Upload rate for 8 linear surfaces: 8124MiB/s -> 8137MiB/s Upload rate for 8 tiled surfaces: 8603MiB/s -> 8624MiB/s Upload rate for 16 linear surfaces: 8123MiB/s -> 8128MiB/s Upload rate for 16 tiled surfaces: 8606MiB/s -> 8618MiB/s Upload rate for 32 linear surfaces: 8121MiB/s -> 8128MiB/s Upload rate for 32 tiled surfaces: 8605MiB/s -> 8614MiB/s Upload rate for 64 linear surfaces: 8121MiB/s -> 8127MiB/s Upload rate for 64 tiled surfaces: 3017MiB/s -> 5202MiB/s Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Testcase: igt/gem_fence_upload/performance Testcase: igt/gem_mmap_gtt Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> Cc: linux-mm@kvack.org --- drivers/gpu/drm/i915/i915_gem.c | 23 ++++++----------------- 1 file changed, 6 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index f6d123828926..f0628e40cb1d 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1565,25 +1565,14 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf) pfn = dev_priv->gtt.mappable_base + i915_gem_obj_ggtt_offset(obj); pfn >>= PAGE_SHIFT; - if (!obj->fault_mappable) { - unsigned long size = min_t(unsigned long, - vma->vm_end - vma->vm_start, - obj->base.size); - int i; + ret = remap_io_mapping(vma, + vma->vm_start, pfn, vma->vm_end - vma->vm_start, + dev_priv->gtt.mappable); + if (ret) + goto unpin; - for (i = 0; i < size >> PAGE_SHIFT; i++) { - ret = vm_insert_pfn(vma, - (unsigned long)vma->vm_start + i * PAGE_SIZE, - pfn + i); - if (ret) - break; - } + obj->fault_mappable = true; - obj->fault_mappable = true; - } else - ret = vm_insert_pfn(vma, - (unsigned long)vmf->virtual_address, - pfn + page_offset); unpin: i915_gem_object_ggtt_unpin(obj); unlock: -- 2.0.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* RE: [PATCH 1/4] mm: Refactor remap_pfn_range() 2014-06-21 15:53 ` [PATCH 1/4] mm: Refactor remap_pfn_range() Chris Wilson ` (2 preceding siblings ...) 2014-06-21 15:53 ` [PATCH 4/4] drm/i915: Use remap_io_mapping() to prefault all PTE in a single pass Chris Wilson @ 2014-06-30 14:26 ` Kirill A. Shutemov 3 siblings, 0 replies; 16+ messages in thread From: Kirill A. Shutemov @ 2014-06-30 14:26 UTC (permalink / raw) To: Chris Wilson Cc: intel-gfx, linux-mm, Andrew Morton, Kirill A. Shutemov, Peter Zijlstra, Rik van Riel, Mel Gorman, Cyrill Gorcunov, Johannes Weiner Chris Wilson wrote: > In preparation for exporting very similar functionality through another > interface, gut the current remap_pfn_range(). The motivating factor here > is to reuse the PGB/PUD/PMD/PTE walker, but allow back progation of > errors rather than BUG_ON. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Rik van Riel <riel@redhat.com> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Cyrill Gorcunov <gorcunov@gmail.com> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Cc: linux-mm@kvack.org > --- > mm/memory.c | 102 +++++++++++++++++++++++++++++++++--------------------------- > 1 file changed, 57 insertions(+), 45 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 037b812a9531..d2c7fe88a289 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -2295,71 +2295,81 @@ int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr, > } > EXPORT_SYMBOL(vm_insert_mixed); > > +struct remap_pfn { > + struct mm_struct *mm; > + unsigned long addr; > + unsigned long pfn; > + pgprot_t prot; > +}; > + > /* > * maps a range of physical memory into the requested pages. the old > * mappings are removed. any references to nonexistent pages results > * in null mappings (currently treated as "copy-on-access") > */ > -static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, > - unsigned long addr, unsigned long end, > - unsigned long pfn, pgprot_t prot) > +static inline int remap_pfn(struct remap_pfn *r, pte_t *pte) > +{ > + if (!pte_none(*pte)) > + return -EBUSY; > + > + set_pte_at(r->mm, r->addr, pte, > + pte_mkspecial(pfn_pte(r->pfn, r->prot))); > + r->pfn++; > + r->addr += PAGE_SIZE; > + return 0; > +} > + > +static int remap_pte_range(struct remap_pfn *r, pmd_t *pmd, unsigned long end) > { > pte_t *pte; > spinlock_t *ptl; > + int err; > > - pte = pte_alloc_map_lock(mm, pmd, addr, &ptl); > + pte = pte_alloc_map_lock(r->mm, pmd, r->addr, &ptl); > if (!pte) > return -ENOMEM; > + > arch_enter_lazy_mmu_mode(); > do { > - BUG_ON(!pte_none(*pte)); > - set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); > - pfn++; > - } while (pte++, addr += PAGE_SIZE, addr != end); > + err = remap_pfn(r, pte++); > + } while (err == 0 && r->addr < end); > arch_leave_lazy_mmu_mode(); > + > pte_unmap_unlock(pte - 1, ptl); > - return 0; > + return err; > } > > -static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud, > - unsigned long addr, unsigned long end, > - unsigned long pfn, pgprot_t prot) > +static inline int remap_pmd_range(struct remap_pfn *r, pud_t *pud, unsigned long end) > { > pmd_t *pmd; > - unsigned long next; > + int err; > > - pfn -= addr >> PAGE_SHIFT; > - pmd = pmd_alloc(mm, pud, addr); > + pmd = pmd_alloc(r->mm, pud, r->addr); > if (!pmd) > return -ENOMEM; > VM_BUG_ON(pmd_trans_huge(*pmd)); > + > do { > - next = pmd_addr_end(addr, end); > - if (remap_pte_range(mm, pmd, addr, next, > - pfn + (addr >> PAGE_SHIFT), prot)) > - return -ENOMEM; > - } while (pmd++, addr = next, addr != end); > - return 0; > + err = remap_pte_range(r, pmd++, pmd_addr_end(r->addr, end)); > + } while (err == 0 && r->addr < end); > + > + return err; > } > > -static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, > - unsigned long addr, unsigned long end, > - unsigned long pfn, pgprot_t prot) > +static inline int remap_pud_range(struct remap_pfn *r, pgd_t *pgd, unsigned long end) > { > pud_t *pud; > - unsigned long next; > + int err; > > - pfn -= addr >> PAGE_SHIFT; > - pud = pud_alloc(mm, pgd, addr); > + pud = pud_alloc(r->mm, pgd, r->addr); > if (!pud) > return -ENOMEM; > + > do { > - next = pud_addr_end(addr, end); > - if (remap_pmd_range(mm, pud, addr, next, > - pfn + (addr >> PAGE_SHIFT), prot)) > - return -ENOMEM; > - } while (pud++, addr = next, addr != end); > - return 0; > + err = remap_pmd_range(r, pud++, pud_addr_end(r->addr, end)); > + } while (err == 0 && r->addr < end); > + > + return err; > } > > /** > @@ -2375,10 +2385,9 @@ static inline int remap_pud_range(struct mm_struct *mm, pgd_t *pgd, > int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, > unsigned long pfn, unsigned long size, pgprot_t prot) > { > - pgd_t *pgd; > - unsigned long next; > unsigned long end = addr + PAGE_ALIGN(size); > - struct mm_struct *mm = vma->vm_mm; > + struct remap_pfn r; > + pgd_t *pgd; > int err; > > /* > @@ -2412,19 +2421,22 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr, > vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP; > > BUG_ON(addr >= end); > - pfn -= addr >> PAGE_SHIFT; > - pgd = pgd_offset(mm, addr); > flush_cache_range(vma, addr, end); > + > + r.mm = vma->vm_mm; > + r.addr = addr; > + r.pfn = pfn; > + r.prot = prot; > + > + pgd = pgd_offset(r.mm, addr); > do { > - next = pgd_addr_end(addr, end); > - err = remap_pud_range(mm, pgd, addr, next, > - pfn + (addr >> PAGE_SHIFT), prot); > - if (err) > - break; > - } while (pgd++, addr = next, addr != end); > + err = remap_pud_range(&r, pgd++, pgd_addr_end(r.addr, end)); > + } while (err == 0 && r.addr < end); > > - if (err) > + if (err) { > untrack_pfn(vma, pfn, PAGE_ALIGN(size)); > + BUG_ON(err == -EBUSY); We probably need a comment for the BUG_ON(). Otherwise, Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > + } > > return err; > } -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2014-06-30 14:32 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-06-13 16:26 [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() Chris Wilson 2014-06-13 16:26 ` [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass Chris Wilson 2014-06-13 16:34 ` [PATCH 1/2] mm: Report attempts to overwrite PTE from remap_pfn_range() Chris Wilson 2014-06-16 13:41 ` Kirill A. Shutemov 2014-06-19 7:19 ` [PATCH] " Chris Wilson 2014-06-19 11:50 ` Kirill A. Shutemov 2014-06-19 12:00 ` Chris Wilson 2014-06-19 12:57 ` Kirill A. Shutemov 2014-06-19 13:22 ` Chris Wilson 2014-06-19 13:59 ` Kirill A. Shutemov 2014-06-21 15:53 ` [PATCH 1/4] mm: Refactor remap_pfn_range() Chris Wilson 2014-06-21 15:53 ` [PATCH 2/4] io-mapping: Always create a struct to hold metadata about the io-mapping Chris Wilson 2014-06-21 15:53 ` [PATCH 3/4] mm: Export remap_io_mapping() Chris Wilson 2014-06-30 14:32 ` Kirill A. Shutemov 2014-06-21 15:53 ` [PATCH 4/4] drm/i915: Use remap_io_mapping() to prefault all PTE in a single pass Chris Wilson 2014-06-30 14:26 ` [PATCH 1/4] mm: Refactor remap_pfn_range() Kirill A. Shutemov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).