* [PATCH v4 0/3] Reorganise calls to vmap() GEM objects
@ 2016-02-22 15:18 Dave Gordon
2016-02-22 15:18 ` [PATCH v4 1/3] drm/i915: add i915_gem_object_vmap_range() to map GEM object to virtual space Dave Gordon
` (4 more replies)
0 siblings, 5 replies; 14+ messages in thread
From: Dave Gordon @ 2016-02-22 15:18 UTC (permalink / raw)
To: intel-gfx; +Cc: Daniel Vetter
Alex Dai and Chris Wilson have both recently posted patches to
rationalise the use of vmap() for mapping GEM objects into kernel
virtual space. However, they addressed different areas, with Alex's
patch being derived from the copy_batch() code, whereas Chris' patch
refactored the dma-buf and ringbuffer code.
So this patchset unifies the two, copying Chris' interfaces which
unite pin-and-vmap for convenient lifecycle management, but using
Alex's code underneath to permit partial mappings. And finally
there's a little optimisation I've added for "small" objects e.g.
ringbuffers and contexts, which are expected to be the objects most
commonly handled by this code.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Alex Dai <yu.dai@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Alex Dai (1):
drm/i915: add i915_gem_object_vmap_range() to map GEM object to
virtual space
Dave Gordon (2):
drm/i915: refactor duplicate object vmap functions (reworked)
drm/i915: optimise i915_gem_object_vmap_range() for small objects
drivers/gpu/drm/i915/i915_cmd_parser.c | 28 +---------
drivers/gpu/drm/i915/i915_drv.h | 15 ++++--
drivers/gpu/drm/i915/i915_gem.c | 94 +++++++++++++++++++++++++++++++++
drivers/gpu/drm/i915/i915_gem_dmabuf.c | 53 +++----------------
drivers/gpu/drm/i915/intel_ringbuffer.c | 52 ++++++------------
5 files changed, 128 insertions(+), 114 deletions(-)
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v4 1/3] drm/i915: add i915_gem_object_vmap_range() to map GEM object to virtual space
2016-02-22 15:18 [PATCH v4 0/3] Reorganise calls to vmap() GEM objects Dave Gordon
@ 2016-02-22 15:18 ` Dave Gordon
2016-02-22 15:18 ` [PATCH v4 2/3] drm/i915: refactor duplicate object vmap functions (reworked) Dave Gordon
` (3 subsequent siblings)
4 siblings, 0 replies; 14+ messages in thread
From: Dave Gordon @ 2016-02-22 15:18 UTC (permalink / raw)
To: intel-gfx; +Cc: Daniel Vetter
From: Alex Dai <yu.dai@intel.com>
There are several places inside driver where a GEM object is mapped
to kernel virtual space. The mapping may be done either for the whole
object or only a subset of it.
This patch introduces a function i915_gem_object_vmap_range() to
implement the common functionality.
v2: Use obj->pages->nents for iteration within i915_gem_object_vmap;
break when it finishes all desired pages. The caller must pass the
actual page count required. (Tvrtko Ursulin)
v4: renamed to i915_gem_object_vmap_range() to make its function
clearer. (Dave Gordon)
Signed-off-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
drivers/gpu/drm/i915/i915_cmd_parser.c | 28 +-------------------
drivers/gpu/drm/i915/i915_drv.h | 4 +++
drivers/gpu/drm/i915/i915_gem.c | 47 +++++++++++++++++++++++++++++++++
drivers/gpu/drm/i915/i915_gem_dmabuf.c | 16 +++--------
drivers/gpu/drm/i915/intel_ringbuffer.c | 24 ++---------------
5 files changed, 57 insertions(+), 62 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 814d894..114b55f 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -863,37 +863,11 @@ void i915_cmd_parser_fini_ring(struct intel_engine_cs *ring)
static u32 *vmap_batch(struct drm_i915_gem_object *obj,
unsigned start, unsigned len)
{
- int i;
- void *addr = NULL;
- struct sg_page_iter sg_iter;
int first_page = start >> PAGE_SHIFT;
int last_page = (len + start + 4095) >> PAGE_SHIFT;
int npages = last_page - first_page;
- struct page **pages;
-
- pages = drm_malloc_ab(npages, sizeof(*pages));
- if (pages == NULL) {
- DRM_DEBUG_DRIVER("Failed to get space for pages\n");
- goto finish;
- }
-
- i = 0;
- for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, first_page) {
- pages[i++] = sg_page_iter_page(&sg_iter);
- if (i == npages)
- break;
- }
-
- addr = vmap(pages, i, 0, PAGE_KERNEL);
- if (addr == NULL) {
- DRM_DEBUG_DRIVER("Failed to vmap pages\n");
- goto finish;
- }
-finish:
- if (pages)
- drm_free_large(pages);
- return (u32*)addr;
+ return i915_gem_object_vmap_range(obj, first_page, npages);
}
/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6644c2e..8cd47c6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2979,6 +2979,10 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj)
obj->pages_pin_count--;
}
+void *__must_check i915_gem_object_vmap_range(struct drm_i915_gem_object *obj,
+ unsigned int first,
+ unsigned int npages);
+
int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
int i915_gem_object_sync(struct drm_i915_gem_object *obj,
struct intel_engine_cs *to,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f68f346..0d5709a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2400,6 +2400,53 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
return 0;
}
+/**
+ * i915_gem_object_vmap_range - map some or all of a GEM object into kernel space
+ * @obj: the GEM object to be mapped
+ * @first: index of the first page of the object to be mapped
+ * @npages: length of the mapping to be created, in pages
+ *
+ * Map a given page range of a GEM object into kernel virtual space. The caller
+ * must make sure the associated pages are gathered and pinned before calling
+ * this function, and is responsible for vunmapping the returned address.
+ *
+ * Returns NULL on failure.
+ */
+void *i915_gem_object_vmap_range(struct drm_i915_gem_object *obj,
+ unsigned int first,
+ unsigned int npages)
+{
+ struct sg_page_iter sg_iter;
+ struct page **pages;
+ void *addr;
+ int i;
+
+ if (first + npages > obj->pages->nents) {
+ DRM_DEBUG_DRIVER("Invalid page count\n");
+ return NULL;
+ }
+
+ pages = drm_malloc_ab(npages, sizeof(*pages));
+ if (pages == NULL) {
+ DRM_DEBUG_DRIVER("Failed to get space for pages\n");
+ return NULL;
+ }
+
+ i = 0;
+ for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, first) {
+ pages[i++] = sg_page_iter_page(&sg_iter);
+ if (i == npages)
+ break;
+ }
+
+ addr = vmap(pages, npages, 0, PAGE_KERNEL);
+ if (addr == NULL)
+ DRM_DEBUG_DRIVER("Failed to vmap pages\n");
+ drm_free_large(pages);
+
+ return addr;
+}
+
void i915_vma_move_to_active(struct i915_vma *vma,
struct drm_i915_gem_request *req)
{
diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index 1f3eef6..aee4149 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -110,9 +110,7 @@ static void *i915_gem_dmabuf_vmap(struct dma_buf *dma_buf)
{
struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
struct drm_device *dev = obj->base.dev;
- struct sg_page_iter sg_iter;
- struct page **pages;
- int ret, i;
+ int ret;
ret = i915_mutex_lock_interruptible(dev);
if (ret)
@@ -131,16 +129,8 @@ static void *i915_gem_dmabuf_vmap(struct dma_buf *dma_buf)
ret = -ENOMEM;
- pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
- if (pages == NULL)
- goto err_unpin;
-
- i = 0;
- for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0)
- pages[i++] = sg_page_iter_page(&sg_iter);
-
- obj->dma_buf_vmapping = vmap(pages, i, 0, PAGE_KERNEL);
- drm_free_large(pages);
+ obj->dma_buf_vmapping = i915_gem_object_vmap_range(obj, 0,
+ dma_buf->size >> PAGE_SHIFT);
if (!obj->dma_buf_vmapping)
goto err_unpin;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 45ce45a..434a452 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2064,27 +2064,6 @@ void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
i915_gem_object_ggtt_unpin(ringbuf->obj);
}
-static u32 *vmap_obj(struct drm_i915_gem_object *obj)
-{
- struct sg_page_iter sg_iter;
- struct page **pages;
- void *addr;
- int i;
-
- pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
- if (pages == NULL)
- return NULL;
-
- i = 0;
- for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0)
- pages[i++] = sg_page_iter_page(&sg_iter);
-
- addr = vmap(pages, i, 0, PAGE_KERNEL);
- drm_free_large(pages);
-
- return addr;
-}
-
int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
struct intel_ringbuffer *ringbuf)
{
@@ -2103,7 +2082,8 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
return ret;
}
- ringbuf->virtual_start = vmap_obj(obj);
+ ringbuf->virtual_start = i915_gem_object_vmap_range(obj, 0,
+ ringbuf->size >> PAGE_SHIFT);
if (ringbuf->virtual_start == NULL) {
i915_gem_object_ggtt_unpin(obj);
return -ENOMEM;
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 2/3] drm/i915: refactor duplicate object vmap functions (reworked)
2016-02-22 15:18 [PATCH v4 0/3] Reorganise calls to vmap() GEM objects Dave Gordon
2016-02-22 15:18 ` [PATCH v4 1/3] drm/i915: add i915_gem_object_vmap_range() to map GEM object to virtual space Dave Gordon
@ 2016-02-22 15:18 ` Dave Gordon
2016-02-22 16:06 ` Tvrtko Ursulin
2016-02-22 15:18 ` [PATCH v4 3/3] drm/i915: optimise i915_gem_object_vmap_range() for small objects Dave Gordon
` (2 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: Dave Gordon @ 2016-02-22 15:18 UTC (permalink / raw)
To: intel-gfx
This is essentially Chris Wilson's patch of a similar name, reworked on
top of Alex Dai's recent patch:
drm/i915: Add i915_gem_object_vmap to map GEM object to virtual space
Chris' original commentary said:
We now have two implementations for vmapping a whole object, one for
dma-buf and one for the ringbuffer. If we couple the vmapping into
the obj->pages lifetime, then we can reuse an obj->vmapping for both
and at the same time couple it into the shrinker.
v2: Mark the failable kmalloc() as __GFP_NOWARN (vsyrjala)
v3: Call unpin_vmap from the right dmabuf unmapper
This patch [v4] reimplements the same functionality, but now as wrappers
round the recently-introduced i915_gem_object_vmap_range() from Alex's
patch mentioned above.
With this change, we have only one vmap() in the whole driver :)
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 11 +++++----
drivers/gpu/drm/i915/i915_gem.c | 39 +++++++++++++++++++++++++++++++
drivers/gpu/drm/i915/i915_gem_dmabuf.c | 41 ++++-----------------------------
drivers/gpu/drm/i915/intel_ringbuffer.c | 32 ++++++++++++-------------
4 files changed, 67 insertions(+), 56 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8cd47c6..b237ebd 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2168,10 +2168,7 @@ struct drm_i915_gem_object {
struct scatterlist *sg;
int last;
} get_page;
-
- /* prime dma-buf support */
- void *dma_buf_vmapping;
- int vmapping_count;
+ void *vmapping;
/** Breadcrumb of last rendering to the buffer.
* There can only be one writer, but we allow for multiple readers.
@@ -2979,6 +2976,12 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj)
obj->pages_pin_count--;
}
+void *__must_check i915_gem_object_pin_vmap(struct drm_i915_gem_object *obj);
+static inline void i915_gem_object_unpin_vmap(struct drm_i915_gem_object *obj)
+{
+ i915_gem_object_unpin_pages(obj);
+}
+
void *__must_check i915_gem_object_vmap_range(struct drm_i915_gem_object *obj,
unsigned int first,
unsigned int npages);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0d5709a..14942cf 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2235,6 +2235,11 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
ops->put_pages(obj);
obj->pages = NULL;
+ if (obj->vmapping) {
+ vunmap(obj->vmapping);
+ obj->vmapping = NULL;
+ }
+
i915_gem_object_invalidate(obj);
return 0;
@@ -2447,6 +2452,40 @@ void *i915_gem_object_vmap_range(struct drm_i915_gem_object *obj,
return addr;
}
+/**
+ * i915_gem_object_pin_vmap - pin a GEM object and map it into kernel space
+ * @obj: the GEM object to be mapped
+ *
+ * Conbines the functions of get_pages(), pin_pages() and vmap_range() on
+ * the whole object. The caller should release the mappoing by calling
+ * i915_gem_object_unpin_vmap() when the object is no longer required.
+ * function.
+ *
+ * Returns an ERR_PTR on failure.
+ */
+void *i915_gem_object_pin_vmap(struct drm_i915_gem_object *obj)
+{
+ int ret;
+
+ ret = i915_gem_object_get_pages(obj);
+ if (ret)
+ return ERR_PTR(ret);
+
+ i915_gem_object_pin_pages(obj);
+
+ if (obj->vmapping == NULL) {
+ obj->vmapping = i915_gem_object_vmap_range(obj, 0,
+ obj->base.size >> PAGE_SHIFT);
+
+ if (obj->vmapping == NULL) {
+ i915_gem_object_unpin_pages(obj);
+ return ERR_PTR(-ENOMEM);
+ }
+ }
+
+ return obj->vmapping;
+}
+
void i915_vma_move_to_active(struct i915_vma *vma,
struct drm_i915_gem_request *req)
{
diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index aee4149..adc7b5e 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -95,14 +95,12 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment *attachment,
{
struct drm_i915_gem_object *obj = dma_buf_to_obj(attachment->dmabuf);
- mutex_lock(&obj->base.dev->struct_mutex);
-
dma_unmap_sg(attachment->dev, sg->sgl, sg->nents, dir);
sg_free_table(sg);
kfree(sg);
+ mutex_lock(&obj->base.dev->struct_mutex);
i915_gem_object_unpin_pages(obj);
-
mutex_unlock(&obj->base.dev->struct_mutex);
}
@@ -110,41 +108,17 @@ static void *i915_gem_dmabuf_vmap(struct dma_buf *dma_buf)
{
struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
struct drm_device *dev = obj->base.dev;
+ void *addr;
int ret;
ret = i915_mutex_lock_interruptible(dev);
if (ret)
return ERR_PTR(ret);
- if (obj->dma_buf_vmapping) {
- obj->vmapping_count++;
- goto out_unlock;
- }
-
- ret = i915_gem_object_get_pages(obj);
- if (ret)
- goto err;
-
- i915_gem_object_pin_pages(obj);
-
- ret = -ENOMEM;
-
- obj->dma_buf_vmapping = i915_gem_object_vmap_range(obj, 0,
- dma_buf->size >> PAGE_SHIFT);
-
- if (!obj->dma_buf_vmapping)
- goto err_unpin;
-
- obj->vmapping_count = 1;
-out_unlock:
+ addr = i915_gem_object_pin_vmap(obj);
mutex_unlock(&dev->struct_mutex);
- return obj->dma_buf_vmapping;
-err_unpin:
- i915_gem_object_unpin_pages(obj);
-err:
- mutex_unlock(&dev->struct_mutex);
- return ERR_PTR(ret);
+ return addr;
}
static void i915_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr)
@@ -153,12 +127,7 @@ static void i915_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr)
struct drm_device *dev = obj->base.dev;
mutex_lock(&dev->struct_mutex);
- if (--obj->vmapping_count == 0) {
- vunmap(obj->dma_buf_vmapping);
- obj->dma_buf_vmapping = NULL;
-
- i915_gem_object_unpin_pages(obj);
- }
+ i915_gem_object_unpin_vmap(obj);
mutex_unlock(&dev->struct_mutex);
}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 434a452..47f186e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2056,7 +2056,7 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
{
if (HAS_LLC(ringbuf->obj->base.dev) && !ringbuf->obj->stolen)
- vunmap(ringbuf->virtual_start);
+ i915_gem_object_unpin_vmap(ringbuf->obj);
else
iounmap(ringbuf->virtual_start);
ringbuf->virtual_start = NULL;
@@ -2077,16 +2077,14 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
return ret;
ret = i915_gem_object_set_to_cpu_domain(obj, true);
- if (ret) {
- i915_gem_object_ggtt_unpin(obj);
- return ret;
- }
+ if (ret)
+ goto unpin;
- ringbuf->virtual_start = i915_gem_object_vmap_range(obj, 0,
- ringbuf->size >> PAGE_SHIFT);
- if (ringbuf->virtual_start == NULL) {
- i915_gem_object_ggtt_unpin(obj);
- return -ENOMEM;
+ ringbuf->virtual_start = i915_gem_object_pin_vmap(obj);
+ if (IS_ERR(ringbuf->virtual_start)) {
+ ret = PTR_ERR(ringbuf->virtual_start);
+ ringbuf->virtual_start = NULL;
+ goto unpin;
}
} else {
ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE);
@@ -2094,10 +2092,8 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
return ret;
ret = i915_gem_object_set_to_gtt_domain(obj, true);
- if (ret) {
- i915_gem_object_ggtt_unpin(obj);
- return ret;
- }
+ if (ret)
+ goto unpin;
/* Access through the GTT requires the device to be awake. */
assert_rpm_wakelock_held(dev_priv);
@@ -2105,14 +2101,18 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base +
i915_gem_obj_ggtt_offset(obj), ringbuf->size);
if (ringbuf->virtual_start == NULL) {
- i915_gem_object_ggtt_unpin(obj);
- return -EINVAL;
+ ret = -ENOMEM;
+ goto unpin;
}
}
ringbuf->vma = i915_gem_obj_to_ggtt(obj);
return 0;
+
+unpin:
+ i915_gem_object_ggtt_unpin(obj);
+ return ret;
}
static void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v4 3/3] drm/i915: optimise i915_gem_object_vmap_range() for small objects
2016-02-22 15:18 [PATCH v4 0/3] Reorganise calls to vmap() GEM objects Dave Gordon
2016-02-22 15:18 ` [PATCH v4 1/3] drm/i915: add i915_gem_object_vmap_range() to map GEM object to virtual space Dave Gordon
2016-02-22 15:18 ` [PATCH v4 2/3] drm/i915: refactor duplicate object vmap functions (reworked) Dave Gordon
@ 2016-02-22 15:18 ` Dave Gordon
2016-02-23 10:16 ` Chris Wilson
2016-02-23 8:37 ` ✗ Fi.CI.BAT: failure for Reorganise calls to vmap() GEM objects Patchwork
2016-02-23 10:59 ` Patchwork
4 siblings, 1 reply; 14+ messages in thread
From: Dave Gordon @ 2016-02-22 15:18 UTC (permalink / raw)
To: intel-gfx
Now that we use this function for ringbuffers and other "small" objects,
it's worth avoiding an extra kmalloc()/kfree() cycle if the page array
is small enough to put on the stack. Here we've chosen an arbitrary
cutoff of 32 (4k) pages, which is big enough for a ringbuffer (4 pages)
or a context image (currently up to 22 pages).
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Alex Dai <yu.dai@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
drivers/gpu/drm/i915/i915_gem.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 14942cf..effb69b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2422,6 +2422,7 @@ void *i915_gem_object_vmap_range(struct drm_i915_gem_object *obj,
unsigned int npages)
{
struct sg_page_iter sg_iter;
+ struct page *mypages[32];
struct page **pages;
void *addr;
int i;
@@ -2431,10 +2432,16 @@ void *i915_gem_object_vmap_range(struct drm_i915_gem_object *obj,
return NULL;
}
- pages = drm_malloc_ab(npages, sizeof(*pages));
- if (pages == NULL) {
- DRM_DEBUG_DRIVER("Failed to get space for pages\n");
- return NULL;
+ if (npages <= ARRAY_SIZE(mypages))
+ pages = mypages;
+ else {
+ pages = kmalloc(npages*sizeof(*pages), GFP_TEMPORARY | __GFP_NOWARN);
+ if (pages == NULL)
+ pages = drm_malloc_ab(npages, sizeof(*pages));
+ if (pages == NULL) {
+ DRM_DEBUG_DRIVER("Failed to get space for pages\n");
+ return NULL;
+ }
}
i = 0;
@@ -2447,7 +2454,8 @@ void *i915_gem_object_vmap_range(struct drm_i915_gem_object *obj,
addr = vmap(pages, npages, 0, PAGE_KERNEL);
if (addr == NULL)
DRM_DEBUG_DRIVER("Failed to vmap pages\n");
- drm_free_large(pages);
+ if (pages != mypages)
+ drm_free_large(pages);
return addr;
}
--
1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v4 2/3] drm/i915: refactor duplicate object vmap functions (reworked)
2016-02-22 15:18 ` [PATCH v4 2/3] drm/i915: refactor duplicate object vmap functions (reworked) Dave Gordon
@ 2016-02-22 16:06 ` Tvrtko Ursulin
2016-02-23 10:06 ` Chris Wilson
0 siblings, 1 reply; 14+ messages in thread
From: Tvrtko Ursulin @ 2016-02-22 16:06 UTC (permalink / raw)
To: Dave Gordon, intel-gfx
[Cc Chris as the author of the idea.]
Hi,
On 22/02/16 15:18, Dave Gordon wrote:
> This is essentially Chris Wilson's patch of a similar name, reworked on
> top of Alex Dai's recent patch:
> drm/i915: Add i915_gem_object_vmap to map GEM object to virtual space
> Chris' original commentary said:
>
> We now have two implementations for vmapping a whole object, one for
> dma-buf and one for the ringbuffer. If we couple the vmapping into
> the obj->pages lifetime, then we can reuse an obj->vmapping for both
> and at the same time couple it into the shrinker.
As a general concept my worry is that by implementing this approach we
tie two unrelated concepts together.
Shrinker is about backing storage (used/free pages in a machine), while
vmap is about kernel address space. And then on 32-bit with its limited
vmap space (128MiB, right?), it can become exhausted much sooner that
the shrinker would be triggered. And we would rely on the shrinker
running to free up address space as well now, right?
So unless I am missing something that doesn't fit well.
>
> v2: Mark the failable kmalloc() as __GFP_NOWARN (vsyrjala)
> v3: Call unpin_vmap from the right dmabuf unmapper
>
> This patch [v4] reimplements the same functionality, but now as wrappers
> round the recently-introduced i915_gem_object_vmap_range() from Alex's
> patch mentioned above.
>
> With this change, we have only one vmap() in the whole driver :)
>
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.h | 11 +++++----
> drivers/gpu/drm/i915/i915_gem.c | 39 +++++++++++++++++++++++++++++++
> drivers/gpu/drm/i915/i915_gem_dmabuf.c | 41 ++++-----------------------------
> drivers/gpu/drm/i915/intel_ringbuffer.c | 32 ++++++++++++-------------
> 4 files changed, 67 insertions(+), 56 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 8cd47c6..b237ebd 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2168,10 +2168,7 @@ struct drm_i915_gem_object {
> struct scatterlist *sg;
> int last;
> } get_page;
> -
> - /* prime dma-buf support */
> - void *dma_buf_vmapping;
> - int vmapping_count;
> + void *vmapping;
>
> /** Breadcrumb of last rendering to the buffer.
> * There can only be one writer, but we allow for multiple readers.
> @@ -2979,6 +2976,12 @@ static inline void i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj)
> obj->pages_pin_count--;
> }
>
> +void *__must_check i915_gem_object_pin_vmap(struct drm_i915_gem_object *obj);
> +static inline void i915_gem_object_unpin_vmap(struct drm_i915_gem_object *obj)
> +{
> + i915_gem_object_unpin_pages(obj);
> +}
> +
> void *__must_check i915_gem_object_vmap_range(struct drm_i915_gem_object *obj,
> unsigned int first,
> unsigned int npages);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 0d5709a..14942cf 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2235,6 +2235,11 @@ static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
> ops->put_pages(obj);
> obj->pages = NULL;
>
> + if (obj->vmapping) {
> + vunmap(obj->vmapping);
> + obj->vmapping = NULL;
> + }
> +
> i915_gem_object_invalidate(obj);
>
> return 0;
> @@ -2447,6 +2452,40 @@ void *i915_gem_object_vmap_range(struct drm_i915_gem_object *obj,
> return addr;
> }
>
> +/**
> + * i915_gem_object_pin_vmap - pin a GEM object and map it into kernel space
> + * @obj: the GEM object to be mapped
> + *
> + * Conbines the functions of get_pages(), pin_pages() and vmap_range() on
> + * the whole object. The caller should release the mappoing by calling
> + * i915_gem_object_unpin_vmap() when the object is no longer required.
> + * function.
"Conbines", "mappoing" and a dangling "function." sentence :)
"when the mapping is no longer required." probably would be better.
Return value also needs to be documented.
> + *
> + * Returns an ERR_PTR on failure.
> + */
> +void *i915_gem_object_pin_vmap(struct drm_i915_gem_object *obj)
> +{
> + int ret;
> +
> + ret = i915_gem_object_get_pages(obj);
> + if (ret)
> + return ERR_PTR(ret);
> +
> + i915_gem_object_pin_pages(obj);
> +
> + if (obj->vmapping == NULL) {
> + obj->vmapping = i915_gem_object_vmap_range(obj, 0,
> + obj->base.size >> PAGE_SHIFT);
> +
> + if (obj->vmapping == NULL) {
> + i915_gem_object_unpin_pages(obj);
> + return ERR_PTR(-ENOMEM);
> + }
> + }
> +
> + return obj->vmapping;
> +}
> +
> void i915_vma_move_to_active(struct i915_vma *vma,
> struct drm_i915_gem_request *req)
> {
> diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
> index aee4149..adc7b5e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
> +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
> @@ -95,14 +95,12 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment *attachment,
> {
> struct drm_i915_gem_object *obj = dma_buf_to_obj(attachment->dmabuf);
>
> - mutex_lock(&obj->base.dev->struct_mutex);
> -
> dma_unmap_sg(attachment->dev, sg->sgl, sg->nents, dir);
> sg_free_table(sg);
> kfree(sg);
>
> + mutex_lock(&obj->base.dev->struct_mutex);
> i915_gem_object_unpin_pages(obj);
> -
> mutex_unlock(&obj->base.dev->struct_mutex);
> }
Unrelated hunk like this would better fit a separate patch so it can be
explained why it is OK, or better.
I would suggest dropping it from this patch.
> @@ -110,41 +108,17 @@ static void *i915_gem_dmabuf_vmap(struct dma_buf *dma_buf)
> {
> struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
> struct drm_device *dev = obj->base.dev;
> + void *addr;
> int ret;
>
> ret = i915_mutex_lock_interruptible(dev);
> if (ret)
> return ERR_PTR(ret);
>
> - if (obj->dma_buf_vmapping) {
> - obj->vmapping_count++;
> - goto out_unlock;
> - }
> -
> - ret = i915_gem_object_get_pages(obj);
> - if (ret)
> - goto err;
> -
> - i915_gem_object_pin_pages(obj);
> -
> - ret = -ENOMEM;
> -
> - obj->dma_buf_vmapping = i915_gem_object_vmap_range(obj, 0,
> - dma_buf->size >> PAGE_SHIFT);
> -
> - if (!obj->dma_buf_vmapping)
> - goto err_unpin;
> -
> - obj->vmapping_count = 1;
> -out_unlock:
> + addr = i915_gem_object_pin_vmap(obj);
> mutex_unlock(&dev->struct_mutex);
> - return obj->dma_buf_vmapping;
>
> -err_unpin:
> - i915_gem_object_unpin_pages(obj);
> -err:
> - mutex_unlock(&dev->struct_mutex);
> - return ERR_PTR(ret);
> + return addr;
> }
>
> static void i915_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr)
> @@ -153,12 +127,7 @@ static void i915_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr)
> struct drm_device *dev = obj->base.dev;
>
> mutex_lock(&dev->struct_mutex);
> - if (--obj->vmapping_count == 0) {
> - vunmap(obj->dma_buf_vmapping);
> - obj->dma_buf_vmapping = NULL;
> -
> - i915_gem_object_unpin_pages(obj);
> - }
> + i915_gem_object_unpin_vmap(obj);
> mutex_unlock(&dev->struct_mutex);
> }
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 434a452..47f186e 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2056,7 +2056,7 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
> void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
> {
> if (HAS_LLC(ringbuf->obj->base.dev) && !ringbuf->obj->stolen)
> - vunmap(ringbuf->virtual_start);
> + i915_gem_object_unpin_vmap(ringbuf->obj);
> else
> iounmap(ringbuf->virtual_start);
> ringbuf->virtual_start = NULL;
> @@ -2077,16 +2077,14 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
> return ret;
>
> ret = i915_gem_object_set_to_cpu_domain(obj, true);
> - if (ret) {
> - i915_gem_object_ggtt_unpin(obj);
> - return ret;
> - }
> + if (ret)
> + goto unpin;
>
> - ringbuf->virtual_start = i915_gem_object_vmap_range(obj, 0,
> - ringbuf->size >> PAGE_SHIFT);
> - if (ringbuf->virtual_start == NULL) {
> - i915_gem_object_ggtt_unpin(obj);
> - return -ENOMEM;
> + ringbuf->virtual_start = i915_gem_object_pin_vmap(obj);
> + if (IS_ERR(ringbuf->virtual_start)) {
> + ret = PTR_ERR(ringbuf->virtual_start);
> + ringbuf->virtual_start = NULL;
> + goto unpin;
> }
> } else {
> ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE);
> @@ -2094,10 +2092,8 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
> return ret;
>
> ret = i915_gem_object_set_to_gtt_domain(obj, true);
> - if (ret) {
> - i915_gem_object_ggtt_unpin(obj);
> - return ret;
> - }
> + if (ret)
> + goto unpin;
>
> /* Access through the GTT requires the device to be awake. */
> assert_rpm_wakelock_held(dev_priv);
> @@ -2105,14 +2101,18 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
> ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base +
> i915_gem_obj_ggtt_offset(obj), ringbuf->size);
> if (ringbuf->virtual_start == NULL) {
> - i915_gem_object_ggtt_unpin(obj);
> - return -EINVAL;
> + ret = -ENOMEM;
> + goto unpin;
Another refactoring not really belonging to this patch. I am not sure it
is that good to share the cleanup path from the two logically split
branches. May be fragile in the future. But it is short enough so OK.
But as a related question, I wonder why doesn't the LRC require the same
!HAS_LLC approach when mapping as ring buffer does here?
> }
> }
>
> ringbuf->vma = i915_gem_obj_to_ggtt(obj);
>
> return 0;
> +
> +unpin:
> + i915_gem_object_ggtt_unpin(obj);
> + return ret;
> }
>
> static void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
>
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* ✗ Fi.CI.BAT: failure for Reorganise calls to vmap() GEM objects
2016-02-22 15:18 [PATCH v4 0/3] Reorganise calls to vmap() GEM objects Dave Gordon
` (2 preceding siblings ...)
2016-02-22 15:18 ` [PATCH v4 3/3] drm/i915: optimise i915_gem_object_vmap_range() for small objects Dave Gordon
@ 2016-02-23 8:37 ` Patchwork
2016-02-23 10:59 ` Patchwork
4 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2016-02-23 8:37 UTC (permalink / raw)
To: Dave Gordon; +Cc: intel-gfx
== Series Details ==
Series: Reorganise calls to vmap() GEM objects
URL : https://patchwork.freedesktop.org/series/3688/
State : failure
== Summary ==
Series 3688v1 Reorganise calls to vmap() GEM objects
http://patchwork.freedesktop.org/api/1.0/series/3688/revisions/1/mbox/
Test core_prop_blob:
Subgroup basic:
incomplete -> PASS (snb-x220t)
Test drv_getparams_basic:
Subgroup basic-eu-total:
incomplete -> PASS (snb-x220t)
Test gem_cs_prefetch:
Subgroup basic-default:
incomplete -> PASS (ilk-hp8440p)
Test gem_ctx_param_basic:
Subgroup basic:
incomplete -> PASS (snb-x220t)
Subgroup basic-default:
incomplete -> PASS (snb-x220t)
Subgroup non-root-set:
incomplete -> PASS (snb-x220t)
Subgroup root-set:
incomplete -> PASS (snb-x220t)
Subgroup root-set-no-zeromap-disabled:
incomplete -> PASS (snb-x220t)
Test gem_exec_basic:
Subgroup basic-bsd2:
incomplete -> SKIP (snb-x220t)
Subgroup basic-default:
incomplete -> PASS (snb-x220t)
Test gem_exec_parse:
Subgroup basic-rejected:
incomplete -> SKIP (snb-x220t)
Test gem_flink_basic:
Subgroup bad-flink:
incomplete -> PASS (snb-x220t)
Subgroup basic:
incomplete -> PASS (snb-x220t)
Subgroup double-flink:
incomplete -> PASS (snb-x220t)
Test gem_mmap:
Subgroup basic:
incomplete -> PASS (snb-x220t)
Test gem_mmap_gtt:
Subgroup basic:
incomplete -> PASS (snb-x220t)
Subgroup basic-read:
incomplete -> PASS (snb-x220t)
Subgroup basic-read-write:
incomplete -> PASS (snb-x220t)
Subgroup basic-small-copy-xy:
incomplete -> PASS (snb-x220t)
Subgroup basic-write-gtt:
incomplete -> PASS (snb-x220t)
Subgroup basic-write-read-distinct:
incomplete -> PASS (snb-x220t)
Test gem_pread:
Subgroup basic:
incomplete -> PASS (snb-x220t)
Test gem_pwrite:
Subgroup basic:
incomplete -> PASS (snb-x220t)
Test gem_render_linear_blits:
Subgroup basic:
incomplete -> PASS (snb-x220t)
Test gem_render_tiled_blits:
Subgroup basic:
incomplete -> PASS (snb-x220t)
Test gem_ringfill:
Subgroup basic-default-child:
incomplete -> PASS (snb-x220t)
Subgroup basic-default-interruptible:
incomplete -> PASS (snb-x220t)
Test gem_storedw_loop:
Subgroup basic-vebox:
incomplete -> SKIP (snb-x220t)
Test gem_sync:
Subgroup basic-bsd:
incomplete -> PASS (snb-x220t)
Test kms_addfb_basic:
Subgroup addfb25-bad-modifier:
incomplete -> PASS (snb-x220t)
Subgroup addfb25-modifier-no-flag:
incomplete -> PASS (snb-x220t)
Subgroup addfb25-y-tiled-small:
incomplete -> SKIP (snb-x220t)
Subgroup addfb25-yf-tiled:
incomplete -> PASS (snb-x220t)
Subgroup bad-pitch-0:
incomplete -> PASS (snb-x220t)
Subgroup basic:
incomplete -> PASS (snb-x220t)
Subgroup bo-too-small-due-to-tiling:
incomplete -> PASS (snb-x220t)
Subgroup no-handle:
incomplete -> PASS (snb-x220t)
Subgroup size-max:
incomplete -> PASS (snb-x220t)
Subgroup too-high:
incomplete -> PASS (snb-x220t)
Subgroup unused-handle:
incomplete -> PASS (snb-x220t)
Subgroup unused-modifier:
incomplete -> PASS (snb-x220t)
Test kms_flip:
Subgroup basic-flip-vs-dpms:
pass -> DMESG-WARN (ilk-hp8440p) UNSTABLE
Subgroup basic-flip-vs-wf_vblank:
fail -> PASS (snb-x220t)
Test kms_force_connector_basic:
Subgroup force-load-detect:
dmesg-fail -> FAIL (snb-x220t)
Test kms_pipe_crc_basic:
Subgroup bad-nb-words-1:
incomplete -> PASS (snb-x220t)
Subgroup bad-source:
incomplete -> PASS (snb-x220t)
Subgroup hang-read-crc-pipe-b:
incomplete -> PASS (snb-x220t)
Subgroup hang-read-crc-pipe-c:
incomplete -> SKIP (snb-x220t)
Subgroup nonblocking-crc-pipe-a-frame-sequence:
incomplete -> PASS (snb-x220t)
Subgroup nonblocking-crc-pipe-b-frame-sequence:
incomplete -> PASS (snb-x220t)
Subgroup nonblocking-crc-pipe-c:
incomplete -> SKIP (snb-x220t)
Subgroup read-crc-pipe-a-frame-sequence:
incomplete -> PASS (snb-x220t)
Subgroup read-crc-pipe-b-frame-sequence:
incomplete -> PASS (snb-x220t)
Subgroup suspend-read-crc-pipe-b:
dmesg-warn -> PASS (skl-i5k-2) UNSTABLE
Subgroup suspend-read-crc-pipe-c:
pass -> DMESG-WARN (skl-i5k-2) UNSTABLE
Test kms_psr_sink_crc:
Subgroup psr_basic:
incomplete -> SKIP (snb-x220t)
Test kms_setmode:
Subgroup basic-clone-single-crtc:
incomplete -> PASS (snb-x220t)
Test pm_rpm:
Subgroup basic-pci-d3-state:
pass -> DMESG-WARN (byt-nuc)
Subgroup basic-rte:
pass -> DMESG-WARN (bsw-nuc-2)
pass -> FAIL (hsw-gt2)
dmesg-warn -> PASS (byt-nuc) UNSTABLE
Test pm_rps:
Subgroup basic-api:
incomplete -> PASS (snb-x220t)
Test prime_self_import:
Subgroup basic-llseek-size:
incomplete -> PASS (snb-x220t)
Subgroup basic-with_fd_dup:
incomplete -> PASS (snb-x220t)
bsw-nuc-2 total:168 pass:136 dwarn:1 dfail:0 fail:1 skip:30
byt-nuc total:168 pass:142 dwarn:1 dfail:0 fail:0 skip:25
hsw-gt2 total:168 pass:156 dwarn:0 dfail:1 fail:1 skip:10
ilk-hp8440p total:168 pass:117 dwarn:1 dfail:0 fail:1 skip:49
ivb-t430s total:168 pass:153 dwarn:0 dfail:0 fail:1 skip:14
skl-i5k-2 total:168 pass:151 dwarn:1 dfail:0 fail:0 skip:16
snb-dellxps total:168 pass:145 dwarn:0 dfail:1 fail:0 skip:22
snb-x220t total:168 pass:145 dwarn:0 dfail:0 fail:2 skip:21
Results at /archive/results/CI_IGT_test/Patchwork_1457/
08fc1b101049694778bff7559e1d05250d2e7072 drm-intel-nightly: 2016y-02m-22d-17h-30m-27s UTC integration manifest
2cd6b67de5e461e31cac718dc8cd4320162ab95c drm/i915: optimise i915_gem_object_vmap_range() for small objects
2c70e45577cbd3f1a8646219674bb1e0953c2d93 drm/i915: refactor duplicate object vmap functions (reworked)
4061b03c5ebdf6db460b11b4f4afe7fcfe4619eb drm/i915: add i915_gem_object_vmap_range() to map GEM object to virtual space
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 2/3] drm/i915: refactor duplicate object vmap functions (reworked)
2016-02-22 16:06 ` Tvrtko Ursulin
@ 2016-02-23 10:06 ` Chris Wilson
2016-02-23 10:31 ` Tvrtko Ursulin
0 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2016-02-23 10:06 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: intel-gfx
On Mon, Feb 22, 2016 at 04:06:39PM +0000, Tvrtko Ursulin wrote:
>
> [Cc Chris as the author of the idea.]
>
> Hi,
>
> On 22/02/16 15:18, Dave Gordon wrote:
> >This is essentially Chris Wilson's patch of a similar name, reworked on
> >top of Alex Dai's recent patch:
> > drm/i915: Add i915_gem_object_vmap to map GEM object to virtual space
> >Chris' original commentary said:
> >
> > We now have two implementations for vmapping a whole object, one for
> > dma-buf and one for the ringbuffer. If we couple the vmapping into
> > the obj->pages lifetime, then we can reuse an obj->vmapping for both
> > and at the same time couple it into the shrinker.
>
> As a general concept my worry is that by implementing this approach
> we tie two unrelated concepts together.
>
> Shrinker is about backing storage (used/free pages in a machine),
> while vmap is about kernel address space. And then on 32-bit with
> its limited vmap space (128MiB, right?), it can become exhausted
> much sooner that the shrinker would be triggered. And we would rely
> on the shrinker running to free up address space as well now, right?
Yes, we use the shrinker to free address space.
> So unless I am missing something that doesn't fit well.
The opposite. Even more reason for the shrinker to be able to recover
vmap space on 32bit systems (for external users, not just ourselves).
> >diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >index 434a452..47f186e 100644
> >--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> >+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >@@ -2056,7 +2056,7 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
> > void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
> > {
> > if (HAS_LLC(ringbuf->obj->base.dev) && !ringbuf->obj->stolen)
> >- vunmap(ringbuf->virtual_start);
> >+ i915_gem_object_unpin_vmap(ringbuf->obj);
> > else
> > iounmap(ringbuf->virtual_start);
> > ringbuf->virtual_start = NULL;
> >@@ -2077,16 +2077,14 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
> > return ret;
> >
> > ret = i915_gem_object_set_to_cpu_domain(obj, true);
> >- if (ret) {
> >- i915_gem_object_ggtt_unpin(obj);
> >- return ret;
> >- }
> >+ if (ret)
> >+ goto unpin;
> >
> >- ringbuf->virtual_start = i915_gem_object_vmap_range(obj, 0,
> >- ringbuf->size >> PAGE_SHIFT);
> >- if (ringbuf->virtual_start == NULL) {
> >- i915_gem_object_ggtt_unpin(obj);
> >- return -ENOMEM;
> >+ ringbuf->virtual_start = i915_gem_object_pin_vmap(obj);
> >+ if (IS_ERR(ringbuf->virtual_start)) {
> >+ ret = PTR_ERR(ringbuf->virtual_start);
> >+ ringbuf->virtual_start = NULL;
> >+ goto unpin;
> > }
> > } else {
> > ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE);
> >@@ -2094,10 +2092,8 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
> > return ret;
> >
> > ret = i915_gem_object_set_to_gtt_domain(obj, true);
> >- if (ret) {
> >- i915_gem_object_ggtt_unpin(obj);
> >- return ret;
> >- }
> >+ if (ret)
> >+ goto unpin;
> >
> > /* Access through the GTT requires the device to be awake. */
> > assert_rpm_wakelock_held(dev_priv);
> >@@ -2105,14 +2101,18 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
> > ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base +
> > i915_gem_obj_ggtt_offset(obj), ringbuf->size);
> > if (ringbuf->virtual_start == NULL) {
> >- i915_gem_object_ggtt_unpin(obj);
> >- return -EINVAL;
> >+ ret = -ENOMEM;
> >+ goto unpin;
>
> Another refactoring not really belonging to this patch. I am not
> sure it is that good to share the cleanup path from the two
> logically split branches. May be fragile in the future. But it is
> short enough so OK.
>
> But as a related question, I wonder why doesn't the LRC require the
> same !HAS_LLC approach when mapping as ring buffer does here?
We don't try to use stolen for LRC. The main difficulty lies in
deciding how to map the state object, stolen forces us to use an
ioremapping through the GTT and so only suitable for write-only
mappings. However, we could be using the per-context HWS, for which we
want a CPU accessible, direct pointer.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 3/3] drm/i915: optimise i915_gem_object_vmap_range() for small objects
2016-02-22 15:18 ` [PATCH v4 3/3] drm/i915: optimise i915_gem_object_vmap_range() for small objects Dave Gordon
@ 2016-02-23 10:16 ` Chris Wilson
0 siblings, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2016-02-23 10:16 UTC (permalink / raw)
To: Dave Gordon; +Cc: intel-gfx
On Mon, Feb 22, 2016 at 03:18:28PM +0000, Dave Gordon wrote:
> Now that we use this function for ringbuffers and other "small" objects,
> it's worth avoiding an extra kmalloc()/kfree() cycle if the page array
> is small enough to put on the stack. Here we've chosen an arbitrary
> cutoff of 32 (4k) pages, which is big enough for a ringbuffer (4 pages)
> or a context image (currently up to 22 pages).
>
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Alex Dai <yu.dai@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 18 +++++++++++++-----
> 1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 14942cf..effb69b 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2422,6 +2422,7 @@ void *i915_gem_object_vmap_range(struct drm_i915_gem_object *obj,
> unsigned int npages)
> {
> struct sg_page_iter sg_iter;
> + struct page *mypages[32];
Use stack or stack_pages, that's the pattern we've used elsewhere.
Though pages_on_stack would be more in line with the rest of the kernel.
> struct page **pages;
> void *addr;
> int i;
> @@ -2431,10 +2432,16 @@ void *i915_gem_object_vmap_range(struct drm_i915_gem_object *obj,
> return NULL;
> }
>
> - pages = drm_malloc_ab(npages, sizeof(*pages));
> - if (pages == NULL) {
> - DRM_DEBUG_DRIVER("Failed to get space for pages\n");
> - return NULL;
> + if (npages <= ARRAY_SIZE(mypages))
> + pages = mypages;
If one branch requires braces, add them to all.
> + else {
> + pages = kmalloc(npages*sizeof(*pages), GFP_TEMPORARY | __GFP_NOWARN);
drm_malloc_gfp
Otherwise, seems an ok idea. The deepest vmap is the
ioctl/execbuffer/cmdparser to which using a further 256 bytes of stack is
acceptable.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 2/3] drm/i915: refactor duplicate object vmap functions (reworked)
2016-02-23 10:06 ` Chris Wilson
@ 2016-02-23 10:31 ` Tvrtko Ursulin
2016-02-23 11:38 ` Chris Wilson
0 siblings, 1 reply; 14+ messages in thread
From: Tvrtko Ursulin @ 2016-02-23 10:31 UTC (permalink / raw)
To: Chris Wilson, Dave Gordon, intel-gfx
On 23/02/16 10:06, Chris Wilson wrote:
> On Mon, Feb 22, 2016 at 04:06:39PM +0000, Tvrtko Ursulin wrote:
>>
>> [Cc Chris as the author of the idea.]
>>
>> Hi,
>>
>> On 22/02/16 15:18, Dave Gordon wrote:
>>> This is essentially Chris Wilson's patch of a similar name, reworked on
>>> top of Alex Dai's recent patch:
>>> drm/i915: Add i915_gem_object_vmap to map GEM object to virtual space
>>> Chris' original commentary said:
>>>
>>> We now have two implementations for vmapping a whole object, one for
>>> dma-buf and one for the ringbuffer. If we couple the vmapping into
>>> the obj->pages lifetime, then we can reuse an obj->vmapping for both
>>> and at the same time couple it into the shrinker.
>>
>> As a general concept my worry is that by implementing this approach
>> we tie two unrelated concepts together.
>>
>> Shrinker is about backing storage (used/free pages in a machine),
>> while vmap is about kernel address space. And then on 32-bit with
>> its limited vmap space (128MiB, right?), it can become exhausted
>> much sooner that the shrinker would be triggered. And we would rely
>> on the shrinker running to free up address space as well now, right?
>
> Yes, we use the shrinker to free address space.
>
>> So unless I am missing something that doesn't fit well.
>
> The opposite. Even more reason for the shrinker to be able to recover
> vmap space on 32bit systems (for external users, not just ourselves).
How? I don't see that failed vmapping will trigger shrinking. What will
prevent i915 from accumulating objects with vmaps sticking around for
too long potentially?
>>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> index 434a452..47f186e 100644
>>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> @@ -2056,7 +2056,7 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
>>> void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
>>> {
>>> if (HAS_LLC(ringbuf->obj->base.dev) && !ringbuf->obj->stolen)
>>> - vunmap(ringbuf->virtual_start);
>>> + i915_gem_object_unpin_vmap(ringbuf->obj);
>>> else
>>> iounmap(ringbuf->virtual_start);
>>> ringbuf->virtual_start = NULL;
>>> @@ -2077,16 +2077,14 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
>>> return ret;
>>>
>>> ret = i915_gem_object_set_to_cpu_domain(obj, true);
>>> - if (ret) {
>>> - i915_gem_object_ggtt_unpin(obj);
>>> - return ret;
>>> - }
>>> + if (ret)
>>> + goto unpin;
>>>
>>> - ringbuf->virtual_start = i915_gem_object_vmap_range(obj, 0,
>>> - ringbuf->size >> PAGE_SHIFT);
>>> - if (ringbuf->virtual_start == NULL) {
>>> - i915_gem_object_ggtt_unpin(obj);
>>> - return -ENOMEM;
>>> + ringbuf->virtual_start = i915_gem_object_pin_vmap(obj);
>>> + if (IS_ERR(ringbuf->virtual_start)) {
>>> + ret = PTR_ERR(ringbuf->virtual_start);
>>> + ringbuf->virtual_start = NULL;
>>> + goto unpin;
>>> }
>>> } else {
>>> ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE);
>>> @@ -2094,10 +2092,8 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
>>> return ret;
>>>
>>> ret = i915_gem_object_set_to_gtt_domain(obj, true);
>>> - if (ret) {
>>> - i915_gem_object_ggtt_unpin(obj);
>>> - return ret;
>>> - }
>>> + if (ret)
>>> + goto unpin;
>>>
>>> /* Access through the GTT requires the device to be awake. */
>>> assert_rpm_wakelock_held(dev_priv);
>>> @@ -2105,14 +2101,18 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
>>> ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base +
>>> i915_gem_obj_ggtt_offset(obj), ringbuf->size);
>>> if (ringbuf->virtual_start == NULL) {
>>> - i915_gem_object_ggtt_unpin(obj);
>>> - return -EINVAL;
>>> + ret = -ENOMEM;
>>> + goto unpin;
>>
>> Another refactoring not really belonging to this patch. I am not
>> sure it is that good to share the cleanup path from the two
>> logically split branches. May be fragile in the future. But it is
>> short enough so OK.
>>
>> But as a related question, I wonder why doesn't the LRC require the
>> same !HAS_LLC approach when mapping as ring buffer does here?
>
> We don't try to use stolen for LRC. The main difficulty lies in
> deciding how to map the state object, stolen forces us to use an
> ioremapping through the GTT and so only suitable for write-only
> mappings. However, we could be using the per-context HWS, for which we
> want a CPU accessible, direct pointer.
I wasn't asking about stolen but the !HAS_LLC path. Even non-stolen ring
buffers will be mapped vie the aperture on !HAS_LLC platforms. That
implies it is about cache coherency and we don't have the same treatment
for the LRC page.
Until your vma->iomap prototype which added the same uncached access to
the LRC as well.
So my question was, do we need this for cache considerations today,
irrespective of caching the pointer in the VMA.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* ✗ Fi.CI.BAT: failure for Reorganise calls to vmap() GEM objects
2016-02-22 15:18 [PATCH v4 0/3] Reorganise calls to vmap() GEM objects Dave Gordon
` (3 preceding siblings ...)
2016-02-23 8:37 ` ✗ Fi.CI.BAT: failure for Reorganise calls to vmap() GEM objects Patchwork
@ 2016-02-23 10:59 ` Patchwork
4 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2016-02-23 10:59 UTC (permalink / raw)
To: Dave Gordon; +Cc: intel-gfx
== Series Details ==
Series: Reorganise calls to vmap() GEM objects
URL : https://patchwork.freedesktop.org/series/3688/
State : failure
== Summary ==
Series 3688v1 Reorganise calls to vmap() GEM objects
http://patchwork.freedesktop.org/api/1.0/series/3688/revisions/1/mbox/
Test gem_cs_prefetch:
Subgroup basic-default:
incomplete -> PASS (ilk-hp8440p)
Test kms_flip:
Subgroup basic-flip-vs-dpms:
pass -> DMESG-WARN (ilk-hp8440p) UNSTABLE
Test kms_force_connector_basic:
Subgroup force-edid:
skip -> PASS (snb-x220t)
Subgroup force-load-detect:
dmesg-fail -> FAIL (snb-x220t)
Test kms_pipe_crc_basic:
Subgroup suspend-read-crc-pipe-b:
dmesg-warn -> PASS (skl-i5k-2) UNSTABLE
Subgroup suspend-read-crc-pipe-c:
pass -> DMESG-WARN (skl-i5k-2) UNSTABLE
Test pm_rpm:
Subgroup basic-pci-d3-state:
pass -> DMESG-WARN (byt-nuc)
Subgroup basic-rte:
pass -> DMESG-WARN (bsw-nuc-2)
pass -> FAIL (hsw-gt2)
dmesg-warn -> PASS (byt-nuc) UNSTABLE
bsw-nuc-2 total:168 pass:136 dwarn:1 dfail:0 fail:1 skip:30
byt-nuc total:168 pass:142 dwarn:1 dfail:0 fail:0 skip:25
hsw-gt2 total:168 pass:156 dwarn:0 dfail:1 fail:1 skip:10
ilk-hp8440p total:168 pass:117 dwarn:1 dfail:0 fail:1 skip:49
ivb-t430s total:168 pass:153 dwarn:0 dfail:0 fail:1 skip:14
skl-i5k-2 total:168 pass:151 dwarn:1 dfail:0 fail:0 skip:16
snb-dellxps total:168 pass:145 dwarn:0 dfail:1 fail:0 skip:22
snb-x220t total:168 pass:145 dwarn:0 dfail:0 fail:2 skip:21
Results at /archive/results/CI_IGT_test/Patchwork_1457/
08fc1b101049694778bff7559e1d05250d2e7072 drm-intel-nightly: 2016y-02m-22d-17h-30m-27s UTC integration manifest
2cd6b67de5e461e31cac718dc8cd4320162ab95c drm/i915: optimise i915_gem_object_vmap_range() for small objects
2c70e45577cbd3f1a8646219674bb1e0953c2d93 drm/i915: refactor duplicate object vmap functions (reworked)
4061b03c5ebdf6db460b11b4f4afe7fcfe4619eb drm/i915: add i915_gem_object_vmap_range() to map GEM object to virtual space
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 2/3] drm/i915: refactor duplicate object vmap functions (reworked)
2016-02-23 10:31 ` Tvrtko Ursulin
@ 2016-02-23 11:38 ` Chris Wilson
2016-02-23 11:52 ` Chris Wilson
0 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2016-02-23 11:38 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: intel-gfx
On Tue, Feb 23, 2016 at 10:31:08AM +0000, Tvrtko Ursulin wrote:
>
> On 23/02/16 10:06, Chris Wilson wrote:
> >On Mon, Feb 22, 2016 at 04:06:39PM +0000, Tvrtko Ursulin wrote:
> >>
> >>[Cc Chris as the author of the idea.]
> >>
> >>Hi,
> >>
> >>On 22/02/16 15:18, Dave Gordon wrote:
> >>>This is essentially Chris Wilson's patch of a similar name, reworked on
> >>>top of Alex Dai's recent patch:
> >>> drm/i915: Add i915_gem_object_vmap to map GEM object to virtual space
> >>>Chris' original commentary said:
> >>>
> >>> We now have two implementations for vmapping a whole object, one for
> >>> dma-buf and one for the ringbuffer. If we couple the vmapping into
> >>> the obj->pages lifetime, then we can reuse an obj->vmapping for both
> >>> and at the same time couple it into the shrinker.
> >>
> >>As a general concept my worry is that by implementing this approach
> >>we tie two unrelated concepts together.
> >>
> >>Shrinker is about backing storage (used/free pages in a machine),
> >>while vmap is about kernel address space. And then on 32-bit with
> >>its limited vmap space (128MiB, right?), it can become exhausted
> >>much sooner that the shrinker would be triggered. And we would rely
> >>on the shrinker running to free up address space as well now, right?
> >
> >Yes, we use the shrinker to free address space.
> >
> >>So unless I am missing something that doesn't fit well.
> >
> >The opposite. Even more reason for the shrinker to be able to recover
> >vmap space on 32bit systems (for external users, not just ourselves).
>
> How? I don't see that failed vmapping will trigger shrinking. What
> will prevent i915 from accumulating objects with vmaps sticking
> around for too long potentially?
I read what you wrote as implying the shrinker would be run (which is
what I hoped). I made the mistake of just thinking of the allocations
around that path, rather than alloc_vmap_area().
Indeed, we would need a new notifier, pretty much for the sole use of
32b. Grr, that will be a nuisance.
> >>But as a related question, I wonder why doesn't the LRC require the
> >>same !HAS_LLC approach when mapping as ring buffer does here?
> >
> >We don't try to use stolen for LRC. The main difficulty lies in
> >deciding how to map the state object, stolen forces us to use an
> >ioremapping through the GTT and so only suitable for write-only
> >mappings. However, we could be using the per-context HWS, for which we
> >want a CPU accessible, direct pointer.
>
> I wasn't asking about stolen but the !HAS_LLC path. Even non-stolen
> ring buffers will be mapped vie the aperture on !HAS_LLC platforms.
> That implies it is about cache coherency and we don't have the same
> treatment for the LRC page.
Oh, completely missed the point. Yes, I also wonder how the kmap() on
the context object works for Braswell without an explicit clflush of at
least the TAIL update between requests.
> Until your vma->iomap prototype which added the same uncached access
> to the LRC as well.
>
> So my question was, do we need this for cache considerations today,
> irrespective of caching the pointer in the VMA.
Yes, I think so, but it's hard to say as Braswell breaks in so many
different ways when the littlest of stress is applied.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 2/3] drm/i915: refactor duplicate object vmap functions (reworked)
2016-02-23 11:38 ` Chris Wilson
@ 2016-02-23 11:52 ` Chris Wilson
2016-02-23 12:25 ` Tvrtko Ursulin
0 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2016-02-23 11:52 UTC (permalink / raw)
To: Tvrtko Ursulin, Dave Gordon, intel-gfx
On Tue, Feb 23, 2016 at 11:38:02AM +0000, Chris Wilson wrote:
> Indeed, we would need a new notifier, pretty much for the sole use of
> 32b. Grr, that will be a nuisance.
Core changes:
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index d1f1d338af20..542a91c2785f 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -187,4 +187,8 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
#define VMALLOC_TOTAL 0UL
#endif
+struct notitifer_block;
+int register_vmap_notifier(struct notifier_block *nb);
+int unregister_vmap_notifier(struct notifier_block *nb);
+
#endif /* _LINUX_VMALLOC_H */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index fb42a5bffe47..0735d82444f7 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -21,6 +21,7 @@
#include <linux/debugobjects.h>
#include <linux/kallsyms.h>
#include <linux/list.h>
+#include <linux/notifier.h>
#include <linux/rbtree.h>
#include <linux/radix-tree.h>
#include <linux/rcupdate.h>
@@ -344,6 +345,8 @@ static void __insert_vmap_area(struct vmap_area *va)
static void purge_vmap_area_lazy(void);
+static BLOCKING_NOTIFIER_HEAD(vmap_notify_list);
+
/*
* Allocate a region of KVA of the specified size and alignment, within the
* vstart and vend.
@@ -356,6 +359,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
struct vmap_area *va;
struct rb_node *n;
unsigned long addr;
+ unsigned long freed;
int purged = 0;
struct vmap_area *first;
@@ -468,6 +472,12 @@ overflow:
purged = 1;
goto retry;
}
+ freed = 0;
+ blocking_notifier_call_chain(&vmap_notify_list, 0, &freed);
+ if (freed > 0) {
+ purged = 0;
+ goto retry;
+ }
if (printk_ratelimit())
pr_warn("vmap allocation for size %lu failed: "
"use vmalloc=<size> to increase size.\n", size);
@@ -475,6 +485,18 @@ overflow:
return ERR_PTR(-EBUSY);
}
+int register_vmap_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_register(&vmap_notify_list, nb);
+}
+EXPORT_SYMBOL_GPL(register_vmap_notifier);
+
+int unregister_vmap_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_unregister(&vmap_notify_list, nb);
+}
+EXPORT_SYMBOL_GPL(unregister_vmap_notifier);
+
static void __free_vmap_area(struct vmap_area *va)
{
BUG_ON(RB_EMPTY_NODE(&va->rb_node));
That doesn't look too horrendous. Names?
register_oovmap_notifier
register_vmap_nospace_notifier?
register_vmap_purge_notifier?
And the 64k question, how to sell it?
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v4 2/3] drm/i915: refactor duplicate object vmap functions (reworked)
2016-02-23 11:52 ` Chris Wilson
@ 2016-02-23 12:25 ` Tvrtko Ursulin
2016-02-23 12:56 ` Chris Wilson
0 siblings, 1 reply; 14+ messages in thread
From: Tvrtko Ursulin @ 2016-02-23 12:25 UTC (permalink / raw)
To: Chris Wilson, Dave Gordon, intel-gfx
On 23/02/16 11:52, Chris Wilson wrote:
> On Tue, Feb 23, 2016 at 11:38:02AM +0000, Chris Wilson wrote:
>> Indeed, we would need a new notifier, pretty much for the sole use of
>> 32b. Grr, that will be a nuisance.
>
> Core changes:
>
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index d1f1d338af20..542a91c2785f 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -187,4 +187,8 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> #define VMALLOC_TOTAL 0UL
> #endif
>
> +struct notitifer_block;
> +int register_vmap_notifier(struct notifier_block *nb);
> +int unregister_vmap_notifier(struct notifier_block *nb);
> +
> #endif /* _LINUX_VMALLOC_H */
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index fb42a5bffe47..0735d82444f7 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -21,6 +21,7 @@
> #include <linux/debugobjects.h>
> #include <linux/kallsyms.h>
> #include <linux/list.h>
> +#include <linux/notifier.h>
> #include <linux/rbtree.h>
> #include <linux/radix-tree.h>
> #include <linux/rcupdate.h>
> @@ -344,6 +345,8 @@ static void __insert_vmap_area(struct vmap_area *va)
>
> static void purge_vmap_area_lazy(void);
>
> +static BLOCKING_NOTIFIER_HEAD(vmap_notify_list);
> +
> /*
> * Allocate a region of KVA of the specified size and alignment, within the
> * vstart and vend.
> @@ -356,6 +359,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
> struct vmap_area *va;
> struct rb_node *n;
> unsigned long addr;
> + unsigned long freed;
> int purged = 0;
> struct vmap_area *first;
>
> @@ -468,6 +472,12 @@ overflow:
> purged = 1;
> goto retry;
> }
> + freed = 0;
> + blocking_notifier_call_chain(&vmap_notify_list, 0, &freed);
> + if (freed > 0) {
> + purged = 0;
> + goto retry;
> + }
> if (printk_ratelimit())
> pr_warn("vmap allocation for size %lu failed: "
> "use vmalloc=<size> to increase size.\n", size);
> @@ -475,6 +485,18 @@ overflow:
> return ERR_PTR(-EBUSY);
> }
>
> +int register_vmap_notifier(struct notifier_block *nb)
> +{
> + return blocking_notifier_chain_register(&vmap_notify_list, nb);
> +}
> +EXPORT_SYMBOL_GPL(register_vmap_notifier);
> +
> +int unregister_vmap_notifier(struct notifier_block *nb)
> +{
> + return blocking_notifier_chain_unregister(&vmap_notify_list, nb);
> +}
> +EXPORT_SYMBOL_GPL(unregister_vmap_notifier);
> +
> static void __free_vmap_area(struct vmap_area *va)
> {
> BUG_ON(RB_EMPTY_NODE(&va->rb_node));
>
>
> That doesn't look too horrendous. Names?
Downside is new deadlock opportunity so GEM callers to vmap would have
to release the struct mutex.
> register_oovmap_notifier
> register_vmap_nospace_notifier?
> register_vmap_purge_notifier?
register_vmap_shrinker ?
> And the 64k question, how to sell it?
Not sure, maybe with numbers showing that caching the vmapping helps us
significantly?
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 2/3] drm/i915: refactor duplicate object vmap functions (reworked)
2016-02-23 12:25 ` Tvrtko Ursulin
@ 2016-02-23 12:56 ` Chris Wilson
0 siblings, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2016-02-23 12:56 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: intel-gfx
On Tue, Feb 23, 2016 at 12:25:29PM +0000, Tvrtko Ursulin wrote:
>
> On 23/02/16 11:52, Chris Wilson wrote:
> >On Tue, Feb 23, 2016 at 11:38:02AM +0000, Chris Wilson wrote:
> >>Indeed, we would need a new notifier, pretty much for the sole use of
> >>32b. Grr, that will be a nuisance.
> >
> >Core changes:
> >
> >diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> >index d1f1d338af20..542a91c2785f 100644
> >--- a/include/linux/vmalloc.h
> >+++ b/include/linux/vmalloc.h
> >@@ -187,4 +187,8 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> > #define VMALLOC_TOTAL 0UL
> > #endif
> >
> >+struct notitifer_block;
> >+int register_vmap_notifier(struct notifier_block *nb);
> >+int unregister_vmap_notifier(struct notifier_block *nb);
> >+
> > #endif /* _LINUX_VMALLOC_H */
> >diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> >index fb42a5bffe47..0735d82444f7 100644
> >--- a/mm/vmalloc.c
> >+++ b/mm/vmalloc.c
> >@@ -21,6 +21,7 @@
> > #include <linux/debugobjects.h>
> > #include <linux/kallsyms.h>
> > #include <linux/list.h>
> >+#include <linux/notifier.h>
> > #include <linux/rbtree.h>
> > #include <linux/radix-tree.h>
> > #include <linux/rcupdate.h>
> >@@ -344,6 +345,8 @@ static void __insert_vmap_area(struct vmap_area *va)
> >
> > static void purge_vmap_area_lazy(void);
> >
> >+static BLOCKING_NOTIFIER_HEAD(vmap_notify_list);
> >+
> > /*
> > * Allocate a region of KVA of the specified size and alignment, within the
> > * vstart and vend.
> >@@ -356,6 +359,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
> > struct vmap_area *va;
> > struct rb_node *n;
> > unsigned long addr;
> >+ unsigned long freed;
> > int purged = 0;
> > struct vmap_area *first;
> >
> >@@ -468,6 +472,12 @@ overflow:
> > purged = 1;
> > goto retry;
> > }
> >+ freed = 0;
> >+ blocking_notifier_call_chain(&vmap_notify_list, 0, &freed);
> >+ if (freed > 0) {
> >+ purged = 0;
> >+ goto retry;
> >+ }
> > if (printk_ratelimit())
> > pr_warn("vmap allocation for size %lu failed: "
> > "use vmalloc=<size> to increase size.\n", size);
> >@@ -475,6 +485,18 @@ overflow:
> > return ERR_PTR(-EBUSY);
> > }
> >
> >+int register_vmap_notifier(struct notifier_block *nb)
> >+{
> >+ return blocking_notifier_chain_register(&vmap_notify_list, nb);
> >+}
> >+EXPORT_SYMBOL_GPL(register_vmap_notifier);
> >+
> >+int unregister_vmap_notifier(struct notifier_block *nb)
> >+{
> >+ return blocking_notifier_chain_unregister(&vmap_notify_list, nb);
> >+}
> >+EXPORT_SYMBOL_GPL(unregister_vmap_notifier);
> >+
> > static void __free_vmap_area(struct vmap_area *va)
> > {
> > BUG_ON(RB_EMPTY_NODE(&va->rb_node));
> >
> >
> >That doesn't look too horrendous. Names?
>
> Downside is new deadlock opportunity so GEM callers to vmap would
> have to release the struct mutex.
No, the shrinker can steal the mutex for itself (or gives up). Callers
need to be sure that they pin their pages before doing any allocations
if they intend to use the pages afterwards. (i.e. we want to apply the
same rules for get_pages vs malloc to get_pages vs vmalloc).
> >register_oovmap_notifier
> >register_vmap_nospace_notifier?
> >register_vmap_purge_notifier?
>
> register_vmap_shrinker ?
>
> >And the 64k question, how to sell it?
>
> Not sure, maybe with numbers showing that caching the vmapping helps
> us significantly?
Something along the lines of how beneficial short-term caching can be
with the issue that raises of how best to release the cache, with
parallels to existing caches.
Hmm, Documentation/vm/highmem.txt:
==========================
TEMPORARY VIRTUAL MAPPINGS
==========================
The kernel contains several ways of creating temporary mappings:
(*) vmap(). This can be used to make a long duration mapping of multiple
physical pages into a contiguous virtual space. It needs global
synchronization to unmap.
"long duration" but temporary, ok, maybe there's a little more wiggle
room already here for keeping vmappings around. I was under the
impression that vmaps were to only ever be shortlived (due to the
pressure on vmap space).
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2016-02-23 12:58 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-22 15:18 [PATCH v4 0/3] Reorganise calls to vmap() GEM objects Dave Gordon
2016-02-22 15:18 ` [PATCH v4 1/3] drm/i915: add i915_gem_object_vmap_range() to map GEM object to virtual space Dave Gordon
2016-02-22 15:18 ` [PATCH v4 2/3] drm/i915: refactor duplicate object vmap functions (reworked) Dave Gordon
2016-02-22 16:06 ` Tvrtko Ursulin
2016-02-23 10:06 ` Chris Wilson
2016-02-23 10:31 ` Tvrtko Ursulin
2016-02-23 11:38 ` Chris Wilson
2016-02-23 11:52 ` Chris Wilson
2016-02-23 12:25 ` Tvrtko Ursulin
2016-02-23 12:56 ` Chris Wilson
2016-02-22 15:18 ` [PATCH v4 3/3] drm/i915: optimise i915_gem_object_vmap_range() for small objects Dave Gordon
2016-02-23 10:16 ` Chris Wilson
2016-02-23 8:37 ` ✗ Fi.CI.BAT: failure for Reorganise calls to vmap() GEM objects Patchwork
2016-02-23 10:59 ` Patchwork
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).