Re: [PATCH v6 3/7] drm/i915: introduce and use i915_gem_object_vmap_range()

From: Dave Gordon <david.s.gordon@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH v6 3/7] drm/i915: introduce and use i915_gem_object_vmap_range()
Date: Tue, 1 Mar 2016 13:13:29 +0000	[thread overview]
Message-ID: <56D59579.2010004@intel.com> (raw)
In-Reply-To: <56D56B0D.2050704@linux.intel.com>

On 01/03/16 10:12, Tvrtko Ursulin wrote:
>
> On 29/02/16 21:16, Dave Gordon wrote:
>> From: Alex Dai <yu.dai@intel.com>
>>
>> There are several places inside driver where a GEM object is mapped
>> to kernel virtual space. The mapping may be done either for the whole
>> object or only a subset of it.
>>
>> This patch introduces a function i915_gem_object_vmap_range() to
>> implement the common functionality. The code itself is extracted and
>> adapted from that in vmap_batch(), but also replaces vmap_obj() and the
>> open-coded version in i915_gem_dmabuf_vmap().
>>
>> v2: use obj->pages->nents for iteration within i915_gem_object_vmap;
>>      break when it finishes all desired pages. The caller must pass the
>>      actual page count required. [Tvrtko Ursulin]
>>
>> v4: renamed to i915_gem_object_vmap_range() to make its function
>>      clearer. [Dave Gordon]
>>
>> v5: use Chris Wilson's new drm_malloc_gfp() rather than kmalloc() or
>>      drm_malloc_ab(). [Dave Gordon]
>>
>> v6: changed range checking to not use pages->nents. [Tvrtko Ursulin]
>>      Use sg_nents_for_len() for range check instead. [Dave Gordon]
>>      Pass range parameters in bytes rather than pages (both callers
>>      were converting from bytes to pages anyway, so this reduces the
>>      number of places where the conversion is done).
>>
>> With this change, we have only one vmap() in the whole driver :)
>>
>> Signed-off-by: Alex Dai <yu.dai@intel.com>
>> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> ---

[snip]

>> +/**
>> + * i915_gem_object_vmap_range - map some or all of a GEM object into
>> kernel space
>> + * @obj: the GEM object to be mapped
>> + * @start: offset in bytes of the start of the range to be mapped
>> + * @len: length in bytes of the range to be mapped
>
> kbuild spotted this kerneldoc issue.

Ah yes, that parameter got renamed

>> + *
>> + * Map a given range of a GEM object into kernel virtual space. The
>> range will
>> + * be extended at both ends, if necessary, to span a whole number of
>> pages. The
>> + * caller must make sure the associated pages are gathered and pinned
>> before
>> + * calling this function, and is responsible for unmapping the
>> returned address
>> + * when it is no longer required.
>> + *
>> + * Returns the address at which the object has been mapped, or NULL
>> on failure.
>> + */
>> +void *i915_gem_object_vmap_range(struct drm_i915_gem_object *obj,
>> +                 unsigned long start,
>> +                 unsigned long nbytes)
>> +{
>> +    struct scatterlist *sg = obj->pages->sgl;
>> +    struct sg_page_iter sg_iter;
>> +    struct page **pages;
>> +    unsigned long first, npages, i;
>> +    int nents;
>> +    void *addr;
>> +
>> +    /* Check requested range against underlying sg list */
>> +    nents = sg_nents_for_len(sg, start + nbytes);
>> +    if (nents < 0) {
>> +        DRM_DEBUG_DRIVER("Invalid page count\n");
>> +        return NULL;
>> +    }
>
> I think this is needless overhead. The helper will iterate the whole sg
> chain while we know the size in obj->base.size and finding out the real
> nents is of little (no) use to the code below.

It was more of a consistency check between the GEM object on the one 
hand, and the SGlist implementation on the other; since none of the 
other SG interfaces actually work in any useful quantities (e.g. pages 
or bytes; 'nents' is particularly useless, as it is affected by the 
details of the way the underlying pages have been mapped, in particular 
how many separate chunks they're in).

>> +
>> +    /* Work in pages from now on */
>> +    first = start >> PAGE_SHIFT;
>> +    npages = DIV_ROUND_UP(start + nbytes, PAGE_SIZE) - first;
>
> And this looks like weak API if the caller can pass non page aligned
> start and size and the function will silently vmap something else.
>
> It should assert and fail on both I think, or it may have been simpler
> to keep it working in page units.

It is exactly the API required by copy_batch() (which is where this code 
came from), and it's a perfectly well-defined API: the start offset is 
rounded down to the start of the containing page, the end address (start 
+ len) is rounded up to the next page boundary, and that defines what 
gets mapped i.e. the smallest page-aligned region containing the range 
specified.

But I could let (all) the callers do that conversion instead ...

>> +    pages = drm_malloc_gfp(npages, sizeof(*pages), GFP_TEMPORARY);
>> +    if (pages == NULL) {
>> +        DRM_DEBUG_DRIVER("Failed to get space for pages\n");
>> +        return NULL;
>> +    }
>> +
>> +    i = 0;
>> +    for_each_sg_page(sg, &sg_iter, nents, first) {
>> +        pages[i] = sg_page_iter_page(&sg_iter);
>> +        if (++i >= npages)
>> +            break;
>> +    }
>> +    WARN_ON(i != npages);
>> +
>> +    addr = vmap(pages, npages, 0, PAGE_KERNEL);
>> +    if (addr == NULL)
>> +        DRM_DEBUG_DRIVER("Failed to vmap pages\n");
>> +    drm_free_large(pages);
>> +
>> +    return addr;
>> +}
>> +
>>   void i915_vma_move_to_active(struct i915_vma *vma,
>>                    struct drm_i915_gem_request *req)
>>   {
>> diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
>> b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
>> index 1f3eef6..aee4149 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
>> @@ -110,9 +110,7 @@ static void *i915_gem_dmabuf_vmap(struct dma_buf
>> *dma_buf)
>>   {
>>       struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
>>       struct drm_device *dev = obj->base.dev;
>> -    struct sg_page_iter sg_iter;
>> -    struct page **pages;
>> -    int ret, i;
>> +    int ret;
>>
>>       ret = i915_mutex_lock_interruptible(dev);
>>       if (ret)
>> @@ -131,16 +129,8 @@ static void *i915_gem_dmabuf_vmap(struct dma_buf
>> *dma_buf)
>>
>>       ret = -ENOMEM;
>>
>> -    pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
>> -    if (pages == NULL)
>> -        goto err_unpin;
>> -
>> -    i = 0;
>> -    for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0)
>> -        pages[i++] = sg_page_iter_page(&sg_iter);
>> -
>> -    obj->dma_buf_vmapping = vmap(pages, i, 0, PAGE_KERNEL);
>> -    drm_free_large(pages);
>> +    obj->dma_buf_vmapping = i915_gem_object_vmap_range(obj, 0,
>> +                        dma_buf->size >> PAGE_SHIFT);
>
> This is still in pages. (Although as said below I think it should remain
> and API be reverted back.)

Ah yes, I missed this call and the one below, 'cos they get fixed up by 
the last patch of the series (which *does* adopt the new interface).

>>       if (!obj->dma_buf_vmapping)
>>           goto err_unpin;
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index 45ce45a..434a452 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -2064,27 +2064,6 @@ void intel_unpin_ringbuffer_obj(struct
>> intel_ringbuffer *ringbuf)
>>       i915_gem_object_ggtt_unpin(ringbuf->obj);
>>   }
>>
>> -static u32 *vmap_obj(struct drm_i915_gem_object *obj)
>> -{
>> -    struct sg_page_iter sg_iter;
>> -    struct page **pages;
>> -    void *addr;
>> -    int i;
>> -
>> -    pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
>> -    if (pages == NULL)
>> -        return NULL;
>> -
>> -    i = 0;
>> -    for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0)
>> -        pages[i++] = sg_page_iter_page(&sg_iter);
>> -
>> -    addr = vmap(pages, i, 0, PAGE_KERNEL);
>> -    drm_free_large(pages);
>> -
>> -    return addr;
>> -}
>> -
>>   int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
>>                        struct intel_ringbuffer *ringbuf)
>>   {
>> @@ -2103,7 +2082,8 @@ int intel_pin_and_map_ringbuffer_obj(struct
>> drm_device *dev,
>>               return ret;
>>           }
>>
>> -        ringbuf->virtual_start = vmap_obj(obj);
>> +        ringbuf->virtual_start = i915_gem_object_vmap_range(obj, 0,
>> +                        ringbuf->size >> PAGE_SHIFT);
>
> Here also.

Actually ... if we work in pages but adopt a convention that length==0 
means "the whole object" then only copy_batch() would have to do any 
page-aligning calculations; all other callers always want the whole 
object so it's nice to make the interface convenient for them :)

I'll see how that works out ...

.Dave.

>>           if (ringbuf->virtual_start == NULL) {
>>               i915_gem_object_ggtt_unpin(obj);
>>               return -ENOMEM;
>>
>
> Regards,
> Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx