Re: [PATCH] drm/i915: tidy up request alloc

From: Dave Gordon <david.s.gordon@intel.com>
To: "Liu, Hong" <hong.liu@intel.com>,
	"chris@chris-wilson.co.uk" <chris@chris-wilson.co.uk>
Cc: "intel-gfx@lists.freedesktop.org" <intel-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/i915: tidy up request alloc
Date: Mon, 4 Jul 2016 11:36:03 +0100	[thread overview]
Message-ID: <577A3C13.4010005@intel.com> (raw)
In-Reply-To: <1467605321.2003.9.camel@intel.com>

On 04/07/16 05:08, Liu, Hong wrote:
> On Fri, 2016-07-01 at 19:34 +0100, Chris Wilson wrote:
>> On Fri, Jul 01, 2016 at 05:58:18PM +0100, Dave Gordon wrote:
>>> On 30/06/16 13:49, Tvrtko Ursulin wrote:
>>>>
>>>> On 30/06/16 11:22, Chris Wilson wrote:
>>>>> On Thu, Jun 30, 2016 at 09:50:20AM +0100, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 30/06/16 02:35, Hong Liu wrote:
>>>>>>> Return the allocated request pointer directly to remove
>>>>>>> the double pointer parameter.
>>>>>>>
>>>>>>> Signed-off-by: Hong Liu <hong.liu@intel.com>
>>>>>>> ---
>>>>>>>   drivers/gpu/drm/i915/i915_gem.c | 25 +++++++------------
>>>>>>> ------
>>>>>>>   1 file changed, 7 insertions(+), 18 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>>>>>>> b/drivers/gpu/drm/i915/i915_gem.c
>>>>>>> index 1d98782..9881455 100644
>>>>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>>>>> @@ -2988,32 +2988,26 @@ void i915_gem_request_free(struct
>>>>>>> kref
>>>>>>> *req_ref)
>>>>>>>       kmem_cache_free(req->i915->requests, req);
>>>>>>>   }
>>>>>>>
>>>>>>> -static inline int
>>>>>>> +static inline struct drm_i915_gem_request *
>>>>>>>   __i915_gem_request_alloc(struct intel_engine_cs *engine,
>>>>>>> -             struct i915_gem_context *ctx,
>>>>>>> -             struct drm_i915_gem_request **req_out)
>>>>>>> +             struct i915_gem_context *ctx)
>>>>>>>   {
>>>>>>>       struct drm_i915_private *dev_priv = engine->i915;
>>>>>>>       unsigned reset_counter =
>>>>>>> i915_reset_counter(&dev_priv->gpu_error);
>>>>>>>       struct drm_i915_gem_request *req;
>>>>>>>       int ret;
>>>>>>>
>>>>>>> -    if (!req_out)
>>>>>>> -        return -EINVAL;
>>>>>>> -
>>>>>>> -    *req_out = NULL;
>>>>>>> -
>>>>>>>       /* ABI: Before userspace accesses the GPU (e.g.
>>>>>>> execbuffer),
>>>>>>> report
>>>>>>>        * EIO if the GPU is already wedged, or EAGAIN to drop
>>>>>>> the
>>>>>>> struct_mutex
>>>>>>>        * and restart.
>>>>>>>        */
>>>>>>>       ret = i915_gem_check_wedge(reset_counter,
>>>>>>> dev_priv->mm.interruptible);
>>>>>>>       if (ret)
>>>>>>> -        return ret;
>>>>>>> +        return ERR_PTR(ret);
>>>>>>>
>>>>>>>       req = kmem_cache_zalloc(dev_priv->requests,
>>>>>>> GFP_KERNEL);
>>>>>>>       if (req == NULL)
>>>>>>> -        return -ENOMEM;
>>>>>>> +        return ERR_PTR(-ENOMEM);
>>>>>>>
>>>>>>>       ret = i915_gem_get_seqno(engine->i915, &req->seqno);
>>>>>>>       if (ret)
>>>>>>> @@ -3041,14 +3035,13 @@ __i915_gem_request_alloc(struct
>>>>>>> intel_engine_cs *engine,
>>>>>>>       if (ret)
>>>>>>>           goto err_ctx;
>>>>>>>
>>>>>>> -    *req_out = req;
>>>>>>> -    return 0;
>>>>>>> +    return req;
>>>>>>>
>>>>>>>   err_ctx:
>>>>>>>       i915_gem_context_unreference(ctx);
>>>>>>>   err:
>>>>>>>       kmem_cache_free(dev_priv->requests, req);
>>>>>>> -    return ret;
>>>>>>> +    return ERR_PTR(ret);
>>>>>>>   }
>>>>>>>
>>>>>>>   /**
>>>>>>> @@ -3067,13 +3060,9 @@ struct drm_i915_gem_request *
>>>>>>>   i915_gem_request_alloc(struct intel_engine_cs *engine,
>>>>>>>                  struct i915_gem_context *ctx)
>>>>>>>   {
>>>>>>> -    struct drm_i915_gem_request *req;
>>>>>>> -    int err;
>>>>>>> -
>>>>>>>       if (ctx == NULL)
>>>>>>>           ctx = engine->i915->kernel_context;
>>>>>>> -    err = __i915_gem_request_alloc(engine, ctx, &req);
>>>>>>> -    return err ? ERR_PTR(err) : req;
>>>>>>> +    return __i915_gem_request_alloc(engine, ctx);
>>>>>>>   }
>>>>>>>
>>>>>>>   struct drm_i915_gem_request *
>>>>>>
>>>>>> Looks good to me. And have this feeling I've seen this
>>>>>> somewhere before.
>>>>>
>>>>> Several times. This is not the full tidy, nor does it realise
>>>>> the ramifactions of request alloc through the stack.
>>>>
>>>> Hm I can't spot that it is doing anything wrong or making
>>>> anything worse. You don't want to let the small cleanup in?
>>>>
>>>> Regards,
>>>> Tvrtko
>>>
>>> It ought to make almost no difference, because the *only* place the
>>> inner function is called is from the outer one, which passes a
>>> pointer to a local for the returned object; and the inner one is
>>> then inlined, so the compiler doesn't actually put it on the stack
>>> and call to the inner allocator anyway.
>>>
>>> Strangely, however, with this change the code becomes ~400 bytes
>>> bigger!
>>>
>>> Disassembly reveals that while the code for the externally-callable
>>> outer function is indeed almost identical, a second copy of it has
>>> also been inlined at the one callsite in this file:
>>>
>>> __i915_gem_object_sync() ...
>>> 	req = i915_gem_request_alloc(to, NULL);
>>>
>>> I don't think that's a critical path and would rather have 400
>>> bytes
>>> smaller codespace. We can get that back by adding /noinline/ to the
>>> outer function i915_gem_request_alloc() (not, of course, to the
>>> inner one, that definitely *should* be inline).
>>
>> __i915_gem_object_sync() should not be calling
>> i915_gem_request_alloc().
>>
>> That's the issue with this patch, your patch and John's patch.
>
> So we wrote the i915_gem_request_alloc() this way is to avoid being
> inlined into callers like __i915_gem_object_sync()?

Not specifically, as the description of commit 268270883 says,

"... this patch renames the existing i915_gem_request_alloc(), and
  makes it local (static inline), and replaces it with a wrapper that
  provides a default if the context is NULL, and also has a nicer
  calling convention (doesn't require a pointer to an output parameter).
  Then we change all callers to use the new convention:
     OLD:
         err = i915_gem_request_alloc(ring, user_ctx, &req);
         if (err) ...
     NEW:
         req = i915_gem_request_alloc(ring, user_ctx);
         if (IS_ERR(req)) ...
     OLD:
         err = i915_gem_request_alloc(ring, ring->default_context, &req);
         if (err) ...
     NEW:
         req = i915_gem_request_alloc(ring, NULL);
         if (IS_ERR(req)) ..."

At the time, it was more a matter of simplifying review and minimising 
lines-of-change by adding the wrapper to implement the new convention 
rather than rewriting the real function at the same time as changing all 
the callers. So I have no objection to continuing the process of making 
this nicer by now flattening the two layers into a single function; I 
just noted the oddity that the flattening made the codespace larger.

Chris, do you have a patch for *not* calling i915_gem_request_alloc() 
from inside __i915_gem_object_sync() ? I don't really know why John's 
original conversion to passing requests everywhere needed this.

.Dave.

> I checked the file with GCC 4.8.5 on my centos environment, it is like
> what Dave found. With the patch, i915_gem_object_sync() is 368 bytes
> bigger.
>
> But when I checked it with GCC 6.1.1 on Fedora 24, it seems it inlines
> the i915_gem_request_alloc() even with the current implementation.
> With the patch, the i915_gem_object_sync() is 80 bytes smaller.
>
>> -Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx