From mboxrd@z Thu Jan 1 00:00:00 1970 From: Akhil P Oommen Subject: Re: [PATCH v2] dma-buf/fence: Take refcount on the module that owns the fence Date: Mon, 25 Jun 2018 21:21:15 +0530 Message-ID: <82f8e976-2a5a-56df-28bb-c75314824bf6@codeaurora.org> References: <1529660407-6266-1-git-send-email-akhilpo@codeaurora.org> <1529661856.7034.404.camel@padovan.org> <152966212844.11773.6596589902326100250@mail.alporthouse.com> <20180625075040.GK2958@phenom.ffwll.local> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1132225236==" Return-path: In-Reply-To: <20180625075040.GK2958@phenom.ffwll.local> Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Chris Wilson , Gustavo Padovan , sumit.semwal@linaro.org, jcrouse@codeaurora.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org, smasetty@codeaurora.org List-Id: linux-arm-msm@vger.kernel.org This is a multi-part message in MIME format. --===============1132225236== Content-Type: multipart/alternative; boundary="------------04A91601A0E4543FDC740CE3" Content-Language: en-US This is a multi-part message in MIME format. --------------04A91601A0E4543FDC740CE3 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit On 6/25/2018 1:20 PM, Daniel Vetter wrote: > On Fri, Jun 22, 2018 at 11:08:48AM +0100, Chris Wilson wrote: >> Quoting Gustavo Padovan (2018-06-22 11:04:16) >>> Hi Akhil, >>> >>> On Fri, 2018-06-22 at 15:10 +0530, Akhil P Oommen wrote: >>>> Each fence object holds function pointers of the module that >>>> initialized >>>> it. Allowing the module to unload before this fence's release is >>>> catastrophic. So, keep a refcount on the module until the fence is >>>> released. >>>> >>>> Signed-off-by: Akhil P Oommen >>>> --- >>>> Changes in v2: >>>> - added description for the new function parameter. >>>> >>>> drivers/dma-buf/dma-fence.c | 16 +++++++++++++--- >>>> include/linux/dma-fence.h | 10 ++++++++-- >>>> 2 files changed, 21 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma- >>>> fence.c >>>> index 4edb9fd..2aaa44e 100644 >>>> --- a/drivers/dma-buf/dma-fence.c >>>> +++ b/drivers/dma-buf/dma-fence.c >>>> @@ -18,6 +18,7 @@ >>>> * more details. >>>> */ >>>> >>>> +#include >>>> #include >>>> #include >>>> #include >>>> @@ -168,6 +169,7 @@ void dma_fence_release(struct kref *kref) >>>> { >>>> struct dma_fence *fence = >>>> container_of(kref, struct dma_fence, refcount); >>>> + struct module *module = fence->owner; >>>> >>>> trace_dma_fence_destroy(fence); >>>> >>>> @@ -178,6 +180,8 @@ void dma_fence_release(struct kref *kref) >>>> fence->ops->release(fence); >>>> else >>>> dma_fence_free(fence); >>>> + >>>> + module_put(module); >>>> } >>>> EXPORT_SYMBOL(dma_fence_release); >>>> >>>> @@ -541,6 +545,7 @@ struct default_wait_cb { >>>> >>>> /** >>>> * dma_fence_init - Initialize a custom fence. >>>> + * @module: [in] the module that calls this API >>>> * @fence: [in] the fence to initialize >>>> * @ops: [in] the dma_fence_ops for operations on this >>>> fence >>>> * @lock: [in] the irqsafe spinlock to use for locking >>>> this fence >>>> @@ -556,8 +561,9 @@ struct default_wait_cb { >>>> * to check which fence is later by simply using dma_fence_later. >>>> */ >>>> void >>>> -dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops >>>> *ops, >>>> - spinlock_t *lock, u64 context, unsigned seqno) >>>> +_dma_fence_init(struct module *module, struct dma_fence *fence, >>>> + const struct dma_fence_ops *ops, spinlock_t *lock, >>>> + u64 context, unsigned seqno) >>>> { >>>> BUG_ON(!lock); >>>> BUG_ON(!ops || !ops->wait || !ops->enable_signaling || >>>> @@ -571,7 +577,11 @@ struct default_wait_cb { >>>> fence->seqno = seqno; >>>> fence->flags = 0UL; >>>> fence->error = 0; >>>> + fence->owner = module; >>>> + >>>> + if (!try_module_get(module)) >>>> + fence->owner = NULL; >>>> >>>> trace_dma_fence_init(fence); >>>> } >>>> -EXPORT_SYMBOL(dma_fence_init); >>>> +EXPORT_SYMBOL(_dma_fence_init); >>> Do we still need to export the symbol, it won't be called from outside >>> anymore? Other than that looks good to me: >> There's a big drawback in that a module reference is often insufficient, >> and that a reference on the driver (or whatever is required for the >> lifetime of the fence) will already hold the module reference. >> >> Considering that we want a few 100k fences in flight per second, is >> there no other way to only export a fence with a module reference? > We'd need to make the timeline a full-blown object (Maarten owes me one > for that design screw-up), and then we could stuff all these things in > there. > > And I think that's the right fix, since try_module_get for every > dma_fence_init just ain't cool really :-) > -Daniel Thanks for the feedback, Daniel. I see your point, but I am not sure how much impact an extra refcounting would create considering the whole effort of setting up a new fence. Also, this refcounting is not required for built-in modules. As of now, unloading a kernel module that uses fence_init() is an easy way to bring down the system. This patch simply fixes that. What you have suggested sounds like a non-trivial effort which someone who is more familiar with this code base can do a better job than me. Perhaps we can take this patch now to fix the issue at hand and later somebody else can share a more optimal solution. :) @Gustavo & @Sumit, I would like the maintainers to take a decision here. -Akhil. --------------04A91601A0E4543FDC740CE3 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 7bit



On 6/25/2018 1:20 PM, Daniel Vetter wrote:
On Fri, Jun 22, 2018 at 11:08:48AM +0100, Chris Wilson wrote:
Quoting Gustavo Padovan (2018-06-22 11:04:16)
Hi Akhil,

On Fri, 2018-06-22 at 15:10 +0530, Akhil P Oommen wrote:
Each fence object holds function pointers of the module that
initialized
it. Allowing the module to unload before this fence's release is
catastrophic. So, keep a refcount on the module until the fence is
released.

Signed-off-by: Akhil P Oommen <akhilpo@codeaurora.org>
---
Changes in v2:
- added description for the new function parameter.

 drivers/dma-buf/dma-fence.c | 16 +++++++++++++---
 include/linux/dma-fence.h   | 10 ++++++++--
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-
fence.c
index 4edb9fd..2aaa44e 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -18,6 +18,7 @@
  * more details.
  */
 
+#include <linux/module.h>
 #include <linux/slab.h>
 #include <linux/export.h>
 #include <linux/atomic.h>
@@ -168,6 +169,7 @@ void dma_fence_release(struct kref *kref)
 {
      struct dma_fence *fence =
              container_of(kref, struct dma_fence, refcount);
+     struct module *module = fence->owner;
 
      trace_dma_fence_destroy(fence);
 
@@ -178,6 +180,8 @@ void dma_fence_release(struct kref *kref)
              fence->ops->release(fence);
      else
              dma_fence_free(fence);
+
+     module_put(module);
 }
 EXPORT_SYMBOL(dma_fence_release);
 
@@ -541,6 +545,7 @@ struct default_wait_cb {
 
 /**
  * dma_fence_init - Initialize a custom fence.
+ * @module:  [in]    the module that calls this API
  * @fence:   [in]    the fence to initialize
  * @ops:     [in]    the dma_fence_ops for operations on this
fence
  * @lock:    [in]    the irqsafe spinlock to use for locking
this fence
@@ -556,8 +561,9 @@ struct default_wait_cb {
  * to check which fence is later by simply using dma_fence_later.
  */
 void
-dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops
*ops,
-            spinlock_t *lock, u64 context, unsigned seqno)
+_dma_fence_init(struct module *module, struct dma_fence *fence,
+             const struct dma_fence_ops *ops, spinlock_t *lock,
+             u64 context, unsigned seqno)
 {
      BUG_ON(!lock);
      BUG_ON(!ops || !ops->wait || !ops->enable_signaling ||
@@ -571,7 +577,11 @@ struct default_wait_cb {
      fence->seqno = seqno;
      fence->flags = 0UL;
      fence->error = 0;
+     fence->owner = module;
+
+     if (!try_module_get(module))
+             fence->owner = NULL;
 
      trace_dma_fence_init(fence);
 }
-EXPORT_SYMBOL(dma_fence_init);
+EXPORT_SYMBOL(_dma_fence_init);
Do we still need to export the symbol, it won't be called from outside
anymore? Other than that looks good to me:
There's a big drawback in that a module reference is often insufficient,
and that a reference on the driver (or whatever is required for the
lifetime of the fence) will already hold the module reference.

Considering that we want a few 100k fences in flight per second, is
there no other way to only export a fence with a module reference?
We'd need to make the timeline a full-blown object (Maarten owes me one
for that design screw-up), and then we could stuff all these things in
there.

And I think that's the right fix, since try_module_get for every
dma_fence_init just ain't cool really :-)
-Daniel
Thanks for the feedback, Daniel.
I see your point, but I am not sure how much impact an extra refcounting would create considering the whole effort of setting up a new fence. Also, this refcounting is not required for built-in modules.

As of now, unloading a kernel module that uses fence_init() is an easy way to bring down the system. This patch simply fixes that. What you have suggested sounds like a non-trivial effort which someone who is more
familiar with this code base can do a better job than me. Perhaps we can take this patch now to fix the issue at hand and later somebody else can share a more optimal solution. :)

@Gustavo & @Sumit, I would like the maintainers to take a decision here.

-Akhil.
--------------04A91601A0E4543FDC740CE3-- --===============1132225236== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1132225236==--