All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Andersson <bjorn.andersson@linaro.org>
To: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Rob Clark <robdclark@gmail.com>, Sean Paul <sean@poorly.run>,
	David Airlie <airlied@linux.ie>, Daniel Vetter <daniel@ffwll.ch>,
	Abhinav Kumar <abhinavk@codeaurora.org>,
	linux-arm-msm@vger.kernel.org, dri-devel@lists.freedesktop.org,
	freedreno@lists.freedesktop.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] drm/msm/dpu: Avoid ABBA deadlock between IRQ modules
Date: Thu, 10 Jun 2021 09:44:24 -0500	[thread overview]
Message-ID: <YMIlSHlRoF4Lhhdb@yoga> (raw)
In-Reply-To: <faf0dbb2-a219-51fe-5cbd-752848a0286f@linaro.org>

On Thu 10 Jun 06:46 CDT 2021, Dmitry Baryshkov wrote:

> On 10/06/2021 02:15, Bjorn Andersson wrote:
> > Handling of the interrupt callback lists is done in dpu_core_irq.c,
> > under the "cb_lock" spinlock. When these operations results in the need
> > for enableing or disabling the IRQ in the hardware the code jumps to
> > dpu_hw_interrupts.c, which protects its operations with "irq_lock"
> > spinlock.
> > 
> > When an interrupt fires, dpu_hw_intr_dispatch_irq() inspects the
> > hardware state while holding the "irq_lock" spinlock and jumps to
> > dpu_core_irq_callback_handler() to invoke the registered handlers, which
> > traverses the callback list under the "cb_lock" spinlock.
> > 
> > As such, in the event that these happens concurrently we'll end up with
> > a deadlock.
> > 
> > Prior to '1c1e7763a6d4 ("drm/msm/dpu: simplify IRQ enabling/disabling")'
> > the enable/disable of the hardware interrupt was done outside the
> > "cb_lock" region, optimitically by using an atomic enable-counter for
> > each interrupt and an warning print if someone changed the list between
> > the atomic_read and the time the operation concluded.
> > 
> > Rather than re-introducing the large array of atomics, serialize the
> > register/unregister operations under a single mutex.
> > 
> > Fixes: 1c1e7763a6d4 ("drm/msm/dpu: simplify IRQ enabling/disabling")
> > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> 
> I have been thinking about this for quite some time. I'm still not confident
> about the proposed scheme.
> 
> What more intrusive, but more obvious change:
>  - Drop dpu_kms->irq_obj.cb_lock alltogether. Use hw_intr's irq_lock instead
> in the register/unregister path and no locks in the callback itself.

I did consider this as well, but while I certainly don't like the
current design I think it would be easy to miss the fact that the two
"different" modules share the lock.

>  - Do not take locks in the dpu_hw_intr_enable_irq/disable_irq (as we are
> already locked outside).
> 

This is correct.

> The core_irq is the only user of the hw_intr framework. In fact I'd like to
> squash them together at some point (if I get some time, I'll send patches
> during this cycle).
> 

As we've seen the two modules are tightly coupled, and it's full of slow
function pointer calls, so I'm definitely in favor of us refactoring
this into a single thing.

Further more, rather than dealing with the callback lists in
dpu_core_irq, I think we should explore just implementing this as
another chained irq handler, and just use the common IRQ code directly -
it has a known API and is known to work.


That said, such redesign will take a little bit of time and we
definitely need to squash the deadlock, as my laptop only survives 10-15
minutes with two displays active.

So if we agree that we want to squash the two modules, then your
suggestion of sharing the lock seems like a reasonable intermediate
workaround. I'll respin and retest my patch accordingly.

Thanks,
Bjorn

> 
> > ---
> >   drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c | 10 +++++++---
> >   drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h      |  2 ++
> >   2 files changed, 9 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c
> > index 4f110c428b60..62bbe35eff7b 100644
> > --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c
> > +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c
> > @@ -82,11 +82,13 @@ int dpu_core_irq_register_callback(struct dpu_kms *dpu_kms, int irq_idx,
> >   	DPU_DEBUG("[%pS] irq_idx=%d\n", __builtin_return_address(0), irq_idx);
> > +	mutex_lock(&dpu_kms->irq_obj.hw_enable_lock);
> >   	spin_lock_irqsave(&dpu_kms->irq_obj.cb_lock, irq_flags);
> >   	trace_dpu_core_irq_register_callback(irq_idx, register_irq_cb);
> >   	list_del_init(&register_irq_cb->list);
> >   	list_add_tail(&register_irq_cb->list,
> >   			&dpu_kms->irq_obj.irq_cb_tbl[irq_idx]);
> > +	spin_unlock_irqrestore(&dpu_kms->irq_obj.cb_lock, irq_flags);
> >   	if (list_is_first(&register_irq_cb->list,
> >   			&dpu_kms->irq_obj.irq_cb_tbl[irq_idx])) {
> >   		int ret = dpu_kms->hw_intr->ops.enable_irq(
> > @@ -96,8 +98,7 @@ int dpu_core_irq_register_callback(struct dpu_kms *dpu_kms, int irq_idx,
> >   			DPU_ERROR("Fail to enable IRQ for irq_idx:%d\n",
> >   					irq_idx);
> >   	}
> > -
> > -	spin_unlock_irqrestore(&dpu_kms->irq_obj.cb_lock, irq_flags);
> > +	mutex_unlock(&dpu_kms->irq_obj.hw_enable_lock);
> >   	return 0;
> >   }
> > @@ -127,9 +128,11 @@ int dpu_core_irq_unregister_callback(struct dpu_kms *dpu_kms, int irq_idx,
> >   	DPU_DEBUG("[%pS] irq_idx=%d\n", __builtin_return_address(0), irq_idx);
> > +	mutex_lock(&dpu_kms->irq_obj.hw_enable_lock);
> >   	spin_lock_irqsave(&dpu_kms->irq_obj.cb_lock, irq_flags);
> >   	trace_dpu_core_irq_unregister_callback(irq_idx, register_irq_cb);
> >   	list_del_init(&register_irq_cb->list);
> > +	spin_unlock_irqrestore(&dpu_kms->irq_obj.cb_lock, irq_flags);
> >   	/* empty callback list but interrupt is still enabled */
> >   	if (list_empty(&dpu_kms->irq_obj.irq_cb_tbl[irq_idx])) {
> >   		int ret = dpu_kms->hw_intr->ops.disable_irq(
> > @@ -140,7 +143,7 @@ int dpu_core_irq_unregister_callback(struct dpu_kms *dpu_kms, int irq_idx,
> >   					irq_idx);
> >   		DPU_DEBUG("irq_idx=%d ret=%d\n", irq_idx, ret);
> >   	}
> > -	spin_unlock_irqrestore(&dpu_kms->irq_obj.cb_lock, irq_flags);
> > +	mutex_unlock(&dpu_kms->irq_obj.hw_enable_lock);
> >   	return 0;
> >   }
> > @@ -207,6 +210,7 @@ void dpu_core_irq_preinstall(struct dpu_kms *dpu_kms)
> >   	dpu_disable_all_irqs(dpu_kms);
> >   	pm_runtime_put_sync(&dpu_kms->pdev->dev);
> > +	mutex_init(&dpu_kms->irq_obj.hw_enable_lock);
> >   	spin_lock_init(&dpu_kms->irq_obj.cb_lock);
> >   	/* Create irq callbacks for all possible irq_idx */
> > diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h
> > index f6840b1af6e4..5a162caea29d 100644
> > --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h
> > +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h
> > @@ -83,6 +83,7 @@ struct dpu_irq_callback {
> >    * @total_irq:    total number of irq_idx obtained from HW interrupts mapping
> >    * @irq_cb_tbl:   array of IRQ callbacks setting
> >    * @cb_lock:      callback lock
> > + * @hw_enable_lock: lock to synchronize callback register and unregister
> >    * @debugfs_file: debugfs file for irq statistics
> >    */
> >   struct dpu_irq {
> > @@ -90,6 +91,7 @@ struct dpu_irq {
> >   	struct list_head *irq_cb_tbl;
> >   	atomic_t *irq_counts;
> >   	spinlock_t cb_lock;
> > +	struct mutex hw_enable_lock;
> >   };
> >   struct dpu_kms {
> > 
> 
> 
> -- 
> With best wishes
> Dmitry

WARNING: multiple messages have this Message-ID (diff)
From: Bjorn Andersson <bjorn.andersson@linaro.org>
To: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: freedreno@lists.freedesktop.org, David Airlie <airlied@linux.ie>,
	linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Abhinav Kumar <abhinavk@codeaurora.org>,
	dri-devel@lists.freedesktop.org, Sean Paul <sean@poorly.run>
Subject: Re: [PATCH] drm/msm/dpu: Avoid ABBA deadlock between IRQ modules
Date: Thu, 10 Jun 2021 09:44:24 -0500	[thread overview]
Message-ID: <YMIlSHlRoF4Lhhdb@yoga> (raw)
In-Reply-To: <faf0dbb2-a219-51fe-5cbd-752848a0286f@linaro.org>

On Thu 10 Jun 06:46 CDT 2021, Dmitry Baryshkov wrote:

> On 10/06/2021 02:15, Bjorn Andersson wrote:
> > Handling of the interrupt callback lists is done in dpu_core_irq.c,
> > under the "cb_lock" spinlock. When these operations results in the need
> > for enableing or disabling the IRQ in the hardware the code jumps to
> > dpu_hw_interrupts.c, which protects its operations with "irq_lock"
> > spinlock.
> > 
> > When an interrupt fires, dpu_hw_intr_dispatch_irq() inspects the
> > hardware state while holding the "irq_lock" spinlock and jumps to
> > dpu_core_irq_callback_handler() to invoke the registered handlers, which
> > traverses the callback list under the "cb_lock" spinlock.
> > 
> > As such, in the event that these happens concurrently we'll end up with
> > a deadlock.
> > 
> > Prior to '1c1e7763a6d4 ("drm/msm/dpu: simplify IRQ enabling/disabling")'
> > the enable/disable of the hardware interrupt was done outside the
> > "cb_lock" region, optimitically by using an atomic enable-counter for
> > each interrupt and an warning print if someone changed the list between
> > the atomic_read and the time the operation concluded.
> > 
> > Rather than re-introducing the large array of atomics, serialize the
> > register/unregister operations under a single mutex.
> > 
> > Fixes: 1c1e7763a6d4 ("drm/msm/dpu: simplify IRQ enabling/disabling")
> > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
> 
> I have been thinking about this for quite some time. I'm still not confident
> about the proposed scheme.
> 
> What more intrusive, but more obvious change:
>  - Drop dpu_kms->irq_obj.cb_lock alltogether. Use hw_intr's irq_lock instead
> in the register/unregister path and no locks in the callback itself.

I did consider this as well, but while I certainly don't like the
current design I think it would be easy to miss the fact that the two
"different" modules share the lock.

>  - Do not take locks in the dpu_hw_intr_enable_irq/disable_irq (as we are
> already locked outside).
> 

This is correct.

> The core_irq is the only user of the hw_intr framework. In fact I'd like to
> squash them together at some point (if I get some time, I'll send patches
> during this cycle).
> 

As we've seen the two modules are tightly coupled, and it's full of slow
function pointer calls, so I'm definitely in favor of us refactoring
this into a single thing.

Further more, rather than dealing with the callback lists in
dpu_core_irq, I think we should explore just implementing this as
another chained irq handler, and just use the common IRQ code directly -
it has a known API and is known to work.


That said, such redesign will take a little bit of time and we
definitely need to squash the deadlock, as my laptop only survives 10-15
minutes with two displays active.

So if we agree that we want to squash the two modules, then your
suggestion of sharing the lock seems like a reasonable intermediate
workaround. I'll respin and retest my patch accordingly.

Thanks,
Bjorn

> 
> > ---
> >   drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c | 10 +++++++---
> >   drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h      |  2 ++
> >   2 files changed, 9 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c
> > index 4f110c428b60..62bbe35eff7b 100644
> > --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c
> > +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_core_irq.c
> > @@ -82,11 +82,13 @@ int dpu_core_irq_register_callback(struct dpu_kms *dpu_kms, int irq_idx,
> >   	DPU_DEBUG("[%pS] irq_idx=%d\n", __builtin_return_address(0), irq_idx);
> > +	mutex_lock(&dpu_kms->irq_obj.hw_enable_lock);
> >   	spin_lock_irqsave(&dpu_kms->irq_obj.cb_lock, irq_flags);
> >   	trace_dpu_core_irq_register_callback(irq_idx, register_irq_cb);
> >   	list_del_init(&register_irq_cb->list);
> >   	list_add_tail(&register_irq_cb->list,
> >   			&dpu_kms->irq_obj.irq_cb_tbl[irq_idx]);
> > +	spin_unlock_irqrestore(&dpu_kms->irq_obj.cb_lock, irq_flags);
> >   	if (list_is_first(&register_irq_cb->list,
> >   			&dpu_kms->irq_obj.irq_cb_tbl[irq_idx])) {
> >   		int ret = dpu_kms->hw_intr->ops.enable_irq(
> > @@ -96,8 +98,7 @@ int dpu_core_irq_register_callback(struct dpu_kms *dpu_kms, int irq_idx,
> >   			DPU_ERROR("Fail to enable IRQ for irq_idx:%d\n",
> >   					irq_idx);
> >   	}
> > -
> > -	spin_unlock_irqrestore(&dpu_kms->irq_obj.cb_lock, irq_flags);
> > +	mutex_unlock(&dpu_kms->irq_obj.hw_enable_lock);
> >   	return 0;
> >   }
> > @@ -127,9 +128,11 @@ int dpu_core_irq_unregister_callback(struct dpu_kms *dpu_kms, int irq_idx,
> >   	DPU_DEBUG("[%pS] irq_idx=%d\n", __builtin_return_address(0), irq_idx);
> > +	mutex_lock(&dpu_kms->irq_obj.hw_enable_lock);
> >   	spin_lock_irqsave(&dpu_kms->irq_obj.cb_lock, irq_flags);
> >   	trace_dpu_core_irq_unregister_callback(irq_idx, register_irq_cb);
> >   	list_del_init(&register_irq_cb->list);
> > +	spin_unlock_irqrestore(&dpu_kms->irq_obj.cb_lock, irq_flags);
> >   	/* empty callback list but interrupt is still enabled */
> >   	if (list_empty(&dpu_kms->irq_obj.irq_cb_tbl[irq_idx])) {
> >   		int ret = dpu_kms->hw_intr->ops.disable_irq(
> > @@ -140,7 +143,7 @@ int dpu_core_irq_unregister_callback(struct dpu_kms *dpu_kms, int irq_idx,
> >   					irq_idx);
> >   		DPU_DEBUG("irq_idx=%d ret=%d\n", irq_idx, ret);
> >   	}
> > -	spin_unlock_irqrestore(&dpu_kms->irq_obj.cb_lock, irq_flags);
> > +	mutex_unlock(&dpu_kms->irq_obj.hw_enable_lock);
> >   	return 0;
> >   }
> > @@ -207,6 +210,7 @@ void dpu_core_irq_preinstall(struct dpu_kms *dpu_kms)
> >   	dpu_disable_all_irqs(dpu_kms);
> >   	pm_runtime_put_sync(&dpu_kms->pdev->dev);
> > +	mutex_init(&dpu_kms->irq_obj.hw_enable_lock);
> >   	spin_lock_init(&dpu_kms->irq_obj.cb_lock);
> >   	/* Create irq callbacks for all possible irq_idx */
> > diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h
> > index f6840b1af6e4..5a162caea29d 100644
> > --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h
> > +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h
> > @@ -83,6 +83,7 @@ struct dpu_irq_callback {
> >    * @total_irq:    total number of irq_idx obtained from HW interrupts mapping
> >    * @irq_cb_tbl:   array of IRQ callbacks setting
> >    * @cb_lock:      callback lock
> > + * @hw_enable_lock: lock to synchronize callback register and unregister
> >    * @debugfs_file: debugfs file for irq statistics
> >    */
> >   struct dpu_irq {
> > @@ -90,6 +91,7 @@ struct dpu_irq {
> >   	struct list_head *irq_cb_tbl;
> >   	atomic_t *irq_counts;
> >   	spinlock_t cb_lock;
> > +	struct mutex hw_enable_lock;
> >   };
> >   struct dpu_kms {
> > 
> 
> 
> -- 
> With best wishes
> Dmitry

  reply	other threads:[~2021-06-10 14:45 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-09 23:15 [PATCH] drm/msm/dpu: Avoid ABBA deadlock between IRQ modules Bjorn Andersson
2021-06-09 23:15 ` Bjorn Andersson
2021-06-10 11:46 ` Dmitry Baryshkov
2021-06-10 11:46   ` Dmitry Baryshkov
2021-06-10 14:44   ` Bjorn Andersson [this message]
2021-06-10 14:44     ` Bjorn Andersson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YMIlSHlRoF4Lhhdb@yoga \
    --to=bjorn.andersson@linaro.org \
    --cc=abhinavk@codeaurora.org \
    --cc=airlied@linux.ie \
    --cc=daniel@ffwll.ch \
    --cc=dmitry.baryshkov@linaro.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=freedreno@lists.freedesktop.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robdclark@gmail.com \
    --cc=sean@poorly.run \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.