linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Mathieu Poirier <mathieu.poirier@linaro.org>
To: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: bjorn.andersson@linaro.org, matthias.bgg@gmail.com,
	pihsun@chromium.org, linux-remoteproc@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-mediatek@lists.infradead.org, linux-kernel@vger.kernel.org,
	kernel@collabora.com
Subject: Re: [PATCH] rpmsg: mtk_rpmsg: Fix circular locking dependency
Date: Tue, 29 Mar 2022 09:08:46 -0600	[thread overview]
Message-ID: <20220329150846.GA3767550@p14s> (raw)
In-Reply-To: <ea0f7cdd-0fae-fb66-3171-f0870384e6aa@collabora.com>

On Tue, Mar 29, 2022 at 12:55:33PM +0200, AngeloGioacchino Del Regno wrote:
> Il 18/02/22 19:01, Mathieu Poirier ha scritto:
> > On Fri, Feb 18, 2022 at 10:16:51AM +0100, AngeloGioacchino Del Regno wrote:
> > > Il 17/02/22 20:03, Mathieu Poirier ha scritto:
> > > > Hi Angelo,
> > > > 
> > > > On Fri, Jan 14, 2022 at 03:47:37PM +0100, AngeloGioacchino Del Regno wrote:
> > > > > During execution of the worker that's used to register rpmsg devices
> > > > > we are safely locking the channels mutex but, when creating a new
> > > > > endpoint for such devices, we are registering a IPI on the SCP, which
> > > > > then makes the SCP to trigger an interrupt, lock its own mutex and in
> > > > > turn register more subdevices.
> > > > > This creates a circular locking dependency situation, as the mtk_rpmsg
> > > > > channels_lock will then depend on the SCP IPI lock.
> > > > > 
> > > > > [   18.014514]  Possible unsafe locking scenario:
> > > > > [   18.014515]        CPU0                    CPU1
> > > > > [   18.014517]        ----                    ----
> > > > > [   18.045467]   lock(&mtk_subdev->channels_lock);
> > > > > [   18.045474]                                lock(&scp->ipi_desc[i].lock);
> > > > 
> > > > I spent well over an hour tracing through the meanders of the code to end up in
> > > > scp_ipi_register() which, I think, leads to the above.  But from there I don't
> > > > see how an IPI can come in and that tells me my assumption is wrong.
> > > > 
> > > > Can you give more details on the events that lead to the above?  I'm not saying
> > > > there is no problem, I just need to understand it.
> > > > 
> > > 
> > > Hi Mathieu,
> > > 
> > > I understand that following this flow without the assistance of the actual
> > > hardware may be a little confusing, so, no worries.
> > > 
> > > drivers/remoteproc/mtk_scp.c - this driver manages the SCP (obviously, a
> > > remote processor)
> > > drivers/remoteproc/mtk_scp_ipi.c - public functions for kernel SCP IPC
> > > 
> > > Flow:
> > > - MediaTek SCP gets probed
> > > - RPMSG starts, we start probing "something", like google,cros-ec-rpmsg
> > > - mtk_rpmsg: creates endpoint; IPI handler is registered here.
> > > 
> > >           ( more flow )
> > > 
> > > - mtk_rpmsg: mtk_rpmsg_ns_cb() -> mtk_rpmsg_create_device(), channel is
> > >               added to the channels list, worker gets scheduled
> > 
> > To me the above is out of order.  The name space endpoint is registered as part
> > of the remote processor start sequence.  From there an IPI with ns_ipi_id comes in
> > and then cros_ec_rpmsg_probe() is called.  The above seems to imply the
> > opposite.
> > 
> > > 
> > > 
> > > Now for the part that produces the real issue:
> > > 
> > > label_a:
> > > 
> > > *** RPMSG MUTEX LOCK ***
> > 
> > By this I take you mean the subdev->channels_lock mutex.
> > 
> > > - mtk_rpmsg: ## Go through multiple channels ##, call mtk_rpmsg_register_device()
> > > 
> > > - Registered device tries to communicate through RPMSG
> > > - .send() or .trysend() (depending on the device) is called: send_ipi()
> > >      *** SCP MUTEX LOCK ***
> > 
> > And this one is either scp->send_lock or scp->ipi_desc[i].lock.
> > 
> > >     - mtk_scp_ipi: Data written, ACK? ok -> return 0
> > >      *** SCP MUTEX UNLOCK ***
> > > 
> > > - mtk_scp_ipi: **** INTERRUPT!!! **** New RPMSG NS available? -> create channel
> > >            goto label_a;
> > > 
> > > *** RPMSG MUTEX UNLOCK ***
> > > 
> > > 
> > > Pardon me for keeping some things in this flow implicit, but that was done to
> > > simplify it as much as possible as to try to make you understand the situation.
> > 
> > I certainly appreciate the effort but the above does not provide me with a clear
> > path that causes the lock to happen.  As I said in my last reply I don't doubt
> > there is a lock contention but the provided information doesn't allow to
> > understand how it happens.
> > 
> > All I am looking for is one scenario with all mutexes and functions calls
> > involved.
> > 
> 
> Hello Mathieu,
> I'm sorry for leaving this unresolved for a long time, had to work on other
> things in the meanwhile.
>

I also had to move on and as such details related to this patch have mostly
vanished from my memory.

> I'm not sure what you need, can you please help me to give you the
> issue background that you require?
> 
> In the meanwhile, here's full debugging info coming from the kmsg:
> 
> https://paste.debian.net/1235967/

I will review this pastebin along with the explanation you had provided earlier
in this thread - maybe I'll get it this time... But that will take some time as
I have 5 other patchsets to review before looking at this again.

> 
> Thanks,
> Angelo
> 
> > Thanks,
> > Mathieu
> > 
> > > 
> > > Cheers,
> > > Angelo
> > > 
> > > > Thanks,
> > > > Mathieu
> > > > 
> > > > > [   18.228399]                                lock(&mtk_subdev->channels_lock);
> > > > > [   18.228405]   lock(&scp->ipi_desc[i].lock);
> > > > > [   18.264405]
> > > > > 
> > > > > To solve this, simply unlock the channels_lock mutex before calling
> > > > > mtk_rpmsg_register_device() and relock it right after, as safety is
> > > > > still ensured by the locking mechanism that happens right after
> > > > > through SCP.
> > > > > Notably, mtk_rpmsg_register_device() does not even require locking.
> > > > > 
> > > > > Fixes: 7017996951fd ("rpmsg: add rpmsg support for mt8183 SCP.")
> > > > > Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
> > > > > ---
> > > > >    drivers/rpmsg/mtk_rpmsg.c | 2 ++
> > > > >    1 file changed, 2 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/rpmsg/mtk_rpmsg.c b/drivers/rpmsg/mtk_rpmsg.c
> > > > > index 5b4404b8be4c..d1213c33da20 100644
> > > > > --- a/drivers/rpmsg/mtk_rpmsg.c
> > > > > +++ b/drivers/rpmsg/mtk_rpmsg.c
> > > > > @@ -234,7 +234,9 @@ static void mtk_register_device_work_function(struct work_struct *register_work)
> > > > >    		if (info->registered)
> > > > >    			continue;
> > > > > +		mutex_unlock(&subdev->channels_lock);
> > > > >    		ret = mtk_rpmsg_register_device(subdev, &info->info);
> > > > > +		mutex_lock(&subdev->channels_lock);
> > > > >    		if (ret) {
> > > > >    			dev_err(&pdev->dev, "Can't create rpmsg_device\n");
> > > > >    			continue;
> > > > > -- 
> > > > > 2.33.1
> > > > > 
> > > 
> > > 
> > > 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-03-29 15:11 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-14 14:47 [PATCH] rpmsg: mtk_rpmsg: Fix circular locking dependency AngeloGioacchino Del Regno
2022-02-16 16:06 ` AngeloGioacchino Del Regno
2022-02-17 16:08   ` Mathieu Poirier
2022-02-17 19:03 ` Mathieu Poirier
2022-02-18  9:16   ` AngeloGioacchino Del Regno
2022-02-18 18:01     ` Mathieu Poirier
2022-03-29 10:55       ` AngeloGioacchino Del Regno
2022-03-29 15:08         ` Mathieu Poirier [this message]
2022-04-28 17:31 ` Mathieu Poirier
2022-04-29 13:37   ` AngeloGioacchino Del Regno

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220329150846.GA3767550@p14s \
    --to=mathieu.poirier@linaro.org \
    --cc=angelogioacchino.delregno@collabora.com \
    --cc=bjorn.andersson@linaro.org \
    --cc=kernel@collabora.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linux-remoteproc@vger.kernel.org \
    --cc=matthias.bgg@gmail.com \
    --cc=pihsun@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).