All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Poirier <mathieu.poirier@linaro.org>
To: Shengjiu Wang <shengjiu.wang@gmail.com>
Cc: Shengjiu Wang <shengjiu.wang@nxp.com>,
	Bjorn Andersson <andersson@kernel.org>,
	Arnaud POULIQUEN <arnaud.pouliquen@foss.st.com>,
	"open list:REMOTE PROCESSOR (REMOTEPROC) SUBSYSTEM" 
	<linux-remoteproc@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] remoteproc: imx_dsp_rproc: Add mutex protection for workqueue
Date: Thu, 29 Sep 2022 11:13:57 -0600	[thread overview]
Message-ID: <20220929171357.GA3107608@p14s> (raw)
In-Reply-To: <CAA+D8ANKwsF=4R3cR0ZZy67CqwDZ1m7M8y57PmMUqkM4tPFZkw@mail.gmail.com>

On Thu, Sep 29, 2022 at 10:03:21AM +0800, Shengjiu Wang wrote:
> On Thu, Sep 29, 2022 at 1:20 AM Mathieu Poirier <mathieu.poirier@linaro.org>
> wrote:
> 
> > On Mon, Sep 26, 2022 at 07:48:13PM +0800, Shengjiu Wang wrote:
> > > The workqueue may execute late even after remoteproc is stopped or
> > > stopping, some resources (rpmsg device and endpoint) have been
> > > released in rproc_stop_subdevices(), then rproc_vq_interrupt()
> > > access these resources will cause kennel dump.
> > >
> > > Call trace:
> > >  virtqueue_add_split+0x1ac/0x560
> > >  virtqueue_add_inbuf+0x4c/0x60
> > >  rpmsg_recv_done+0x15c/0x294
> > >  vring_interrupt+0x6c/0xa4
> > >  rproc_vq_interrupt+0x30/0x50
> > >  imx_dsp_rproc_vq_work+0x24/0x40 [imx_dsp_rproc]
> > >  process_one_work+0x1d0/0x354
> > >  worker_thread+0x13c/0x470
> > >  kthread+0x154/0x160
> > >  ret_from_fork+0x10/0x20
> > >
> > > Add mutex protection in imx_dsp_rproc_vq_work(), if the state is
> > > not running, then just skip calling rproc_vq_interrupt().
> > >
> > > Also the flush workqueue operation can't be added in rproc stop
> > > for same reason.
> > >
> > > Fixes: ec0e5549f358 ("remoteproc: imx_dsp_rproc: Add remoteproc driver
> > for DSP on i.MX")
> > > Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
> > > ---
> > >  drivers/remoteproc/imx_dsp_rproc.c | 12 +++++++++---
> > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/remoteproc/imx_dsp_rproc.c
> > b/drivers/remoteproc/imx_dsp_rproc.c
> > > index 899aa8dd12f0..95da1cbefacf 100644
> > > --- a/drivers/remoteproc/imx_dsp_rproc.c
> > > +++ b/drivers/remoteproc/imx_dsp_rproc.c
> > > @@ -347,9 +347,6 @@ static int imx_dsp_rproc_stop(struct rproc *rproc)
> > >       struct device *dev = rproc->dev.parent;
> > >       int ret = 0;
> > >
> > > -     /* Make sure work is finished */
> > > -     flush_work(&priv->rproc_work);
> > > -
> >
> > The kernel documentation for this function [1] indicate that once it
> > returns
> > there will no more jobs to process in that queue, _unless_ another job has
> > been
> > queued _after_ the flush has started.  What I suspect is happening here is
> > that
> > a new job is queued between the time flush_work() returns and the remote
> > processor
> > is switched off, something that should not be happening since all the
> > subdevices have been stopped in rproc_stop_subdevices().
> >
> > [1].
> > https://elixir.bootlin.com/linux/v6.0-rc7/source/kernel/workqueue.c#L3092
> 
> 
> The call sequence with echo stop > remoteproc
> 
> rproc_shutdown
> -> rproc_stop
>    ->*rproc_stop_subdevices*
>    ->rproc->ops->stop()
>        ->imx_dsp_rproc_stop
>            ->*flush_work*
>               -> rproc_vq_interrupt

I understand now - thanks for the details.  Please send me another revision with
the above call sequence in the patch changelog.  The one that is currently there
is obscure and doesn't provide a clear picture of what the problem is.

> 
> So the *flush_work* is not safe, because the resource has been released in
> *rproc_stop_subdevices.  *the resource needed by rproc_vq_interrupt
> is not accessible.
> 
> 
> 
> >
> >
> >
> > >       if (rproc->state == RPROC_CRASHED) {
> > >               priv->flags &= ~REMOTE_IS_READY;
> > >               return 0;
> > > @@ -432,9 +429,18 @@ static void imx_dsp_rproc_vq_work(struct
> > work_struct *work)
> > >  {
> > >       struct imx_dsp_rproc *priv = container_of(work, struct
> > imx_dsp_rproc,
> > >                                                 rproc_work);
> > > +     struct rproc *rproc = priv->rproc;
> > > +
> > > +     mutex_lock(&rproc->lock);
> > > +
> > > +     if (rproc->state != RPROC_RUNNING)
> > > +             goto unlock_mutex;
> > >
> > >       rproc_vq_interrupt(priv->rproc, 0);
> > >       rproc_vq_interrupt(priv->rproc, 1);
> >
> > These are not guaranteed to be atomic and sleeping with the mutex held is
> > guaranteed to deadlock the system.
> >
> > spinlock should be a problem with sleep,  but here using the mutex, should
> be ok.
> right?

I was thinking more about this worker thread executing concurrently with
the remoteproc core but we are fine as long as a single mutex is used.

> 
> best regards
> wang shengjiu
> 
> > +
> > > +unlock_mutex:
> > > +     mutex_unlock(&rproc->lock);
> > >  }
> > >
> > >  /**
> > > --
> > > 2.34.1
> > >
> >

      parent reply	other threads:[~2022-09-29 17:14 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-26 11:48 [PATCH] remoteproc: imx_dsp_rproc: Add mutex protection for workqueue Shengjiu Wang
2022-09-28 10:22 ` Peng Fan
2022-09-28 17:20 ` Mathieu Poirier
     [not found]   ` <CAA+D8ANKwsF=4R3cR0ZZy67CqwDZ1m7M8y57PmMUqkM4tPFZkw@mail.gmail.com>
2022-09-29 17:13     ` Mathieu Poirier [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220929171357.GA3107608@p14s \
    --to=mathieu.poirier@linaro.org \
    --cc=andersson@kernel.org \
    --cc=arnaud.pouliquen@foss.st.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-remoteproc@vger.kernel.org \
    --cc=shengjiu.wang@gmail.com \
    --cc=shengjiu.wang@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.