public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Théo Lebrun" <theo.lebrun@bootlin.com>
To: "Miquel Raynal" <miquel.raynal@bootlin.com>
Cc: "Mark Brown" <broonie@kernel.org>,
	"Apurva Nandan" <a-nandan@ti.com>, "Dhruva Gole" <d-gole@ti.com>,
	<linux-spi@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	"Gregory CLEMENT" <gregory.clement@bootlin.com>,
	"Vladimir Kondratiev" <vladimir.kondratiev@mobileye.com>,
	"Thomas Petazzoni" <thomas.petazzoni@bootlin.com>,
	"Tawfik Bayouk" <tawfik.bayouk@mobileye.com>
Subject: Re: [PATCH] spi: cadence-qspi: stop calling system-wide PM helpers for runtime PM
Date: Mon, 05 Feb 2024 11:03:44 +0100	[thread overview]
Message-ID: <CYX260CKXOUN.2H1DC1TG1Q6TY@bootlin.com> (raw)
In-Reply-To: <20240205100312.6f0f40db@xps-13>

Hi,

On Mon Feb 5, 2024 at 10:03 AM CET, Miquel Raynal wrote:
> Hello Théo,
>
> theo.lebrun@bootlin.com wrote on Fri, 02 Feb 2024 18:29:40 +0100:
>
> > The ->runtime_suspend() and ->runtime_resume() callbacks are not
> > expected to call spi_controller_suspend() and spi_controller_resume().
> > Remove calls to those in the cadence-qspi driver.
> > 
> > Those helpers have two roles currently:
> >  - They stop/start the queue, including dealing with the kworker.
> >  - They toggle the SPI controller SPI_CONTROLLER_SUSPENDED flag. It
> >    requires acquiring ctlr->bus_lock_mutex.
> > 
> > The cadence-qspi ->exec_op() implementation bumps the usage counter at
> > its start. It might therefore run our ->runtime_resume()
> > implementation. However, ctlr->bus_lock_mutex is acquired by
> > spi_mem_exec_op() while ->exec_op() is being called.
> > 
> > Here is a brief call tree highlighting the issue:
> > 
> > spi_mem_exec_op()
> >         ...
> >         spi_mem_access_start()
> >                 mutex_lock(&ctlr->bus_lock_mutex)
> > 
> >         cqspi_exec_mem_op()
> >                 pm_runtime_resume_and_get()
> >                         cqspi_resume()
> >                                 spi_controller_resume()
> >                                         mutex_lock(&ctlr->bus_lock_mutex)
> >                 ...
> > 
> >         spi_mem_access_end()
> >                 mutex_unlock(&ctlr->bus_lock_mutex)
> >         ...
> > 
> > The fatal conclusion of this is a deadlock: we acquire a lock on each
> > operation but while running the operation, we might want to runtime
> > resume and acquire the same lock.
> > 
> > Anyway, those helpers (spi_controller_{suspend,resume}) are aimed at
> > system-wide suspend and resume and should NOT be called at runtime
> > suspend & resume.
> > 
> > Side note: the previous implementation had a second issue. It acquired a
> > pointer to both `struct cqspi_st` and `struct spi_controller` using
> > dev_get_drvdata(). Neither embed the other. This lead to memory
> > corruption that was being hidden inside the big cqspi->f_pdata array on
> > my setup. It was working until I tried changing the array side to its
> > theorical max of 4, which lead to the discovery of this gnarly bug.
> > 
> > Fixes: 0578a6dbfe75 ("spi: spi-cadence-quadspi: add runtime pm support")
> > Fixes: 2087e85bb66e ("spi: cadence-quadspi: fix suspend-resume implementations")
>
> Your commit log makes total sense but I believe the diff is gonna break
> again the suspend to RAM operation. This is only my understanding
> right after quickly going through the whole story, so maybe I'm
> totally off topic.

The current ->runtime_suspend() implementation would indeed (probably)
work for suspend-to-RAM if it wasn't for the wrong pointers to cqspi
and spi_controller (see side note from commit message).

I've not found a moment where `struct cqspi_st` embed `struct
spi_controller` at its start, so I do not believe this has ever worked.
It might be the result of a mistake while porting a patch from a branch
that included other changes.

> What happened if I understand the two commits blamed above:
>
> - There were PM hooks.
> - Someone turned them into runtime PM hooks (breaking regular
>   suspend/resume).
> - Someone else added the "missing" suspend/resume logic inside the
>   runtime PM hooks to fix suspend and resume.
> - You are removing this logic because it leads to deadlocks.
>
> There was likely a misconception of what is expected in both cases
> (quick and small power savings vs. full power cycle/loosing the whole
> configuration).
>
> I would propose instead to create two distinct set of functions:
> - One for runtime PM
> - One for suspend/resume
> This way the runtime PM no longer deadlocks and people using
> suspend/resume won't get affected? I don't know if your runtime hooks
> *will* always be called during a suspend/resume. I hope so, which would
> make the split quite easy and without any code duplication.

That does indeed sound like the right approach. Runtime hooks can be
called from suspend/resume if needs be. Runtime PM then gets disabled
at the late stage.

I do not believe currently system-wide suspend can be working.
spi_controller_{suspend,resume} are being called with a bogus pointer.
This makes me ask: should the system-wide suspend/resume part be
addressed with this patch or a follow-up? It feels like a separate
concern to me.

The nice thing is that I have easy access to J7200, which uses the same
controller and supports suspend-to-RAM. That should make it a good test
setup.

Thanks,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

  reply	other threads:[~2024-02-05 10:03 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-02 17:29 [PATCH] spi: cadence-qspi: stop calling system-wide PM helpers for runtime PM Théo Lebrun
2024-02-05  9:03 ` Miquel Raynal
2024-02-05 10:03   ` Théo Lebrun [this message]
2024-02-05 10:12     ` Miquel Raynal
2024-02-05 10:38       ` Dhruva Gole

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CYX260CKXOUN.2H1DC1TG1Q6TY@bootlin.com \
    --to=theo.lebrun@bootlin.com \
    --cc=a-nandan@ti.com \
    --cc=broonie@kernel.org \
    --cc=d-gole@ti.com \
    --cc=gregory.clement@bootlin.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-spi@vger.kernel.org \
    --cc=miquel.raynal@bootlin.com \
    --cc=tawfik.bayouk@mobileye.com \
    --cc=thomas.petazzoni@bootlin.com \
    --cc=vladimir.kondratiev@mobileye.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox