All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Jander <david@protonic.nl>
To: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Mark Brown <broonie@kernel.org>,
	Casper Andersson <casper.casan@gmail.com>,
	<linux-spi@vger.kernel.org>,
	Marc Kleine-Budde <mkl@pengutronix.de>,
	Andrew Lunn <andrew@lunn.ch>,
	Lars Povlsen <lars.povlsen@microchip.com>,
	Steen Hegelund <steen.hegelund@microchip.com>,
	Microchip Linux Driver Support <UNGLinuxDriver@microchip.com>
Subject: Re: [PROBLEM] spi driver internal error during boot on sparx5
Date: Thu, 1 Sep 2022 13:02:22 +0200	[thread overview]
Message-ID: <20220901130222.587f4932@erd992> (raw)
In-Reply-To: <YxBX4bXG02E4lSUW@axis.com>

On Thu, 1 Sep 2022 08:57:37 +0200
Vincent Whitchurch <vincent.whitchurch@axis.com> wrote:

> On Wed, Aug 31, 2022 at 05:07:42PM +0100, Mark Brown wrote:
> > On Mon, Aug 29, 2022 at 10:56:13AM +0200, David Jander wrote:  
> > > Not sure if this is a correct fix, but I'd like to know if your situation
> > > changes this way, if you could try it.
> > > I don't have access to any hardware with a mux unfortunately, so I can't test
> > > it myself.  
> > 
> > I guess claiming to have a noop mux might work for testing, though I'd
> > be dubious that it was actually doing the mux operations properly?  I  
> 
> I'm seeing these problems with the roadtest test framework which uses
> UML and doesn't need any special hardware.  Roadtest puts its emulated
> devices under a spi-mux with gpio-mockup and there are no tests for the
> actual muxing, but other driver tests break since transfers using
> spi-mux don't work properly.
> 
> I pushed the latest version with SPI support to the tree below.  The
> test_adc084s021 tests a SPI device.  Roadtest is placed inside a linux
> tree, but it doesn't need any patches to the kernel:
> 
>  https://github.com/vwax/linux/commits/roadtest/devel
> 
> You can reproduce the problem with:
> 
>  git remote add vwax https://github.com/vwax/linux.git
>  git fetch vwax
>  git archive vwax/roadtest/devel tools/testing/roadtest | tar xf -
>  make -C tools/testing/roadtest/ -j24 OPTS="-k adc --rt-timeout 10 -v"
> 
> The test passes on v5.19 but on current mainline it hangs and times out.

This is very nice. Thanks.
I followed your instructions, and if I apply the following, all tests are
passed.

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 83da8862b8f2..32c01e684af3 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -1727,8 +1727,7 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
        spin_unlock_irqrestore(&ctlr->queue_lock, flags);
 
        ret = __spi_pump_transfer_message(ctlr, msg, was_busy);
-       if (!ret)
-               kthread_queue_work(ctlr->kworker, &ctlr->pump_messages);
+       kthread_queue_work(ctlr->kworker, &ctlr->pump_messages);
 
        ctlr->cur_msg = NULL;
        ctlr->fallback = false;

The problem is that if __spi_pump_transfer_message() fails, the ctlr->busy
flag is left true, so __spi_async() is not going to queue new work. The busy
transition is handled right above that piece of code, in
__spi_pump_transfer_message(), and the mechanism is to queue more work to do
it. Apparently work was only queued when the transfer was successful, and I am
not sure why it was like that. Queuing work unconditionally solves the issue
and should not be a problem.

Curious thing is, that this change alone is sufficient to make all the
roadtest tests pass. But I still think that does not fix the bug reported by
Casper. For that, Mark's patch is also necessary.

@Casper: can you test Mark's patch with above addition?

> > think we need to either change spi_mux to duplicate the incoming message
> > (that's probably "cleaner") or teach the core that spi-mux exists and
> > should always use the pump/queue.  The below might do the trick but in
> > spite of my suggestion above I've not tested either yet:
> > 
> > diff --git a/drivers/spi/spi-mux.c b/drivers/spi/spi-mux.c
> > index f5d32ec4634e..0709e987bd5a 100644
> > --- a/drivers/spi/spi-mux.c
> > +++ b/drivers/spi/spi-mux.c
> > @@ -161,6 +161,7 @@ static int spi_mux_probe(struct spi_device *spi)
> >  	ctlr->num_chipselect = mux_control_states(priv->mux);
> >  	ctlr->bus_num = -1;
> >  	ctlr->dev.of_node = spi->dev.of_node;
> > +	ctlr->must_async = true;
> >  
> >  	ret = devm_spi_register_controller(&spi->dev, ctlr);
> >  	if (ret)
> > diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
> > index 1cfed874f7ae..88d48a105d3c 100644
> > --- a/drivers/spi/spi.c
> > +++ b/drivers/spi/spi.c
> > @@ -4033,7 +4033,7 @@ static int __spi_sync(struct spi_device *spi, struct spi_message *message)
> >  	 * guard against reentrancy from a different context. The io_mutex
> >  	 * will catch those cases.
> >  	 */
> > -	if (READ_ONCE(ctlr->queue_empty)) {
> > +	if (READ_ONCE(ctlr->queue_empty) && !ctlr->must_async) {
> >  		message->actual_length = 0;
> >  		message->status = -EINPROGRESS;
> >  
> > diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
> > index e6c73d5ff1a8..f089ee1ead58 100644
> > --- a/include/linux/spi/spi.h
> > +++ b/include/linux/spi/spi.h
> > @@ -469,6 +469,7 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch
> >   *	SPI_TRANS_FAIL_NO_START.
> >   * @queue_empty: signal green light for opportunistically skipping the queue
> >   *	for spi_sync transfers.
> > + * @must_async: disable all fast paths in the core
> >   *
> >   * Each SPI controller can communicate with one or more @spi_device
> >   * children.  These make a small bus, sharing MOSI, MISO and SCK signals
> > @@ -690,6 +691,7 @@ struct spi_controller {
> >  
> >  	/* Flag for enabling opportunistic skipping of the queue in spi_sync */
> >  	bool			queue_empty;
> > +	bool			must_async;
> >  };
> >  
> >  static inline void *spi_controller_get_devdata(struct spi_controller *ctlr)
> > 
> > Assuming that works out there'll be an extra test in the fast path but
> > no sync operations, and a performance hit for spi-mux users.  Hopefully
> > that works as well as it did before.  
>[...]

Best regards,

-- 
David Jander
Protonic Holland.

  reply	other threads:[~2022-09-01 11:03 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-26  9:41 [PROBLEM] spi driver internal error during boot on sparx5 Casper Andersson
2022-08-29  8:56 ` David Jander
2022-08-31 16:07   ` Mark Brown
2022-09-01  6:57     ` Vincent Whitchurch
2022-09-01 11:02       ` David Jander [this message]
2022-09-01 11:42         ` Vincent Whitchurch
2022-09-01 12:08           ` David Jander
2022-09-01 11:51         ` Mark Brown
2022-09-01 15:16         ` Casper Andersson
2022-09-02  6:38           ` David Jander
2022-09-01 11:11       ` Mark Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220901130222.587f4932@erd992 \
    --to=david@protonic.nl \
    --cc=UNGLinuxDriver@microchip.com \
    --cc=andrew@lunn.ch \
    --cc=broonie@kernel.org \
    --cc=casper.casan@gmail.com \
    --cc=lars.povlsen@microchip.com \
    --cc=linux-spi@vger.kernel.org \
    --cc=mkl@pengutronix.de \
    --cc=steen.hegelund@microchip.com \
    --cc=vincent.whitchurch@axis.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.