From: David Jander <david@protonic.nl>
To: Casper Andersson <casper.casan@gmail.com>
Cc: Mark Brown <broonie@kernel.org>,
linux-spi@vger.kernel.org, Marc Kleine-Budde <mkl@pengutronix.de>,
Andrew Lunn <andrew@lunn.ch>,
Lars Povlsen <lars.povlsen@microchip.com>,
Steen Hegelund <steen.hegelund@microchip.com>,
Microchip Linux Driver Support <UNGLinuxDriver@microchip.com>
Subject: Re: [PROBLEM] spi driver internal error during boot on sparx5
Date: Mon, 29 Aug 2022 10:56:13 +0200 [thread overview]
Message-ID: <20220829105613.476622d2@erd992> (raw)
In-Reply-To: <20220826094143.iysrl3tsqxmhp4dq@wse-c0155>
Hi Casper,
On Fri, 26 Aug 2022 11:41:43 +0200
Casper Andersson <casper.casan@gmail.com> wrote:
> Hi,
>
> I'm having some issues on my SparX5 switch (PCB135) from Microchip.
> Since this patch series by David Jander the spi driver errors during boot.
> https://lore.kernel.org/all/20220621061234.3626638-1-david@protonic.nl/
>
> ae7d2346dc89 ("spi: Don't use the message queue if possible in spi_sync")
> On this commit it starts failing to mount the partitions during boot.
> This causes the output marked ERROR 1 below.
>
> 69fa95905d40 ("spi: Ensure the io_mutex is held until spi_finalize_current_message()")
> On this commit it no longer boots properly. I am able to enter login
> info, but then I'm unable to do anything. Though, when running latest
> versions of e.g. net and net-next trees it boots sometimes. I have
> observed some different errors which seems to occur seemingly at random,
> show in ERROR 2-4 below. ERROR 2 and 3 seems to be the most common ones.
>
> ERROR 1:
> [ 1.333629] Internal error: Oops: 96000044 [#1] PREEMPT SMP
> [ 1.333636] Modules linked in:
> [ 1.333644] CPU: 0 PID: 292 Comm: spi0 Not tainted 5.19.0-rc1 #18
> [ 1.333653] Hardware name: microchip,sparx5 (DT)
> [ 1.333657] pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 1.333665] pc : ktime_get_real_ts64+0x4c/0x110
> [ 1.333681] lr : spi_transfer_one_message+0x13c/0x690
> [ 1.333693] sp : ffff80000aadbce0
> [ 1.333696] x29: ffff80000aadbce0 x28: 0000000000000000 x27: 00007bff79822ce0
> [ 1.333709] x26: ffff000004d1f400 x25: ffff000004d1f000 x24: ffff800009d03798
> [ 1.333720] x23: ffff8000093f3008 x22: 0000006809d03830 x21: 00000000000001c2
> [ 1.333732] x20: ffff800009a20e80 x19: 0000000000000000 x18: ffffffffffffffff
> [ 1.333743] x17: 0000000000000002 x16: 000001d25a50f10c x15: 00074f2fcdde90a0
> [ 1.333754] x14: ffff800008dfb8a0 x13: 000000000000ac08 x12: 00000000f5257d14
> [ 1.333766] x11: 00000000000002f7 x10: 0000000000000a00 x9 : ffff8000088690bc
> [ 1.333777] x8 : ffff000005242760 x7 : 0000000000000004 x6 : fffffbffeffcbd28
> [ 1.333788] x5 : 0000000000000000 x4 : 0000000009a51c00 x3 : 0000000000000000
> [ 1.333798] x2 : 0000000009a51c00 x1 : 0000000000000000 x0 : 0000000000000001
> [ 1.333810] Call trace:
> [ 1.333812] ktime_get_real_ts64+0x4c/0x110
> [ 1.333821] spi_transfer_one_message+0x13c/0x690
> [ 1.333831] __spi_pump_transfer_message+0x174/0x550
> [ 1.333841] __spi_pump_messages+0xb8/0x330
> [ 1.333850] spi_pump_messages+0x24/0x30
> [ 1.333859] kthread_worker_fn+0xb8/0x290
> [ 1.333870] kthread+0x118/0x120
> [ 1.333879] ret_from_fork+0x10/0x20
> [ 1.333892] Code: 120002b3 370004d5 d50339bf f9403e80 (f90002c0)
> [ 1.333898] ---[ end trace 0000000000000000 ]---
>
> Error 2:
> [ 5.527818] Internal error: Oops: 8600000f [#1] PREEMPT SMP
> [ 5.534020] Modules linked in:
> [ 5.536959] CPU: 0 PID: 292 Comm: spi0 Not tainted 6.0.0-rc1 #7
> [ 5.542627] Hardware name: microchip,sparx5 (DT)
> [ 5.547043] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 5.553704] pc : 0xffff0000042062d8
> [ 5.557046] lr : spi_finalize_current_message+0x1ac/0x2e0
> [ 5.562216] sp : ffff80000aa2bc50
> [ 5.565384] x29: ffff80000aa2bc50 x28: 0000000000000000 x27: 00007bff79858be0
> [ 5.572217] x26: ffff000007258c20 x25: ffff000007258800 x24: ffff80000cb8b0e0
> [ 5.579049] x23: ffff80000cb8b1b0 x22: ffff80000cb8b158 x21: ffff000004d1c800
> [ 5.585881] x20: ffff80000cb8b1b0 x19: ffff80000cb8b1b0 x18: 0000000000000000
> [ 5.592713] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> [ 5.599544] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> [ 5.606375] x11: 00000000000001f5 x10: 0000000000000a10 x9 : ffff80000808e890
> [ 5.613206] x8 : ffff000004966170 x7 : 0000000000000004 x6 : 00000001497a5800
> [ 5.620037] x5 : 00000000410fd030 x4 : 0000000000c0000e x3 : ffff800076773000
> [ 5.626869] x2 : 0000000000000000 x1 : ffff0000042062d8 x0 : ffff0000037addb8
> [ 5.633701] Call trace:
> [ 5.636040] 0xffff0000042062d8
> [ 5.639046] spi_mux_complete_cb+0x48/0x60
> [ 5.642969] spi_finalize_current_message+0x1ac/0x2e0
> [ 5.647803] spi_transfer_one_message+0x298/0x680
> [ 5.652304] __spi_pump_transfer_message+0x188/0x5a0
> [ 5.657055] __spi_pump_messages+0xdc/0x330
> [ 5.661058] spi_pump_messages+0x24/0x30
> [ 5.664812] kthread_worker_fn+0xb8/0x290
> [ 5.668653] kthread+0x118/0x120
> [ 5.671742] ret_from_fork+0x10/0x20
> [ 5.675170] Code: 00000000 00000000 08f4b570 ffff8000 (00000000)
> [ 5.680999] ---[ end trace 0000000000000000 ]---
> [ 5.678207] note: spi0[291] exited with preempt_count 1
>
> ERROR 3:
> [ 4.443467] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000108
> [ 4.443479] Mem abort info:
> [ 4.443481] ESR = 0x0000000086000004
> [ 4.443485] EC = 0x21: IABT (current EL), IL = 32 bits
> [ 4.443490] SET = 0, FnV = 0
> [ 4.443494] EA = 0, S1PTW = 0
> [ 4.443497] FSC = 0x04: level 0 translation fault
> [ 4.443502] user pgtable: 4k pages, 48-bit VAs, pgdp=00000007070f2000
> [ 4.443508] [0000000000000108] pgd=0000000000000000, p4d=0000000000000000
> [ 4.443520] Internal error: Oops: 86000004 [#1] PREEMPT SMP
> [ 4.443527] Modules linked in:
> [ 4.443534] CPU: 0 PID: 292 Comm: spi0 Not tainted 5.19.0-rc1 #25
> [ 4.443542] Hardware name: microchip,sparx5 (DT)
> [ 4.443546] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 4.443555] pc : 0x108
> [ 4.443564] lr : spi_finalize_current_message+0x1ac/0x2e0
> [ 4.443578] sp : ffff80000ab03c50
> [ 4.443581] x29: ffff80000ab03c50 x28: 0000000000000000 x27: 00007bff79822ce0
> [ 4.443594] x26: ffff0000049c3420 x25: ffff0000049c3000 x24: ffff80000b7733b0
> [ 4.443607] x23: ffff80000b773480 x22: ffff80000b773428 x21: ffff0000036ac800
> [ 4.443619] x20: ffff80000b773480 x19: ffff80000b773480 x18: 0000000040bd0097
> [ 4.443632] x17: 0000000000000004 x16: 0000000000000121 x15: 007c2a66ba4f78c6
> [ 4.443644] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002
> [ 4.443655] x11: 000000000000026b x10: 0000000000000a00 x9 : ffff80000808c94c
> [ 4.443667] x8 : ffff00000735c460 x7 : 0000000000000001 x6 : 0000000108d9d000
> [ 4.443679] x5 : 00000000410fd030 x4 : 0000000000c0000e x3 : ffff8000767a9000
> [ 4.443691] x2 : 0000000000000000 x1 : 0000000000000108 x0 : ffff80000b773750
> [ 4.443703] Call trace:
> [ 4.443706] 0x108
> [ 4.443712] spi_mux_complete_cb+0x48/0x60
> [ 4.443720] spi_finalize_current_message+0x1ac/0x2e0
> [ 4.443730] spi_transfer_one_message+0x2b0/0x690
> [ 4.443739] __spi_pump_transfer_message+0x188/0x5a0
> [ 4.443749] __spi_pump_messages+0xdc/0x330
> [ 4.443759] spi_pump_messages+0x24/0x30
> [ 4.443768] kthread_worker_fn+0xb8/0x290
> [ 4.443779] kthread+0x118/0x120
> [ 4.443788] ret_from_fork+0x10/0x20
> [ 4.443802] Code: bad PC value
> [ 4.443807] ---[ end trace 0000000000000000 ]---
>
> Error 4:
> [ 4.012013] Unable to handle kernel execute from non-executable memory at virtual address ffff80000b79b498
> [ 4.012027] Mem abort info:
> [ 4.012029] ESR = 0x000000008600000f
> [ 4.012033] EC = 0x21: IABT (current EL), IL = 32 bits
> [ 4.012038] SET = 0, FnV = 0
> [ 4.012042] EA = 0, S1PTW = 0
> [ 4.012045] FSC = 0x0f: level 3 permission fault
> [ 4.012049] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000007012ec000
> [ 4.012055] [ffff80000b79b498] pgd=100000077ffff003, p4d=100000077ffff003, pud=100000077fffe003, pmd=10000007036c7003, pte=0068000704f29703
> [ 4.012077] Internal error: Oops: 8600000f [#1] PREEMPT SMP
> [ 4.012084] Modules linked in:
> [ 4.012091] CPU: 0 PID: 292 Comm: spi0 Not tainted 5.19.0-rc1 #25
> [ 4.012099] Hardware name: microchip,sparx5 (DT)
> [ 4.012103] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 4.012112] pc : 0xffff80000b79b498
> [ 4.012121] lr : 0xffff80000b79b498
> [ 4.012127] sp : ffff80000ab5bc90
> [ 4.012130] x29: ffff0000042ea800 x28: 0000000000000000 x27: 00007bff79822ce0
> [ 4.012143] x26: ffff0000042eac20 x25: ffff0000042ea800 x24: ffff80000b79b420
> [ 4.012155] x23: ffff80000b79b4f0 x22: ffff80000b79b498 x21: ffff000004c35800
> [ 4.012168] x20: ffff80000b79b4f0 x19: ffff80000b79b4f0 x18: 0000000000000000
> [ 4.012180] x17: 0000000000000001 x16: 0000000000000001 x15: 0117d93fc9cfcb72
> [ 4.012192] x14: 0114def7e22c5168 x13: ffff800008dfb8a0 x12: 00000000fa83b2da
> [ 4.012204] x11: 0000000000000077 x10: 000000000000008a x9 : ffff8000080b2724
> [ 4.012216] x8 : 0000000000000000 x7 : 0000000000000001 x6 : 0000000000000096
> [ 4.012227] x5 : 0000000000000000 x4 : ffff00007fbacd80 x3 : ffff000004dec880
> [ 4.012239] x2 : 0000000000000000 x1 : ffff000004dec880 x0 : 0000000000000000
> [ 4.012251] Call trace:
> [ 4.012254] 0xffff80000b79b498
> [ 4.012265] Code: 04dfc880 ffff0000 0b79b4b0 ffff8000 (08dd5984)
> [ 4.012271] ---[ end trace 0000000000000000 ]---
Thanks for reporting.
Looking at Errors 2 and 3, I suspect there might be a race in the SPI mux
driver. After a quick inspection, I see this:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/spi/spi-mux.c?h=v6.0-rc3#n123
spi_mux_transfer_one_message() returns before the message is transferred (in
spi_async()), which is not expected. AFAIK, an ctlr->transfer_one_message()
implementation should not return until the transfer is completed.
To check if that is causing the problem, could you try the following change:
--- a/drivers/spi/spi-mux.c
+++ b/drivers/spi/spi-mux.c
@@ -120,7 +120,7 @@ static int spi_mux_transfer_one_message(struct spi_controller *ctlr,
m->spi = priv->spi;
/* do the transfer */
- return spi_async(priv->spi, m);
+ return spi_sync(priv->spi, m);
}
static int spi_mux_probe(struct spi_device *spi)
Not sure if this is a correct fix, but I'd like to know if your situation
changes this way, if you could try it.
I don't have access to any hardware with a mux unfortunately, so I can't test
it myself.
Best regards,
--
David Jander
Protonic Holland.
next prev parent reply other threads:[~2022-08-29 8:56 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-26 9:41 [PROBLEM] spi driver internal error during boot on sparx5 Casper Andersson
2022-08-29 8:56 ` David Jander [this message]
2022-08-31 16:07 ` Mark Brown
2022-09-01 6:57 ` Vincent Whitchurch
2022-09-01 11:02 ` David Jander
2022-09-01 11:42 ` Vincent Whitchurch
2022-09-01 12:08 ` David Jander
2022-09-01 11:51 ` Mark Brown
2022-09-01 15:16 ` Casper Andersson
2022-09-02 6:38 ` David Jander
2022-09-01 11:11 ` Mark Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220829105613.476622d2@erd992 \
--to=david@protonic.nl \
--cc=UNGLinuxDriver@microchip.com \
--cc=andrew@lunn.ch \
--cc=broonie@kernel.org \
--cc=casper.casan@gmail.com \
--cc=lars.povlsen@microchip.com \
--cc=linux-spi@vger.kernel.org \
--cc=mkl@pengutronix.de \
--cc=steen.hegelund@microchip.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.