linux-spi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Miquel Raynal <miquel.raynal@bootlin.com>
To: Ronald Wahl <ronald.wahl@raritan.com>
Cc: Mark Brown <broonie@kernel.org>,
	linux-spi@vger.kernel.org,
	Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
	Ryan Wanner <ryan.wanner@microchip.com>,
	stable@vger.kernel.org,
	Richard Weinberger <richard.weinberger@gmail.com>
Subject: Re: [PATCH 1/2] spi: atmel: Do not cancel a transfer upon any signal
Date: Wed, 29 Nov 2023 09:49:32 +0100	[thread overview]
Message-ID: <20231129094932.2639ca49@xps-13> (raw)
In-Reply-To: <0ce4c673-5c0b-4181-9d8b-53bcb0521f3e@raritan.com>

Hi Ronald,

+ Richard, my dear jffs2 expert ;)

ronald.wahl@raritan.com wrote on Mon, 27 Nov 2023 18:54:40 +0100:

> On 27.11.23 16:10, Ronald Wahl wrote:
> > On 27.11.23 10:58, Miquel Raynal wrote:  
> >> The intended move from wait_for_completion_*() to
> >> wait_for_completion_interruptible_*() was to allow (very) long spi memor  
> y
> >> transfers to be stopped upon user request instead of freezing the
> >> machine forever as the timeout value could now be significantly bigger.
> >>
> >> However, depending on the user logic, applications can receive many
> >> signals for their own "internal" purpose and have nothing to do with the
> >> requested kernel operations, hence interrupting spi transfers upon any
> >> signal is probably not a wise choice. Instead, let's switch to
> >> wait_for_completion_killable_*() to only catch the "important"
> >> signals. This was likely the intended behavior anyway.  
> >
> > Actually this seems to work. But aborting a process that has a SPI
> > transfer running causes ugly messages from kernel. This is somehow
> > unexpected:
> >
> > # dd if=/dev/urandom of=/flashdisk/testfile bs=1024 count=512
> > ^C[  380.726760] spi-nor spi0.0: spi transfer canceled
> > [  380.731688] spi-nor spi0.0: SPI transfer failed: -512
> > [  380.737141] spi_master spi0: failed to transfer one message from queue
> > [  380.746495] spi-nor spi0.0: spi transfer canceled
> > [  380.751549] spi-nor spi0.0: SPI transfer failed: -512
> > [  380.756844] spi_master spi0: failed to transfer one message from queue
> >
> > JFFS2 also logs an informational message which is less visible but also
> > may rise eyebrows:
> > [  380.743904] jffs2: Write of 4164 bytes at 0x0016a47c failed. retu  
> rned
> > -512, retlen 68
> >
> > Killing a process is something to expect in certain cases and it should
> > not cause such messages which may create some anxiety that something bad
> > had happened. So maybe the "kill" case should be silent (e.g. level
> > "debug")
> > but without out hiding real errors. But even when hiding the message in t  
> he
> > SPI framework it may cause additional messages in upper layers like JFFS2  
> .
> > I'm not sure whether all of this is a good idea. This is something others
> > have to decide.  
> 
> ... and now I just got a crash when unmounting and remounting jffs2:
> 
> unmount:
> [ 8245.821105] spi-nor spi0.0: spi transfer canceled
> [ 8245.826288] spi-nor spi0.0: SPI transfer failed: -512
> [ 8245.831508] spi_master spi0: failed to transfer one message from queue
> [ 8245.838484] jffs2: Write of 1092 bytes at 0x00181458 failed. returned -5
> 12, retlen 68
> [ 8245.839786] spi-nor spi0.0: spi transfer canceled
> [ 8245.844759] spi-nor spi0.0: SPI transfer failed: -512
> [ 8245.850145] spi_master spi0: failed to transfer one message from queue
> [ 8245.856909] jffs2: Write of 1092 bytes at 0x0018189c failed. returned -5
> 12, retlen 0
> [ 8245.856942] jffs2: Not marking the space at 0x0018189c as dirty because the flash driver returned retlen zero
> 
> mount:
> [ 8831.213456] jffs2: error: (1142) jffs2_link_node_ref: Adding new ref 28b
> d9da7 at (0x000ad578-0x000ae5bc) not immediately after previous (0x000ad578
> -0x000ad578)
> [ 8831.228212] Internal error: Oops - undefined instruction: 0 [#1] THUMB2
> [ 8831.234996] CPU: 0 PID: 1142 Comm: mount Not tainted 6.6.2-sama5 #1
> [ 8831.241587] Hardware name: Atmel SAMA5
> [ 8831.245478] PC is at jffs2_link_node_ref+0xe/0xe2
> [ 8831.250360] LR is at jffs2_link_node_ref+0xb9/0xe2
> [ 8831.255473] pc : [<c020969c>]    lr : [<c0209747>]    psr: 00000033
> [ 8831.261893] sp : c974dd78  ip : 00000000  fp : c09c6a5c
> [ 8831.267428] r10: c1572f18  r9 : 0000e002  r8 : c2849964
> [ 8831.272801] r7 : 00001a44  r6 : 000ae5bc  r5 : c14df4a8  r4 : c1620208
> [ 8831.279647] r3 : 00000001  r2 : 40000000  r1 : c090ce3c  r0 : 00000093
> [ 8831.286340] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb  Segme
> nt none
> [ 8831.293974] Control: 50c53c7d  Table: 21470059  DAC: 00000051
> [ 8831.300025] Register r0 information: non-paged memory
> [ 8831.305239] Register r1 information: non-slab/vmalloc memory
> [ 8831.311220] Register r2 information: non-paged memory
> [ 8831.316427] Register r3 information: non-paged memory
> [ 8831.321630] Register r4 information: slab kmalloc-2k start c1620000 poin
> ter offset 520 size 2048
> [ 8831.330796] Register r5 information: slab jffs2_refblock start c14df3e0 pointer offset 200 size 248
> [ 8831.340219] Register r6 information: non-paged memory
> [ 8831.345422] Register r7 information: non-paged memory
> [ 8831.350784] Register r8 information: slab kmalloc-4k start c2849000 poin
> ter offset 2404 size 4096
> [ 8831.360032] Register r9 information: non-paged memory
> [ 8831.365243] Register r10 information: slab pde_opener start c1572f18 poi
> nter offset 0 size 24
> [ 8831.374137] Register r11 information: non-slab/vmalloc memory
> [ 8831.380041] Register r12 information: NULL pointer
> [ 8831.385140] Process mount (pid: 1142, stack limit = 0xb9d2bb39)
> [ 8831.391215] Stack: (0xc974dd78 to 0xc974e000)
> [ 8831.395884] dd60:                                                       000ad578 000ae5bc
> [ 8831.404242] dd80: 000ad578 000ad578 c2849dec c2805e00 c1620208 c2849dec 00000000 c021130d
> [ 8831.412762] dda0: c1572d50 00000694 0000069c 00000036 00010000 c2937755 00000000 20061985
> [ 8831.421280] ddc0: 0000069c 00000000 c2849964 c2805e00 c1620208 c2849964 0000069c 00000008
> [ 8831.429799] dde0: 00000000 c2849ff8 c2849000 c020c40b c09c6a5c c974de34 c974de30 00000000
> [ 8831.438154] de00: 00000004 0000069c 00001000 c2937bc0 00000208 c1620000 00000000 000a0000
> [ 8831.446665] de20: 00000000 00000000 0000000a 00070000 c2849000 00000000 00001000 00000000
> [ 8831.455184] de40: 00000000 00000000 00010000 c2805e00 00000000 00000000 00200000 c2937e00
> [ 8831.463704] de60: 00000000 00008000 c0975a98 c020e02d c1001300 00000200 c020f3eb 00000dc0
> [ 8831.472062] de80: c2937e00 00000000 c020f3eb c2805e00 c156fc80 c29b9400 00200000 c2937e00
> [ 8831.480584] dea0: 00000000 c020f42d c29b9400 c156fc80 c09c6a64 c29b9400 c1298c00 c156fc80
> [ 8831.489100] dec0: c020f5c9 c02a853d 0000003a 00000000 c156fc80 c020f5c9 c974df58 c02a85ff
> [ 8831.497458] dee0: c156fc80 c020f5c9 00000000 c020f5bf c156fc80 c156fc80 00000020 c016fdc3
> [ 8831.505976] df00: c156fc80 00000000 c1374780 c0115c75 00000000 00000000 c156fc80 c0182209
> [ 8831.514497] df20: c2937e00 c974df58 c2937f80 00000000 c2937e00 c2937f80 00008000 00000000
> [ 8831.523015] df40: 0006d71a 00008000 00068991 c0182307 00000000 c015048b c1360d90 c18b3a18
> [ 8831.531370] df60: 00001000 c2937f80 c2937e00 00000000 00000015 c0182539 00000000 0006d71a
> [ 8831.539881] df80: 0006d725 00000000 0006d70c 0006d71a 00000015 c010023c c1374780 00000015
> [ 8831.548399] dfa0: 00068991 c0100041 00000000 0006d70c 0006d70c 0006d71a 0006d725 00008000
> [ 8831.556908] dfc0: 00000000 0006d70c 0006d71a 00000015 00008000 00000000 0006d938 00068991
> [ 8831.565263] dfe0: b6ea606c beb82b9c 00039818 b6ea607c a0000010 0006d70c 00000000 00000000
> [ 8831.573779]  jffs2_link_node_ref from jffs2_sum_scan_sumnode+0x1a5/0x320
> [ 8831.580860]  jffs2_sum_scan_sumnode from jffs2_scan_medium+0x2d7/0xab0
> [ 8831.587595]  jffs2_scan_medium from jffs2_do_mount_fs+0xeb/0x356
> [ 8831.593958]  jffs2_do_mount_fs from jffs2_do_fill_super+0xf7/0x182
> [ 8831.600332]  jffs2_do_fill_super from mtd_get_sb+0x61/0x98
> [ 8831.606176]  mtd_get_sb from get_tree_mtd+0x4f/0xe8
> [ 8831.611241]  get_tree_mtd from vfs_get_tree+0x13/0x7c
> [ 8831.616648]  vfs_get_tree from path_mount+0x409/0x4d4
> [ 8831.621891]  path_mount from do_mount+0x33/0x40
> [ 8831.626593]  do_mount from sys_mount+0xeb/0xfe
> [ 8831.631367]  sys_mount from ret_fast_syscall+0x1/0x5c
> [ 8831.636591] Exception stack(0xc974dfa8 to 0xc974dff0)
> [ 8831.641961] dfa0:                   00000000 0006d70c 0006d70c 0006d71a 0006d725 00008000
> [ 8831.650314] dfc0: 00000000 0006d70c 0006d71a 00000015 00008000 00000000 0006d938 00068991
> [ 8831.658818] dfe0: b6ea606c beb82b9c 00039818 b6ea607c
> [ 8831.664195] Code: 6a63 b085 990a b903 (de02) 6ae5
> [ 8831.669135] ---[ end trace 0000000000000000 ]---
> [ 8831.673902] Kernel panic - not syncing: Fatal exception
> 
> The previous aborted I/Os from a dd may also play a role here but the crash
>  is
> clearly a cause of the interrupted transfers. It's a bit odd that it is an
> undefined instruction but probably it was a stack overflow.
> 
> Looking back in the SPI driver history I see some occasions where interrupt
> ible
> transfers were changed to non-interruptible transfers because filesystems l
> ike
> jffs2 used SIGKILL during unmount or maybe other reasons:
> 
> 775c4c0032c4 "spi: stm32-qspi: remove signal sensitive on completion"
> 26cfc0dbe43a "spi-zynq-qspi: use wait_for_completion_timeout to make zynq_q
> spi_exec_mem_op not interruptible"
> 7f3ac71ac3b0 "spi: davinci: fix spurious i/o error"
> 
> Having them in spi-atmel now seems a bit off and even a source of misbehavi
> our
> and even crashes.

It's not just spi-atmel, any spi-mem controller might be tempted to use
interruptible^Wkillable transfers just because the timeout values can
be really big as the memory sizes increase.

One solution is to change the completion helpers back to something
non-killable/non-interruptible, but the user experience will be
slightly degraded. The other would be to look into jffs2 (if it's the
only filesystem playing with signals during unmount, tbh I don't know).
But maybe this signaling mechanism can't be hacked for compatibility
reasons. Handling this at the spi level would be a mix of layers, I'm
not ready for that.

Richard, Mark, what's your opinion here?

Thanks,
Miquèl

  reply	other threads:[~2023-11-29  8:49 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-27  9:58 [PATCH 1/2] spi: atmel: Do not cancel a transfer upon any signal Miquel Raynal
2023-11-27  9:58 ` [PATCH 2/2] spi: atmel: Drop unused defines Miquel Raynal
2023-11-27 15:10 ` [PATCH 1/2] spi: atmel: Do not cancel a transfer upon any signal Ronald Wahl
2023-11-27 17:54   ` Ronald Wahl
2023-11-29  8:49     ` Miquel Raynal [this message]
2023-11-29 11:05       ` Ronald Wahl
2023-11-30 12:46       ` Richard Weinberger
2023-11-30 18:26         ` Richard Weinberger
2023-11-30 18:36           ` Ronald Wahl
2023-11-30 20:15           ` Miquel Raynal
2023-11-30 20:43             ` Ronald Wahl
2023-11-30 20:58               ` Richard Weinberger
2023-12-01 11:13               ` David Laight
2023-12-01 13:38                 ` Ronald Wahl
2023-12-04 11:54                   ` Ronald Wahl
2023-12-04 12:26                     ` Mark Brown
2023-12-04 12:39                       ` Ronald Wahl
2023-12-04 12:19                   ` David Laight
2023-12-05  7:49               ` Miquel Raynal
2023-11-27 16:48 ` Mark Brown
2023-12-01 14:16 ` Mark Brown
2023-12-04 15:47 ` Mark Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231129094932.2639ca49@xps-13 \
    --to=miquel.raynal@bootlin.com \
    --cc=broonie@kernel.org \
    --cc=linux-spi@vger.kernel.org \
    --cc=richard.weinberger@gmail.com \
    --cc=ronald.wahl@raritan.com \
    --cc=ryan.wanner@microchip.com \
    --cc=stable@vger.kernel.org \
    --cc=thomas.petazzoni@bootlin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).