All of lore.kernel.org
 help / color / mirror / Atom feed
From: Miquel Raynal <miquel.raynal@bootlin.com>
To: Liwei Song <liwei.song.lsong@gmail.com>
Cc: RichardWeinberger <richard@nod.at>,
	 VigneshRaghavendra <vigneshr@ti.com>,
	 TudorAmbarus <tudor.ambarus@linaro.org>,
	PratyushYadav <pratyush@kernel.org>,
	 MichaelWalle <mwalle@kernel.org>,
	linux-mtd@lists.infradead.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] mtd: core: add sync between read/write and unbind device
Date: Tue, 29 Apr 2025 09:55:54 +0200	[thread overview]
Message-ID: <87jz73v1th.fsf@bootlin.com> (raw)
In-Reply-To: <20250331161542.3040005-1-liwei.song.lsong@gmail.com> (Liwei Song's message of "Tue, 1 Apr 2025 00:15:20 +0800")

Hello Liwei,

On 01/04/2025 at 00:15:20 +08, Liwei Song <liwei.song.lsong@gmail.com> wrote:

> When unbind mtd device or qspi controller with a high frequency
> reading to /dev/mtd0 device, there will be Calltrace as below:
>
> $ while true; do cat /dev/mtd0 >/dev/null; done &
> $ echo ff8d2000.spi  > /sys/bus/platform/drivers/cadence-qspi/unbind
>
> Internal error: synchronous external abort: 0000000096000210 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 3 UID: 0 PID: 466 Comm: cat Not tainted 6.14.0-rc7-yocto-standard+ #1
> Hardware name: SoCFPGA Stratix 10 SoCDK (DT)
> pc : cqspi_indirect_read_execute.isra.0+0x188/0x330
> lr : cqspi_indirect_read_execute.isra.0+0x21c/0x330
> Call trace:
>  cqspi_indirect_read_execute.isra.0+0x188/0x330 (P)
>  cqspi_exec_mem_op+0x8bc/0xe40
>  spi_mem_exec_op+0x3e0/0x478
>  spi_mem_no_dirmap_read+0xa8/0xc8
>  spi_mem_dirmap_read+0xdc/0x150
>  spi_nor_read_data+0x120/0x198
>  spi_nor_read+0xf0/0x280
>  mtd_read_oob_std+0x80/0x98
>  mtd_read_oob+0x9c/0x168
>  mtd_read+0x6c/0xd8
>  mtdchar_read+0xdc/0x288
>  vfs_read+0xc8/0x2f8
>  ksys_read+0x70/0x110
>  __arm64_sys_read+0x24/0x38
>  invoke_syscall+0x5c/0x130
>  el0_svc_common.constprop.0+0x48/0xf8
>  do_el0_svc+0x28/0x40
>  el0_svc+0x30/0xd0
>  el0t_64_sync_handler+0x144/0x168
>  el0t_64_sync+0x198/0x1a0
> Code: 927e7442 aa1a03e0 8b020342 d503201f (b9400321)
> ---[ end trace 0000000000000000 ]---
>
> Or:
> $ while true; do cat /dev/mtd0 >/dev/null; done &
> $ echo spi0.0 > /sys/class/mtd/mtd0/device/driver/unbind
>
> Unable to handle kernel paging request at virtual address 00000000000012e8
> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 2 UID: 0 PID: 459 Comm: cat Not tainted 6.14.0-rc7-yocto-standard+ #1
> Hardware name: SoCFPGA Stratix 10 SoCDK (DT)
> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : spi_mem_exec_op+0x3e8/0x478
> lr : spi_mem_exec_op+0x3e0/0x478
> Call trace:
>  spi_mem_exec_op+0x3e8/0x478 (P)
>  spi_mem_no_dirmap_read+0xa8/0xc8
>  spi_mem_dirmap_read+0xdc/0x150
>  spi_nor_read_data+0x120/0x198
>  spi_nor_read+0xf0/0x280
>  mtd_read_oob_std+0x80/0x98
>  mtd_read_oob+0x9c/0x168
>  mtd_read+0x6c/0xd8
>  mtdchar_read+0xdc/0x288
>  vfs_read+0xc8/0x2f8
>  ksys_read+0x70/0x110
>  __arm64_sys_read+0x24/0x38
>  invoke_syscall+0x5c/0x130
>  el0_svc_common.constprop.0+0x48/0xf8
>  do_el0_svc+0x28/0x40
>  el0_svc+0x30/0xd0
>  el0t_64_sync_handler+0x144/0x168
>  el0t_64_sync+0x198/0x1a0
> Code: f9400842 d63f0040 2a0003f4 f94002a1 (f9417437)
> ---[ end trace 0000000000000000 ]---
>
> when unbind is running, the memory allocated to qspi controller and
> mtd device is freed during unbinding, but open/close and reading device
> are still running, if the reading process get read lock and start
> excuting, there will be above illegal memory access. This issue also
> can be repruduced on many other platforms like ls1046 and nxpimx8 which
> have qspi flash.
>
> In this patch, register a spi bus notifier which will be called before
> unbind process freeing device memory, add a new member mtd_event_remove
> to block mtd open/read, then waiting for the running task to be finished,
> after that, memory is safe to be free.
>
> Signed-off-by: Liwei Song <liwei.song.lsong@gmail.com>
> ---
>
> Hi Maintainer,
>
> This is an improved patch compared with the original one:
> (https://patchwork.ozlabs.org/project/linux-mtd/patch/20250325133954.3699535-1-liwei.song.lsong@gmail.com/),
> This v2 patch move notifier to spi-nor to avoid crash other types of flash.
> now this patch only aim at fixing nor-flash "bind/unbind while reading" calltrace,
> but for other types of flash like nand also have this issue.

While I agree with the observation and also the conclusion of adding
some kind of notifier, I'd like to understand the rationale behind
choosing to fix only spi-nor in v2? If any spi memory registered in the
mtd subsystem is subject to this failure, we should find a generic
approach (or if it's too difficult, at least have the fix in both
spi nor and spi nand). Looking at your implementation, maybe it could
fit in spi-mem (I'm not sure).

...

> +static int spi_nor_remove_notifier_call(struct notifier_block *nb,
> +					unsigned long event, void
> *data);

I believe spi nor maitainers would prefer to avoid forward declarations.

> +
>  /**
>   * spi_nor_get_cmd_ext() - Get the command opcode extension based on the
>   *			   extension type.
> @@ -1191,6 +1195,9 @@ static int spi_nor_prep(struct spi_nor *nor)
>  	if (nor->controller_ops && nor->controller_ops->prepare)
>  		ret = nor->controller_ops->prepare(nor);
>  
> +	if (nor->mtd.mtd_event_remove)
> +		return -ENODEV;
> +
>  	return ret;
>  }

...

> +static int spi_nor_remove_notifier_call(struct notifier_block *nb,
> +				    unsigned long event, void *data)
> +{
> +	struct device *dev = data;
> +	struct spi_device *spi;
> +	struct spi_mem *mem;
> +	struct spi_nor *nor;
> +
> +	if (!of_match_device(spi_nor_of_table, dev))
> +		return 0;
> +
> +	switch (event) {
> +	case BUS_NOTIFY_DEL_DEVICE:
> +	case BUS_NOTIFY_UNBIND_DRIVER:
> +		spi = to_spi_device(dev);
> +		mem = spi_get_drvdata(spi);
> +		if (!mem)
> +			return NOTIFY_DONE;
> +		nor = spi_mem_get_drvdata(mem);
> +
> +		mutex_lock(&nor->lock);
> +		nor->mtd.mtd_event_remove = true;
> +		mutex_unlock(&nor->lock);
> +		msleep(300);

What is this sleep for?

> +
> +		break;
> +	}
> +
> +	return NOTIFY_DONE;
> +}
> +
>  /*
>   * REVISIT: many of these chips have deep power-down modes, which
>   * should clearly be entered on suspend() to minimize power use.
> diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h
> index 8d10d9d2e830..134bfa6fcf76 100644
> --- a/include/linux/mtd/mtd.h
> +++ b/include/linux/mtd/mtd.h
> @@ -290,6 +290,7 @@ struct mtd_info {
>  	/* Kernel-only stuff starts here. */
>  	const char *name;
>  	int index;
> +	bool mtd_event_remove;

No need to repeat 'mtd' here, you are already in the mtd_info structure,
so mtd->mtd_event_remove would be redundant.

>  	/* OOB layout description */
>  	const struct mtd_ooblayout_ops *ooblayout;

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

WARNING: multiple messages have this Message-ID (diff)
From: Miquel Raynal <miquel.raynal@bootlin.com>
To: Liwei Song <liwei.song.lsong@gmail.com>
Cc: RichardWeinberger <richard@nod.at>,
	 VigneshRaghavendra <vigneshr@ti.com>,
	 TudorAmbarus <tudor.ambarus@linaro.org>,
	PratyushYadav <pratyush@kernel.org>,
	 MichaelWalle <mwalle@kernel.org>,
	linux-mtd@lists.infradead.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] mtd: core: add sync between read/write and unbind device
Date: Tue, 29 Apr 2025 09:55:54 +0200	[thread overview]
Message-ID: <87jz73v1th.fsf@bootlin.com> (raw)
In-Reply-To: <20250331161542.3040005-1-liwei.song.lsong@gmail.com> (Liwei Song's message of "Tue, 1 Apr 2025 00:15:20 +0800")

Hello Liwei,

On 01/04/2025 at 00:15:20 +08, Liwei Song <liwei.song.lsong@gmail.com> wrote:

> When unbind mtd device or qspi controller with a high frequency
> reading to /dev/mtd0 device, there will be Calltrace as below:
>
> $ while true; do cat /dev/mtd0 >/dev/null; done &
> $ echo ff8d2000.spi  > /sys/bus/platform/drivers/cadence-qspi/unbind
>
> Internal error: synchronous external abort: 0000000096000210 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 3 UID: 0 PID: 466 Comm: cat Not tainted 6.14.0-rc7-yocto-standard+ #1
> Hardware name: SoCFPGA Stratix 10 SoCDK (DT)
> pc : cqspi_indirect_read_execute.isra.0+0x188/0x330
> lr : cqspi_indirect_read_execute.isra.0+0x21c/0x330
> Call trace:
>  cqspi_indirect_read_execute.isra.0+0x188/0x330 (P)
>  cqspi_exec_mem_op+0x8bc/0xe40
>  spi_mem_exec_op+0x3e0/0x478
>  spi_mem_no_dirmap_read+0xa8/0xc8
>  spi_mem_dirmap_read+0xdc/0x150
>  spi_nor_read_data+0x120/0x198
>  spi_nor_read+0xf0/0x280
>  mtd_read_oob_std+0x80/0x98
>  mtd_read_oob+0x9c/0x168
>  mtd_read+0x6c/0xd8
>  mtdchar_read+0xdc/0x288
>  vfs_read+0xc8/0x2f8
>  ksys_read+0x70/0x110
>  __arm64_sys_read+0x24/0x38
>  invoke_syscall+0x5c/0x130
>  el0_svc_common.constprop.0+0x48/0xf8
>  do_el0_svc+0x28/0x40
>  el0_svc+0x30/0xd0
>  el0t_64_sync_handler+0x144/0x168
>  el0t_64_sync+0x198/0x1a0
> Code: 927e7442 aa1a03e0 8b020342 d503201f (b9400321)
> ---[ end trace 0000000000000000 ]---
>
> Or:
> $ while true; do cat /dev/mtd0 >/dev/null; done &
> $ echo spi0.0 > /sys/class/mtd/mtd0/device/driver/unbind
>
> Unable to handle kernel paging request at virtual address 00000000000012e8
> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 2 UID: 0 PID: 459 Comm: cat Not tainted 6.14.0-rc7-yocto-standard+ #1
> Hardware name: SoCFPGA Stratix 10 SoCDK (DT)
> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : spi_mem_exec_op+0x3e8/0x478
> lr : spi_mem_exec_op+0x3e0/0x478
> Call trace:
>  spi_mem_exec_op+0x3e8/0x478 (P)
>  spi_mem_no_dirmap_read+0xa8/0xc8
>  spi_mem_dirmap_read+0xdc/0x150
>  spi_nor_read_data+0x120/0x198
>  spi_nor_read+0xf0/0x280
>  mtd_read_oob_std+0x80/0x98
>  mtd_read_oob+0x9c/0x168
>  mtd_read+0x6c/0xd8
>  mtdchar_read+0xdc/0x288
>  vfs_read+0xc8/0x2f8
>  ksys_read+0x70/0x110
>  __arm64_sys_read+0x24/0x38
>  invoke_syscall+0x5c/0x130
>  el0_svc_common.constprop.0+0x48/0xf8
>  do_el0_svc+0x28/0x40
>  el0_svc+0x30/0xd0
>  el0t_64_sync_handler+0x144/0x168
>  el0t_64_sync+0x198/0x1a0
> Code: f9400842 d63f0040 2a0003f4 f94002a1 (f9417437)
> ---[ end trace 0000000000000000 ]---
>
> when unbind is running, the memory allocated to qspi controller and
> mtd device is freed during unbinding, but open/close and reading device
> are still running, if the reading process get read lock and start
> excuting, there will be above illegal memory access. This issue also
> can be repruduced on many other platforms like ls1046 and nxpimx8 which
> have qspi flash.
>
> In this patch, register a spi bus notifier which will be called before
> unbind process freeing device memory, add a new member mtd_event_remove
> to block mtd open/read, then waiting for the running task to be finished,
> after that, memory is safe to be free.
>
> Signed-off-by: Liwei Song <liwei.song.lsong@gmail.com>
> ---
>
> Hi Maintainer,
>
> This is an improved patch compared with the original one:
> (https://patchwork.ozlabs.org/project/linux-mtd/patch/20250325133954.3699535-1-liwei.song.lsong@gmail.com/),
> This v2 patch move notifier to spi-nor to avoid crash other types of flash.
> now this patch only aim at fixing nor-flash "bind/unbind while reading" calltrace,
> but for other types of flash like nand also have this issue.

While I agree with the observation and also the conclusion of adding
some kind of notifier, I'd like to understand the rationale behind
choosing to fix only spi-nor in v2? If any spi memory registered in the
mtd subsystem is subject to this failure, we should find a generic
approach (or if it's too difficult, at least have the fix in both
spi nor and spi nand). Looking at your implementation, maybe it could
fit in spi-mem (I'm not sure).

...

> +static int spi_nor_remove_notifier_call(struct notifier_block *nb,
> +					unsigned long event, void
> *data);

I believe spi nor maitainers would prefer to avoid forward declarations.

> +
>  /**
>   * spi_nor_get_cmd_ext() - Get the command opcode extension based on the
>   *			   extension type.
> @@ -1191,6 +1195,9 @@ static int spi_nor_prep(struct spi_nor *nor)
>  	if (nor->controller_ops && nor->controller_ops->prepare)
>  		ret = nor->controller_ops->prepare(nor);
>  
> +	if (nor->mtd.mtd_event_remove)
> +		return -ENODEV;
> +
>  	return ret;
>  }

...

> +static int spi_nor_remove_notifier_call(struct notifier_block *nb,
> +				    unsigned long event, void *data)
> +{
> +	struct device *dev = data;
> +	struct spi_device *spi;
> +	struct spi_mem *mem;
> +	struct spi_nor *nor;
> +
> +	if (!of_match_device(spi_nor_of_table, dev))
> +		return 0;
> +
> +	switch (event) {
> +	case BUS_NOTIFY_DEL_DEVICE:
> +	case BUS_NOTIFY_UNBIND_DRIVER:
> +		spi = to_spi_device(dev);
> +		mem = spi_get_drvdata(spi);
> +		if (!mem)
> +			return NOTIFY_DONE;
> +		nor = spi_mem_get_drvdata(mem);
> +
> +		mutex_lock(&nor->lock);
> +		nor->mtd.mtd_event_remove = true;
> +		mutex_unlock(&nor->lock);
> +		msleep(300);

What is this sleep for?

> +
> +		break;
> +	}
> +
> +	return NOTIFY_DONE;
> +}
> +
>  /*
>   * REVISIT: many of these chips have deep power-down modes, which
>   * should clearly be entered on suspend() to minimize power use.
> diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h
> index 8d10d9d2e830..134bfa6fcf76 100644
> --- a/include/linux/mtd/mtd.h
> +++ b/include/linux/mtd/mtd.h
> @@ -290,6 +290,7 @@ struct mtd_info {
>  	/* Kernel-only stuff starts here. */
>  	const char *name;
>  	int index;
> +	bool mtd_event_remove;

No need to repeat 'mtd' here, you are already in the mtd_info structure,
so mtd->mtd_event_remove would be redundant.

>  	/* OOB layout description */
>  	const struct mtd_ooblayout_ops *ooblayout;

Thanks,
Miquèl

  parent reply	other threads:[~2025-04-29  7:56 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-25 13:39 [PATCH] mtd: core: add sync between read/write and unbind device Liwei Song
2025-03-25 13:39 ` Liwei Song
2025-03-31 16:15 ` [PATCH v2] " Liwei Song
2025-03-31 16:15   ` Liwei Song
2025-04-17  2:31   ` liwei song
2025-04-17  2:31     ` liwei song
2025-04-29  7:55   ` Miquel Raynal [this message]
2025-04-29  7:55     ` Miquel Raynal
2025-04-30 13:48     ` liwei song
2025-04-30 13:48       ` liwei song
2025-05-05 16:39       ` Pratyush Yadav
2025-05-05 16:39         ` Pratyush Yadav

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87jz73v1th.fsf@bootlin.com \
    --to=miquel.raynal@bootlin.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=liwei.song.lsong@gmail.com \
    --cc=mwalle@kernel.org \
    --cc=pratyush@kernel.org \
    --cc=richard@nod.at \
    --cc=tudor.ambarus@linaro.org \
    --cc=vigneshr@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.