All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peng Fan <peng.fan@oss.nxp.com>
To: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Bjorn Andersson <andersson@kernel.org>,
	Arnaud Pouliquen <arnaud.pouliquen@st.com>,
	"open list:REMOTE PROCESSOR (REMOTEPROC) SUBSYSTEM"
	<linux-remoteproc@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>,
	Peng Fan <peng.fan@nxp.com>
Subject: Re: [PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown
Date: Wed, 2 Apr 2025 09:43:55 +0800	[thread overview]
Message-ID: <20250402014355.GA22575@nxa18884-linux> (raw)
In-Reply-To: <Z-wOr3eLaX9myqb4@p14s>

On Tue, Apr 01, 2025 at 10:05:03AM -0600, Mathieu Poirier wrote:
>On Tue, Apr 01, 2025 at 09:41:24AM +0800, Peng Fan wrote:
>> On Mon, Mar 31, 2025 at 09:40:41AM -0600, Mathieu Poirier wrote:
>> >On Sat, Mar 29, 2025 at 08:56:29PM +0800, Peng Fan wrote:
>> >> On Fri, Mar 28, 2025 at 08:14:41AM -0600, Mathieu Poirier wrote:
>> >> >On Fri, Mar 28, 2025 at 12:50:12PM +0800, Peng Fan wrote:
>> >> >> On Thu, Mar 27, 2025 at 11:46:33AM -0600, Mathieu Poirier wrote:
>> >> >> >Hi,
>> >> >> >
>> >> >> >On Wed, Mar 26, 2025 at 10:02:14AM +0800, Peng Fan (OSS) wrote:
>> >> >> >> From: Peng Fan <peng.fan@nxp.com>
>> >> >> >> 
>> >> >> >> There is case as below could trigger kernel dump:
>> >> >> >> Use U-Boot to start remote processor(rproc) with resource table
>> >> >> >> published to a fixed address by rproc. After Kernel boots up,
>> >> >> >> stop the rproc, load a new firmware which doesn't have resource table
>> >> >> >> ,and start rproc.
>> >> >> >>
>> >> >> >
>> >> >> >If a firwmare image doesn't have a resouce table, rproc_elf_load_rsc_table()
>> >> >> >will return an error [1], rproc_fw_boot() will exit prematurely [2] and the
>> >> >> >remote processor won't be started.  What am I missing?
>> >> >> 
>> >> >> STM32 and i.MX use their own parse_fw implementation which allows no resource
>> >> >> table:
>> >> >> https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/stm32_rproc.c#L272
>> >> >> https://elixir.bootlin.com/linux/v6.13.7/source/drivers/remoteproc/imx_rproc.c#L598
>> >> >
>> >> >Ok, that settles rproc_fw_boot() but there is also rproc_find_loaded_rsc_table()
>> >> >that will return NULL if a resource table is not found and preventing the
>> >> >memcpy() in rproc_start() from happening:
>> >> >
>> >> >https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1288
>> >> 
>> >> 
>> >> Sorry, I forgot to mention below code:
>> >> loaded_table is a valid pointer for i.MX, see
>> >> https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/imx_rproc.c#L666,
>> >
>> >(SIGH)
>> >
>> >The changelong for this patch says "... load a new firmware which doesn't have a
>> >resource table..." and now you are telling me that @load_table is valid.  As
>> >such I have to _guess_ that @priv->rsc_table is not null.  So which is it -
>> >valid or not valid?  
>> 
>> As wrote in commit log, bootloader kicks the m7 and m7 publishes a valid
>> resource table to a fixed address.
>> 
>> When linux boots up, first stop m7, then load a new firmware which does not
>> have resource table, then stop m7.
>> 
>> Even the new firmware does not have resource table, the imx_rproc driver
>> still returns a valid resource table address which is got from device tree
>> (rsrc_table) in imx DTS when the driver probe.
>> 
>> @priv->rsc_table is always valid even the firwmare does not have a valid
>
>And that is where the problem is - why can't that situation be fixed instead of
>pushing it to the subsystem core?  Why can't you have code in
>imx_rproc_elf_find_loaded_rsc_table() that checks if there is a valid resource
>table at the address held by @priv->rsc_table and return NULL if there isn't?

It is ok address the issue in imx_rproc.c without touching core code.

priv->rsc_table contains valid resource table which is published when
m7 is kicked by bootloader, and m7 publishes a valid table to
priv->rsc_table.

It still contains valid content when linux stops m7.

To make it invalid when linux starts m7 with a firwmare(the elf image not has
resource table), need to clear the content of priv->rsc_table or
write some magic number when stop the m7 which was started by bootloader.

Then it is possible to check priv->rsc_table in
imx_rproc_elf_find_loaded_rsc_table.

The 2nd approach is to clear rproc->table_sz and rproc->table_ptr in
imx_rproc_parse_fw before rproc_elf_load_rsc_table.


>
>The core is already checking if @loaded_table is valid in rproc_start(), why
>can't that be used instead of adding yet another check?

Ah. I was thinking clear table_sz in rpoc_shutdown is an easy approach and
could benifit others in case other platforms meet similar issue in future.

If you think the current patch is not proper, I could do a v3 with the upper
2nd approach.

>
>> resource table. The TCM area is not writeable by Linux, so the firmware will
>> copy the resource table from TCM to DDR if the firmware has a resource table.
>> 
>> Hope this is clear.
>
>What is clear is that I spend 4 sessions on a 3-line patch, valuable time I
>could have spent reviewing other peoples' patches.

Sorry. Not intend to waste your time.

Thanks,
Peng
>
>> 
>> >
>> >If my assumption above is valid than fix that instead of hacking the remoteproc
>> >core.
>> 
>> I just found V1 was picked up by Bjorn.
>
>I am currently working with Bjorn on that.
>
>> It is not hack, clearing table_sz in core code does not hurt, I think.
>
>It is a hack for as long as you haven't provided a valid explanation for the
>changes you are proposing.  
>
>> 
>> If my assumption is not valid the changelog and your justification for
>> >this patch are wrong.  Either way I have spent way too much time on this patch
>> >already and dropping it.  The same goes for your other patch [1] - resent it
>> >when you will have properly address the work herein.   
>> 
>
>And yet you just sent a V2.
>
>> sure.
>> 
>> Thanks,
>> Peng
>> 
>> >
>> >[1]. [PATCH] remoteproc: imx_rproc: Add mutex protection for workqueue
>> >
>> >> 
>> >> So loaded_table is valid, it is memcpy trigger kernel panic because table_sz is
>> >> not zero while cached_table is NULL.
>> >> 	loaded_table = rproc_find_loaded_rsc_table(rproc, fw);
>> >> 	if (loaded_table) {
>> >> 		memcpy(loaded_table, rproc->cached_table, rproc->table_sz);
>> >> 		rproc->table_ptr = loaded_table;
>> >> 	}
>> >> 
>> >> Thanks,
>> >> Peng
>> >> 
>> >> >
>> >> >> 
>> >> >> Thanks,
>> >> >> Peng
>> >> >> 
>> >> >> >
>> >> >> >[1]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_elf_loader.c#L338
>> >> >> >[2]. https://elixir.bootlin.com/linux/v6.14-rc6/source/drivers/remoteproc/remoteproc_core.c#L1411 
>> >> >> >
>> >> >> >> When starting rproc with a firmware not have resource table,
>> >> >> >> `memcpy(loaded_table, rproc->cached_table, rproc->table_sz)` will
>> >> >> >> trigger dump, because rproc->cache_table is set to NULL during the last
>> >> >> >> stop operation, but rproc->table_sz is still valid.
>> >> >> >> 
>> >> >> >> This issue is found on i.MX8MP and i.MX9.
>> >> >> >> 
>> >> >> >> Dump as below:
>> >> >> >> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
>> >> >> >> Mem abort info:
>> >> >> >>   ESR = 0x0000000096000004
>> >> >> >>   EC = 0x25: DABT (current EL), IL = 32 bits
>> >> >> >>   SET = 0, FnV = 0
>> >> >> >>   EA = 0, S1PTW = 0
>> >> >> >>   FSC = 0x04: level 0 translation fault
>> >> >> >> Data abort info:
>> >> >> >>   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>> >> >> >>   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>> >> >> >>   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>> >> >> >> user pgtable: 4k pages, 48-bit VAs, pgdp=000000010af63000
>> >> >> >> [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
>> >> >> >> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>> >> >> >> Modules linked in:
>> >> >> >> CPU: 2 UID: 0 PID: 1060 Comm: sh Not tainted 6.14.0-rc7-next-20250317-dirty #38
>> >> >> >> Hardware name: NXP i.MX8MPlus EVK board (DT)
>> >> >> >> pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> >> >> >> pc : __pi_memcpy_generic+0x110/0x22c
>> >> >> >> lr : rproc_start+0x88/0x1e0
>> >> >> >> Call trace:
>> >> >> >>  __pi_memcpy_generic+0x110/0x22c (P)
>> >> >> >>  rproc_boot+0x198/0x57c
>> >> >> >>  state_store+0x40/0x104
>> >> >> >>  dev_attr_store+0x18/0x2c
>> >> >> >>  sysfs_kf_write+0x7c/0x94
>> >> >> >>  kernfs_fop_write_iter+0x120/0x1cc
>> >> >> >>  vfs_write+0x240/0x378
>> >> >> >>  ksys_write+0x70/0x108
>> >> >> >>  __arm64_sys_write+0x1c/0x28
>> >> >> >>  invoke_syscall+0x48/0x10c
>> >> >> >>  el0_svc_common.constprop.0+0xc0/0xe0
>> >> >> >>  do_el0_svc+0x1c/0x28
>> >> >> >>  el0_svc+0x30/0xcc
>> >> >> >>  el0t_64_sync_handler+0x10c/0x138
>> >> >> >>  el0t_64_sync+0x198/0x19c
>> >> >> >> 
>> >> >> >> Clear rproc->table_sz to address the issue.
>> >> >> >> 
>> >> >> >> While at here, also clear rproc->table_sz when rproc_fw_boot and
>> >> >> >> rproc_detach.
>> >> >> >> 
>> >> >> >> Fixes: 9dc9507f1880 ("remoteproc: Properly deal with the resource table when detaching")
>> >> >> >> Signed-off-by: Peng Fan <peng.fan@nxp.com>
>> >> >> >> ---
>> >> >> >> 
>> >> >> >> V2:
>> >> >> >>  Clear table_sz when rproc_fw_boot and rproc_detach per Arnaud
>> >> >> >> 
>> >> >> >>  drivers/remoteproc/remoteproc_core.c | 3 +++
>> >> >> >>  1 file changed, 3 insertions(+)
>> >> >> >> 
>> >> >> >> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
>> >> >> >> index c2cf0d277729..1efa53d4e0c3 100644
>> >> >> >> --- a/drivers/remoteproc/remoteproc_core.c
>> >> >> >> +++ b/drivers/remoteproc/remoteproc_core.c
>> >> >> >> @@ -1442,6 +1442,7 @@ static int rproc_fw_boot(struct rproc *rproc, const struct firmware *fw)
>> >> >> >>  	kfree(rproc->cached_table);
>> >> >> >>  	rproc->cached_table = NULL;
>> >> >> >>  	rproc->table_ptr = NULL;
>> >> >> >> +	rproc->table_sz = 0;
>> >> >> >>  unprepare_rproc:
>> >> >> >>  	/* release HW resources if needed */
>> >> >> >>  	rproc_unprepare_device(rproc);
>> >> >> >> @@ -2025,6 +2026,7 @@ int rproc_shutdown(struct rproc *rproc)
>> >> >> >>  	kfree(rproc->cached_table);
>> >> >> >>  	rproc->cached_table = NULL;
>> >> >> >>  	rproc->table_ptr = NULL;
>> >> >> >> +	rproc->table_sz = 0;
>> >> >> >>  out:
>> >> >> >>  	mutex_unlock(&rproc->lock);
>> >> >> >>  	return ret;
>> >> >> >> @@ -2091,6 +2093,7 @@ int rproc_detach(struct rproc *rproc)
>> >> >> >>  	kfree(rproc->cached_table);
>> >> >> >>  	rproc->cached_table = NULL;
>> >> >> >>  	rproc->table_ptr = NULL;
>> >> >> >> +	rproc->table_sz = 0;
>> >> >> >>  out:
>> >> >> >>  	mutex_unlock(&rproc->lock);
>> >> >> >>  	return ret;
>> >> >> >> -- 
>> >> >> >> 2.37.1
>> >> >> >> 
>> >> >> >
>> >> >
>> >
>

  reply	other threads:[~2025-04-02  0:36 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-26  2:02 [PATCH V2] remoteproc: core: Clear table_sz when rproc_shutdown Peng Fan (OSS)
2025-03-27 17:46 ` Mathieu Poirier
2025-03-28  4:50   ` Peng Fan
2025-03-28 14:14     ` Mathieu Poirier
2025-03-29 12:56       ` Peng Fan
2025-03-31 15:40         ` Mathieu Poirier
2025-04-01  1:41           ` Peng Fan
2025-04-01 16:05             ` Mathieu Poirier
2025-04-02  1:43               ` Peng Fan [this message]
2025-04-02 13:48                 ` Bjorn Andersson
2025-04-03 14:32                   ` Peng Fan
2025-04-08 16:10                     ` Peng Fan
2025-04-08 16:59                       ` Mathieu Poirier
2025-04-09  6:46                         ` Peng Fan
2025-05-06  4:19                           ` Peng Fan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250402014355.GA22575@nxa18884-linux \
    --to=peng.fan@oss.nxp.com \
    --cc=andersson@kernel.org \
    --cc=arnaud.pouliquen@st.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-remoteproc@vger.kernel.org \
    --cc=mathieu.poirier@linaro.org \
    --cc=peng.fan@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.