* [RFC PATCH] remoteproc: core: full attach detach during recovery
@ 2025-10-02 15:33 Tanmay Shah
2025-10-16 15:12 ` Mathieu Poirier
0 siblings, 1 reply; 5+ messages in thread
From: Tanmay Shah @ 2025-10-02 15:33 UTC (permalink / raw)
To: andersson, mathieu.poirier; +Cc: linux-remoteproc, linux-kernel, Tanmay Shah
Current recovery operation does only virtio device reset, but do not
free and re-allocate all the resources. As third-party is booting the
remote processor during attach-detach, it is better to free and
re-allocate resoruces as resource table state might be unknown to linux
when remote processor boots and reports crash.
Signed-off-by: Tanmay Shah <tanmay.shah@amd.com>
---
Note: RFC patch for design discussion. Please do not merge.
drivers/remoteproc/remoteproc_core.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index 825672100528..4971508bc5b2 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1786,7 +1786,20 @@ static int rproc_attach_recovery(struct rproc *rproc)
if (ret)
return ret;
- return __rproc_attach(rproc);
+ /* clean up all acquired resources */
+ rproc_resource_cleanup(rproc);
+
+ /* release HW resources if needed */
+ rproc_unprepare_device(rproc);
+
+ rproc_disable_iommu(rproc);
+
+ /* Free the copy of the resource table */
+ kfree(rproc->cached_table);
+ rproc->cached_table = NULL;
+ rproc->table_ptr = NULL;
+
+ return rproc_attach(rproc);
}
static int rproc_boot_recovery(struct rproc *rproc)
base-commit: 56d030ea3330ab737fe6c05f89d52f56208b07ac
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] remoteproc: core: full attach detach during recovery
2025-10-02 15:33 [RFC PATCH] remoteproc: core: full attach detach during recovery Tanmay Shah
@ 2025-10-16 15:12 ` Mathieu Poirier
2025-10-16 16:12 ` Tanmay Shah
0 siblings, 1 reply; 5+ messages in thread
From: Mathieu Poirier @ 2025-10-16 15:12 UTC (permalink / raw)
To: Tanmay Shah; +Cc: andersson, linux-remoteproc, linux-kernel
Good morning,
On Thu, Oct 02, 2025 at 08:33:46AM -0700, Tanmay Shah wrote:
> Current recovery operation does only virtio device reset, but do not
> free and re-allocate all the resources. As third-party is booting the
> remote processor during attach-detach, it is better to free and
> re-allocate resoruces as resource table state might be unknown to linux
> when remote processor boots and reports crash.
1) When referring to "third-party", should I assume boot loader?
2) Function rproc_attach_recovery() calls __rproc_detach(), which in turn calls
rproc_reset_rsc_table_on_detach(). That function deals explicitly with the
resource table.
3) The code in this patch mixes __rproc_detach() with rproc_attach(), something
that is likely not a good idea. We either do __rproc_detach/__rproc_attach or
rproc_detach/rproc_attach but I'd like to avoid the mix-and-match to keep the
amount of possible states to a minimum.
If I understand correctly, the main motivation for this patch is the management
of the resource table. But as noted in (2), this should be taken care of. Am I
missing some information?
Thanks,
Mathieu
>
> Signed-off-by: Tanmay Shah <tanmay.shah@amd.com>
> ---
>
> Note: RFC patch for design discussion. Please do not merge.
>
> drivers/remoteproc/remoteproc_core.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index 825672100528..4971508bc5b2 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1786,7 +1786,20 @@ static int rproc_attach_recovery(struct rproc *rproc)
> if (ret)
> return ret;
>
> - return __rproc_attach(rproc);
> + /* clean up all acquired resources */
> + rproc_resource_cleanup(rproc);
> +
> + /* release HW resources if needed */
> + rproc_unprepare_device(rproc);
> +
> + rproc_disable_iommu(rproc);
> +
> + /* Free the copy of the resource table */
> + kfree(rproc->cached_table);
> + rproc->cached_table = NULL;
> + rproc->table_ptr = NULL;
> +
> + return rproc_attach(rproc);
> }
>
> static int rproc_boot_recovery(struct rproc *rproc)
>
> base-commit: 56d030ea3330ab737fe6c05f89d52f56208b07ac
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] remoteproc: core: full attach detach during recovery
2025-10-16 15:12 ` Mathieu Poirier
@ 2025-10-16 16:12 ` Tanmay Shah
2025-10-17 15:35 ` Mathieu Poirier
0 siblings, 1 reply; 5+ messages in thread
From: Tanmay Shah @ 2025-10-16 16:12 UTC (permalink / raw)
To: Mathieu Poirier; +Cc: andersson, linux-remoteproc, linux-kernel
Hello,
Please find my comments below:
On 10/16/25 10:12 AM, Mathieu Poirier wrote:
> Good morning,
>
> On Thu, Oct 02, 2025 at 08:33:46AM -0700, Tanmay Shah wrote:
>> Current recovery operation does only virtio device reset, but do not
>> free and re-allocate all the resources. As third-party is booting the
>> remote processor during attach-detach, it is better to free and
>> re-allocate resoruces as resource table state might be unknown to linux
>> when remote processor boots and reports crash.
>
> 1) When referring to "third-party", should I assume boot loader?
Here, "third-party" could be a bootloader or another core in a
heterogeneous system. In my-case it is a platform management controller.
> 2) Function rproc_attach_recovery() calls __rproc_detach(), which in turn calls
> rproc_reset_rsc_table_on_detach(). That function deals explicitly with the
> resource table.
As per my understanding, rproc_reset_rsc_table_on_detach() will setup
clean resource table, that sets vring addresses to 0xffffffff. Please
let me know if this understanding is not correct.
If we do not, call rproc_attach(), then correct vring addresses are not
setup in the resource table for next attach to work. Because,
rproc_handle_resources() and rproc_alloc_registered_carveouts() are not
called as part __rproc_attach().
> 3) The code in this patch mixes __rproc_detach() with rproc_attach(), something
> that is likely not a good idea. We either do __rproc_detach/__rproc_attach or
> rproc_detach/rproc_attach but I'd like to avoid the mix-and-match to keep the
> amount of possible states to a minimum.
>
I agree to this. I can find a way to call rproc_detach() and
rproc_attach() sequentially, instead of __rproc_detach() and
rproc_attach() calls. I might have to remove
rproc_trigger_attach_recovery completely, but that is implementation
details. We can work it out later, once we agree to the current problem
& solution.
> If I understand correctly, the main motivation for this patch is the management
> of the resource table. But as noted in (2), this should be taken care of. Am I
> missing some information?
>
The main motivation is to make the attach operation works during
attach_recovery(). The __rproc_detach() works as expected, but attach
doesn't work. After recovery, I am not able to strat RPMsg communication.
Please let me know if I am missing something.
Thanks,
Tanmay
> Thanks,
> Mathieu
>
>>
>> Signed-off-by: Tanmay Shah <tanmay.shah@amd.com>
>> ---
>>
>> Note: RFC patch for design discussion. Please do not merge.
>>
>> drivers/remoteproc/remoteproc_core.c | 15 ++++++++++++++-
>> 1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
>> index 825672100528..4971508bc5b2 100644
>> --- a/drivers/remoteproc/remoteproc_core.c
>> +++ b/drivers/remoteproc/remoteproc_core.c
>> @@ -1786,7 +1786,20 @@ static int rproc_attach_recovery(struct rproc *rproc)
>> if (ret)
>> return ret;
>>
>> - return __rproc_attach(rproc);
>> + /* clean up all acquired resources */
>> + rproc_resource_cleanup(rproc);
>> +
>> + /* release HW resources if needed */
>> + rproc_unprepare_device(rproc);
>> +
>> + rproc_disable_iommu(rproc);
>> +
>> + /* Free the copy of the resource table */
>> + kfree(rproc->cached_table);
>> + rproc->cached_table = NULL;
>> + rproc->table_ptr = NULL;
>> +
>> + return rproc_attach(rproc);
>> }
>>
>> static int rproc_boot_recovery(struct rproc *rproc)
>>
>> base-commit: 56d030ea3330ab737fe6c05f89d52f56208b07ac
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] remoteproc: core: full attach detach during recovery
2025-10-16 16:12 ` Tanmay Shah
@ 2025-10-17 15:35 ` Mathieu Poirier
2025-10-17 16:22 ` Tanmay Shah
0 siblings, 1 reply; 5+ messages in thread
From: Mathieu Poirier @ 2025-10-17 15:35 UTC (permalink / raw)
To: Tanmay Shah; +Cc: andersson, linux-remoteproc, linux-kernel
On Thu, Oct 16, 2025 at 11:12:26AM -0500, Tanmay Shah wrote:
>
> Hello,
>
> Please find my comments below:
>
> On 10/16/25 10:12 AM, Mathieu Poirier wrote:
> > Good morning,
> >
> > On Thu, Oct 02, 2025 at 08:33:46AM -0700, Tanmay Shah wrote:
> > > Current recovery operation does only virtio device reset, but do not
> > > free and re-allocate all the resources. As third-party is booting the
> > > remote processor during attach-detach, it is better to free and
> > > re-allocate resoruces as resource table state might be unknown to linux
> > > when remote processor boots and reports crash.
> >
> > 1) When referring to "third-party", should I assume boot loader?
>
> Here, "third-party" could be a bootloader or another core in a heterogeneous
> system. In my-case it is a platform management controller.
Ok
>
>
> > 2) Function rproc_attach_recovery() calls __rproc_detach(), which in turn calls
> > rproc_reset_rsc_table_on_detach(). That function deals explicitly with the
> > resource table.
>
> As per my understanding, rproc_reset_rsc_table_on_detach() will setup clean
> resource table, that sets vring addresses to 0xffffffff. Please let me know
> if this understanding is not correct.
>
> If we do not, call rproc_attach(), then correct vring addresses are not
> setup in the resource table for next attach to work. Because,
> rproc_handle_resources() and rproc_alloc_registered_carveouts() are not
> called as part __rproc_attach().
Your assessment is correct. When the clean_table was introduced, it was to
address the detach->attach scenario. At that time the only recovery we
supported was to stop and start again, which did not involved the clean_table.
Re-attaching on crash was introduced later in a scenario that may not have
included a resource table.
>
> > 3) The code in this patch mixes __rproc_detach() with rproc_attach(), something
> > that is likely not a good idea. We either do __rproc_detach/__rproc_attach or
> > rproc_detach/rproc_attach but I'd like to avoid the mix-and-match to keep the
> > amount of possible states to a minimum.
> >
>
> I agree to this. I can find a way to call rproc_detach() and rproc_attach()
> sequentially, instead of __rproc_detach() and rproc_attach() calls. I might
> have to remove rproc_trigger_attach_recovery completely, but that is
> implementation details. We can work it out later, once we agree to the
> current problem & solution.
>
Humm... You might just be able to call rproc_detach/rproc_attach from
rproc_attach_recovery() if you enhance rproc_detach to be called in a CRASHED
context [1]. Let's see what you find when trying this on real HW.
[1]. https://elixir.bootlin.com/linux/v6.17.1/source/drivers/remoteproc/remoteproc_core.c#L2065
> > If I understand correctly, the main motivation for this patch is the management
> > of the resource table. But as noted in (2), this should be taken care of. Am I
> > missing some information?
> >
>
> The main motivation is to make the attach operation works during
> attach_recovery(). The __rproc_detach() works as expected, but attach
> doesn't work. After recovery, I am not able to strat RPMsg communication.
>
> Please let me know if I am missing something.
>
> Thanks,
> Tanmay
>
> > Thanks,
> > Mathieu
> >
> > >
> > > Signed-off-by: Tanmay Shah <tanmay.shah@amd.com>
> > > ---
> > >
> > > Note: RFC patch for design discussion. Please do not merge.
> > >
> > > drivers/remoteproc/remoteproc_core.c | 15 ++++++++++++++-
> > > 1 file changed, 14 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> > > index 825672100528..4971508bc5b2 100644
> > > --- a/drivers/remoteproc/remoteproc_core.c
> > > +++ b/drivers/remoteproc/remoteproc_core.c
> > > @@ -1786,7 +1786,20 @@ static int rproc_attach_recovery(struct rproc *rproc)
> > > if (ret)
> > > return ret;
> > > - return __rproc_attach(rproc);
> > > + /* clean up all acquired resources */
> > > + rproc_resource_cleanup(rproc);
> > > +
> > > + /* release HW resources if needed */
> > > + rproc_unprepare_device(rproc);
> > > +
> > > + rproc_disable_iommu(rproc);
> > > +
> > > + /* Free the copy of the resource table */
> > > + kfree(rproc->cached_table);
> > > + rproc->cached_table = NULL;
> > > + rproc->table_ptr = NULL;
> > > +
> > > + return rproc_attach(rproc);
> > > }
> > > static int rproc_boot_recovery(struct rproc *rproc)
> > >
> > > base-commit: 56d030ea3330ab737fe6c05f89d52f56208b07ac
> > > --
> > > 2.34.1
> > >
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] remoteproc: core: full attach detach during recovery
2025-10-17 15:35 ` Mathieu Poirier
@ 2025-10-17 16:22 ` Tanmay Shah
0 siblings, 0 replies; 5+ messages in thread
From: Tanmay Shah @ 2025-10-17 16:22 UTC (permalink / raw)
To: Mathieu Poirier; +Cc: andersson, linux-remoteproc, linux-kernel
On 10/17/25 10:35 AM, Mathieu Poirier wrote:
> On Thu, Oct 16, 2025 at 11:12:26AM -0500, Tanmay Shah wrote:
>>
>> Hello,
>>
>> Please find my comments below:
>>
>> On 10/16/25 10:12 AM, Mathieu Poirier wrote:
>>> Good morning,
>>>
>>> On Thu, Oct 02, 2025 at 08:33:46AM -0700, Tanmay Shah wrote:
>>>> Current recovery operation does only virtio device reset, but do not
>>>> free and re-allocate all the resources. As third-party is booting the
>>>> remote processor during attach-detach, it is better to free and
>>>> re-allocate resoruces as resource table state might be unknown to linux
>>>> when remote processor boots and reports crash.
>>>
>>> 1) When referring to "third-party", should I assume boot loader?
>>
>> Here, "third-party" could be a bootloader or another core in a heterogeneous
>> system. In my-case it is a platform management controller.
>
> Ok
>
>>
>>
>>> 2) Function rproc_attach_recovery() calls __rproc_detach(), which in turn calls
>>> rproc_reset_rsc_table_on_detach(). That function deals explicitly with the
>>> resource table.
>>
>> As per my understanding, rproc_reset_rsc_table_on_detach() will setup clean
>> resource table, that sets vring addresses to 0xffffffff. Please let me know
>> if this understanding is not correct.
>>
>> If we do not, call rproc_attach(), then correct vring addresses are not
>> setup in the resource table for next attach to work. Because,
>> rproc_handle_resources() and rproc_alloc_registered_carveouts() are not
>> called as part __rproc_attach().
>
> Your assessment is correct. When the clean_table was introduced, it was to
> address the detach->attach scenario. At that time the only recovery we
> supported was to stop and start again, which did not involved the clean_table.
> Re-attaching on crash was introduced later in a scenario that may not have
> included a resource table.
>
Okay that explains the current architecture.
>>
>>> 3) The code in this patch mixes __rproc_detach() with rproc_attach(), something
>>> that is likely not a good idea. We either do __rproc_detach/__rproc_attach or
>>> rproc_detach/rproc_attach but I'd like to avoid the mix-and-match to keep the
>>> amount of possible states to a minimum.
>>>
>>
>> I agree to this. I can find a way to call rproc_detach() and rproc_attach()
>> sequentially, instead of __rproc_detach() and rproc_attach() calls. I might
>> have to remove rproc_trigger_attach_recovery completely, but that is
>> implementation details. We can work it out later, once we agree to the
>> current problem & solution.
>>
>
> Humm... You might just be able to call rproc_detach/rproc_attach from
> rproc_attach_recovery() if you enhance rproc_detach to be called in a CRASHED
> context [1]. Let's see what you find when trying this on real HW.
>
> [1]. https://elixir.bootlin.com/linux/v6.17.1/source/drivers/remoteproc/remoteproc_core.c#L2065
>
>
Thank You for the suggestion. Agreed. Since we are coming to an
agreement on the final solution, I will send the actual series which
also takes care of start/stop recovery on xlnx platform driver. I will
implement above suggestion, and test on HW.
Thanks,
Tanmay
>>> If I understand correctly, the main motivation for this patch is the management
>>> of the resource table. But as noted in (2), this should be taken care of. Am I
>>> missing some information?
>>>
>>
>> The main motivation is to make the attach operation works during
>> attach_recovery(). The __rproc_detach() works as expected, but attach
>> doesn't work. After recovery, I am not able to strat RPMsg communication.
>>
>> Please let me know if I am missing something.
>>
>> Thanks,
>> Tanmay
>>
>>> Thanks,
>>> Mathieu
>>>
>>>>
>>>> Signed-off-by: Tanmay Shah <tanmay.shah@amd.com>
>>>> ---
>>>>
>>>> Note: RFC patch for design discussion. Please do not merge.
>>>>
>>>> drivers/remoteproc/remoteproc_core.c | 15 ++++++++++++++-
>>>> 1 file changed, 14 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
>>>> index 825672100528..4971508bc5b2 100644
>>>> --- a/drivers/remoteproc/remoteproc_core.c
>>>> +++ b/drivers/remoteproc/remoteproc_core.c
>>>> @@ -1786,7 +1786,20 @@ static int rproc_attach_recovery(struct rproc *rproc)
>>>> if (ret)
>>>> return ret;
>>>> - return __rproc_attach(rproc);
>>>> + /* clean up all acquired resources */
>>>> + rproc_resource_cleanup(rproc);
>>>> +
>>>> + /* release HW resources if needed */
>>>> + rproc_unprepare_device(rproc);
>>>> +
>>>> + rproc_disable_iommu(rproc);
>>>> +
>>>> + /* Free the copy of the resource table */
>>>> + kfree(rproc->cached_table);
>>>> + rproc->cached_table = NULL;
>>>> + rproc->table_ptr = NULL;
>>>> +
>>>> + return rproc_attach(rproc);
>>>> }
>>>> static int rproc_boot_recovery(struct rproc *rproc)
>>>>
>>>> base-commit: 56d030ea3330ab737fe6c05f89d52f56208b07ac
>>>> --
>>>> 2.34.1
>>>>
>>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-10-17 16:23 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-02 15:33 [RFC PATCH] remoteproc: core: full attach detach during recovery Tanmay Shah
2025-10-16 15:12 ` Mathieu Poirier
2025-10-16 16:12 ` Tanmay Shah
2025-10-17 15:35 ` Mathieu Poirier
2025-10-17 16:22 ` Tanmay Shah
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox