* [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq @ 2025-12-04 10:40 Stefan Roese 2025-12-04 16:45 ` Tanmay Shah 0 siblings, 1 reply; 14+ messages in thread From: Stefan Roese @ 2025-12-04 10:40 UTC (permalink / raw) To: linux-remoteproc; +Cc: Tanmay Shah, Mathieu Poirier Testing on our ZynqMP platform has shown, that some R5 messages might get dropped under high CPU load. This patch creates a new high-prio workqueue which is now used instead of the default system workqueue. With this change we don't experience these message drops any more. Signed-off-by: Stefan Roese <stefan.roese@mailbox.org> Cc: Tanmay Shah <tanmay.shah@amd.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> --- v3: - Call cancel_work_sync() before freeing ipi (suggested by Zhongqiu Han) v2: - Also call destroy_workqueue() in zynqmp_r5_cluster_exit() (suggested by Zhongqiu Han) - Correct call seq to avoid UAF (suggested by Zhongqiu Han) drivers/remoteproc/xlnx_r5_remoteproc.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c b/drivers/remoteproc/xlnx_r5_remoteproc.c index feca6de68da28..308328b0b489f 100644 --- a/drivers/remoteproc/xlnx_r5_remoteproc.c +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c @@ -16,6 +16,7 @@ #include <linux/of_reserved_mem.h> #include <linux/platform_device.h> #include <linux/remoteproc.h> +#include <linux/workqueue.h> #include "remoteproc_internal.h" @@ -116,6 +117,7 @@ struct zynqmp_r5_cluster { enum zynqmp_r5_cluster_mode mode; int core_count; struct zynqmp_r5_core **r5_cores; + struct workqueue_struct *workqueue; }; /** @@ -174,10 +176,18 @@ static void handle_event_notified(struct work_struct *work) static void zynqmp_r5_mb_rx_cb(struct mbox_client *cl, void *msg) { struct zynqmp_ipi_message *ipi_msg, *buf_msg; + struct zynqmp_r5_cluster *cluster; struct mbox_info *ipi; + struct device *dev; size_t len; ipi = container_of(cl, struct mbox_info, mbox_cl); + dev = ipi->r5_core->dev; + cluster = dev_get_drvdata(dev->parent); + if (!cluster) { + dev_err(dev->parent, "Invalid driver data\n"); + return; + } /* copy data from ipi buffer to r5_core */ ipi_msg = (struct zynqmp_ipi_message *)msg; @@ -195,7 +205,7 @@ static void zynqmp_r5_mb_rx_cb(struct mbox_client *cl, void *msg) if (mbox_send_message(ipi->rx_chan, NULL) < 0) dev_err(cl->dev, "ack failed to mbox rx_chan\n"); - schedule_work(&ipi->mbox_work); + queue_work(cluster->workqueue, &ipi->mbox_work); } /** @@ -1154,6 +1164,7 @@ static void zynqmp_r5_cluster_exit(void *data) for (i = 0; i < cluster->core_count; i++) { r5_core = cluster->r5_cores[i]; + cancel_work_sync(&r5_core->ipi->mbox_work); zynqmp_r5_free_mbox(r5_core->ipi); of_reserved_mem_device_release(r5_core->dev); put_device(r5_core->dev); @@ -1162,6 +1173,7 @@ static void zynqmp_r5_cluster_exit(void *data) } kfree(cluster->r5_cores); + destroy_workqueue(cluster->workqueue); kfree(cluster); platform_set_drvdata(pdev, NULL); } @@ -1194,11 +1206,20 @@ static int zynqmp_r5_remoteproc_probe(struct platform_device *pdev) return ret; } + cluster->workqueue = alloc_workqueue(dev_name(dev), + WQ_UNBOUND | WQ_HIGHPRI, 0); + if (!cluster->workqueue) { + dev_err_probe(dev, -ENOMEM, "cannot create workqueue\n"); + kfree(cluster); + return -ENOMEM; + } + /* wire in so each core can be cleaned up at driver remove */ platform_set_drvdata(pdev, cluster); ret = zynqmp_r5_cluster_init(cluster); if (ret) { + destroy_workqueue(cluster->workqueue); kfree(cluster); platform_set_drvdata(pdev, NULL); dev_err_probe(dev, ret, "Invalid r5f subsystem device tree\n"); -- 2.52.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq 2025-12-04 10:40 [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq Stefan Roese @ 2025-12-04 16:45 ` Tanmay Shah 2025-12-05 12:06 ` Stefan Roese 0 siblings, 1 reply; 14+ messages in thread From: Tanmay Shah @ 2025-12-04 16:45 UTC (permalink / raw) To: Stefan Roese, linux-remoteproc; +Cc: Mathieu Poirier Hello, Thank You for your patch. Please find my comments below. On 12/4/25 4:40 AM, Stefan Roese wrote: > Testing on our ZynqMP platform has shown, that some R5 messages might > get dropped under high CPU load. This patch creates a new high-prio Here, I would like to understand what it means by "R5 messages might get dropped" Even under high CPU load, the messages from R5 are stored in the virtqueues. If Linux doesn't read it, then it is not really lost/dropped. Could you please explain your use case in detail and how the testing is conducted? Thanks, Tanmay > workqueue which is now used instead of the default system workqueue. > With this change we don't experience these message drops any more. > > Signed-off-by: Stefan Roese <stefan.roese@mailbox.org> > Cc: Tanmay Shah <tanmay.shah@amd.com> > Cc: Mathieu Poirier <mathieu.poirier@linaro.org> > --- > v3: > - Call cancel_work_sync() before freeing ipi (suggested by Zhongqiu Han) > > v2: > - Also call destroy_workqueue() in zynqmp_r5_cluster_exit() (suggested by Zhongqiu Han) > - Correct call seq to avoid UAF (suggested by Zhongqiu Han) > > drivers/remoteproc/xlnx_r5_remoteproc.c | 23 ++++++++++++++++++++++- > 1 file changed, 22 insertions(+), 1 deletion(-) > > diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c b/drivers/remoteproc/xlnx_r5_remoteproc.c > index feca6de68da28..308328b0b489f 100644 > --- a/drivers/remoteproc/xlnx_r5_remoteproc.c > +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c > @@ -16,6 +16,7 @@ > #include <linux/of_reserved_mem.h> > #include <linux/platform_device.h> > #include <linux/remoteproc.h> > +#include <linux/workqueue.h> > > #include "remoteproc_internal.h" > > @@ -116,6 +117,7 @@ struct zynqmp_r5_cluster { > enum zynqmp_r5_cluster_mode mode; > int core_count; > struct zynqmp_r5_core **r5_cores; > + struct workqueue_struct *workqueue; > }; > > /** > @@ -174,10 +176,18 @@ static void handle_event_notified(struct work_struct *work) > static void zynqmp_r5_mb_rx_cb(struct mbox_client *cl, void *msg) > { > struct zynqmp_ipi_message *ipi_msg, *buf_msg; > + struct zynqmp_r5_cluster *cluster; > struct mbox_info *ipi; > + struct device *dev; > size_t len; > > ipi = container_of(cl, struct mbox_info, mbox_cl); > + dev = ipi->r5_core->dev; > + cluster = dev_get_drvdata(dev->parent); > + if (!cluster) { > + dev_err(dev->parent, "Invalid driver data\n"); > + return; > + } > > /* copy data from ipi buffer to r5_core */ > ipi_msg = (struct zynqmp_ipi_message *)msg; > @@ -195,7 +205,7 @@ static void zynqmp_r5_mb_rx_cb(struct mbox_client *cl, void *msg) > if (mbox_send_message(ipi->rx_chan, NULL) < 0) > dev_err(cl->dev, "ack failed to mbox rx_chan\n"); > > - schedule_work(&ipi->mbox_work); > + queue_work(cluster->workqueue, &ipi->mbox_work); > } > > /** > @@ -1154,6 +1164,7 @@ static void zynqmp_r5_cluster_exit(void *data) > > for (i = 0; i < cluster->core_count; i++) { > r5_core = cluster->r5_cores[i]; > + cancel_work_sync(&r5_core->ipi->mbox_work); > zynqmp_r5_free_mbox(r5_core->ipi); > of_reserved_mem_device_release(r5_core->dev); > put_device(r5_core->dev); > @@ -1162,6 +1173,7 @@ static void zynqmp_r5_cluster_exit(void *data) > } > > kfree(cluster->r5_cores); > + destroy_workqueue(cluster->workqueue); > kfree(cluster); > platform_set_drvdata(pdev, NULL); > } > @@ -1194,11 +1206,20 @@ static int zynqmp_r5_remoteproc_probe(struct platform_device *pdev) > return ret; > } > > + cluster->workqueue = alloc_workqueue(dev_name(dev), > + WQ_UNBOUND | WQ_HIGHPRI, 0); > + if (!cluster->workqueue) { > + dev_err_probe(dev, -ENOMEM, "cannot create workqueue\n"); > + kfree(cluster); > + return -ENOMEM; > + } > + > /* wire in so each core can be cleaned up at driver remove */ > platform_set_drvdata(pdev, cluster); > > ret = zynqmp_r5_cluster_init(cluster); > if (ret) { > + destroy_workqueue(cluster->workqueue); > kfree(cluster); > platform_set_drvdata(pdev, NULL); > dev_err_probe(dev, ret, "Invalid r5f subsystem device tree\n"); ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq 2025-12-04 16:45 ` Tanmay Shah @ 2025-12-05 12:06 ` Stefan Roese 2025-12-10 2:51 ` Zhongqiu Han 0 siblings, 1 reply; 14+ messages in thread From: Stefan Roese @ 2025-12-05 12:06 UTC (permalink / raw) To: tanmay.shah, linux-remoteproc; +Cc: Mathieu Poirier, Zhongqiu Han Hi Tanmay, On 12/4/25 17:45, Tanmay Shah wrote: > Hello, > > Thank You for your patch. Please find my comments below. > > On 12/4/25 4:40 AM, Stefan Roese wrote: >> Testing on our ZynqMP platform has shown, that some R5 messages might >> get dropped under high CPU load. This patch creates a new high-prio > > Here, I would like to understand what it means by "R5 messages might get > dropped" > > Even under high CPU load, the messages from R5 are stored in the > virtqueues. If Linux doesn't read it, then it is not really lost/dropped. > > Could you please explain your use case in detail and how the testing is > conducted? Our use-case is, that we send ~4k messages per second from the R5 to Linux - sometimes even a bit more. Normally these messages are received okay and no messages are dropped. Sometimes, under "high CPU load" scenarios it happens, that the R5 has to drop messages, as there is no free space in the RPMsg buffer, which is 256 entries AFAIU. Resulting from the Linux driver not emptying the RX queue. Could you please elaborate on these virtqueues a bit? Especially why no messages drop should happen because of these virtqueues? Thanks, Stefan > Thanks, > Tanmay > >> workqueue which is now used instead of the default system workqueue. >> With this change we don't experience these message drops any more. >> >> Signed-off-by: Stefan Roese <stefan.roese@mailbox.org> >> Cc: Tanmay Shah <tanmay.shah@amd.com> >> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> >> --- >> v3: >> - Call cancel_work_sync() before freeing ipi (suggested by Zhongqiu Han) >> >> v2: >> - Also call destroy_workqueue() in zynqmp_r5_cluster_exit() (suggested >> by Zhongqiu Han) >> - Correct call seq to avoid UAF (suggested by Zhongqiu Han) >> >> drivers/remoteproc/xlnx_r5_remoteproc.c | 23 ++++++++++++++++++++++- >> 1 file changed, 22 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c b/drivers/ >> remoteproc/xlnx_r5_remoteproc.c >> index feca6de68da28..308328b0b489f 100644 >> --- a/drivers/remoteproc/xlnx_r5_remoteproc.c >> +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c >> @@ -16,6 +16,7 @@ >> #include <linux/of_reserved_mem.h> >> #include <linux/platform_device.h> >> #include <linux/remoteproc.h> >> +#include <linux/workqueue.h> >> #include "remoteproc_internal.h" >> @@ -116,6 +117,7 @@ struct zynqmp_r5_cluster { >> enum zynqmp_r5_cluster_mode mode; >> int core_count; >> struct zynqmp_r5_core **r5_cores; >> + struct workqueue_struct *workqueue; >> }; >> /** >> @@ -174,10 +176,18 @@ static void handle_event_notified(struct >> work_struct *work) >> static void zynqmp_r5_mb_rx_cb(struct mbox_client *cl, void *msg) >> { >> struct zynqmp_ipi_message *ipi_msg, *buf_msg; >> + struct zynqmp_r5_cluster *cluster; >> struct mbox_info *ipi; >> + struct device *dev; >> size_t len; >> ipi = container_of(cl, struct mbox_info, mbox_cl); >> + dev = ipi->r5_core->dev; >> + cluster = dev_get_drvdata(dev->parent); >> + if (!cluster) { >> + dev_err(dev->parent, "Invalid driver data\n"); >> + return; >> + } >> /* copy data from ipi buffer to r5_core */ >> ipi_msg = (struct zynqmp_ipi_message *)msg; >> @@ -195,7 +205,7 @@ static void zynqmp_r5_mb_rx_cb(struct mbox_client >> *cl, void *msg) >> if (mbox_send_message(ipi->rx_chan, NULL) < 0) >> dev_err(cl->dev, "ack failed to mbox rx_chan\n"); >> - schedule_work(&ipi->mbox_work); >> + queue_work(cluster->workqueue, &ipi->mbox_work); >> } >> /** >> @@ -1154,6 +1164,7 @@ static void zynqmp_r5_cluster_exit(void *data) >> for (i = 0; i < cluster->core_count; i++) { >> r5_core = cluster->r5_cores[i]; >> + cancel_work_sync(&r5_core->ipi->mbox_work); >> zynqmp_r5_free_mbox(r5_core->ipi); >> of_reserved_mem_device_release(r5_core->dev); >> put_device(r5_core->dev); >> @@ -1162,6 +1173,7 @@ static void zynqmp_r5_cluster_exit(void *data) >> } >> kfree(cluster->r5_cores); >> + destroy_workqueue(cluster->workqueue); >> kfree(cluster); >> platform_set_drvdata(pdev, NULL); >> } >> @@ -1194,11 +1206,20 @@ static int zynqmp_r5_remoteproc_probe(struct >> platform_device *pdev) >> return ret; >> } >> + cluster->workqueue = alloc_workqueue(dev_name(dev), >> + WQ_UNBOUND | WQ_HIGHPRI, 0); >> + if (!cluster->workqueue) { >> + dev_err_probe(dev, -ENOMEM, "cannot create workqueue\n"); >> + kfree(cluster); >> + return -ENOMEM; >> + } >> + >> /* wire in so each core can be cleaned up at driver remove */ >> platform_set_drvdata(pdev, cluster); >> ret = zynqmp_r5_cluster_init(cluster); >> if (ret) { >> + destroy_workqueue(cluster->workqueue); >> kfree(cluster); >> platform_set_drvdata(pdev, NULL); >> dev_err_probe(dev, ret, "Invalid r5f subsystem device tree\n"); > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq 2025-12-05 12:06 ` Stefan Roese @ 2025-12-10 2:51 ` Zhongqiu Han 2025-12-10 8:29 ` Stefan Roese 0 siblings, 1 reply; 14+ messages in thread From: Zhongqiu Han @ 2025-12-10 2:51 UTC (permalink / raw) To: Stefan Roese, tanmay.shah, linux-remoteproc; +Cc: Mathieu Poirier, zhongqiu.han On 12/5/2025 8:06 PM, Stefan Roese wrote: > Hi Tanmay, > > On 12/4/25 17:45, Tanmay Shah wrote: >> Hello, >> >> Thank You for your patch. Please find my comments below. >> >> On 12/4/25 4:40 AM, Stefan Roese wrote: >>> Testing on our ZynqMP platform has shown, that some R5 messages might >>> get dropped under high CPU load. This patch creates a new high-prio >> >> Here, I would like to understand what it means by "R5 messages might >> get dropped" >> >> Even under high CPU load, the messages from R5 are stored in the >> virtqueues. If Linux doesn't read it, then it is not really lost/dropped. >> >> Could you please explain your use case in detail and how the testing >> is conducted? > > Our use-case is, that we send ~4k messages per second from the R5 to > Linux - sometimes even a bit more. Normally these messages are received > okay and no messages are dropped. Sometimes, under "high CPU load" > scenarios it happens, that the R5 has to drop messages, as there is no > free space in the RPMsg buffer, which is 256 entries AFAIU. Resulting > from the Linux driver not emptying the RX queue. > > Could you please elaborate on these virtqueues a bit? Especially why no > messages drop should happen because of these virtqueues? AFAIK, as a transport layer based on virtqueue, rpmsg is reliable once a message has been successfully enqueued. The observed "drop" here appears to be on the R5 side, where the application discards messages when no entry buffer is available. In the long run, while improving the Linux side is recommended, it could also be helpful for the R5 side to implement strategies such as an application-level buffer and retry mechanisms. > > Thanks, > Stefan > >> Thanks, >> Tanmay >> >>> workqueue which is now used instead of the default system workqueue. >>> With this change we don't experience these message drops any more. >>> >>> Signed-off-by: Stefan Roese <stefan.roese@mailbox.org> >>> Cc: Tanmay Shah <tanmay.shah@amd.com> >>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> >>> --- >>> v3: >>> - Call cancel_work_sync() before freeing ipi (suggested by Zhongqiu Han) >>> >>> v2: >>> - Also call destroy_workqueue() in zynqmp_r5_cluster_exit() >>> (suggested by Zhongqiu Han) >>> - Correct call seq to avoid UAF (suggested by Zhongqiu Han) >>> >>> drivers/remoteproc/xlnx_r5_remoteproc.c | 23 ++++++++++++++++++++++- >>> 1 file changed, 22 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c b/drivers/ >>> remoteproc/xlnx_r5_remoteproc.c >>> index feca6de68da28..308328b0b489f 100644 >>> --- a/drivers/remoteproc/xlnx_r5_remoteproc.c >>> +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c >>> @@ -16,6 +16,7 @@ >>> #include <linux/of_reserved_mem.h> >>> #include <linux/platform_device.h> >>> #include <linux/remoteproc.h> >>> +#include <linux/workqueue.h> >>> #include "remoteproc_internal.h" >>> @@ -116,6 +117,7 @@ struct zynqmp_r5_cluster { >>> enum zynqmp_r5_cluster_mode mode; >>> int core_count; >>> struct zynqmp_r5_core **r5_cores; >>> + struct workqueue_struct *workqueue; >>> }; >>> /** >>> @@ -174,10 +176,18 @@ static void handle_event_notified(struct >>> work_struct *work) >>> static void zynqmp_r5_mb_rx_cb(struct mbox_client *cl, void *msg) >>> { >>> struct zynqmp_ipi_message *ipi_msg, *buf_msg; >>> + struct zynqmp_r5_cluster *cluster; >>> struct mbox_info *ipi; >>> + struct device *dev; >>> size_t len; >>> ipi = container_of(cl, struct mbox_info, mbox_cl); >>> + dev = ipi->r5_core->dev; >>> + cluster = dev_get_drvdata(dev->parent); >>> + if (!cluster) { >>> + dev_err(dev->parent, "Invalid driver data\n"); >>> + return; >>> + } >>> /* copy data from ipi buffer to r5_core */ >>> ipi_msg = (struct zynqmp_ipi_message *)msg; >>> @@ -195,7 +205,7 @@ static void zynqmp_r5_mb_rx_cb(struct mbox_client >>> *cl, void *msg) >>> if (mbox_send_message(ipi->rx_chan, NULL) < 0) >>> dev_err(cl->dev, "ack failed to mbox rx_chan\n"); >>> - schedule_work(&ipi->mbox_work); >>> + queue_work(cluster->workqueue, &ipi->mbox_work); >>> } >>> /** >>> @@ -1154,6 +1164,7 @@ static void zynqmp_r5_cluster_exit(void *data) >>> for (i = 0; i < cluster->core_count; i++) { >>> r5_core = cluster->r5_cores[i]; >>> + cancel_work_sync(&r5_core->ipi->mbox_work); >>> zynqmp_r5_free_mbox(r5_core->ipi); >>> of_reserved_mem_device_release(r5_core->dev); >>> put_device(r5_core->dev); >>> @@ -1162,6 +1173,7 @@ static void zynqmp_r5_cluster_exit(void *data) >>> } >>> kfree(cluster->r5_cores); >>> + destroy_workqueue(cluster->workqueue); >>> kfree(cluster); >>> platform_set_drvdata(pdev, NULL); >>> } >>> @@ -1194,11 +1206,20 @@ static int zynqmp_r5_remoteproc_probe(struct >>> platform_device *pdev) >>> return ret; >>> } >>> + cluster->workqueue = alloc_workqueue(dev_name(dev), >>> + WQ_UNBOUND | WQ_HIGHPRI, 0); >>> + if (!cluster->workqueue) { >>> + dev_err_probe(dev, -ENOMEM, "cannot create workqueue\n"); >>> + kfree(cluster); >>> + return -ENOMEM; >>> + } >>> + >>> /* wire in so each core can be cleaned up at driver remove */ >>> platform_set_drvdata(pdev, cluster); >>> ret = zynqmp_r5_cluster_init(cluster); >>> if (ret) { >>> + destroy_workqueue(cluster->workqueue); >>> kfree(cluster); >>> platform_set_drvdata(pdev, NULL); >>> dev_err_probe(dev, ret, "Invalid r5f subsystem device >>> tree\n"); >> > -- Thx and BRs, Zhongqiu Han ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq 2025-12-10 2:51 ` Zhongqiu Han @ 2025-12-10 8:29 ` Stefan Roese 2025-12-10 18:28 ` Tanmay Shah 0 siblings, 1 reply; 14+ messages in thread From: Stefan Roese @ 2025-12-10 8:29 UTC (permalink / raw) To: Zhongqiu Han, tanmay.shah, linux-remoteproc; +Cc: Mathieu Poirier Hi Tanmay, On 12/10/25 03:51, Zhongqiu Han wrote: > On 12/5/2025 8:06 PM, Stefan Roese wrote: >> Hi Tanmay, >> >> On 12/4/25 17:45, Tanmay Shah wrote: >>> Hello, >>> >>> Thank You for your patch. Please find my comments below. >>> >>> On 12/4/25 4:40 AM, Stefan Roese wrote: >>>> Testing on our ZynqMP platform has shown, that some R5 messages might >>>> get dropped under high CPU load. This patch creates a new high-prio >>> >>> Here, I would like to understand what it means by "R5 messages might >>> get dropped" >>> >>> Even under high CPU load, the messages from R5 are stored in the >>> virtqueues. If Linux doesn't read it, then it is not really lost/ >>> dropped. >>> >>> Could you please explain your use case in detail and how the testing >>> is conducted? >> >> Our use-case is, that we send ~4k messages per second from the R5 to >> Linux - sometimes even a bit more. Normally these messages are received >> okay and no messages are dropped. Sometimes, under "high CPU load" >> scenarios it happens, that the R5 has to drop messages, as there is no >> free space in the RPMsg buffer, which is 256 entries AFAIU. Resulting >> from the Linux driver not emptying the RX queue. >> >> Could you please elaborate on these virtqueues a bit? Especially why no >> messages drop should happen because of these virtqueues? > > AFAIK, as a transport layer based on virtqueue, rpmsg is reliable once a > message has been successfully enqueued. The observed "drop" here appears > to be on the R5 side, where the application discards messages when no > entry buffer is available. Correct. > In the long run, while improving the Linux side is recommended, Yes, please. > it could > also be helpful for the R5 side to implement strategies such as an > application-level buffer and retry mechanisms. We already did this. We've added an additional buffer mechanism to the R5, which improved this "message drop situation" a bit. Still it did not fix it for all our high message rate situations - still resulting in frame drops on the R5 side (the R5 is a bit resource restricted). Improving the responsiveness on the Linux side seems to be the best way for us to deal with this problem. Thanks, Stefan > >> >> Thanks, >> Stefan >> >>> Thanks, >>> Tanmay >>> >>>> workqueue which is now used instead of the default system workqueue. >>>> With this change we don't experience these message drops any more. >>>> >>>> Signed-off-by: Stefan Roese <stefan.roese@mailbox.org> >>>> Cc: Tanmay Shah <tanmay.shah@amd.com> >>>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> >>>> --- >>>> v3: >>>> - Call cancel_work_sync() before freeing ipi (suggested by Zhongqiu >>>> Han) >>>> >>>> v2: >>>> - Also call destroy_workqueue() in zynqmp_r5_cluster_exit() >>>> (suggested by Zhongqiu Han) >>>> - Correct call seq to avoid UAF (suggested by Zhongqiu Han) >>>> >>>> drivers/remoteproc/xlnx_r5_remoteproc.c | 23 ++++++++++++++++++++++- >>>> 1 file changed, 22 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c b/drivers/ >>>> remoteproc/xlnx_r5_remoteproc.c >>>> index feca6de68da28..308328b0b489f 100644 >>>> --- a/drivers/remoteproc/xlnx_r5_remoteproc.c >>>> +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c >>>> @@ -16,6 +16,7 @@ >>>> #include <linux/of_reserved_mem.h> >>>> #include <linux/platform_device.h> >>>> #include <linux/remoteproc.h> >>>> +#include <linux/workqueue.h> >>>> #include "remoteproc_internal.h" >>>> @@ -116,6 +117,7 @@ struct zynqmp_r5_cluster { >>>> enum zynqmp_r5_cluster_mode mode; >>>> int core_count; >>>> struct zynqmp_r5_core **r5_cores; >>>> + struct workqueue_struct *workqueue; >>>> }; >>>> /** >>>> @@ -174,10 +176,18 @@ static void handle_event_notified(struct >>>> work_struct *work) >>>> static void zynqmp_r5_mb_rx_cb(struct mbox_client *cl, void *msg) >>>> { >>>> struct zynqmp_ipi_message *ipi_msg, *buf_msg; >>>> + struct zynqmp_r5_cluster *cluster; >>>> struct mbox_info *ipi; >>>> + struct device *dev; >>>> size_t len; >>>> ipi = container_of(cl, struct mbox_info, mbox_cl); >>>> + dev = ipi->r5_core->dev; >>>> + cluster = dev_get_drvdata(dev->parent); >>>> + if (!cluster) { >>>> + dev_err(dev->parent, "Invalid driver data\n"); >>>> + return; >>>> + } >>>> /* copy data from ipi buffer to r5_core */ >>>> ipi_msg = (struct zynqmp_ipi_message *)msg; >>>> @@ -195,7 +205,7 @@ static void zynqmp_r5_mb_rx_cb(struct >>>> mbox_client *cl, void *msg) >>>> if (mbox_send_message(ipi->rx_chan, NULL) < 0) >>>> dev_err(cl->dev, "ack failed to mbox rx_chan\n"); >>>> - schedule_work(&ipi->mbox_work); >>>> + queue_work(cluster->workqueue, &ipi->mbox_work); >>>> } >>>> /** >>>> @@ -1154,6 +1164,7 @@ static void zynqmp_r5_cluster_exit(void *data) >>>> for (i = 0; i < cluster->core_count; i++) { >>>> r5_core = cluster->r5_cores[i]; >>>> + cancel_work_sync(&r5_core->ipi->mbox_work); >>>> zynqmp_r5_free_mbox(r5_core->ipi); >>>> of_reserved_mem_device_release(r5_core->dev); >>>> put_device(r5_core->dev); >>>> @@ -1162,6 +1173,7 @@ static void zynqmp_r5_cluster_exit(void *data) >>>> } >>>> kfree(cluster->r5_cores); >>>> + destroy_workqueue(cluster->workqueue); >>>> kfree(cluster); >>>> platform_set_drvdata(pdev, NULL); >>>> } >>>> @@ -1194,11 +1206,20 @@ static int zynqmp_r5_remoteproc_probe(struct >>>> platform_device *pdev) >>>> return ret; >>>> } >>>> + cluster->workqueue = alloc_workqueue(dev_name(dev), >>>> + WQ_UNBOUND | WQ_HIGHPRI, 0); >>>> + if (!cluster->workqueue) { >>>> + dev_err_probe(dev, -ENOMEM, "cannot create workqueue\n"); >>>> + kfree(cluster); >>>> + return -ENOMEM; >>>> + } >>>> + >>>> /* wire in so each core can be cleaned up at driver remove */ >>>> platform_set_drvdata(pdev, cluster); >>>> ret = zynqmp_r5_cluster_init(cluster); >>>> if (ret) { >>>> + destroy_workqueue(cluster->workqueue); >>>> kfree(cluster); >>>> platform_set_drvdata(pdev, NULL); >>>> dev_err_probe(dev, ret, "Invalid r5f subsystem device >>>> tree\n"); >>> >> > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq 2025-12-10 8:29 ` Stefan Roese @ 2025-12-10 18:28 ` Tanmay Shah 2025-12-15 1:14 ` Mathieu Poirier ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Tanmay Shah @ 2025-12-10 18:28 UTC (permalink / raw) To: Stefan Roese, Zhongqiu Han, linux-remoteproc; +Cc: Mathieu Poirier Hello, please check my comments below: On 12/10/25 2:29 AM, Stefan Roese wrote: > Hi Tanmay, > > On 12/10/25 03:51, Zhongqiu Han wrote: >> On 12/5/2025 8:06 PM, Stefan Roese wrote: >>> Hi Tanmay, >>> >>> On 12/4/25 17:45, Tanmay Shah wrote: >>>> Hello, >>>> >>>> Thank You for your patch. Please find my comments below. >>>> >>>> On 12/4/25 4:40 AM, Stefan Roese wrote: >>>>> Testing on our ZynqMP platform has shown, that some R5 messages might >>>>> get dropped under high CPU load. This patch creates a new high-prio >>>> This commit text should be fixed. Messages are not dropped by Linux, but R5 can't send new messages as rx vq is not processed by Linux. >>>> Here, I would like to understand what it means by "R5 messages might >>>> get dropped" >>>> >>>> Even under high CPU load, the messages from R5 are stored in the >>>> virtqueues. If Linux doesn't read it, then it is not really lost/ >>>> dropped. >>>> >>>> Could you please explain your use case in detail and how the testing >>>> is conducted? >>> >>> Our use-case is, that we send ~4k messages per second from the R5 to >>> Linux - sometimes even a bit more. Normally these messages are received >>> okay and no messages are dropped. Sometimes, under "high CPU load" >>> scenarios it happens, that the R5 has to drop messages, as there is no >>> free space in the RPMsg buffer, which is 256 entries AFAIU. Resulting >>> from the Linux driver not emptying the RX queue. >>> Thanks for the details. Your understanding is correct. >>> Could you please elaborate on these virtqueues a bit? Especially why no >>> messages drop should happen because of these virtqueues? >> >> AFAIK, as a transport layer based on virtqueue, rpmsg is reliable once a >> message has been successfully enqueued. The observed "drop" here appears >> to be on the R5 side, where the application discards messages when no >> entry buffer is available. > > Correct. > >> In the long run, while improving the Linux side is recommended, > > Yes, please. > >> it could >> also be helpful for the R5 side to implement strategies such as an >> application-level buffer and retry mechanisms. > > We already did this. We've added an additional buffer mechanism to the > R5, which improved this "message drop situation" a bit. Still it did not > fix it for all our high message rate situations - still resulting in > frame drops on the R5 side (the R5 is a bit resource restricted). > > Improving the responsiveness on the Linux side seems to be the best way > for us to deal with this problem. > I agree to this. However, Just want to understand and cover full picture here. On R5 side, I am assuming open-amp library is used for the RPMsg communication. rpmsg_send() API will end up here: https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/rpmsg/rpmsg_virtio.c#L384 Here, if the new buffer is not available, then R5 is supposed to wait for 1ms before sending a new message. After 1ms, R5 will try to get buffer again, and this continues for 15 seconds. This is the default mechanism. This mechanism is used in your case correctly ? Alternatively you can register platform specific wait mechanism via this callback: https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/include/openamp/rpmsg_virtio.h#L42 Few questions for further understanding: 1) As per your use case, 4k per second data transfer rate must be maintained all the time? And this is achieved with this patch? Even after having the high priority queue, if someone wants to achieve 8k per seconds or 16k per seconds data transfer rate, at some point we will hit this issue again. The reliable solution would be to keep the data transfer rate reasonable, and have solid re-try mechanism. I am okay to take this patch in after addressing comments below but, please make sure all above things are r5 side is working as well. Thanks, Tanmay > Thanks, > Stefan > >> >>> >>> Thanks, >>> Stefan >>> >>>> Thanks, >>>> Tanmay >>>> >>>>> workqueue which is now used instead of the default system workqueue. >>>>> With this change we don't experience these message drops any more. >>>>> >>>>> Signed-off-by: Stefan Roese <stefan.roese@mailbox.org> >>>>> Cc: Tanmay Shah <tanmay.shah@amd.com> >>>>> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> >>>>> --- >>>>> v3: >>>>> - Call cancel_work_sync() before freeing ipi (suggested by Zhongqiu >>>>> Han) >>>>> >>>>> v2: >>>>> - Also call destroy_workqueue() in zynqmp_r5_cluster_exit() >>>>> (suggested by Zhongqiu Han) >>>>> - Correct call seq to avoid UAF (suggested by Zhongqiu Han) >>>>> >>>>> drivers/remoteproc/xlnx_r5_remoteproc.c | 23 ++++++++++++++++++++++- >>>>> 1 file changed, 22 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c b/drivers/ >>>>> remoteproc/xlnx_r5_remoteproc.c >>>>> index feca6de68da28..308328b0b489f 100644 >>>>> --- a/drivers/remoteproc/xlnx_r5_remoteproc.c >>>>> +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c >>>>> @@ -16,6 +16,7 @@ >>>>> #include <linux/of_reserved_mem.h> >>>>> #include <linux/platform_device.h> >>>>> #include <linux/remoteproc.h> >>>>> +#include <linux/workqueue.h> >>>>> #include "remoteproc_internal.h" >>>>> @@ -116,6 +117,7 @@ struct zynqmp_r5_cluster { >>>>> enum zynqmp_r5_cluster_mode mode; >>>>> int core_count; >>>>> struct zynqmp_r5_core **r5_cores; >>>>> + struct workqueue_struct *workqueue; >>>>> }; >>>>> /** >>>>> @@ -174,10 +176,18 @@ static void handle_event_notified(struct >>>>> work_struct *work) >>>>> static void zynqmp_r5_mb_rx_cb(struct mbox_client *cl, void *msg) >>>>> { >>>>> struct zynqmp_ipi_message *ipi_msg, *buf_msg; >>>>> + struct zynqmp_r5_cluster *cluster; >>>>> struct mbox_info *ipi; >>>>> + struct device *dev; >>>>> size_t len; >>>>> ipi = container_of(cl, struct mbox_info, mbox_cl); >>>>> + dev = ipi->r5_core->dev; >>>>> + cluster = dev_get_drvdata(dev->parent); >>>>> + if (!cluster) { >>>>> + dev_err(dev->parent, "Invalid driver data\n"); >>>>> + return; >>>>> + } >>>>> /* copy data from ipi buffer to r5_core */ >>>>> ipi_msg = (struct zynqmp_ipi_message *)msg; >>>>> @@ -195,7 +205,7 @@ static void zynqmp_r5_mb_rx_cb(struct >>>>> mbox_client *cl, void *msg) >>>>> if (mbox_send_message(ipi->rx_chan, NULL) < 0) >>>>> dev_err(cl->dev, "ack failed to mbox rx_chan\n"); >>>>> - schedule_work(&ipi->mbox_work); >>>>> + queue_work(cluster->workqueue, &ipi->mbox_work); >>>>> } >>>>> /** >>>>> @@ -1154,6 +1164,7 @@ static void zynqmp_r5_cluster_exit(void *data) >>>>> for (i = 0; i < cluster->core_count; i++) { >>>>> r5_core = cluster->r5_cores[i]; >>>>> + cancel_work_sync(&r5_core->ipi->mbox_work); I see merge-conflict on top of the for-next branch. Please rebase the patch on top of the for-next branch: https://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux.git/log/?h=for-next >>>>> zynqmp_r5_free_mbox(r5_core->ipi); >>>>> of_reserved_mem_device_release(r5_core->dev); >>>>> put_device(r5_core->dev); >>>>> @@ -1162,6 +1173,7 @@ static void zynqmp_r5_cluster_exit(void *data) >>>>> } >>>>> kfree(cluster->r5_cores); >>>>> + destroy_workqueue(cluster->workqueue); >>>>> kfree(cluster); >>>>> platform_set_drvdata(pdev, NULL); >>>>> } >>>>> @@ -1194,11 +1206,20 @@ static int >>>>> zynqmp_r5_remoteproc_probe(struct platform_device *pdev) >>>>> return ret; >>>>> } >>>>> + cluster->workqueue = alloc_workqueue(dev_name(dev), >>>>> + WQ_UNBOUND | WQ_HIGHPRI, 0); >>>>> + if (!cluster->workqueue) { >>>>> + dev_err_probe(dev, -ENOMEM, "cannot create workqueue\n"); >>>>> + kfree(cluster); >>>>> + return -ENOMEM; >>>>> + } >>>>> + Workqueue will be unused if mbox properties are not mentioned in the device-tree. So, we need to allocate workqueue only if IPI is setup for at least one core. I think following logic should work: Make decision if workqueue is needed or not, if zynqmp_r5_setup_mbox() function is passing for atleast one core. If zynqmp_r5_setup_mbox() is success, then set a flag to allocate workqueue, and then later right before calling zynqmp_r5_core_init() allocate the workqueue for the cluster. Remoteproc can be used only to load() and start() stop() fw, and RPMsg can be optional. Also, before calling destroy_workqueue make sure to have NULL check and destroy only if it was allocated. Thanks, Tanmay >>>>> /* wire in so each core can be cleaned up at driver remove */ >>>>> platform_set_drvdata(pdev, cluster); >>>>> ret = zynqmp_r5_cluster_init(cluster); >>>>> if (ret) { >>>>> + destroy_workqueue(cluster->workqueue); >>>>> kfree(cluster); >>>>> platform_set_drvdata(pdev, NULL); >>>>> dev_err_probe(dev, ret, "Invalid r5f subsystem device >>>>> tree\n"); >>>> >>> >> >> > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq 2025-12-10 18:28 ` Tanmay Shah @ 2025-12-15 1:14 ` Mathieu Poirier 2025-12-16 14:34 ` Stefan Roese 2025-12-16 14:26 ` Stefan Roese 2025-12-16 14:43 ` Stefan Roese 2 siblings, 1 reply; 14+ messages in thread From: Mathieu Poirier @ 2025-12-15 1:14 UTC (permalink / raw) To: Tanmay Shah; +Cc: Stefan Roese, Zhongqiu Han, linux-remoteproc On Wed, Dec 10, 2025 at 12:28:52PM -0600, Tanmay Shah wrote: > Hello, please check my comments below: > > On 12/10/25 2:29 AM, Stefan Roese wrote: > > Hi Tanmay, > > > > On 12/10/25 03:51, Zhongqiu Han wrote: > > > On 12/5/2025 8:06 PM, Stefan Roese wrote: > > > > Hi Tanmay, > > > > > > > > On 12/4/25 17:45, Tanmay Shah wrote: > > > > > Hello, > > > > > > > > > > Thank You for your patch. Please find my comments below. > > > > > > > > > > On 12/4/25 4:40 AM, Stefan Roese wrote: > > > > > > Testing on our ZynqMP platform has shown, that some R5 messages might > > > > > > get dropped under high CPU load. This patch creates a new high-prio > > > > > > > This commit text should be fixed. Messages are not dropped by Linux, but R5 > can't send new messages as rx vq is not processed by Linux. > I agree. > > > > > Here, I would like to understand what it means by "R5 > > > > > messages might get dropped" > > > > > > > > > > Even under high CPU load, the messages from R5 are stored in > > > > > the virtqueues. If Linux doesn't read it, then it is not > > > > > really lost/ dropped. > > > > > > > > > > Could you please explain your use case in detail and how the > > > > > testing is conducted? > > > > > > > > Our use-case is, that we send ~4k messages per second from the R5 to > > > > Linux - sometimes even a bit more. Normally these messages are received > > > > okay and no messages are dropped. Sometimes, under "high CPU load" > > > > scenarios it happens, that the R5 has to drop messages, as there is no > > > > free space in the RPMsg buffer, which is 256 entries AFAIU. Resulting > > > > from the Linux driver not emptying the RX queue. > > > > > > Thanks for the details. Your understanding is correct. > > > > > Could you please elaborate on these virtqueues a bit? Especially why no > > > > messages drop should happen because of these virtqueues? > > > > > > AFAIK, as a transport layer based on virtqueue, rpmsg is reliable once a > > > message has been successfully enqueued. The observed "drop" here appears > > > to be on the R5 side, where the application discards messages when no > > > entry buffer is available. > > > > Correct. > > > > > In the long run, while improving the Linux side is recommended, > > > > Yes, please. > > > > > it could > > > also be helpful for the R5 side to implement strategies such as an > > > application-level buffer and retry mechanisms. > > > > We already did this. We've added an additional buffer mechanism to the > > R5, which improved this "message drop situation" a bit. Still it did not > > fix it for all our high message rate situations - still resulting in > > frame drops on the R5 side (the R5 is a bit resource restricted). > > > > Improving the responsiveness on the Linux side seems to be the best way > > for us to deal with this problem. > > > > I agree to this. However, Just want to understand and cover full picture > here. > > On R5 side, I am assuming open-amp library is used for the RPMsg > communication. > > rpmsg_send() API will end up here: https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/rpmsg/rpmsg_virtio.c#L384 > > Here, if the new buffer is not available, then R5 is supposed to wait for > 1ms before sending a new message. After 1ms, R5 will try to get buffer > again, and this continues for 15 seconds. This is the default mechanism. > > This mechanism is used in your case correctly ? > > Alternatively you can register platform specific wait mechanism via this > callback: https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/include/openamp/rpmsg_virtio.h#L42 > > Few questions for further understanding: > > 1) As per your use case, 4k per second data transfer rate must be maintained > all the time? And this is achieved with this patch? > > Even after having the high priority queue, if someone wants to achieve 8k > per seconds or 16k per seconds data transfer rate, at some point we will hit > this issue again. > Right, I also think this patch is not the right solution. > The reliable solution would be to keep the data transfer rate reasonable, > and have solid re-try mechanism. > > I am okay to take this patch in after addressing comments below but, please > make sure all above things are r5 side is working as well. Tanmay is correct on all front. > > Thanks, > Tanmay > > > > Thanks, > > Stefan > > > > > > > > > > > > > Thanks, > > > > Stefan > > > > > > > > > Thanks, > > > > > Tanmay > > > > > > > > > > > workqueue which is now used instead of the default system workqueue. > > > > > > With this change we don't experience these message drops any more. > > > > > > > > > > > > Signed-off-by: Stefan Roese <stefan.roese@mailbox.org> > > > > > > Cc: Tanmay Shah <tanmay.shah@amd.com> > > > > > > Cc: Mathieu Poirier <mathieu.poirier@linaro.org> > > > > > > --- > > > > > > v3: > > > > > > - Call cancel_work_sync() before freeing ipi (suggested > > > > > > by Zhongqiu Han) > > > > > > > > > > > > v2: > > > > > > - Also call destroy_workqueue() in > > > > > > zynqmp_r5_cluster_exit() (suggested by Zhongqiu Han) > > > > > > - Correct call seq to avoid UAF (suggested by Zhongqiu Han) > > > > > > > > > > > > drivers/remoteproc/xlnx_r5_remoteproc.c | 23 ++++++++++++++++++++++- > > > > > > 1 file changed, 22 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c > > > > > > b/drivers/ remoteproc/xlnx_r5_remoteproc.c > > > > > > index feca6de68da28..308328b0b489f 100644 > > > > > > --- a/drivers/remoteproc/xlnx_r5_remoteproc.c > > > > > > +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c > > > > > > @@ -16,6 +16,7 @@ > > > > > > #include <linux/of_reserved_mem.h> > > > > > > #include <linux/platform_device.h> > > > > > > #include <linux/remoteproc.h> > > > > > > +#include <linux/workqueue.h> > > > > > > #include "remoteproc_internal.h" > > > > > > @@ -116,6 +117,7 @@ struct zynqmp_r5_cluster { > > > > > > enum zynqmp_r5_cluster_mode mode; > > > > > > int core_count; > > > > > > struct zynqmp_r5_core **r5_cores; > > > > > > + struct workqueue_struct *workqueue; > > > > > > }; > > > > > > /** > > > > > > @@ -174,10 +176,18 @@ static void > > > > > > handle_event_notified(struct work_struct *work) > > > > > > static void zynqmp_r5_mb_rx_cb(struct mbox_client *cl, void *msg) > > > > > > { > > > > > > struct zynqmp_ipi_message *ipi_msg, *buf_msg; > > > > > > + struct zynqmp_r5_cluster *cluster; > > > > > > struct mbox_info *ipi; > > > > > > + struct device *dev; > > > > > > size_t len; > > > > > > ipi = container_of(cl, struct mbox_info, mbox_cl); > > > > > > + dev = ipi->r5_core->dev; > > > > > > + cluster = dev_get_drvdata(dev->parent); > > > > > > + if (!cluster) { > > > > > > + dev_err(dev->parent, "Invalid driver data\n"); > > > > > > + return; > > > > > > + } > > > > > > /* copy data from ipi buffer to r5_core */ > > > > > > ipi_msg = (struct zynqmp_ipi_message *)msg; > > > > > > @@ -195,7 +205,7 @@ static void > > > > > > zynqmp_r5_mb_rx_cb(struct mbox_client *cl, void *msg) > > > > > > if (mbox_send_message(ipi->rx_chan, NULL) < 0) > > > > > > dev_err(cl->dev, "ack failed to mbox rx_chan\n"); > > > > > > - schedule_work(&ipi->mbox_work); > > > > > > + queue_work(cluster->workqueue, &ipi->mbox_work); > > > > > > } > > > > > > /** > > > > > > @@ -1154,6 +1164,7 @@ static void zynqmp_r5_cluster_exit(void *data) > > > > > > for (i = 0; i < cluster->core_count; i++) { > > > > > > r5_core = cluster->r5_cores[i]; > > > > > > + cancel_work_sync(&r5_core->ipi->mbox_work); > > I see merge-conflict on top of the for-next branch. Please rebase the patch > on top of the for-next branch: https://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux.git/log/?h=for-next > > > > > > > > zynqmp_r5_free_mbox(r5_core->ipi); > > > > > > of_reserved_mem_device_release(r5_core->dev); > > > > > > put_device(r5_core->dev); > > > > > > @@ -1162,6 +1173,7 @@ static void zynqmp_r5_cluster_exit(void *data) > > > > > > } > > > > > > kfree(cluster->r5_cores); > > > > > > + destroy_workqueue(cluster->workqueue); > > > > > > kfree(cluster); > > > > > > platform_set_drvdata(pdev, NULL); > > > > > > } > > > > > > @@ -1194,11 +1206,20 @@ static int > > > > > > zynqmp_r5_remoteproc_probe(struct platform_device *pdev) > > > > > > return ret; > > > > > > } > > > > > > + cluster->workqueue = alloc_workqueue(dev_name(dev), > > > > > > + WQ_UNBOUND | WQ_HIGHPRI, 0); > > > > > > + if (!cluster->workqueue) { > > > > > > + dev_err_probe(dev, -ENOMEM, "cannot create workqueue\n"); > > > > > > + kfree(cluster); > > > > > > + return -ENOMEM; > > > > > > + } > > > > > > + > > Workqueue will be unused if mbox properties are not mentioned in the > device-tree. So, we need to allocate workqueue only if IPI is setup for at > least one core. I think following logic should work: > > Make decision if workqueue is needed or not, if zynqmp_r5_setup_mbox() > function is passing for atleast one core. If zynqmp_r5_setup_mbox() is > success, then set a flag to allocate workqueue, and then later right before > calling zynqmp_r5_core_init() allocate the workqueue for the cluster. > > Remoteproc can be used only to load() and start() stop() fw, and RPMsg can > be optional. > > Also, before calling destroy_workqueue make sure to have NULL check and > destroy only if it was allocated. > > Thanks, > Tanmay > > > > > > > /* wire in so each core can be cleaned up at driver remove */ > > > > > > platform_set_drvdata(pdev, cluster); > > > > > > ret = zynqmp_r5_cluster_init(cluster); > > > > > > if (ret) { > > > > > > + destroy_workqueue(cluster->workqueue); > > > > > > kfree(cluster); > > > > > > platform_set_drvdata(pdev, NULL); > > > > > > dev_err_probe(dev, ret, "Invalid r5f subsystem > > > > > > device tree\n"); > > > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq 2025-12-15 1:14 ` Mathieu Poirier @ 2025-12-16 14:34 ` Stefan Roese 2025-12-16 21:47 ` Mathieu Poirier 0 siblings, 1 reply; 14+ messages in thread From: Stefan Roese @ 2025-12-16 14:34 UTC (permalink / raw) To: Mathieu Poirier, Tanmay Shah; +Cc: Zhongqiu Han, linux-remoteproc Hi Mathieu, On 12/15/25 02:14, Mathieu Poirier wrote: > On Wed, Dec 10, 2025 at 12:28:52PM -0600, Tanmay Shah wrote: >> Hello, please check my comments below: >> >> On 12/10/25 2:29 AM, Stefan Roese wrote: >>> Hi Tanmay, >>> >>> On 12/10/25 03:51, Zhongqiu Han wrote: >>>> On 12/5/2025 8:06 PM, Stefan Roese wrote: >>>>> Hi Tanmay, >>>>> >>>>> On 12/4/25 17:45, Tanmay Shah wrote: >>>>>> Hello, >>>>>> >>>>>> Thank You for your patch. Please find my comments below. >>>>>> >>>>>> On 12/4/25 4:40 AM, Stefan Roese wrote: >>>>>>> Testing on our ZynqMP platform has shown, that some R5 messages might >>>>>>> get dropped under high CPU load. This patch creates a new high-prio >>>>>> >> >> This commit text should be fixed. Messages are not dropped by Linux, but R5 >> can't send new messages as rx vq is not processed by Linux. >> > > I agree. > >>>>>> Here, I would like to understand what it means by "R5 >>>>>> messages might get dropped" >>>>>> >>>>>> Even under high CPU load, the messages from R5 are stored in >>>>>> the virtqueues. If Linux doesn't read it, then it is not >>>>>> really lost/ dropped. >>>>>> >>>>>> Could you please explain your use case in detail and how the >>>>>> testing is conducted? >>>>> >>>>> Our use-case is, that we send ~4k messages per second from the R5 to >>>>> Linux - sometimes even a bit more. Normally these messages are received >>>>> okay and no messages are dropped. Sometimes, under "high CPU load" >>>>> scenarios it happens, that the R5 has to drop messages, as there is no >>>>> free space in the RPMsg buffer, which is 256 entries AFAIU. Resulting >>>>> from the Linux driver not emptying the RX queue. >>>>> >> >> Thanks for the details. Your understanding is correct. >> >>>>> Could you please elaborate on these virtqueues a bit? Especially why no >>>>> messages drop should happen because of these virtqueues? >>>> >>>> AFAIK, as a transport layer based on virtqueue, rpmsg is reliable once a >>>> message has been successfully enqueued. The observed "drop" here appears >>>> to be on the R5 side, where the application discards messages when no >>>> entry buffer is available. >>> >>> Correct. >>> >>>> In the long run, while improving the Linux side is recommended, >>> >>> Yes, please. >>> >>>> it could >>>> also be helpful for the R5 side to implement strategies such as an >>>> application-level buffer and retry mechanisms. >>> >>> We already did this. We've added an additional buffer mechanism to the >>> R5, which improved this "message drop situation" a bit. Still it did not >>> fix it for all our high message rate situations - still resulting in >>> frame drops on the R5 side (the R5 is a bit resource restricted). >>> >>> Improving the responsiveness on the Linux side seems to be the best way >>> for us to deal with this problem. >>> >> >> I agree to this. However, Just want to understand and cover full picture >> here. >> >> On R5 side, I am assuming open-amp library is used for the RPMsg >> communication. >> >> rpmsg_send() API will end up here: https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/rpmsg/rpmsg_virtio.c#L384 >> >> Here, if the new buffer is not available, then R5 is supposed to wait for >> 1ms before sending a new message. After 1ms, R5 will try to get buffer >> again, and this continues for 15 seconds. This is the default mechanism. >> >> This mechanism is used in your case correctly ? >> >> Alternatively you can register platform specific wait mechanism via this >> callback: https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/include/openamp/rpmsg_virtio.h#L42 >> >> Few questions for further understanding: >> >> 1) As per your use case, 4k per second data transfer rate must be maintained >> all the time? And this is achieved with this patch? >> >> Even after having the high priority queue, if someone wants to achieve 8k >> per seconds or 16k per seconds data transfer rate, at some point we will hit >> this issue again. >> > > Right, I also think this patch is not the right solution. Hmmm. My understanding of Tanmays's comments is somewhat different. He is not "against" this patch in general AFAIU. Please see my reply with a more detailed description of our system setup and it's message flow and limitations that I just sent a few minutes ago. >> The reliable solution would be to keep the data transfer rate reasonable, >> and have solid re-try mechanism. >> >> I am okay to take this patch in after addressing comments below but, please >> make sure all above things are r5 side is working as well. > > Tanmay is correct on all front. Agreed. Thanks, Stefan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq 2025-12-16 14:34 ` Stefan Roese @ 2025-12-16 21:47 ` Mathieu Poirier 2025-12-17 10:27 ` Stefan Roese 0 siblings, 1 reply; 14+ messages in thread From: Mathieu Poirier @ 2025-12-16 21:47 UTC (permalink / raw) To: Stefan Roese; +Cc: Tanmay Shah, Zhongqiu Han, linux-remoteproc On Tue, Dec 16, 2025 at 03:34:18PM +0100, Stefan Roese wrote: > Hi Mathieu, > > On 12/15/25 02:14, Mathieu Poirier wrote: > > On Wed, Dec 10, 2025 at 12:28:52PM -0600, Tanmay Shah wrote: > > > Hello, please check my comments below: > > > > > > On 12/10/25 2:29 AM, Stefan Roese wrote: > > > > Hi Tanmay, > > > > > > > > On 12/10/25 03:51, Zhongqiu Han wrote: > > > > > On 12/5/2025 8:06 PM, Stefan Roese wrote: > > > > > > Hi Tanmay, > > > > > > > > > > > > On 12/4/25 17:45, Tanmay Shah wrote: > > > > > > > Hello, > > > > > > > > > > > > > > Thank You for your patch. Please find my comments below. > > > > > > > > > > > > > > On 12/4/25 4:40 AM, Stefan Roese wrote: > > > > > > > > Testing on our ZynqMP platform has shown, that some R5 messages might > > > > > > > > get dropped under high CPU load. This patch creates a new high-prio > > > > > > > > > > > > > This commit text should be fixed. Messages are not dropped by Linux, but R5 > > > can't send new messages as rx vq is not processed by Linux. > > > > > > > I agree. > > > > > > > Here, I would like to understand what it means by "R5 > > > > > > > messages might get dropped" > > > > > > > > > > > > > > Even under high CPU load, the messages from R5 are stored in > > > > > > > the virtqueues. If Linux doesn't read it, then it is not > > > > > > > really lost/ dropped. > > > > > > > > > > > > > > Could you please explain your use case in detail and how the > > > > > > > testing is conducted? > > > > > > > > > > > > Our use-case is, that we send ~4k messages per second from the R5 to > > > > > > Linux - sometimes even a bit more. Normally these messages are received > > > > > > okay and no messages are dropped. Sometimes, under "high CPU load" > > > > > > scenarios it happens, that the R5 has to drop messages, as there is no > > > > > > free space in the RPMsg buffer, which is 256 entries AFAIU. Resulting > > > > > > from the Linux driver not emptying the RX queue. > > > > > > > > > > > > Thanks for the details. Your understanding is correct. > > > > > > > > > Could you please elaborate on these virtqueues a bit? Especially why no > > > > > > messages drop should happen because of these virtqueues? > > > > > > > > > > AFAIK, as a transport layer based on virtqueue, rpmsg is reliable once a > > > > > message has been successfully enqueued. The observed "drop" here appears > > > > > to be on the R5 side, where the application discards messages when no > > > > > entry buffer is available. > > > > > > > > Correct. > > > > > > > > > In the long run, while improving the Linux side is recommended, > > > > > > > > Yes, please. > > > > > > > > > it could > > > > > also be helpful for the R5 side to implement strategies such as an > > > > > application-level buffer and retry mechanisms. > > > > > > > > We already did this. We've added an additional buffer mechanism to the > > > > R5, which improved this "message drop situation" a bit. Still it did not > > > > fix it for all our high message rate situations - still resulting in > > > > frame drops on the R5 side (the R5 is a bit resource restricted). > > > > > > > > Improving the responsiveness on the Linux side seems to be the best way > > > > for us to deal with this problem. > > > > > > > > > > I agree to this. However, Just want to understand and cover full picture > > > here. > > > > > > On R5 side, I am assuming open-amp library is used for the RPMsg > > > communication. > > > > > > rpmsg_send() API will end up here: https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/rpmsg/rpmsg_virtio.c#L384 > > > > > > Here, if the new buffer is not available, then R5 is supposed to wait for > > > 1ms before sending a new message. After 1ms, R5 will try to get buffer > > > again, and this continues for 15 seconds. This is the default mechanism. > > > > > > This mechanism is used in your case correctly ? > > > > > > Alternatively you can register platform specific wait mechanism via this > > > callback: https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/include/openamp/rpmsg_virtio.h#L42 > > > > > > Few questions for further understanding: > > > > > > 1) As per your use case, 4k per second data transfer rate must be maintained > > > all the time? And this is achieved with this patch? > > > > > > Even after having the high priority queue, if someone wants to achieve 8k > > > per seconds or 16k per seconds data transfer rate, at some point we will hit > > > this issue again. > > > > > > > Right, I also think this patch is not the right solution. > > Hmmm. My understanding of Tanmays's comments is somewhat different. He > is not "against" this patch in general AFAIU. Please see my reply with > a more detailed description of our system setup and it's message flow > and limitations that I just sent a few minutes ago. > Regardless of how we spin things around, this patch is about running out of resource (CPU cycles and memory). It is only a matter of time before this solution becomes obsolete. The main issue here is that we are adding a priority workqueue for everyone using this driver, which may have unwanted side effects. Please add a kernel module parameter to control what kind of workqueue is to be used. Thanks, Mathieu > > > The reliable solution would be to keep the data transfer rate reasonable, > > > and have solid re-try mechanism. > > > > > > I am okay to take this patch in after addressing comments below but, please > > > make sure all above things are r5 side is working as well. > > > > Tanmay is correct on all front. > > Agreed. > > Thanks, > Stefan > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq 2025-12-16 21:47 ` Mathieu Poirier @ 2025-12-17 10:27 ` Stefan Roese 2025-12-17 21:34 ` Mathieu Poirier 0 siblings, 1 reply; 14+ messages in thread From: Stefan Roese @ 2025-12-17 10:27 UTC (permalink / raw) To: Mathieu Poirier; +Cc: Tanmay Shah, Zhongqiu Han, linux-remoteproc Hi Mathieu, On 12/16/25 22:47, Mathieu Poirier wrote: > On Tue, Dec 16, 2025 at 03:34:18PM +0100, Stefan Roese wrote: >> Hi Mathieu, >> >> On 12/15/25 02:14, Mathieu Poirier wrote: >>> On Wed, Dec 10, 2025 at 12:28:52PM -0600, Tanmay Shah wrote: >>>> Hello, please check my comments below: >>>> >>>> On 12/10/25 2:29 AM, Stefan Roese wrote: >>>>> Hi Tanmay, >>>>> >>>>> On 12/10/25 03:51, Zhongqiu Han wrote: >>>>>> On 12/5/2025 8:06 PM, Stefan Roese wrote: >>>>>>> Hi Tanmay, >>>>>>> >>>>>>> On 12/4/25 17:45, Tanmay Shah wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> Thank You for your patch. Please find my comments below. >>>>>>>> >>>>>>>> On 12/4/25 4:40 AM, Stefan Roese wrote: >>>>>>>>> Testing on our ZynqMP platform has shown, that some R5 messages might >>>>>>>>> get dropped under high CPU load. This patch creates a new high-prio >>>>>>>> >>>> >>>> This commit text should be fixed. Messages are not dropped by Linux, but R5 >>>> can't send new messages as rx vq is not processed by Linux. >>>> >>> >>> I agree. >>>>>>>> Here, I would like to understand what it means by "R5 >>>>>>>> messages might get dropped" >>>>>>>> >>>>>>>> Even under high CPU load, the messages from R5 are stored in >>>>>>>> the virtqueues. If Linux doesn't read it, then it is not >>>>>>>> really lost/ dropped. >>>>>>>> >>>>>>>> Could you please explain your use case in detail and how the >>>>>>>> testing is conducted? >>>>>>> >>>>>>> Our use-case is, that we send ~4k messages per second from the R5 to >>>>>>> Linux - sometimes even a bit more. Normally these messages are received >>>>>>> okay and no messages are dropped. Sometimes, under "high CPU load" >>>>>>> scenarios it happens, that the R5 has to drop messages, as there is no >>>>>>> free space in the RPMsg buffer, which is 256 entries AFAIU. Resulting >>>>>>> from the Linux driver not emptying the RX queue. >>>>>>> >>>> >>>> Thanks for the details. Your understanding is correct. >>>> >>>>>>> Could you please elaborate on these virtqueues a bit? Especially why no >>>>>>> messages drop should happen because of these virtqueues? >>>>>> >>>>>> AFAIK, as a transport layer based on virtqueue, rpmsg is reliable once a >>>>>> message has been successfully enqueued. The observed "drop" here appears >>>>>> to be on the R5 side, where the application discards messages when no >>>>>> entry buffer is available. >>>>> >>>>> Correct. >>>>> >>>>>> In the long run, while improving the Linux side is recommended, >>>>> >>>>> Yes, please. >>>>> >>>>>> it could >>>>>> also be helpful for the R5 side to implement strategies such as an >>>>>> application-level buffer and retry mechanisms. >>>>> >>>>> We already did this. We've added an additional buffer mechanism to the >>>>> R5, which improved this "message drop situation" a bit. Still it did not >>>>> fix it for all our high message rate situations - still resulting in >>>>> frame drops on the R5 side (the R5 is a bit resource restricted). >>>>> >>>>> Improving the responsiveness on the Linux side seems to be the best way >>>>> for us to deal with this problem. >>>>> >>>> >>>> I agree to this. However, Just want to understand and cover full picture >>>> here. >>>> >>>> On R5 side, I am assuming open-amp library is used for the RPMsg >>>> communication. >>>> >>>> rpmsg_send() API will end up here: https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/rpmsg/rpmsg_virtio.c#L384 >>>> >>>> Here, if the new buffer is not available, then R5 is supposed to wait for >>>> 1ms before sending a new message. After 1ms, R5 will try to get buffer >>>> again, and this continues for 15 seconds. This is the default mechanism. >>>> >>>> This mechanism is used in your case correctly ? >>>> >>>> Alternatively you can register platform specific wait mechanism via this >>>> callback: https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/include/openamp/rpmsg_virtio.h#L42 >>>> >>>> Few questions for further understanding: >>>> >>>> 1) As per your use case, 4k per second data transfer rate must be maintained >>>> all the time? And this is achieved with this patch? >>>> >>>> Even after having the high priority queue, if someone wants to achieve 8k >>>> per seconds or 16k per seconds data transfer rate, at some point we will hit >>>> this issue again. >>>> >>> >>> Right, I also think this patch is not the right solution. >> >> Hmmm. My understanding of Tanmays's comments is somewhat different. He >> is not "against" this patch in general AFAIU. Please see my reply with >> a more detailed description of our system setup and it's message flow >> and limitations that I just sent a few minutes ago. >> > > Regardless of how we spin things around, this patch is about running out of > resource (CPU cycles and memory). It is only a matter of time before this > solution becomes obsolete. > > The main issue here is that we are adding a priority workqueue for everyone > using this driver, which may have unwanted side effects. Please add a kernel > module parameter to control what kind of workqueue is to be used. Okay, will do. Thanks, Stefan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq 2025-12-17 10:27 ` Stefan Roese @ 2025-12-17 21:34 ` Mathieu Poirier 2025-12-18 14:58 ` Stefan Roese 0 siblings, 1 reply; 14+ messages in thread From: Mathieu Poirier @ 2025-12-17 21:34 UTC (permalink / raw) To: Stefan Roese; +Cc: Tanmay Shah, Zhongqiu Han, linux-remoteproc On Wed, Dec 17, 2025 at 11:27:44AM +0100, Stefan Roese wrote: > Hi Mathieu, > > On 12/16/25 22:47, Mathieu Poirier wrote: > > On Tue, Dec 16, 2025 at 03:34:18PM +0100, Stefan Roese wrote: > > > Hi Mathieu, > > > > > > On 12/15/25 02:14, Mathieu Poirier wrote: > > > > On Wed, Dec 10, 2025 at 12:28:52PM -0600, Tanmay Shah wrote: > > > > > Hello, please check my comments below: > > > > > > > > > > On 12/10/25 2:29 AM, Stefan Roese wrote: > > > > > > Hi Tanmay, > > > > > > > > > > > > On 12/10/25 03:51, Zhongqiu Han wrote: > > > > > > > On 12/5/2025 8:06 PM, Stefan Roese wrote: > > > > > > > > Hi Tanmay, > > > > > > > > > > > > > > > > On 12/4/25 17:45, Tanmay Shah wrote: > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > Thank You for your patch. Please find my comments below. > > > > > > > > > > > > > > > > > > On 12/4/25 4:40 AM, Stefan Roese wrote: > > > > > > > > > > Testing on our ZynqMP platform has shown, that some R5 messages might > > > > > > > > > > get dropped under high CPU load. This patch creates a new high-prio > > > > > > > > > > > > > > > > > > > This commit text should be fixed. Messages are not dropped by Linux, but R5 > > > > > can't send new messages as rx vq is not processed by Linux. > > > > > > > > > > > > > I agree. > > > > > > > > > Here, I would like to understand what it means by "R5 > > > > > > > > > messages might get dropped" > > > > > > > > > > > > > > > > > > Even under high CPU load, the messages from R5 are stored in > > > > > > > > > the virtqueues. If Linux doesn't read it, then it is not > > > > > > > > > really lost/ dropped. > > > > > > > > > > > > > > > > > > Could you please explain your use case in detail and how the > > > > > > > > > testing is conducted? > > > > > > > > > > > > > > > > Our use-case is, that we send ~4k messages per second from the R5 to > > > > > > > > Linux - sometimes even a bit more. Normally these messages are received > > > > > > > > okay and no messages are dropped. Sometimes, under "high CPU load" > > > > > > > > scenarios it happens, that the R5 has to drop messages, as there is no > > > > > > > > free space in the RPMsg buffer, which is 256 entries AFAIU. Resulting > > > > > > > > from the Linux driver not emptying the RX queue. > > > > > > > > > > > > > > > > > > Thanks for the details. Your understanding is correct. > > > > > > > > > > > > > Could you please elaborate on these virtqueues a bit? Especially why no > > > > > > > > messages drop should happen because of these virtqueues? > > > > > > > > > > > > > > AFAIK, as a transport layer based on virtqueue, rpmsg is reliable once a > > > > > > > message has been successfully enqueued. The observed "drop" here appears > > > > > > > to be on the R5 side, where the application discards messages when no > > > > > > > entry buffer is available. > > > > > > > > > > > > Correct. > > > > > > > > > > > > > In the long run, while improving the Linux side is recommended, > > > > > > > > > > > > Yes, please. > > > > > > > > > > > > > it could > > > > > > > also be helpful for the R5 side to implement strategies such as an > > > > > > > application-level buffer and retry mechanisms. > > > > > > > > > > > > We already did this. We've added an additional buffer mechanism to the > > > > > > R5, which improved this "message drop situation" a bit. Still it did not > > > > > > fix it for all our high message rate situations - still resulting in > > > > > > frame drops on the R5 side (the R5 is a bit resource restricted). > > > > > > > > > > > > Improving the responsiveness on the Linux side seems to be the best way > > > > > > for us to deal with this problem. > > > > > > > > > > > > > > > > I agree to this. However, Just want to understand and cover full picture > > > > > here. > > > > > > > > > > On R5 side, I am assuming open-amp library is used for the RPMsg > > > > > communication. > > > > > > > > > > rpmsg_send() API will end up here: https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/rpmsg/rpmsg_virtio.c#L384 > > > > > > > > > > Here, if the new buffer is not available, then R5 is supposed to wait for > > > > > 1ms before sending a new message. After 1ms, R5 will try to get buffer > > > > > again, and this continues for 15 seconds. This is the default mechanism. > > > > > > > > > > This mechanism is used in your case correctly ? > > > > > > > > > > Alternatively you can register platform specific wait mechanism via this > > > > > callback: https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/include/openamp/rpmsg_virtio.h#L42 > > > > > > > > > > Few questions for further understanding: > > > > > > > > > > 1) As per your use case, 4k per second data transfer rate must be maintained > > > > > all the time? And this is achieved with this patch? > > > > > > > > > > Even after having the high priority queue, if someone wants to achieve 8k > > > > > per seconds or 16k per seconds data transfer rate, at some point we will hit > > > > > this issue again. > > > > > > > > > > > > > Right, I also think this patch is not the right solution. > > > > > > Hmmm. My understanding of Tanmays's comments is somewhat different. He > > > is not "against" this patch in general AFAIU. Please see my reply with > > > a more detailed description of our system setup and it's message flow > > > and limitations that I just sent a few minutes ago. > > > > > > > Regardless of how we spin things around, this patch is about running out of > > resource (CPU cycles and memory). It is only a matter of time before this > > solution becomes obsolete. > > > > The main issue here is that we are adding a priority workqueue for everyone > > using this driver, which may have unwanted side effects. Please add a kernel > > module parameter to control what kind of workqueue is to be used. > > Okay, will do. Please see this patchset [1] Tanmay is currently working on. I would much rather see that solution put to work than playing with workqueue priorities. [1]. "[RFC PATCH 0/2] Enhance RPMsg buffer management" > > Thanks, > Stefan > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq 2025-12-17 21:34 ` Mathieu Poirier @ 2025-12-18 14:58 ` Stefan Roese 0 siblings, 0 replies; 14+ messages in thread From: Stefan Roese @ 2025-12-18 14:58 UTC (permalink / raw) To: Mathieu Poirier; +Cc: Tanmay Shah, Zhongqiu Han, linux-remoteproc Hi Mathieu, On 12/17/25 22:34, Mathieu Poirier wrote: > On Wed, Dec 17, 2025 at 11:27:44AM +0100, Stefan Roese wrote: >> Hi Mathieu, >> >> On 12/16/25 22:47, Mathieu Poirier wrote: >>> On Tue, Dec 16, 2025 at 03:34:18PM +0100, Stefan Roese wrote: >>>> Hi Mathieu, >>>> >>>> On 12/15/25 02:14, Mathieu Poirier wrote: >>>>> On Wed, Dec 10, 2025 at 12:28:52PM -0600, Tanmay Shah wrote: >>>>>> Hello, please check my comments below: >>>>>> >>>>>> On 12/10/25 2:29 AM, Stefan Roese wrote: >>>>>>> Hi Tanmay, >>>>>>> >>>>>>> On 12/10/25 03:51, Zhongqiu Han wrote: >>>>>>>> On 12/5/2025 8:06 PM, Stefan Roese wrote: >>>>>>>>> Hi Tanmay, >>>>>>>>> >>>>>>>>> On 12/4/25 17:45, Tanmay Shah wrote: >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> Thank You for your patch. Please find my comments below. >>>>>>>>>> >>>>>>>>>> On 12/4/25 4:40 AM, Stefan Roese wrote: >>>>>>>>>>> Testing on our ZynqMP platform has shown, that some R5 messages might >>>>>>>>>>> get dropped under high CPU load. This patch creates a new high-prio >>>>>>>>>> >>>>>> >>>>>> This commit text should be fixed. Messages are not dropped by Linux, but R5 >>>>>> can't send new messages as rx vq is not processed by Linux. >>>>>> >>>>> >>>>> I agree. >>>>>>>>>> Here, I would like to understand what it means by "R5 >>>>>>>>>> messages might get dropped" >>>>>>>>>> >>>>>>>>>> Even under high CPU load, the messages from R5 are stored in >>>>>>>>>> the virtqueues. If Linux doesn't read it, then it is not >>>>>>>>>> really lost/ dropped. >>>>>>>>>> >>>>>>>>>> Could you please explain your use case in detail and how the >>>>>>>>>> testing is conducted? >>>>>>>>> >>>>>>>>> Our use-case is, that we send ~4k messages per second from the R5 to >>>>>>>>> Linux - sometimes even a bit more. Normally these messages are received >>>>>>>>> okay and no messages are dropped. Sometimes, under "high CPU load" >>>>>>>>> scenarios it happens, that the R5 has to drop messages, as there is no >>>>>>>>> free space in the RPMsg buffer, which is 256 entries AFAIU. Resulting >>>>>>>>> from the Linux driver not emptying the RX queue. >>>>>>>>> >>>>>> >>>>>> Thanks for the details. Your understanding is correct. >>>>>> >>>>>>>>> Could you please elaborate on these virtqueues a bit? Especially why no >>>>>>>>> messages drop should happen because of these virtqueues? >>>>>>>> >>>>>>>> AFAIK, as a transport layer based on virtqueue, rpmsg is reliable once a >>>>>>>> message has been successfully enqueued. The observed "drop" here appears >>>>>>>> to be on the R5 side, where the application discards messages when no >>>>>>>> entry buffer is available. >>>>>>> >>>>>>> Correct. >>>>>>> >>>>>>>> In the long run, while improving the Linux side is recommended, >>>>>>> >>>>>>> Yes, please. >>>>>>> >>>>>>>> it could >>>>>>>> also be helpful for the R5 side to implement strategies such as an >>>>>>>> application-level buffer and retry mechanisms. >>>>>>> >>>>>>> We already did this. We've added an additional buffer mechanism to the >>>>>>> R5, which improved this "message drop situation" a bit. Still it did not >>>>>>> fix it for all our high message rate situations - still resulting in >>>>>>> frame drops on the R5 side (the R5 is a bit resource restricted). >>>>>>> >>>>>>> Improving the responsiveness on the Linux side seems to be the best way >>>>>>> for us to deal with this problem. >>>>>>> >>>>>> >>>>>> I agree to this. However, Just want to understand and cover full picture >>>>>> here. >>>>>> >>>>>> On R5 side, I am assuming open-amp library is used for the RPMsg >>>>>> communication. >>>>>> >>>>>> rpmsg_send() API will end up here: https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/rpmsg/rpmsg_virtio.c#L384 >>>>>> >>>>>> Here, if the new buffer is not available, then R5 is supposed to wait for >>>>>> 1ms before sending a new message. After 1ms, R5 will try to get buffer >>>>>> again, and this continues for 15 seconds. This is the default mechanism. >>>>>> >>>>>> This mechanism is used in your case correctly ? >>>>>> >>>>>> Alternatively you can register platform specific wait mechanism via this >>>>>> callback: https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/include/openamp/rpmsg_virtio.h#L42 >>>>>> >>>>>> Few questions for further understanding: >>>>>> >>>>>> 1) As per your use case, 4k per second data transfer rate must be maintained >>>>>> all the time? And this is achieved with this patch? >>>>>> >>>>>> Even after having the high priority queue, if someone wants to achieve 8k >>>>>> per seconds or 16k per seconds data transfer rate, at some point we will hit >>>>>> this issue again. >>>>>> >>>>> >>>>> Right, I also think this patch is not the right solution. >>>> >>>> Hmmm. My understanding of Tanmays's comments is somewhat different. He >>>> is not "against" this patch in general AFAIU. Please see my reply with >>>> a more detailed description of our system setup and it's message flow >>>> and limitations that I just sent a few minutes ago. >>>> >>> >>> Regardless of how we spin things around, this patch is about running out of >>> resource (CPU cycles and memory). It is only a matter of time before this >>> solution becomes obsolete. >>> >>> The main issue here is that we are adding a priority workqueue for everyone >>> using this driver, which may have unwanted side effects. Please add a kernel >>> module parameter to control what kind of workqueue is to be used. >> >> Okay, will do. > > Please see this patchset [1] Tanmay is currently working on. I would much > rather see that solution put to work than playing with workqueue priorities. > > [1]. "[RFC PATCH 0/2] Enhance RPMsg buffer management" Thanks for the notice. I'll take a look at it and if possible give it a try and will report back. Thanks, Stefan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq 2025-12-10 18:28 ` Tanmay Shah 2025-12-15 1:14 ` Mathieu Poirier @ 2025-12-16 14:26 ` Stefan Roese 2025-12-16 14:43 ` Stefan Roese 2 siblings, 0 replies; 14+ messages in thread From: Stefan Roese @ 2025-12-16 14:26 UTC (permalink / raw) To: tanmay.shah, Zhongqiu Han, linux-remoteproc; +Cc: Mathieu Poirier Hi Tanmay, sorry for the delay. Please find some comments below... On 12/10/25 19:28, Tanmay Shah wrote: > Hello, please check my comments below: > > On 12/10/25 2:29 AM, Stefan Roese wrote: >> Hi Tanmay, >> >> On 12/10/25 03:51, Zhongqiu Han wrote: >>> On 12/5/2025 8:06 PM, Stefan Roese wrote: >>>> Hi Tanmay, >>>> >>>> On 12/4/25 17:45, Tanmay Shah wrote: >>>>> Hello, >>>>> >>>>> Thank You for your patch. Please find my comments below. >>>>> >>>>> On 12/4/25 4:40 AM, Stefan Roese wrote: >>>>>> Testing on our ZynqMP platform has shown, that some R5 messages might >>>>>> get dropped under high CPU load. This patch creates a new high-prio >>>>> > > This commit text should be fixed. Messages are not dropped by Linux, but > R5 can't send new messages as rx vq is not processed by Linux. Agreed. I will change the commit message in the next patch revision. >>>>> Here, I would like to understand what it means by "R5 messages >>>>> might get dropped" >>>>> >>>>> Even under high CPU load, the messages from R5 are stored in the >>>>> virtqueues. If Linux doesn't read it, then it is not really lost/ >>>>> dropped. >>>>> >>>>> Could you please explain your use case in detail and how the >>>>> testing is conducted? >>>> >>>> Our use-case is, that we send ~4k messages per second from the R5 to >>>> Linux - sometimes even a bit more. Normally these messages are received >>>> okay and no messages are dropped. Sometimes, under "high CPU load" >>>> scenarios it happens, that the R5 has to drop messages, as there is no >>>> free space in the RPMsg buffer, which is 256 entries AFAIU. Resulting >>>> from the Linux driver not emptying the RX queue. >>>> > > Thanks for the details. Your understanding is correct. > >>>> Could you please elaborate on these virtqueues a bit? Especially why no >>>> messages drop should happen because of these virtqueues? >>> >>> AFAIK, as a transport layer based on virtqueue, rpmsg is reliable once a >>> message has been successfully enqueued. The observed "drop" here appears >>> to be on the R5 side, where the application discards messages when no >>> entry buffer is available. >> >> Correct. >> >>> In the long run, while improving the Linux side is recommended, >> >> Yes, please. >> >>> it could >>> also be helpful for the R5 side to implement strategies such as an >>> application-level buffer and retry mechanisms. >> >> We already did this. We've added an additional buffer mechanism to the >> R5, which improved this "message drop situation" a bit. Still it did not >> fix it for all our high message rate situations - still resulting in >> frame drops on the R5 side (the R5 is a bit resource restricted). >> >> Improving the responsiveness on the Linux side seems to be the best way >> for us to deal with this problem. >> > > I agree to this. However, Just want to understand and cover full picture > here. > > On R5 side, I am assuming open-amp library is used for the RPMsg > communication. > > rpmsg_send() API will end up here: https://github.com/OpenAMP/open-amp/ > blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/rpmsg/rpmsg_virtio.c#L384 > > Here, if the new buffer is not available, then R5 is supposed to wait > for 1ms before sending a new message. After 1ms, R5 will try to get > buffer again, and this continues for 15 seconds. This is the default > mechanism. > > This mechanism is used in your case correctly ? We use rpmsg_trysend() to send data (messages): - that means we try to write a message to vq - if it fails (queue full), we just add it to a software ringbuffer (and try to send it on the next cycle) - we cannot wait for a message queue to get "not full", because data to write to rpmsg vq arrives cyclic each [ms] (so we cannot wait for rpmsg sending to be done) > Alternatively you can register platform specific wait mechanism via this > callback: https://github.com/OpenAMP/open-amp/blob/ > be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/include/openamp/ > rpmsg_virtio.h#L42 > > Few questions for further understanding: > > 1) As per your use case, 4k per second data transfer rate must be > maintained all the time? And this is achieved with this patch? Yes, the 4k messages / sec arrive from an external (sensor) system which is then forwarded from r5 to a53. So therefore it means it has to be maintained all the time, as we have no control over the external sensor originating these messages. > Even after having the high priority queue, if someone wants to achieve > 8k per seconds or 16k per seconds data transfer rate, at some point we > will hit this issue again. Agreed. This current "solution" by using a high-prio workqueue will very likely not fix all use-cases - especially when the message rate increases even more for a longer time. This is not to be expected in our system though. We have run longer tests on our system w/o any message drops (on the r5 side of course) with this patch applied. > The reliable solution would be to keep the data transfer rate > reasonable, and have solid re-try mechanism. AFAIU, we do have a "solid re-try mechanism" implemented with this software ringbuffer that we added, as mentioned above. Still the resources on the r5 side are somewhat limited and we can't increase this ringbuffer size much more. Additionally we have some requirements that the messages are received on the Linux a53 side not too much delayed. IMHO this patch with "improved message receiving" in Linux seems to be the best solution for us. > I am okay to take this patch in after addressing comments below but, > please make sure all above things are r5 side is working as well. Okay. Thanks for all your comments and input so far. Thanks, Stefan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq 2025-12-10 18:28 ` Tanmay Shah 2025-12-15 1:14 ` Mathieu Poirier 2025-12-16 14:26 ` Stefan Roese @ 2025-12-16 14:43 ` Stefan Roese 2 siblings, 0 replies; 14+ messages in thread From: Stefan Roese @ 2025-12-16 14:43 UTC (permalink / raw) To: tanmay.shah, Zhongqiu Han, linux-remoteproc; +Cc: Mathieu Poirier Hi Tanmay, I just now see, that you added some more comments later in your reply. Please find some comment below... On 12/10/25 19:28, Tanmay Shah wrote: <snip> >>>>>> /** >>>>>> @@ -1154,6 +1164,7 @@ static void zynqmp_r5_cluster_exit(void *data) >>>>>> for (i = 0; i < cluster->core_count; i++) { >>>>>> r5_core = cluster->r5_cores[i]; >>>>>> + cancel_work_sync(&r5_core->ipi->mbox_work); > > I see merge-conflict on top of the for-next branch. Please rebase the > patch on top of the for-next branch: https://git.kernel.org/pub/scm/ > linux/kernel/git/remoteproc/linux.git/log/?h=for-next Will do in v4. >>>>>> zynqmp_r5_free_mbox(r5_core->ipi); >>>>>> of_reserved_mem_device_release(r5_core->dev); >>>>>> put_device(r5_core->dev); >>>>>> @@ -1162,6 +1173,7 @@ static void zynqmp_r5_cluster_exit(void *data) >>>>>> } >>>>>> kfree(cluster->r5_cores); >>>>>> + destroy_workqueue(cluster->workqueue); >>>>>> kfree(cluster); >>>>>> platform_set_drvdata(pdev, NULL); >>>>>> } >>>>>> @@ -1194,11 +1206,20 @@ static int >>>>>> zynqmp_r5_remoteproc_probe(struct platform_device *pdev) >>>>>> return ret; >>>>>> } >>>>>> + cluster->workqueue = alloc_workqueue(dev_name(dev), >>>>>> + WQ_UNBOUND | WQ_HIGHPRI, 0); >>>>>> + if (!cluster->workqueue) { >>>>>> + dev_err_probe(dev, -ENOMEM, "cannot create workqueue\n"); >>>>>> + kfree(cluster); >>>>>> + return -ENOMEM; >>>>>> + } >>>>>> + > > Workqueue will be unused if mbox properties are not mentioned in the > device-tree. So, we need to allocate workqueue only if IPI is setup for > at least one core. I think following logic should work: > > Make decision if workqueue is needed or not, if zynqmp_r5_setup_mbox() > function is passing for atleast one core. If zynqmp_r5_setup_mbox() is > success, then set a flag to allocate workqueue, and then later right > before calling zynqmp_r5_core_init() allocate the workqueue for the > cluster. Good idea. I'll gladly enhance this in the next patch version, if you (still) agree that this patch is a valid solution for our use-case, after reviewing my reply with the r5 side. > Remoteproc can be used only to load() and start() stop() fw, and RPMsg > can be optional. > > Also, before calling destroy_workqueue make sure to have NULL check and > destroy only if it was allocated. Ack. Thanks, Stefan ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-12-18 15:04 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-12-04 10:40 [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq Stefan Roese 2025-12-04 16:45 ` Tanmay Shah 2025-12-05 12:06 ` Stefan Roese 2025-12-10 2:51 ` Zhongqiu Han 2025-12-10 8:29 ` Stefan Roese 2025-12-10 18:28 ` Tanmay Shah 2025-12-15 1:14 ` Mathieu Poirier 2025-12-16 14:34 ` Stefan Roese 2025-12-16 21:47 ` Mathieu Poirier 2025-12-17 10:27 ` Stefan Roese 2025-12-17 21:34 ` Mathieu Poirier 2025-12-18 14:58 ` Stefan Roese 2025-12-16 14:26 ` Stefan Roese 2025-12-16 14:43 ` Stefan Roese
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.