From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DC35321CA03 for ; Mon, 15 Dec 2025 01:14:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765761255; cv=none; b=TOaZuycUQJgdRCo5LlgBkyzQEtgFof1Xw97FNgY+vN74wRQ3QVXxFAfjmtsYkA+zGw443rv31CfL9pn13Cc4i8PaZukdlHbF9eDC78ptBSher1XiahdwLNvZy1levAti4LYPDtMrtgg26r3rYCGBU64Pwt8b4G6Jabtf3Fxxrjk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765761255; c=relaxed/simple; bh=88bPpntIhdNcLehdsEno3nfx3lVZUrQdCIpCx36pBEQ=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=rWWrjKAftq8w3u42RmkF8ZYd+MFUZ56h44G9M5soNGB9uO5aJ0fVIJgeSTQ0fH1/2fGmSGugHhk2z4/hhXvvWQcvHMULrB9FOfjmOfOmuJEoWWLzixNEM8tSgQUsd4rKcj6wdMoa+MtmgeoXhj5WzqGSF8Ta0TMCA2qE7AOZs/U= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org; spf=pass smtp.mailfrom=linaro.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b=ajxzXOcl; arc=none smtp.client-ip=209.85.214.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linaro.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="ajxzXOcl" Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-2a081c163b0so16551895ad.0 for ; Sun, 14 Dec 2025 17:14:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1765761253; x=1766366053; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=3d60J3WZ0abyaWulvNv1Ih85c77Ca6ChpFYHa4mstdE=; b=ajxzXOclynvXcBQqDTUyeCepDtsJRH7SPrjgmfjUfSL/jVSvFAJCm9ab0nuGAayW1E pPhTmKgSAzoo1aSS1+mIlGwy+7ckL3fOVMaHhL/F0Y1Lm4fIVuvIeGD111xEhTHZjIO6 xAa4bEqaABL/q8CgCrtdcKLfipGxS0ZZy5cWY8Kjqd7OCDaNtA9STp1ErBgY41GnKuPE hE0zftkusBVGPV8JBIoQx4ahxhQZxO1Lsk48nrEM8Hz/VZksAyT47dR95UQxos6nzQhP rw0fJgDfkyhv0svLYXsLkvQC9mn5aew99pAxK+GwmgDZ481O2bkaZyl1nLYNYp6kfEeQ 1WoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765761253; x=1766366053; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3d60J3WZ0abyaWulvNv1Ih85c77Ca6ChpFYHa4mstdE=; b=PwbnpipCROK9Opf6X9jp7zO5NAqp3diOt5L2y2D21RrPAZ7bBIEo5kKWlGGj9U85zh EmSzG76YMe2Bb2+Y8pF5gOd4KQKgJTQ0ZZBJYaRjyP39DGyUh0t1mbgcj63lAKfUxtz6 XGd0YST0ieoDFp/BIuR4ePn+NcImeqVq8yKMguwAJuJJE5AjZZogFGIS3ffaPnzKXz9x R4IdP74rh4Q7qzLSTad8/DY15cld92o7s5nXwOlx5KbWLF34qWtAO88S6z77hM5HeNQE Gy3w7mi03Ur+ItoJg0SCpWuwAt+6WzBx4kj/Shst+zglaFwq2Op7XjDxiKF5yAEkkvb5 cM+g== X-Forwarded-Encrypted: i=1; AJvYcCVoI47CiTqBVB5ujp/LATUfybIy8PsxNn93NjopB250rGh0DBYRJ/QHUVjL75w5Yq01O2GcKrFmdJEh0482Mg0V@vger.kernel.org X-Gm-Message-State: AOJu0YyrI5ILwxcZT+Fo/aJjorWM4ZRWacdAivhR9/wJkHkyxSppOVeH B58ZiR+ZjUpaXE0TvS+idaoVty/2iVlyVzvZFgyCYNswddm2PPt+FwCGZNdu+eHZhIY= X-Gm-Gg: AY/fxX5u7hiJ4+K492juPbC5XrQWQml+MkL5p5J/bSDf/uXrZ711fTQ9LPlJYJS4gck WQD6TuHmyFcXH/mFLYqIysd+rIWt5ZfDnJuqQXaqRCENxyasiqCwnYYzHX4spCY6x2pZ0MmDavZ 13Ti2mIIWc/gd+N124dX8uDxJKkAUewH7NrvUTgJKeRljH2NTs81HrcpzFTN0pahVQJajZGlRP9 7IniJgMwlMeZlhPHSIVWccx6WrwJZeddx1Cc7S6Ci4RKJRNLRCNckkK73DfvhnmqMFi/vnXtnXy Sd8qlC67fD3/KtguvkOPXIyP7WPnm1z3WkDhvupcQqG9qkqc/Sy19uq2IBzaWlByHqbSgCromKS Ilh2OAldINEYewZvGp34IFyw/ktdnk/6w7Dyc905KL7TAPnXLnF4vqAfM9IX4ljRWmhocU9v9Xr lemOM2vStTIqALzqVCGxP7HkQ3t9PfAIi4RTr0K/sTUsZi7DWTzd4V X-Google-Smtp-Source: AGHT+IG8+iH8744guE70WWn3yTrdXQoU+87X04Z+ivjiAnpTXO0n1NkKludE8N7RDMGLrKpd9hUBow== X-Received: by 2002:a17:902:e94d:b0:295:6d30:e25f with SMTP id d9443c01a7336-29f243879d8mr88304805ad.53.1765761253057; Sun, 14 Dec 2025 17:14:13 -0800 (PST) Received: from p14s (p7838222-ipoefx.ipoe.ocn.ne.jp. [123.225.39.221]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2a0b465de71sm33354105ad.41.2025.12.14.17.14.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Dec 2025 17:14:12 -0800 (PST) Date: Sun, 14 Dec 2025 18:14:08 -0700 From: Mathieu Poirier To: Tanmay Shah Cc: Stefan Roese , Zhongqiu Han , linux-remoteproc@vger.kernel.org Subject: Re: [v3 PATCH] remoteproc: xlnx: Use high-prio workqueue instead of system wq Message-ID: References: <20251204104009.129483-1-stefan.roese@mailbox.org> <747f8248-bbd8-4580-bc57-3efda9f50eb0@oss.qualcomm.com> <1adb380c-4b57-4799-8e3f-f9ba5d14eb18@amd.com> Precedence: bulk X-Mailing-List: linux-remoteproc@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1adb380c-4b57-4799-8e3f-f9ba5d14eb18@amd.com> On Wed, Dec 10, 2025 at 12:28:52PM -0600, Tanmay Shah wrote: > Hello, please check my comments below: > > On 12/10/25 2:29 AM, Stefan Roese wrote: > > Hi Tanmay, > > > > On 12/10/25 03:51, Zhongqiu Han wrote: > > > On 12/5/2025 8:06 PM, Stefan Roese wrote: > > > > Hi Tanmay, > > > > > > > > On 12/4/25 17:45, Tanmay Shah wrote: > > > > > Hello, > > > > > > > > > > Thank You for your patch. Please find my comments below. > > > > > > > > > > On 12/4/25 4:40 AM, Stefan Roese wrote: > > > > > > Testing on our ZynqMP platform has shown, that some R5 messages might > > > > > > get dropped under high CPU load. This patch creates a new high-prio > > > > > > > This commit text should be fixed. Messages are not dropped by Linux, but R5 > can't send new messages as rx vq is not processed by Linux. > I agree. > > > > > Here, I would like to understand what it means by "R5 > > > > > messages might get dropped" > > > > > > > > > > Even under high CPU load, the messages from R5 are stored in > > > > > the virtqueues. If Linux doesn't read it, then it is not > > > > > really lost/ dropped. > > > > > > > > > > Could you please explain your use case in detail and how the > > > > > testing is conducted? > > > > > > > > Our use-case is, that we send ~4k messages per second from the R5 to > > > > Linux - sometimes even a bit more. Normally these messages are received > > > > okay and no messages are dropped. Sometimes, under "high CPU load" > > > > scenarios it happens, that the R5 has to drop messages, as there is no > > > > free space in the RPMsg buffer, which is 256 entries AFAIU. Resulting > > > > from the Linux driver not emptying the RX queue. > > > > > > Thanks for the details. Your understanding is correct. > > > > > Could you please elaborate on these virtqueues a bit? Especially why no > > > > messages drop should happen because of these virtqueues? > > > > > > AFAIK, as a transport layer based on virtqueue, rpmsg is reliable once a > > > message has been successfully enqueued. The observed "drop" here appears > > > to be on the R5 side, where the application discards messages when no > > > entry buffer is available. > > > > Correct. > > > > > In the long run, while improving the Linux side is recommended, > > > > Yes, please. > > > > > it could > > > also be helpful for the R5 side to implement strategies such as an > > > application-level buffer and retry mechanisms. > > > > We already did this. We've added an additional buffer mechanism to the > > R5, which improved this "message drop situation" a bit. Still it did not > > fix it for all our high message rate situations - still resulting in > > frame drops on the R5 side (the R5 is a bit resource restricted). > > > > Improving the responsiveness on the Linux side seems to be the best way > > for us to deal with this problem. > > > > I agree to this. However, Just want to understand and cover full picture > here. > > On R5 side, I am assuming open-amp library is used for the RPMsg > communication. > > rpmsg_send() API will end up here: https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/rpmsg/rpmsg_virtio.c#L384 > > Here, if the new buffer is not available, then R5 is supposed to wait for > 1ms before sending a new message. After 1ms, R5 will try to get buffer > again, and this continues for 15 seconds. This is the default mechanism. > > This mechanism is used in your case correctly ? > > Alternatively you can register platform specific wait mechanism via this > callback: https://github.com/OpenAMP/open-amp/blob/be5770f30516505c1a4d35efcffff9fb547f7dcf/lib/include/openamp/rpmsg_virtio.h#L42 > > Few questions for further understanding: > > 1) As per your use case, 4k per second data transfer rate must be maintained > all the time? And this is achieved with this patch? > > Even after having the high priority queue, if someone wants to achieve 8k > per seconds or 16k per seconds data transfer rate, at some point we will hit > this issue again. > Right, I also think this patch is not the right solution. > The reliable solution would be to keep the data transfer rate reasonable, > and have solid re-try mechanism. > > I am okay to take this patch in after addressing comments below but, please > make sure all above things are r5 side is working as well. Tanmay is correct on all front. > > Thanks, > Tanmay > > > > Thanks, > > Stefan > > > > > > > > > > > > > Thanks, > > > > Stefan > > > > > > > > > Thanks, > > > > > Tanmay > > > > > > > > > > > workqueue which is now used instead of the default system workqueue. > > > > > > With this change we don't experience these message drops any more. > > > > > > > > > > > > Signed-off-by: Stefan Roese > > > > > > Cc: Tanmay Shah > > > > > > Cc: Mathieu Poirier > > > > > > --- > > > > > > v3: > > > > > > - Call cancel_work_sync() before freeing ipi (suggested > > > > > > by Zhongqiu Han) > > > > > > > > > > > > v2: > > > > > > - Also call destroy_workqueue() in > > > > > > zynqmp_r5_cluster_exit() (suggested by Zhongqiu Han) > > > > > > - Correct call seq to avoid UAF (suggested by Zhongqiu Han) > > > > > > > > > > > >   drivers/remoteproc/xlnx_r5_remoteproc.c | 23 ++++++++++++++++++++++- > > > > > >   1 file changed, 22 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/drivers/remoteproc/xlnx_r5_remoteproc.c > > > > > > b/drivers/ remoteproc/xlnx_r5_remoteproc.c > > > > > > index feca6de68da28..308328b0b489f 100644 > > > > > > --- a/drivers/remoteproc/xlnx_r5_remoteproc.c > > > > > > +++ b/drivers/remoteproc/xlnx_r5_remoteproc.c > > > > > > @@ -16,6 +16,7 @@ > > > > > >   #include > > > > > >   #include > > > > > >   #include > > > > > > +#include > > > > > >   #include "remoteproc_internal.h" > > > > > > @@ -116,6 +117,7 @@ struct zynqmp_r5_cluster { > > > > > >       enum  zynqmp_r5_cluster_mode mode; > > > > > >       int core_count; > > > > > >       struct zynqmp_r5_core **r5_cores; > > > > > > +    struct workqueue_struct *workqueue; > > > > > >   }; > > > > > >   /** > > > > > > @@ -174,10 +176,18 @@ static void > > > > > > handle_event_notified(struct work_struct *work) > > > > > >   static void zynqmp_r5_mb_rx_cb(struct mbox_client *cl, void *msg) > > > > > >   { > > > > > >       struct zynqmp_ipi_message *ipi_msg, *buf_msg; > > > > > > +    struct zynqmp_r5_cluster *cluster; > > > > > >       struct mbox_info *ipi; > > > > > > +    struct device *dev; > > > > > >       size_t len; > > > > > >       ipi = container_of(cl, struct mbox_info, mbox_cl); > > > > > > +    dev = ipi->r5_core->dev; > > > > > > +    cluster = dev_get_drvdata(dev->parent); > > > > > > +    if (!cluster) { > > > > > > +        dev_err(dev->parent, "Invalid driver data\n"); > > > > > > +        return; > > > > > > +    } > > > > > >       /* copy data from ipi buffer to r5_core */ > > > > > >       ipi_msg = (struct zynqmp_ipi_message *)msg; > > > > > > @@ -195,7 +205,7 @@ static void > > > > > > zynqmp_r5_mb_rx_cb(struct mbox_client *cl, void *msg) > > > > > >       if (mbox_send_message(ipi->rx_chan, NULL) < 0) > > > > > >           dev_err(cl->dev, "ack failed to mbox rx_chan\n"); > > > > > > -    schedule_work(&ipi->mbox_work); > > > > > > +    queue_work(cluster->workqueue, &ipi->mbox_work); > > > > > >   } > > > > > >   /** > > > > > > @@ -1154,6 +1164,7 @@ static void zynqmp_r5_cluster_exit(void *data) > > > > > >       for (i = 0; i < cluster->core_count; i++) { > > > > > >           r5_core = cluster->r5_cores[i]; > > > > > > +        cancel_work_sync(&r5_core->ipi->mbox_work); > > I see merge-conflict on top of the for-next branch. Please rebase the patch > on top of the for-next branch: https://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux.git/log/?h=for-next > > > > > > > >           zynqmp_r5_free_mbox(r5_core->ipi); > > > > > >           of_reserved_mem_device_release(r5_core->dev); > > > > > >           put_device(r5_core->dev); > > > > > > @@ -1162,6 +1173,7 @@ static void zynqmp_r5_cluster_exit(void *data) > > > > > >       } > > > > > >       kfree(cluster->r5_cores); > > > > > > +    destroy_workqueue(cluster->workqueue); > > > > > >       kfree(cluster); > > > > > >       platform_set_drvdata(pdev, NULL); > > > > > >   } > > > > > > @@ -1194,11 +1206,20 @@ static int > > > > > > zynqmp_r5_remoteproc_probe(struct platform_device *pdev) > > > > > >           return ret; > > > > > >       } > > > > > > +    cluster->workqueue = alloc_workqueue(dev_name(dev), > > > > > > +                         WQ_UNBOUND | WQ_HIGHPRI, 0); > > > > > > +    if (!cluster->workqueue) { > > > > > > +        dev_err_probe(dev, -ENOMEM, "cannot create workqueue\n"); > > > > > > +        kfree(cluster); > > > > > > +        return -ENOMEM; > > > > > > +    } > > > > > > + > > Workqueue will be unused if mbox properties are not mentioned in the > device-tree. So, we need to allocate workqueue only if IPI is setup for at > least one core. I think following logic should work: > > Make decision if workqueue is needed or not, if zynqmp_r5_setup_mbox() > function is passing for atleast one core. If zynqmp_r5_setup_mbox() is > success, then set a flag to allocate workqueue, and then later right before > calling zynqmp_r5_core_init() allocate the workqueue for the cluster. > > Remoteproc can be used only to load() and start() stop() fw, and RPMsg can > be optional. > > Also, before calling destroy_workqueue make sure to have NULL check and > destroy only if it was allocated. > > Thanks, > Tanmay > > > > > > >       /* wire in so each core can be cleaned up at driver remove */ > > > > > >       platform_set_drvdata(pdev, cluster); > > > > > >       ret = zynqmp_r5_cluster_init(cluster); > > > > > >       if (ret) { > > > > > > +        destroy_workqueue(cluster->workqueue); > > > > > >           kfree(cluster); > > > > > >           platform_set_drvdata(pdev, NULL); > > > > > >           dev_err_probe(dev, ret, "Invalid r5f subsystem > > > > > > device tree\n"); > > > > > > > > > > > > > > > > > >