From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7570C4361B for ; Thu, 17 Dec 2020 18:22:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9A2ED238EF for ; Thu, 17 Dec 2020 18:22:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729986AbgLQSWL (ORCPT ); Thu, 17 Dec 2020 13:22:11 -0500 Received: from so254-31.mailgun.net ([198.61.254.31]:41040 "EHLO so254-31.mailgun.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729580AbgLQSWL (ORCPT ); Thu, 17 Dec 2020 13:22:11 -0500 DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=mg.codeaurora.org; q=dns/txt; s=smtp; t=1608229311; h=Message-ID: References: In-Reply-To: Subject: Cc: To: From: Date: Content-Transfer-Encoding: Content-Type: MIME-Version: Sender; bh=uDlTTR0Cfma4aSUzb8Egeut+U21yuCwXAadjUdyUhZA=; b=Aa7qzhDKlaj1lf5aZyxByDMRi9pkJpuwHlokOmknOmm2xce/uRVkMkvhirdbtq+HX0z9ZFh5 I49y5VcPzjjXNEMffFqTZqxSB9ByvU92+RCtkFCOvPLEs4UyOyLz+Os2vYj/kIt+5Fwg/tIe mwVMBVdG1XQyQVDf78MTFFVhP/4= X-Mailgun-Sending-Ip: 198.61.254.31 X-Mailgun-Sid: WyI0MWYwYSIsICJsaW51eC1rZXJuZWxAdmdlci5rZXJuZWwub3JnIiwgImJlOWU0YSJd Received: from smtp.codeaurora.org (ec2-35-166-182-171.us-west-2.compute.amazonaws.com [35.166.182.171]) by smtp-out-n10.prod.us-east-1.postgun.com with SMTP id 5fdba1a2ca81d9e625eab5e6 (version=TLS1.2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256); Thu, 17 Dec 2020 18:21:22 GMT Sender: rishabhb=codeaurora.org@mg.codeaurora.org Received: by smtp.codeaurora.org (Postfix, from userid 1001) id B0F90C433ED; Thu, 17 Dec 2020 18:21:21 +0000 (UTC) Received: from mail.codeaurora.org (localhost.localdomain [127.0.0.1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: rishabhb) by smtp.codeaurora.org (Postfix) with ESMTPSA id 19D7BC433CA; Thu, 17 Dec 2020 18:21:20 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Thu, 17 Dec 2020 10:21:20 -0800 From: rishabhb@codeaurora.org To: Alex Elder Cc: Bjorn Andersson , linux-remoteproc@vger.kernel.org, linux-kernel@vger.kernel.org, tsoni@codeaurora.org, psodagud@codeaurora.org, sidgup@codeaurora.org Subject: Re: [PATCH] remoteproc: Create a separate workqueue for recovery tasks In-Reply-To: References: <1607806087-27244-1-git-send-email-rishabhb@codeaurora.org> Message-ID: <87c3f902b94bc243fc28e0ce79303dd4@codeaurora.org> X-Sender: rishabhb@codeaurora.org User-Agent: Roundcube Webmail/1.3.9 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020-12-17 08:12, Alex Elder wrote: > On 12/15/20 4:55 PM, Bjorn Andersson wrote: >> On Sat 12 Dec 14:48 CST 2020, Rishabh Bhatnagar wrote: >> >>> Create an unbound high priority workqueue for recovery tasks. > > I have been looking at a different issue that is caused by > crash notification. > > What happened was that the modem crashed while the AP was > in system suspend (or possibly even resuming) state. And > there is no guarantee that the system will have called a > driver's ->resume callback when the crash notification is > delivered. > > In my case (in the IPA driver), handling a modem crash > cannot be done while the driver is suspended; i.e. the > activities in its ->resume callback must be completed > before we can recover from the crash. > > For this reason I might like to change the way the > crash notification is handled, but what I'd rather see > is to have the work queue not run until user space > is unfrozen, which would guarantee that all drivers > that have registered for a crash notification will > be resumed when the notification arrives. > > I'm not sure how that interacts with what you are > looking for here. I think the workqueue could still > be unbound, but its work would be delayed longer before > any notification (and recovery) started. > > -Alex > > In that case, maybe adding a "WQ_FREEZABLE" flag might help? > >> This simply repeats $subject >> >>> Recovery time is an important parameter for a subsystem and there >>> might be situations where multiple subsystems crash around the same >>> time. Scheduling into an unbound workqueue increases parallelization >>> and avoids time impact. >> >> You should be able to write this more succinctly. The important part >> is >> that you want an unbound work queue to allow recovery to happen in >> parallel - which naturally implies that you care about recovery >> latency. >> >>> Also creating a high priority workqueue >>> will utilize separate worker threads with higher nice values than >>> normal ones. >>> >> >> This doesn't describe why you need the higher priority. >> >> >> I believe, and certainly with the in-line coredump, that we're running >> our recovery work for way too long to be queued on the system_wq. As >> such the content of the patch looks good! >> >> Regards, >> Bjorn >> >>> Signed-off-by: Rishabh Bhatnagar >>> --- >>> drivers/remoteproc/remoteproc_core.c | 9 ++++++++- >>> 1 file changed, 8 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/remoteproc/remoteproc_core.c >>> b/drivers/remoteproc/remoteproc_core.c >>> index 46c2937..8fd8166 100644 >>> --- a/drivers/remoteproc/remoteproc_core.c >>> +++ b/drivers/remoteproc/remoteproc_core.c >>> @@ -48,6 +48,8 @@ static DEFINE_MUTEX(rproc_list_mutex); >>> static LIST_HEAD(rproc_list); >>> static struct notifier_block rproc_panic_nb; >>> +static struct workqueue_struct *rproc_wq; >>> + >>> typedef int (*rproc_handle_resource_t)(struct rproc *rproc, >>> void *, int offset, int avail); >>> @@ -2475,7 +2477,7 @@ void rproc_report_crash(struct rproc *rproc, >>> enum rproc_crash_type type) >>> rproc->name, rproc_crash_to_string(type)); >>> /* create a new task to handle the error */ >>> - schedule_work(&rproc->crash_handler); >>> + queue_work(rproc_wq, &rproc->crash_handler); >>> } >>> EXPORT_SYMBOL(rproc_report_crash); >>> @@ -2520,6 +2522,10 @@ static void __exit rproc_exit_panic(void) >>> static int __init remoteproc_init(void) >>> { >>> + rproc_wq = alloc_workqueue("rproc_wq", WQ_UNBOUND | WQ_HIGHPRI, 0); >>> + if (!rproc_wq) >>> + return -ENOMEM; >>> + >>> rproc_init_sysfs(); >>> rproc_init_debugfs(); >>> rproc_init_cdev(); >>> @@ -2536,6 +2542,7 @@ static void __exit remoteproc_exit(void) >>> rproc_exit_panic(); >>> rproc_exit_debugfs(); >>> rproc_exit_sysfs(); >>> + destroy_workqueue(rproc_wq); >>> } >>> module_exit(remoteproc_exit); >>> -- The Qualcomm Innovation Center, Inc. is a member of the Code >>> Aurora Forum, >>> a Linux Foundation Collaborative Project >>>