From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6090BC433EF for ; Tue, 19 Jul 2022 08:26:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236820AbiGSI0R (ORCPT ); Tue, 19 Jul 2022 04:26:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38298 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231645AbiGSI0L (ORCPT ); Tue, 19 Jul 2022 04:26:11 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B0DF0A1BD; Tue, 19 Jul 2022 01:26:01 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 631F9B819CB; Tue, 19 Jul 2022 08:26:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 23F6AC341C6; Tue, 19 Jul 2022 08:25:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1658219159; bh=sqT9AXv5uW2Cp8PrB2p6lWJ1iWAOGuDNRWWC4IUqaSM=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=raNtjIoGAA8HJI0TY1P2OKCiTGk+8MKdk0upV9SAccQDPojwcV8xQhfOt8tFM13z0 rtvLZv0oXLa4m1xa0Y7Fx3V4cLvr/J40vyogE6WuXpkjzclSdUwM9cnR3Dm56nD0df bTUjdx2T7S0cO1iziyLyG5CfpqFT8420eGaEO29kJeWBYu6m/BaxwNzNyOcPq0RR9O j7NjXhqn4qthyeSWAN+ApsSLfhadjZY0rqra5pGmL+q3RV+oUnq/sbb57aC6uOELOy q7R+5IUHtqgunzIDxOxgTqWNn5CknTGo9JD96pUU5m4vMbb33GwI8jEnTESDhMnwON rkRo7YN3+aUng== Received: from sofa.misterjones.org ([185.219.108.64] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1oDiYG-008QQ6-O3; Tue, 19 Jul 2022 09:25:56 +0100 Date: Tue, 19 Jul 2022 09:25:56 +0100 Message-ID: <87o7xlzey3.wl-maz@kernel.org> From: Marc Zyngier To: Neeraj Upadhyay Cc: , , , , , , , , , , , , , , , , , Subject: Re: [PATCH v3] srcu: Reduce blocking agressiveness of expedited grace periods further In-Reply-To: <20220701031545.9868-1-quic_neeraju@quicinc.com> References: <20220701031545.9868-1-quic_neeraju@quicinc.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: quic_neeraju@quicinc.com, paulmck@kernel.org, frederic@kernel.org, josh@joshtriplett.org, rostedt@goodmis.org, mathieu.desnoyers@efficios.com, jiangshanlai@gmail.com, joel@joelfernandes.org, linux-kernel@vger.kernel.org, zhangfei.gao@foxmail.com, boqun.feng@gmail.com, urezki@gmail.com, shameerali.kolothum.thodi@huawei.com, pbonzini@redhat.com, mtosatti@redhat.com, eric.auger@redhat.com, chenxiang66@hisilicon.com, zhangfei.gao@linaro.org, rcu@vger.kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi folks, On Fri, 01 Jul 2022 04:15:45 +0100, Neeraj Upadhyay wrote: >=20 > Commit 640a7d37c3f4 ("srcu: Block less aggressively for expedited > grace periods") highlights a problem where aggressively blocking > SRCU expedited grace periods, as was introduced in commit > 282d8998e997 ("srcu: Prevent expedited GPs and blocking readers > from consuming CPU"), introduces ~2 minutes delay to the overall > ~3.5 minutes boot time, when starting VMs with "-bios QEMU_EFI.fd" > cmdline on qemu, which results in very high rate of memslots > add/remove, which causes > ~6000 synchronize_srcu() calls for > kvm->srcu SRCU instance. >=20 > Below table captures the experiments done by Zhangfei Gao and Shameer > to measure the boottime impact with various values of non-sleeping > per phase counts, with HZ_250 and preemption enabled: >=20 > +=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80+=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80+ > | SRCU_MAX_NODELAY_PHASE | Boot time (s) | > +=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80+=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80+ > | 100 | 30.053 | > | 150 | 25.151 | > | 200 | 20.704 | > | 250 | 15.748 | > | 500 | 11.401 | > | 1000 | 11.443 | > | 10000 | 11.258 | > | 1000000 | 11.154 | > +=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80+=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80+ >=20 > Analysis on the experiment results showed improved boot time > with non blocking delays close to one jiffy duration. This > was also seen when number of per-phase iterations were scaled > to one jiffy. >=20 > So, this change scales per-grace-period phase number of non-sleeping > polls, such that, non-sleeping polls are done for one jiffy. In addition > to this, srcu_get_delay() call in srcu_gp_end(), which is used to calcula= te > the delay used for scheduling callbacks, is replaced with the check for > expedited grace period. This is done, to schedule cbs for completed exped= ited > grace periods immediately, which results in improved boot time seen in > experiments. Testing done by Marc and Zhangfei confirms that this change = recovers > most of the performance degradation in boottime; for CONFIG_HZ_250 config= uration, > boottime improves from 3m50s to 41s on Marc's setup; and from 2m40s to ~9= .7s > on Zhangfei's setup. >=20 > In addition to the changes to default per phase delays, this change > adds 3 new kernel parameters - srcutree.srcu_max_nodelay, > srcutree.srcu_max_nodelay_phase, srcutree.srcu_retry_check_delay. > This allows users to configure the srcu grace period scanning delays, > depending on their system configuration requirements. >=20 > Signed-off-by: Neeraj Upadhyay > Tested-by: Marc Zyngier > Tested-by: Zhangfei Gao Is there any chance for this fix to make it into 5.19? The regression is significant enough on low-end systems, and I'd rather see it addressed. Thanks, M. --=20 Without deviation from the norm, progress is not possible.