From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5236EC61DA4 for ; Thu, 9 Feb 2023 19:10:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229450AbjBITKQ (ORCPT ); Thu, 9 Feb 2023 14:10:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229604AbjBITKP (ORCPT ); Thu, 9 Feb 2023 14:10:15 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8841C66FAF for ; Thu, 9 Feb 2023 11:10:01 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 30F78B822D7 for ; Thu, 9 Feb 2023 19:10:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1152EC433D2; Thu, 9 Feb 2023 19:09:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1675969798; bh=D15Kq849woOVI1YLNhoXChEYCGgOX9FUI/uMUN08w9o=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=ixjIUr/4Aunsg+LKNcYh3mSpOoSfkqUSQNFRrqSzbrhQ0TqW58+KIWNRRrDvZp0ke 6xt+R4X9+q98VLEQcUqr1VpRie9hmfFANT+quUBv/2kZT48vCBYhBE6dbKp55tNU2I EE0fK/T/C8loHbHKct78AeJo+kKK9OuEVnqRYLkhLux6sI4t5ar7Hd0nsBmD9zG3nd 9rDalPUVjgV1HrlGHPdwXySYm3lXIEcZThSR4GoJWP72nZCZ6AGA9GfvhnPLvgHDWY fjjZdUEZgTXzLkVZiMAUnNDEhFSDlo2F3pXV3SQ+m/2BVhXiKdsjiICsPW0So+qbSB jzk4eCB4452Sg== Date: Thu, 9 Feb 2023 19:09:53 +0000 From: Conor Dooley To: Andrew Jones Cc: linux-riscv@lists.infradead.org, kvm-riscv@lists.infradead.org, devicetree@vger.kernel.org, 'Anup Patel ' , 'Palmer Dabbelt ' , 'Paul Walmsley ' , 'Krzysztof Kozlowski ' , 'Atish Patra ' , 'Heiko Stuebner ' , 'Jisheng Zhang ' , 'Rob Herring ' , 'Albert Ou ' , 'Conor Dooley ' Subject: Re: [PATCH v4 6/8] RISC-V: Use Zicboz in clear_page when available Message-ID: References: <20230209152628.129914-1-ajones@ventanamicro.com> <20230209152628.129914-7-ajones@ventanamicro.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="veppz8gDaqvg02hq" Content-Disposition: inline In-Reply-To: <20230209152628.129914-7-ajones@ventanamicro.com> Precedence: bulk List-ID: X-Mailing-List: devicetree@vger.kernel.org --veppz8gDaqvg02hq Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Feb 09, 2023 at 04:26:26PM +0100, Andrew Jones wrote: > Using memset() to zero a 4K page takes 563 total instructions, where > 20 are branches. clear_page(), with Zicboz and a 64 byte block size, > takes 169 total instructions, where 4 are branches and 33 are nops. > Even though the block size is a variable, thanks to alternatives, we > can still implement a Duff device without having to do any preliminary > calculations. This is achieved by taking advantage of 'vendor_id' > being used as application-specific data for alternatives, enabling us > to stop patching / unrolling when 4K bytes have been zeroed (we would > loop and continue after 4K if the page size would be larger) >=20 > For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to > only loop a few times and larger block sizes to not loop at all. Since > cbo.zero doesn't take an offset, we also need an 'add' after each > instruction, making the loop body 112 to 160 bytes. Hopefully this > is small enough to not cause icache misses. >=20 > Signed-off-by: Andrew Jones > Acked-by: Conor Dooley > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeatur= e.c > index 74736b4f0624..42246bbfa532 100644 > --- a/arch/riscv/kernel/cpufeature.c > +++ b/arch/riscv/kernel/cpufeature.c > @@ -280,6 +280,17 @@ void __init riscv_fill_hwcap(void) > #ifdef CONFIG_RISCV_ALTERNATIVE > static bool riscv_cpufeature_application_check(u32 feature, u16 data) > { > + switch (feature) { > + case RISCV_ISA_EXT_ZICBOZ: > + /* > + * Zicboz alternative applications provide the maximum I like the comment, rather than this being some wizardry. I find the word "applications" to be a little unclear, perhaps, iff this series needs a respin, this would work better as "Users of the Zicboz alternative provide..." (or s/Users/Callers)? > + * supported block size order, or zero when it doesn't > + * matter. If the current block size exceeds the maximum, > + * then the alternative cannot be applied. > + */ > + return data =3D=3D 0 || riscv_cboz_block_size <=3D (1U << data); > + } > + > return data =3D=3D 0; > } --veppz8gDaqvg02hq Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRh246EGq/8RLhDjO14tDGHoIJi0gUCY+VFAQAKCRB4tDGHoIJi 0mviAP4rLh2m+MLdM3RrIndb+WmlOQSSSRoaFx36yjj37BnHaAD/dhvxfhFkKQ6r UZR4CNvcoA4Bf0uG3ag4YsqP3yI2iAw= =6AcS -----END PGP SIGNATURE----- --veppz8gDaqvg02hq--