From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A8E0C432BE for ; Thu, 12 Aug 2021 14:48:30 +0000 (UTC) Received: from phobos.denx.de (phobos.denx.de [85.214.62.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3F6E66108C for ; Thu, 12 Aug 2021 14:48:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 3F6E66108C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=denx.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.denx.de Received: from h2850616.stratoserver.net (localhost [IPv6:::1]) by phobos.denx.de (Postfix) with ESMTP id DB52682DE8; Thu, 12 Aug 2021 16:48:16 +0200 (CEST) Authentication-Results: phobos.denx.de; dmarc=none (p=none dis=none) header.from=denx.de Authentication-Results: phobos.denx.de; spf=pass smtp.mailfrom=u-boot-bounces@lists.denx.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=denx.de; s=phobos-20191101; t=1628779697; bh=lTeJdZjMIJlutnQFDze1exnJlkTuHvB/t4wkPBkC7+s=; h=From:To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From; b=IgxLITv8TR7JhZH5GIouM+7L5Hxh0JFN+bC21ZiBpj8WvToHs+MA0U4Aq8GHu5HGM nb+QSlUVoBsFh9QDRn/h6CiSnUL4OFIigjspo6maDCrhyAYvplayXiVHR+puSFt9cD Q5ENCYUYppWQC/TU6hPbxHsly/dCoqqH8rHYAIiGqAWe/aKJq63K0JVscNXILkUruD 2qBKvO5hPEP7oXIbGJjkm6Ib4jpRXJYo1w0BSlObR6AAdDa4lzcDT4rH6c89da0pa4 7wxN8d9YAoZvEUb5zqI3U0SFhSGFvnv4NaUfVsmSh4blq8QmNwMrVeE9MGF+zgGgTx JIsOVg9U7IUGg== Received: by phobos.denx.de (Postfix, from userid 109) id B1B6582D73; Thu, 12 Aug 2021 16:48:06 +0200 (CEST) Received: from mout-u-107.mailbox.org (mout-u-107.mailbox.org [91.198.250.252]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by phobos.denx.de (Postfix) with ESMTPS id 82F2682DBC for ; Thu, 12 Aug 2021 16:47:58 +0200 (CEST) Authentication-Results: phobos.denx.de; dmarc=none (p=none dis=none) header.from=denx.de Authentication-Results: phobos.denx.de; spf=fail smtp.mailfrom=sr@denx.de Received: from smtp1.mailbox.org (smtp1.mailbox.org [IPv6:2001:67c:2050:105:465:1:1:0]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-u-107.mailbox.org (Postfix) with ESMTPS id 4GlqL637xSzQk9m; Thu, 12 Aug 2021 16:47:58 +0200 (CEST) Received: from smtp1.mailbox.org ([80.241.60.240]) by spamfilter05.heinlein-hosting.de (spamfilter05.heinlein-hosting.de [80.241.56.123]) (amavisd-new, port 10030) with ESMTP id yehkl1noDSE9; Thu, 12 Aug 2021 16:47:52 +0200 (CEST) From: Stefan Roese To: u-boot@lists.denx.de Cc: Wolfgang Denk , Rasmus Villemoes , sjg@chromium.org, trini@konsulko.com Subject: [PATCH v4 0/3] arm64: Add optimized memset/memcpy/memove functions Date: Thu, 12 Aug 2021 16:47:48 +0200 Message-Id: <20210812144751.2563707-1-sr@denx.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: F0F411898 X-Rspamd-UID: b67072 X-BeenThere: u-boot@lists.denx.de X-Mailman-Version: 2.1.34 Precedence: list List-Id: U-Boot discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: u-boot-bounces@lists.denx.de Sender: "U-Boot" X-Virus-Scanned: clamav-milter 0.103.2 at phobos.denx.de X-Virus-Status: Clean On an NXP LX2160 based platform it has been noticed, that the currently implemented memset/memcpy functions for aarch64 are suboptimal. Especially the memset() for clearing the NXP MC firmware memory is very expensive (time-wise). This patchset now adds the optimized functions ported from this repository: https://github.com/ARM-software/optimized-routines As the optimized memset function make use of the dc opcode, which needs the caches to be enabled, an additional check is added and a simple memset version is used in this case. Please note that checkpatch.pl complains about some issue with this imported file: arch/arm/lib/asmdefs.h Since it's imported I did explicitly not make any changes here, to make potential future sync'ing easer. Here some numbers to see the speed improments: Current original version: ------------------------- memset() 32 Bytes, 16M times: time: 0.446 seconds memset() 16MiB, 256 times: time: 1.076 seconds memcpy() 512MiB: time: 0.224 seconds New optimized version: ---------------------- memset() 32 Bytes, 16M times: time: 0.287 seconds memset() 16MiB, 256 times: time: 0.292 seconds memcpy() 512MiB: time: 0.222 seconds Summary: The optimized memcpy is nearly identical to the original one. But the optimized memset is much faster, for small and big sizes. Small sizes factor ~1.6 and big sizes factor ~3.7. Note: These measurements were done on the NXP LX2160ARDB board. Thanks, Stefan Changes in v4: - Use macros instead of register names, following the optimized code - Add zero size check Changes in v3: - Add memmove alias, as this function also handles it optimized - Add memmove as well Changes in v2: - Add file names and locations and git commit ID from imported files to the commit message - New patch Stefan Roese (3): arm64: arch/arm/lib: Add optimized memset/memcpy/memmove functions arm64: memset-arm64: Use simple memset when cache is disabled arm64: Kconfig: Enable usage of optimized memset/memcpy/memmove arch/arm/Kconfig | 38 +++++- arch/arm/include/asm/string.h | 4 + arch/arm/lib/Makefile | 5 + arch/arm/lib/asmdefs.h | 98 ++++++++++++++ arch/arm/lib/memcpy-arm64.S | 242 ++++++++++++++++++++++++++++++++++ arch/arm/lib/memset-arm64.S | 148 +++++++++++++++++++++ 6 files changed, 529 insertions(+), 6 deletions(-) create mode 100644 arch/arm/lib/asmdefs.h create mode 100644 arch/arm/lib/memcpy-arm64.S create mode 100644 arch/arm/lib/memset-arm64.S -- 2.32.0