From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D81C9C5475B for ; Wed, 28 Feb 2024 23:06:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:In-Reply-To:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=XUHgf8DRRjD7Vh3Ws/R7yWAeMbUdSPslBuo8hee0Iwk=; b=fVWJGFNR2JWli0 kRl2FIrZN1ImM9vDfMLXlytEJmcglP2mnwmi3IdMQe8y93w2v8NCuMAXw/Wmn7KM+uqe6O2N5acbq 4Xsaskxj+n/Rw+B7+dioDtqZssqQKw4YTXw0stMa+WNZkMsrzYkeLhS/XCp2rSerivYJaPewATdHR F9JU5+LfchN/txjcuSw1wY8h7H4ShXpgVRoj4PNE64XHYGYZljacv8Z3vPn/ZxcQbZ/f0f087QF7A jl7axCa4TnnIFI0VPrqb+KwiDdFfvyCzX5sDXp6qg1dWC5APfULVSqLLASL9LYRI/rtULbTZOVDoP JhB99pg+TMFmhAUouzEQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rfT0Y-0000000BEkI-1Nsh; Wed, 28 Feb 2024 23:06:38 +0000 Received: from mail-mw2nam12on20601.outbound.protection.outlook.com ([2a01:111:f403:200a::601] helo=NAM12-MW2-obe.outbound.protection.outlook.com) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rfT0V-0000000BEiQ-18jm for linux-arm-kernel@lists.infradead.org; Wed, 28 Feb 2024 23:06:36 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=YHWLj3gXF/XysF+cMOU6s7th0vRdthaFJLCDS7mCwrUkOt5xhf1J9FlzRKuS0OQwBe0Mq/M3r9w2W3XOIIaWP+DyL3EXNJfDdAkd98CdRDdA6o6bxHTzmeHHQuyNvg8nEkv8rjm+ix5AW6+YgLUKLlxhZTLk2YJXtHGSBuLvM23lWw2i0qsFLNbZSK5xwljWosbMGwpZXbO7qXO3iKJGZlETkMD5utxObEXB06O4G4+iefKrUgPQtKi6WwswL9pPWmyPYKmU+SfRZXwLW1Kjz0Ik3nlEn4t2qMXahowhG2Tk2T7Jp/mWIoHxRJhAVzF79zpx+ioWe/GjBcPWRcLTGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=bqa1gAzyjIl/S4o2PRoznHYpTG2g0iENkbXUyWNHni4=; b=CJBd27YqJL13w4kyNtjylLlgQ3U0gQDUFGiK8LpbTvnq5cnb6CNyj6LMh5JIEEnhqrR2/v6tKj0ausHjgX5k9pgHDs7d1PMm3M4JN+NdMFxgEWRBwv4NPLBpPpjLVRBRYj9rsj/+KApJiKsUGHrVt7I4aHKVn1Au6Vj/Q+zSykPiLTEYr49H6NOZfYaDk5t3MYC0bIYIQgeypO9TRtprRAvHqSMQhij3Kz15PBMCeF2wwOeWtZ1C2l6qltwv8jRQlUvNdgh72NBnrtuUy0Zuuxkckj6Ka7Ij/BCNH9yi78T/Q+i41laGh7PjK0fbWoiLq8lL8vi2eX/fmdzD6RLXog== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=bqa1gAzyjIl/S4o2PRoznHYpTG2g0iENkbXUyWNHni4=; b=K9VLilN6TcLbw6pZa2pvKDKNCsPxtMclo1/z45HSwMgjxrpvJurNe/A6YtCM7X/mcabxApZJiI2w1e05Q1tC99S0YZloz7XxqInIVSgW7nUFvQ+y2YReUjW2NZeV4etTWwgLepB827LQdHFLyl5ps56/ny5oJmGgNUmr9hPZq/s3oo8R79/FWeys+svjvkCb6/+PiMLjrdC4nysdq3M+3vyoMg3u4d7QM04Em4xQP9ciR7ebveHC25ceUuk6zCVePb98wOrrS4DrAWaIqLT/p69NcE2SLg5Ci/9MuEGcMFGY6doCFrKz/vU7MhetxU451o2fUHdINROE9DiDSpKcUQ== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from DM6PR12MB3849.namprd12.prod.outlook.com (2603:10b6:5:1c7::26) by PH7PR12MB5901.namprd12.prod.outlook.com (2603:10b6:510:1d5::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7316.41; Wed, 28 Feb 2024 23:06:17 +0000 Received: from DM6PR12MB3849.namprd12.prod.outlook.com ([fe80::2e3e:c7c0:84da:3941]) by DM6PR12MB3849.namprd12.prod.outlook.com ([fe80::2e3e:c7c0:84da:3941%6]) with mapi id 15.20.7316.037; Wed, 28 Feb 2024 23:06:17 +0000 Date: Wed, 28 Feb 2024 19:06:16 -0400 From: Jason Gunthorpe To: Catalin Marinas Cc: Alexander Gordeev , Andrew Morton , Christian Borntraeger , Borislav Petkov , Dave Hansen , "David S. Miller" , Eric Dumazet , Gerald Schaefer , Vasily Gorbik , Heiko Carstens , "H. Peter Anvin" , Justin Stitt , Jakub Kicinski , Leon Romanovsky , linux-rdma@vger.kernel.org, linux-s390@vger.kernel.org, llvm@lists.linux.dev, Ingo Molnar , Bill Wendling , Nathan Chancellor , Nick Desaulniers , netdev@vger.kernel.org, Paolo Abeni , Salil Mehta , Jijie Shao , Sven Schnelle , Thomas Gleixner , x86@kernel.org, Yisen Zhuang , Arnd Bergmann , Leon Romanovsky , linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Mark Rutland , Michael Guralnik , patches@lists.linux.dev, Niklas Schnelle , Will Deacon Subject: Re: [PATCH 4/6] arm64/io: Provide a WC friendly __iowriteXX_copy() Message-ID: <20240228230616.GS13330@nvidia.com> References: <0-v1-38290193eace+5-mlx5_arm_wc_jgg@nvidia.com> <4-v1-38290193eace+5-mlx5_arm_wc_jgg@nvidia.com> Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: DS7PR07CA0005.namprd07.prod.outlook.com (2603:10b6:5:3af::15) To DM6PR12MB3849.namprd12.prod.outlook.com (2603:10b6:5:1c7::26) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6PR12MB3849:EE_|PH7PR12MB5901:EE_ X-MS-Office365-Filtering-Correlation-Id: 3c28f0b5-d1f8-470c-71e8-08dc38b1de44 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 1R/Z2DoK9MFIGp/bQ3/mJJ/ZbV31X9IjwBAWaBiYoq7kEo/SEXRLEFGLfuLDwy6HGgNmjXGLSm52qXYFzX4qnrnnyujGLjalIvS524U7ETvnbwYV/euEPVSCETWsigA3ZnJ5BUi9gYSJwHHY4vJSWhXD+aNZ4j2sC7YtZm9l1MovqRBBisM0XKou68Nuo0sNN2DbZbDDJgaTmiEaPbmlWJNUHmM8YB+nSgex8D5BEKFiKKKd3q5HC2oVaE6FRzMfVv4GopcBWdslR3yYtL/wGsHfT5Oif8DQOOCRDK5YEKgwdj1YsTvjGl6MTk3wsCJ0l0UDY8Fswh2VLlPNqlwJzW3h/aKjpj9Hb5zW6JtWEbQjMiUzmLAOpOb7QQwhiOWj4IikBSIfHxZplu5qrENFX6gZ2KQMAbDq6lfeuh2gV5Gbg7CIvDDDYwrwJokiyrDymtIl+gEKBc+XMYrEHUzVv+MrxfNrG0zy/eNZwSFr6FYoXQdMPKoMfcvNimQdyYK8bwuh4dyH7p0WRgAoG9mxR1yVu9x0qKM/pGKCbU2GFT/ddgtc/2hKAIxpWfsCnOxQd6y6YxV75+wDqQuF+31CV2V72SCCguwm4IqvtzJ4TdhvEyIG2+uhrDeoI2PEG+ym X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DM6PR12MB3849.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?bl33Vq0DxnAgTVxNZhEcYh36gqJFY7Pug7yoOiv50bjWihtvuCvQhWBb2K3D?= =?us-ascii?Q?dOa4DUejnRvnjtItie9SrSocZX+erpipnPZ6zx6Ik/mZp6VaKK4aYeHkPomm?= =?us-ascii?Q?SlC/lhwvGJewnYvxiSSLtAzjeWuAdp3crCxUVI9s5BJH+5hYVb2Wcoo/zB86?= =?us-ascii?Q?O0qphUwEPZ79CDYSTXvXfnChTvQIXGmgP1fhhCIF5trKzwmgmLjprlMQ2nHd?= =?us-ascii?Q?elWdLB2oXwlM5xhfyr1k8bv8W0p6MaTLmI0z+4iHBt4fY51TEfLHDiR9Cxno?= =?us-ascii?Q?rv7Mz4w/AfHQBaoWOJg8yyEczE3+fx3lC1/sK2HW7EJYyuSMOYgwY21G99Kx?= =?us-ascii?Q?+JrSs/pnCo8zvRX4hlkqpt2Q1zMJS+aJAMsLcI0LEeJ8tnaEMhKaZf5JEDZA?= =?us-ascii?Q?KL713xFYTqdiW2cqvnvev0iKhrDMMn8TjN5R7C8Qz/K1LjLj+WNZGpX4NVvp?= =?us-ascii?Q?WV5+sDsam9VLZefmlQbDVKMCZG9MiR5mbtIlW3yURcellEIcStCJ+L4I8xWa?= =?us-ascii?Q?mb9z0gdGY26nxHDqOafc2stJx2RUEW00oafqKD1keb4UQPUHnwvxmdzXi5d3?= =?us-ascii?Q?UszQDRWd9DDN8ymT1Ri4wpMCWb9oiXxJIxEUXfmX1Xmc9VkvevYU50ciIzsm?= =?us-ascii?Q?ckPRGxwFEkXCAhmCjif235tDNm00RumXOGowaDtEiSn57lgfD3YpJQAuLkop?= =?us-ascii?Q?Jr50Bmh6UD29qib+xXIrO4/ZpHovoZ7UMmyeq0Z3BIJtydZxk4CppdQynuug?= =?us-ascii?Q?u2pOkBlEQqPfcx8H08X8D4fPHOwADHKlLpcUffdWwiR/gRJfF29t+DgKeyrj?= =?us-ascii?Q?aKwGSFdqmc3qXm3Fe0RjArA7UZTptJtc4SVYN3LCMq+riVwddxc6XJoE3Y+x?= =?us-ascii?Q?q42EFUMtf6yNrzPL2Q17r9cZVOYfaRhjkOGhWQ5qhQXt90zJUbIkL2QjKCCG?= =?us-ascii?Q?E0npfDQRVFLYLAy4QuEc5M3WTbVMdkFqP+Fs5tNen5T1Ilu3A4jsmjZPteAe?= =?us-ascii?Q?mU8HTXrL4Sn4uThS5oCcmqFnzwBQieNkhzShdlDiAksRzAe02zIX2VQZzIJn?= =?us-ascii?Q?i94edTg8PiKHICQJwWjd0e1oeVtkCsgKABHknxgxguZ8WhmCs1NxSaKpHDLD?= =?us-ascii?Q?9Y/YjC2yxeuPBYYm3eXxaRHsfGXcagq6cg5f5e4B9Lelk8QeSbmmekAlORuO?= =?us-ascii?Q?tB2Wy0XNyLrhV80Sbd8Kr/xi7mY6i1HVuu+2HBVyRRUMsgS1dApAtpsmJcUD?= =?us-ascii?Q?KB2RXIgu9PcCuGQ3Nc7j+752tmE/PfIZOUnhbXWVItfEVkUXYDHmu2CxaANS?= =?us-ascii?Q?eCCAvuzpixoK7AqBc9pbkbphEMwtQUJUVjWjqlEyd+QurdTfccV6BZ89Omtx?= =?us-ascii?Q?LtOrNbF8flwKvCeMz1oG6VYrK3UihBRTx5Z0Hn49z3O/7yCbai9NNpfvxWFU?= =?us-ascii?Q?7SmMrFgECI16Q8KrmP3WkXxTzA42D3jw0DaEPujBLbDz36IC9vGnFFFBM72A?= =?us-ascii?Q?eig7G5K9mIsA66xcQPUp1mrAdgvlImkzdadX0O8hMed3J7v4mrrrCYvDPf6p?= =?us-ascii?Q?HMgUf90absdk3k6yx2Y=3D?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3c28f0b5-d1f8-470c-71e8-08dc38b1de44 X-MS-Exchange-CrossTenant-AuthSource: DM6PR12MB3849.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Feb 2024 23:06:17.0667 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: LILSMS8ZEh7qGuIN+7D0algwk/mXUR8Yb1GOHfNMJ34rb0yVu07rDpxZyT20mY5R X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB5901 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240228_150635_357515_E52204F2 X-CRM114-Status: GOOD ( 18.76 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Feb 27, 2024 at 10:37:18AM +0000, Catalin Marinas wrote: > On Tue, Feb 20, 2024 at 09:17:08PM -0400, Jason Gunthorpe wrote: > > +/* > > + * This generates a memcpy that works on a from/to address which is aligned to > > + * bits. Count is in terms of the number of bits sized quantities to copy. It > > + * optimizes to use the STR groupings when possible so that it is WC friendly. > > + */ > > +#define memcpy_toio_aligned(to, from, count, bits) \ > > + ({ \ > > + volatile u##bits __iomem *_to = to; \ > > + const u##bits *_from = from; \ > > + size_t _count = count; \ > > + const u##bits *_end_from = _from + ALIGN_DOWN(_count, 8); \ > > + \ > > + for (; _from < _end_from; _from += 8, _to += 8) \ > > + __const_memcpy_toio_aligned##bits(_to, _from, 8); \ > > + if ((_count % 8) >= 4) { \ > > + __const_memcpy_toio_aligned##bits(_to, _from, 4); \ > > + _from += 4; \ > > + _to += 4; \ > > + } \ > > + if ((_count % 4) >= 2) { \ > > + __const_memcpy_toio_aligned##bits(_to, _from, 2); \ > > + _from += 2; \ > > + _to += 2; \ > > + } \ > > + if (_count % 2) \ > > + __const_memcpy_toio_aligned##bits(_to, _from, 1); \ > > + }) > > Do we actually need all this if count is not constant? If it's not > performance critical anywhere, I'd rather copy the generic > implementation, it's easier to read. Which generic version? The point is to maximize WC effects with non-constant values, so I think we do need something like this. ie we can't just fall back to looping over 64 bit stores one at a time. If we don't use the large block stores we know we get very poor WC behavior. So at least the 8 and 4 constant value sections are needed. At that point you may as well just do 4 and 2 instead of another loop. Most places I know about using this are performance paths, the entire iocopy infrastructure was introduced as an x86 performance optimization.. Jason _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel