From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from CY3PR05CU001.outbound.protection.outlook.com (mail-westcentralusazon11013049.outbound.protection.outlook.com [40.93.201.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE63225B098 for ; Thu, 14 May 2026 05:18:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.93.201.49 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778735900; cv=fail; b=EHFPLd/9dN54IbkSaEFi7DqIlHJwtwFgR84ILRi6dkMzzyyRhTQlggVNrLzN4BATCrsbRiZ/uyVFWk1RDfb/PzXcRVANfoaQPz08UQHNWREnaM5qL8zPD0eOhpaV2FusnykgS9mg8vIhbYh4fTO0KfMjakOTn6prIVNepSbtNN8= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778735900; c=relaxed/simple; bh=X7jVJ2PjNcAavGxjnWAtb2t33xJSlpARO8XIwslQCzE=; h=Message-ID:Date:Subject:To:Cc:References:From:In-Reply-To: Content-Type:MIME-Version; b=qdh+bdNscvRsgoheiO9689LOaJ6x6JdeGfAdhRDt+Rab/k3uD3dlHJU0TBKCs3EcT8nR2JapEo4BNHDKdPwk+VPHKGtKei+6uo0v8mJwlTNb+PUU19A6nmj4h4KXViMJZMSp91ANxMbg7OjTZ9b7Fx79ObKpns46MUznaf9xDp4= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=iD4TmyGm; arc=fail smtp.client-ip=40.93.201.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="iD4TmyGm" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=P7Xawei4rSC3oEKHbj0P6g/3or45b3hiKz2UhdrkeY4tI3X3TOsTNOCtVSV0IHriyhrysbFP7LiDIhGNg5K6v2jltO4f47NUTXgDWeR/kxQcX0OC93Y7/lcXO0FNjvsPLxURHeEgLfxf55UoSaqZrnHrywYxqzDDHDClNlgXkusj4MGOErIFVVLj0kO0LAuRO4fGbYuTTQN91Rv5lgdpjUrQOZ9cAHPFTm2iYWJ20RMAAAPshVe0tnUvoaw426Od3ecu4FH+5zh2HfdL+zULeRZxSAlMkqGU51VE3Lx7W+v1IsRUoh4rhHN3CuE9a8o5N/i3031LkJBGJdQ5lQSvSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=xall7EWWT/rJ0wN/C1Y0lrBhqJOBFYYVqTnx3US/mV0=; b=fRAt53roJdXHtS9a0MeivcoHQzXOhfirqGt+nTTpLkTRh4CIH8nA9DgRgmBTwnO97rqnkl+yTpOepGeTttvVYGszDJ7nXKA/reXJrWmpHEjoYGXsVa4T6jnApwkBWSZj/pmLAQa0GMsXS026r1ZIktfNnyR4h/rHabOyA50Fy+EtGajwdLR+tf+MJ+7Lq41quHoxAHZRSBoby7N925HJGrXmu9EVCu/S5aYGIMhKKaX8pKsc2GlE+5PLIDFlPcs7eDamNE4cfVZiF58eJczp2u8qW7tm6scC97OOvtDYuFrFPehQHNpCli782ey7MaC+Hug+XImHPE0rBUmZ2xPekg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=xall7EWWT/rJ0wN/C1Y0lrBhqJOBFYYVqTnx3US/mV0=; b=iD4TmyGmZBYk3b0S02qhIssmXtk90C0aaWsehnLfmLj68NYIdqqaKKxB9tJAvarpnGqAQdIlSfgpJE4mcsgrVJ7qq0CE8D8EChT/V43rURX6ZPvuyGZEhgBmiWZ96DvD21/D229ZMPe5O+Yng9t19NFu6p+cmpR6ZVg3N9CT/9I= Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=amd.com; Received: from BN9PR12MB5259.namprd12.prod.outlook.com (2603:10b6:408:100::17) by DM4PR12MB6544.namprd12.prod.outlook.com (2603:10b6:8:8d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9913.11; Thu, 14 May 2026 05:18:15 +0000 Received: from BN9PR12MB5259.namprd12.prod.outlook.com ([fe80::122c:cca7:c2b3:90ed]) by BN9PR12MB5259.namprd12.prod.outlook.com ([fe80::122c:cca7:c2b3:90ed%4]) with mapi id 15.20.9891.021; Thu, 14 May 2026 05:18:15 +0000 Message-ID: <6a5e794a-a608-4126-9abe-0d512a57dd67@amd.com> Date: Thu, 14 May 2026 10:47:32 +0530 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 1/1] mm: batch page copies in folio_copy() and folio_mc_copy() To: "David Hildenbrand (Arm)" , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org Cc: Lorenzo Stoakes , "Liam R . Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Ankur Arora , Bharata B Rao , Hrushikesh Salunke , David Rientjes , sandipan.das@amd.com References: <20260427142036.111940-2-shivankg@amd.com> <20260427142036.111940-4-shivankg@amd.com> <073e5e2c-7102-4141-b0d7-fa5635f811f5@kernel.org> Content-Language: en-US From: "Garg, Shivank" In-Reply-To: <073e5e2c-7102-4141-b0d7-fa5635f811f5@kernel.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-ClientProxiedBy: PN4P287CA0130.INDP287.PROD.OUTLOOK.COM (2603:1096:c01:2b1::10) To BN9PR12MB5259.namprd12.prod.outlook.com (2603:10b6:408:100::17) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN9PR12MB5259:EE_|DM4PR12MB6544:EE_ X-MS-Office365-Filtering-Correlation-Id: f9cd4689-d80c-4219-8430-08deb17833b1 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|7416014|376014|366016|3023799003|4143699003|22082099003|18002099003|56012099003|11063799003; X-Microsoft-Antispam-Message-Info: u6OIVt+z+QLOANFN/zjldhlKGnBRoUSULz/4WSII2vbBCuvGrZL0qL/2uzk0sN6g7B1E5Q56oklvtrkA/e9fr7VuPLYMw+C/e81i6I/DO9EbgZs0g6oNx/+31sNaUJJgyHVcXM56TIt1FUq7ozNaXjPUgWxxP4SF7gIKXV95yRkPn/UqXzOUBVjt+LDJr1ZkbV9h6e4PIBLwF6sGJ+Medaz8KgNvQ2U58X0s5cJ7mt1nsYEVxeI8O8buaUvzWoTb7KM54/X/suoKW9zDCgrd8tATPxD+JsMZMCP1+e4qK9RW88+YTpVtgTumxTxfaNJVC8rBG+J9js8SE77uxrgeY5WDm6VTHUdpOzHyAfYNPUhYurzUS4VB+rUIeah7aUYLOJ9FIn6U6/sGIn/WQPFI++b+hSd5lD2nSqub6xDuHhvR+B2fVm72Q4wdoyBGgf9O28PYVQ65wcSJFmHPe3HQwAzY7BKMfuQYQnHD+d4Rc8ShgbjT+j2EG9qF18Dj1Pb9JF+Uk5MizsQM4TX+BtNpyh+3rFdNnvJmtObj0Zims34zECSUz1noNVzepN6+k2nu/GPgLIlikYYuslrTZlUVpX3kA+qoSx0m4NMfwWqoNJU0dS3Dq+ecce0zBdcSbGIdv77pDGr0WIEGAjrsl2jnZ+D4Jf/j8s46ocNFBLWMGcP+OI7oIu7LvWndO5HexTjx X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BN9PR12MB5259.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(7416014)(376014)(366016)(3023799003)(4143699003)(22082099003)(18002099003)(56012099003)(11063799003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?V05xTHpzNVAyODZWQVY2bWk4djFacitncEozRTQ2TUE1NGxLNDJuK3RaV0lI?= =?utf-8?B?bUUyM3IwaDg5RDVJeGxpWG5xbnc4cVhuS0lmaERwTEtwZE02K05RMWZVeHYx?= =?utf-8?B?N24rNlJ0TUxnd1kxNDIvVHBBUVFhUyszWUI2OGlMUHcxSjJsVGcwUTFOZWJH?= =?utf-8?B?cHBETmVEN053RUF5KzQ4RlFPei9mYkNENjJvWk41Q1Yyc0dZSDFBUFFXNjRU?= =?utf-8?B?NjhxU3N1OGZBR3hxTCt3MGJaQkx2eGMrcmpqQkJtY2xYdVNYbDFiOG5TNGJJ?= =?utf-8?B?TUVMYWRvMDA0QkYxcXo4SkNoL0RXTUFjbXIyejB4bmFNU0Zadm9YdFJDMjZY?= =?utf-8?B?WVoxK0ZOK1J5REZiRzN5WGdqVXROSzJoY1B2WFdKL012SklxQVNtUTh3MWxL?= =?utf-8?B?Q0hEZnhyWFIrVnlFYW5DZ3Y3MWJkMW83bjBEZWJCR1g0SnJwK01EbVo2cEl6?= =?utf-8?B?THRqcXdkY05Ba3ZvVGtDYjJhbXRMRmJIa25NemFmbWJzdytXYWpOeWRwSXNj?= =?utf-8?B?blIveGxVVWN3RXBjSUZHZldRK1NKUEVGS1JjYmVoYWd6MGp1YTE3VU4rUUpH?= =?utf-8?B?NE56TStVU2N4bThNb3NiUThKa1VhY0NQNStKbHRJY1cyUVpxcUNUT3ArOXV2?= =?utf-8?B?MWsvcjRYVklSejZoazRJNUFBZWQ5WkZsSWFSY1E0RS84VXpqaTZvVDhTZ3Rs?= =?utf-8?B?SVhzbksrYzM4VFFudGxQZThRVDUvUnhGS2pUYUNod2p6MktXbnhVQ1RpNjBL?= =?utf-8?B?U0Y5MmZ5WjJKSk1WMHoyV2ZPanVuYjlkL1g1WG5kaTVtak1aVTFpenNna0lO?= =?utf-8?B?RkxSNUxMcHJLM0dxdFp1czhEZWZ3c29TYko5d3M2c3JuSTJ3Smg5ZHJQcGxC?= =?utf-8?B?YlU2MythQU1uaFZoYy9qUTFIcW1GVS9xQUp1VjNnKy9OMW9FRGxmRnJBcllD?= =?utf-8?B?ck16blpPY0xZK3hUdlFsdGJWY2VuQ25rcmN4TDk4WHFsTVZMb2s2WTNhVS9T?= =?utf-8?B?b1AvQXh1dDNJQ2dVSWlnSFk4T0p6UEpFUGRXNHIxVzhKb21JaXEzYXo1Zk9O?= =?utf-8?B?eUNDR2hYQ1BoRDBSS01PbU5DKzc4V0o2WTMxT0g0NE1xbnJYTzFiSjZaT0RN?= =?utf-8?B?V2tBWjFmWjExQVRHMnhlQVhxOUtJR3BaRzQ2VGY4WWdQcStDdDVxbndYU3Z0?= =?utf-8?B?YklXZjZHN0tNdXdsQkZGY3duc2hxZzNWTkR3ak1La1gwZWZUQzdiN2hWS3o1?= =?utf-8?B?MllIYjVEeGdwN0paa3BIT3R2RVEzdW5HdEd3V1FLb3lFN3dqemVOQ2llckFy?= =?utf-8?B?RHdHaXd3Y2xjYUh0QzNWMWR2b3NVUEJjbUxkU2xlOVUwejRibWtjQTNzWllm?= =?utf-8?B?UWszK2l2RU5mTVJMRVo3ZWxWSDJveVdic1h3QTJLSVBMODZwWWluZkltN3Fa?= =?utf-8?B?bkREU3BpTHU3MkVMVXRXbGV3WmdIV0VON3FQT1p4V2Fja05pU2JkSU5raUxE?= =?utf-8?B?TmhnNUQvUUR0WVBFYllPUEFqRGhQb2d1dStzOU1GZ1lDa1lhaWJtZlFFUGFM?= =?utf-8?B?eXQ5OFVRQkovZXNUNitYOUJCTFBZcmtNTElzcUs3aG4zNGp3bXhpVWFVRVlR?= =?utf-8?B?OVVsREpBMGxTRUNFNVNBMysvTmlPODcrU20wTEd5Y2dLdE9qTGZjeDQxVWhV?= =?utf-8?B?TGFWa2ozY0RSSXZ4LzBNSDJBZ1lRVFFqeVQxNnFnK1N4REhpS0R1YmNHMnJm?= =?utf-8?B?YVIwT1I2ckh1ZWZHMlY5bWJVTlpIWE5MYnBPRnNPRmduNnFXYzlaVjh5a3F0?= =?utf-8?B?MTZvTFJaRHBDNHdsSStmNFhDdnZDNnRIR0xTcGh6MW5CUnJzb282YllzNzU0?= =?utf-8?B?aDVWYUlHR1o0YWtpZ0pMRE1jZmJNV29FU3hXaDh4TW90UFlnMW81eEdudFhv?= =?utf-8?B?TlR4RUdDU3dCSXlDRGI5SjZUWFRwMlFtUDZmWWtzMUdKa3VlbmozaklhaThE?= =?utf-8?B?aGVxODNZcTVpOEFGNFplSmhrN3BHWEMvZVY4NEFvaFRxMmxRVktXYURzK3F5?= =?utf-8?B?anZWM2gvTHFPV2h0QUVBYkQ5d05KemZxaWRjRjNxZ21MUVZqOUpFVkRnTnB1?= =?utf-8?B?bTdleVZ3eldmUmdLMW5pVk40SmxwbXB3Qm9yLzkrVnBYWU9la2N4UWJmL2lR?= =?utf-8?B?Z2JrZ01oOTFCZVZoWlRVZHB4em5FcTVwV05DV2hySWpuVkkydEVlMmhORlU1?= =?utf-8?B?cWxHaTN5ZklvN0RybUVKZFJRdWhaNFZDUFJGUDlGWlo4aUYyd2hrL1FtUWIv?= =?utf-8?B?dWFkMlVvcmlaNVN2ZHlWWXpHNUxMMStTUjJlL1B4THhRWmx6Nys2dz09?= X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-Network-Message-Id: f9cd4689-d80c-4219-8430-08deb17833b1 X-MS-Exchange-CrossTenant-AuthSource: BN9PR12MB5259.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 May 2026 05:18:15.6176 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: qxnGlSxJwdheuwFA+1M0e/n3ojwHlvlDik93Qku/hxy3fuRexPyylF9IZQbv/Rje5G97hAh/lVGitE4XCpr0GQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB6544 On 5/12/2026 3:01 PM, David Hildenbrand (Arm) wrote: > On 4/27/26 16:20, Shivank Garg wrote: >> Rewrite folio_copy() and folio_mc_copy() as thin wrappers around new >> batched helpers copy_highpages() and copy_mc_highpages(). >> >> The current implementations iterate copy_highpage() (or its #MC-aware >> variant) per 4 KB page. For a single 2 MB folio that loop runs 512 >> times and pays, per page: >> >> - kmap_local_page() / kunmap_local() >> - cond_resched() >> - one invocation of the architecture copy_page()/memcpy() primitive >> >> The new helpers issue a single copy_mc_to_kernel()/memcpy() over >> the whole contiguous range when CONFIG_HIGHMEM is off and no >> architecture overrides (__HAVE_ARCH_COPY_HIGHPAGE) copy_highpage(). >> HIGHMEM and arch overrides keep the existing per-page path. >> >> Tested on dual-socket AMD EPYC 9655 (Zen 5) with a CXL.mem node. >> In-kernel folio_mc_copy() microbenchmark on 2 MB folios, source >> evicted from cache before each iteration and measured throughput: >> >> direction baseline GB/s optimized GB/s speedup >> DRAM0 -> DRAM1 18.65 ± 1.37 38.03 ± 3.21 2.04x >> DRAM0 -> CXL 25.46 ± 2.89 39.29 ± 1.17 1.54x >> CXL -> DRAM0 20.61 ± 3.95 35.07 ± 0.62 1.70x >> >> End-to-end move_pages(2) throughput on anonymous 2 MB mTHP folios, >> 1 GB migrated per run: >> >> direction baseline GB/s optimized GB/s speedup >> DRAM0 -> DRAM1 7.20 ± 0.03 8.01 ± 0.02 1.11x >> DRAM0 -> CXL 11.12 ± 0.15 13.07 ± 0.03 1.18x >> DRAM1 -> DRAM0 7.21 ± 0.02 7.95 ± 0.02 1.10x >> CXL -> DRAM0 9.10 ± 0.05 9.49 ± 0.01 1.04x >> >> On AMD EPYC 7713 (Zen 3 / Milan, REP_GOOD without FSRM/ERMS) the >> folio_copy() bulk path regresses because memcpy() falls through to >> memcpy_orig (an unrolled movq loop), which is slower than the >> per-page copy_page() (microcoded rep movsq) it replaces. > > Do you know what the reason for that fallback is? Could it be fixed (e.g., when > we detect page alignment or sth like that?) > The fallback is gated on X86_FEATURE_FSRM in arch/x86/lib/memcpy_64.S: SYM_TYPED_FUNC_START(__memcpy) ALTERNATIVE "jmp memcpy_orig", "", X86_FEATURE_FSRM movq %rdi, %rax movq %rdx, %rcx rep movsb RET AMD Zen 3 does not have FSRM, so it jmp to memcpy_orig (unrolled movq loop). On v7.1.0-rc3, I measured these primitives and the kernel's actual memcpy() across three CPUs, using a kernel module that vmallocs 16MB src/dst buffer and times each primitive for comparison. Numbers are mean (in GB/s) ± SD% (= SD as percent of mean). 1.) AMD EPYC 7713 (Zen 3), Flags: rep_good only, no ERMS/FSRM: size unrolled_movq GB/s±SD% rep_movsq GB/s±SD% kernel_memcpy GB/s±SD% ------------------------------------------------------------------------------ 16B 0.38± 8.73% 0.41± 0.43% 0.43± 0.31% 32B 0.85± 0.19% 0.80± 8.37% 0.84± 0.07% 64B 1.68± 0.35% 1.60± 0.03% 1.59± 9.37% 128B 3.23± 0.22% 3.04± 0.62% 3.19± 0.03% 256B 5.99± 5.78% 5.62± 4.15% 5.93± 0.42% 512B 10.07± 1.36% 10.49± 2.60% 10.02± 0.21% 1K 14.49± 0.09% 18.19± 0.37% 14.31± 3.48% 2K 17.11± 1.01% 28.04± 2.37% 18.14± 0.56% 4K 18.36± 0.22% 39.15± 0.50% 19.57± 1.14% - kernel_memcpy is tracking unrolled_movq. - rep_movsq is 1.4x-2x faster than the unrolled_movq fallback for >= 1 KiB. 2.) On Intel(R) Xeon(R) Platinum 8362 Flags: rep_good, erms, fsrm size unrolled_movq GB/s±SD% rep_movsq GB/s±SD% rep_movsb GB/s±SD% kernel_memcpy GB/s±SD% -------------------------------------------------------------------------------------------- 16B 0.89± 0.93% 0.64± 0.10% 0.69± 0.57% 0.66± 3.52% 32B 2.08± 2.46% 1.28± 0.15% 1.38± 6.21% 1.33± 4.28% 64B 3.97± 2.26% 2.55± 0.24% 2.83± 0.22% 2.65± 4.48% 128B 7.45± 0.09% 5.00± 2.53% 5.48± 5.04% 5.30± 1.60% 256B 13.24± 0.01% 9.79± 0.57% 10.12± 0.37% 9.81± 0.34% 512B 21.67± 0.03% 17.87± 0.02% 18.43± 0.79% 17.81± 0.25% 1K 27.84± 1.96% 34.54± 1.24% 35.67± 1.88% 34.56± 2.49% 2K 32.67± 2.35% 59.58± 0.01% 65.67± 0.18% 59.35± 1.12% 4K 34.85± 0.64% 95.35± 0.00% 96.64± 0.69% 95.35± 0.00% - kernel_memcpy is using rep_movsb (FSRM in use). - Below 512 B the unrolled movq loop is ~20-50% faster, >1 KiB FSRM wins. 3.) On AMD EPYC 9655 96-Core Processor (Zen 5) Flags: rep_good, erms, fsrm size unrolled_movq GB/s±SD% rep_movsq GB/s±SD% rep_movsb GB/s±SD% kernel_memcpy GB/s±SD% -------------------------------------------------------------------------------------------- 16B 0.53± 0.39% 0.53± 0.21% 0.55± 0.13% 0.53± 0.14% 32B 1.13± 1.49% 1.06± 0.07% 1.09± 0.16% 1.06± 0.09% 64B 2.21± 0.12% 2.13± 0.07% 2.18± 0.14% 2.13± 0.09% 128B 4.25± 0.12% 4.26± 0.10% 4.37± 0.12% 4.31± 0.14% 256B 8.01± 0.19% 8.61± 0.27% 8.61± 0.18% 8.51± 0.10% 512B 14.14± 0.18% 16.80± 0.24% 16.80± 0.23% 16.81± 0.24% 1K 22.93± 0.73% 31.70± 0.48% 32.37± 0.28% 32.02± 0.22% 2K 30.36± 0.27% 53.24± 1.01% 56.58± 0.22% 56.04± 0.22% 4K 35.05± 0.65% 80.25± 0.41% 83.90± 0.20% 76.23± 0.37% - kernel_memcpy is using rep_movsb (FSRM in use). - For smaller size, unrolled movq are close enough to be within noise. Regarding the fix, One option is to make memcpy() fall back to rep movsq instead of unrolled movq loop when FSRM is absent. The data shows the benefit on Zen 3. For the Intel, unrolled movq is faster for smaller sizes. But, I'm not sure if adding these complexities to memcpy() is welcome. Happy to work on this if it is helpful. Another option is to leave memcpy() untouched for this series and add a new copy_pages() helper that the folio copy path can use. It would use ALTERNATIVE_2 that picks rep movsb on ERMS/FSRM and rep movsq on REP_GOOD and per-page copy_page() loop as the final fallback. Thanks, Shivank