From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9615832BF24 for ; Wed, 10 Dec 2025 20:16:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765397797; cv=none; b=SntrkYSa2DytvvsyqelKQlFhOh3q8ey5sM/3xVoPDiyqB5sE0Dge14ruNOoqjlyRR/kLcRT7yqvaeDmIWghfxLiIi/sFH0qepiCrwEQ0HGdlZnGLGc22fpT0o/VZwT0euEBowOcP0C/GOvN0k3z3rLVh9p6YAaI/1rZXV/tN7RI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765397797; c=relaxed/simple; bh=2dzIQbCAhIJrQV4dYiKtp0DkLTDC/2fE0VzGRbmCeTw=; h=Date:From:To:CC:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=XGcMxogIvhV+aHSZz+tA4cWhB9D/T0NyJaIwtOZBgxvWs6lMf/y8S5nN2b6yMIV6zduxV6Lv6oZqZeuQJwq+G05nkZ4BO4pA8sHElztPoVzK3ZQx/JSFVMWWxBIFlAGJ234Plp273mqkF9ZHvaVSDES4xIeiysy3UnEV46xuDIo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=EwKVj+0K; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="EwKVj+0K" Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.18.1.11/8.18.1.11) with ESMTP id 5BAGc7rV1440445 for ; Wed, 10 Dec 2025 12:16:32 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-type:date:from:in-reply-to:message-id:mime-version :references:subject:to; s=s2048-2025-q2; bh=0rjbkPcVP0OMpZpi8LdP rRUTAr5HeJYf1Gz4mbFbmRk=; b=EwKVj+0Kzpgw7dNcpjReyvLVKcZ4q39qhEKy Y2DZbrBSjE9ybhppnX2GV/5rx3JWZNihWa7ouK8PIvdWXuYrChcX/jsuRyrwhNCv wAOWu1WKnu/TmjKyQWuWYOg8Rg2lcQH+biURffNYTru8UBmPYk0FOQKkj2MXSf3J LT3Bt0UUS01T3CPmyxPLIGKqxYO6UOmT5BgSJC0KIEb+VZ4cTPaQQr6/wQWdeOiC yOKqId7EmYxiIbLQbJwViUiNdE6aVPf5A79BKGc0hqaIXilHEpX7hLIFTcsvUdk0 EN3g9WyJuEMbR4U1h9Apo2lhbLv2jK0jp70LrmTlHcsBjeIrbQ== Received: from mail.thefacebook.com ([163.114.134.16]) by m0089730.ppops.net (PPS) with ESMTPS id 4aycjba04x-8 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 10 Dec 2025 12:16:32 -0800 (PST) Received: from twshared36991.31.prn2.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c08b:78::2ac9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.29; Wed, 10 Dec 2025 20:16:28 +0000 Received: by devbig071.cco5.facebook.com (Postfix, from userid 704538) id E1A5BB4BEF15; Wed, 10 Dec 2025 12:16:17 -0800 (PST) Date: Wed, 10 Dec 2025 12:16:17 -0800 From: Ben Niu To: Mark Rutland CC: , , , , , , , Subject: Withdraw [PATCH] tracing: Enable kprobe tracing for Arm64 asm functions Message-ID: References: <20251027181749.240466-1-benniu@meta.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: GX8Olrjd2RNmysViBwgmg8ZGdClK_JDq X-Authority-Analysis: v=2.4 cv=QcVrf8bv c=1 sm=1 tr=0 ts=6939d520 cx=c_pps a=CB4LiSf2rd0gKozIdrpkBw==:117 a=CB4LiSf2rd0gKozIdrpkBw==:17 a=kj9zAlcOel0A:10 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=NEAV23lmAAAA:8 a=VabnemYjAAAA:8 a=oWcT_wZofUWEQ4_KjLkA:9 a=CjuIK1q_8ugA:10 a=gKebqoRLp9LExxC7YDUY:22 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjEwMDE2NiBTYWx0ZWRfXz5p6s0AsXvO2 RYuvnind/hqTtG55FNSIsV11kiTRyMdnR9yh+wKA2cUdjA6Arfi+xIx0tMwjrXgUp/i2zfQHkg4 ouaIMWJl+ilszEzm8c6GxYyUk/wv6ufVfWslO1DPAFFDWzmMQccgc0F2ZPV6hu/1XCsznODyGyL WlZ/KrghJTq0DN03PykTcH9KkxJHo15qmSlciT4DgpGSUpw5rgQHMIPYe4erU9uOKTatB8BK/i7 SP8w2hogDKu21azyawXNWvgPP7F9B9cYQX2KeXvdrNANVcXaTkd4PNbPOrmMWfgtuLbz3XUyaVM Oe2/CTh0g29Kd/oExH0aBfbLij1TDdAdBawP30YjV4ZAc7CATcvIedNKWVuNkAJ/VL8RzpKus+1 IbkM/CKGNznvwrGGVxVgTfhiYaoaXw== X-Proofpoint-GUID: GX8Olrjd2RNmysViBwgmg8ZGdClK_JDq X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-10_02,2025-12-09_03,2025-10-01_01 On Mon, Nov 17, 2025 at 10:34:22AM +0000, Mark Rutland wrote: > On Thu, Oct 30, 2025 at 11:07:51AM -0700, Ben Niu wrote: > > On Thu, Oct 30, 2025 at 12:35:25PM +0000, Mark Rutland wrote: > > > Is there something specific you want to trace, but cannot currently > > > trace (on arm64)? > > > > For some reason, we only saw Arm64 Linux asm functions __arch_copy_to_user and > > __arch_copy_from_user being hot in our workloads, not those counterpart asm > > functions on x86, so we are trying to understand and improve performance of > > those Arm64 asm functions. > > Are you sure that's not an artifact of those being out-of-line on arm64, > but inline on x86? On x86, the out-of-line forms are only used when the > CPU doesn't have FSRM, and when the CPU *does* have FSRM, the logic gets > inlined. See raw_copy_from_user(), raw_copy_to_user(), and > copy_user_generic() in arch/x86/include/asm/uaccess_64.h. On x86, INLINE_COPY_TO_USER is not defined in the latest linux kernel and our internal branch, so _copy_to_user is always defined as an extern function (no-inline), which ends up inlining copy_user_generic. copy_user_generic executes FSRM rep movs if CPU supports it (our case), otherwise, it calls rep_movs_alternative, which issues plain movs to copy memory. FSRM is vectorized internally by the CPU and Arm FEAT_MOPS seems similar to what FSRM does, but Neoverse V2 does not support FEAT_MOPS so the kernel resorts to sttr to copy memory. > Have you checked that inlining is not skewing your results, and > artificially making those look hotter on am64 by virtue of centralizing > samples to the same IP/PC range? As mentioned above, _copy_to_user is not inlined on x86. > Can you share any information on those workloads? e.g. which callchains > were hot? Please reach out to James Greenhalgh and Chris Goodyer at Arm for more details about those workloads, which I can't share in a public channel. By bpftracing, we found that most of __arch_copy_to_user calls are against size 2k-4k. I wrote a simple microbenchmark to test and compare the performance of copy_to_user on both x64 and arm64, see https://github.com/mcfi/benchmark/tree/main/ku_copy Below were data collected. You could see that Arm64 ku copy speed had a dip when the size hits 2K-16K. Routine Size Arm64 bytes/s / x64 bytes /s copy_to_user 8 203.4% copy_from_user 8 248.5% copy_to_user 16 194.6% copy_from_user 16 240.6% copy_to_user 32 195.0% copy_from_user 32 236.5% copy_to_user 64 193.2% copy_from_user 64 240.4% copy_to_user 128 211.0% copy_from_user 128 273.0% copy_to_user 256 185.3% copy_from_user 256 229.3% copy_to_user 512 152.9% copy_from_user 512 182.4% copy_to_user 1024 121.0% copy_from_user 1024 141.9% copy_to_user 2048 95.1% copy_from_user 2048 108.7% copy_to_user 4096 78.9% copy_from_user 4096 85.7% copy_to_user 8192 69.7% copy_from_user 8192 73.5% copy_to_user 16384 65.9% copy_from_user 16384 68.8% copy_to_user 32768 117.3% copy_from_user 32768 118.6% copy_to_user 65536 115.5% copy_from_user 65536 117.4% copy_to_user 131072 114.6% copy_from_user 131072 113.3% copy_to_user 262144 114.4% copy_from_user 262144 109.3% I sent out a patch for faster __arch_copy_from_user/__arch_copy_to_user. Please see this message ID 20251018052237.1368504-2-benniu@meta.com for more details. > Mark. In addition, I'm withdrawing this patch because we found a workaround to trace __arch_copy_to_user/__arch_copy_from_user through watchpoints. This message ID aTjpzfqTEGPcird5@meta.com should have all the details. Thank you for reviewing this patch and all your helpful comments. Ben