From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 69779EB64D9 for ; Thu, 29 Jun 2023 19:34:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:In-Reply-To:References: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=tziMJsSHyruFoXGJ7WOo8UtiVPlNJlPKEb85GxEGEQQ=; b=pFOAy+DR/EQXgF gDQ9PZhJ4bXf3+oPOCUA4G3vnmvz7TTivxewpEIq2hRYVXr1bowFa71MyNFXNfmHjJ3ETLkFJZCob sC/CQFgsgfAGer294fDTEKYWS5w9MFJgfxLjxlIkVhzamEvvXzytEdj2Q1e3/j/KJLnp1dLwploVy QhHpg7Mai4KpMo76YWX6V/B1pR4/nYmtuSmeXdoXGeC4Y7N8JzdHgY9Tx2MboYS2TiMhPUnehfLc+ Xn0PSWRaPBRoPtZuXLXL9vlgNQiZAE1SIrKGPOHCQA5ZETwWSVDmXdKxiDVhQQsi3L98LBNdezo1m V/U9j7AXt5gZf5UOewRA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qExOv-001xae-2a; Thu, 29 Jun 2023 19:33:57 +0000 Received: from mail-os0jpn01on20611.outbound.protection.outlook.com ([2a01:111:f403:700c::611] helo=JPN01-OS0-obe.outbound.protection.outlook.com) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1qExOr-001xZN-1q for linux-arm-kernel@lists.infradead.org; Thu, 29 Jun 2023 19:33:56 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=m171SNa0VzY6qVbcEa1Q1WZ4RZxrj+ipr3K6cnlcUZrfhSRT/0P1HU6HcjwuyFkFKJkNXJgHbFMSsi885fuucolts4wvpSZ5CXP4u3bcLgxBXCYtRpmV2ErJFxFEwBq78jkwUiGxnK6/MnS+crSJYEYyx+R84wI882MPLyS+w2mba9twnhUM3a43Ye7j+ckg0BlGkUERoyF6NzZzaJx2EIqm0FIZSDZ7IsLC2n8HO1jvKhckP1OnUk7U0CxrJ/bKQIiDUV/tQf0+eKOoTxEAkJ90t5C4gcabVVjmB1l9JeoW4YTJDsDQb+gi1KMeezK9ZxzCGQLPmb4Dcdzuou/n+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=30fhY4zcGw9JmbyRRaV4MGxmqPaCylMHKH5Nq1khZ1I=; b=S08vr6u9aL3jeGKa9DuZykJ/+V2JrAbWkfLKLlwhTm9Bp77nppADTYRM+MlxFHpbsokjs6I09vCZM40JglWi/lZ86hY4FVh6/tWDA3+Y6+vQLhSU90qWvVzRHPe8X+SoxcmA0RIH1Iv1FjR6I0EUfws+ZwTiXPnA7NYl2eXKkOKTQeOIMYXAaP6VcFeoNvH4tRjX2vhEpFmbD8zh17js1GBiqglYcrTxu3YSH97nt/KK7Ib59b5BrnSjjzyEUnOdhttsZV1Z3vFx+Vb1ZYuJbRfAoCtANymVl9pICU2DiVDmD0+0vMGNbBhneJ4bGjPC5sjN3B0rHiEeSnd4RWJkuw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=anritsu.com; dmarc=pass action=none header.from=anritsu.com; dkim=pass header.d=anritsu.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=AnritsuGlobal.onmicrosoft.com; s=selector1-AnritsuGlobal-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=30fhY4zcGw9JmbyRRaV4MGxmqPaCylMHKH5Nq1khZ1I=; b=FPH1Jj4v/zHrGx+Dsk0wVYEltdhoqlRnBnqEbTvX+8Dv/b5iKSuCPWGU4Du7otPKa6laofzfimBR+sysZQ05W+ctTUKWrzsIPT/osJ0rc3opvJgzKdP6X9LIaraeL25AcTbD9NUi+6EzF67WwYhrztvbEb+ct+HZ1uCEANqx2MM= Received: from OS3P301MB0421.JPNP301.PROD.OUTLOOK.COM (2603:1096:604:20b::13) by TY3P301MB0660.JPNP301.PROD.OUTLOOK.COM (2603:1096:400:3cd::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.26; Thu, 29 Jun 2023 19:33:39 +0000 Received: from OS3P301MB0421.JPNP301.PROD.OUTLOOK.COM ([fe80::ca9e:7dc1:a080:8a9f]) by OS3P301MB0421.JPNP301.PROD.OUTLOOK.COM ([fe80::ca9e:7dc1:a080:8a9f%4]) with mapi id 15.20.6521.026; Thu, 29 Jun 2023 19:33:39 +0000 From: "Havens, Austin" To: Mark Rutland CC: "catalin.marinas@arm.com" , "will@kernel.org" , "michal.simek@amd.com" , "Suresh, Siddarth" , "Lui, Vincent" , "linux-arm-kernel@lists.infradead.org" Subject: Re: Slowdown copying data between kernel versions 4.19 and 5.15 Thread-Topic: Slowdown copying data between kernel versions 4.19 and 5.15 Thread-Index: AQHZphJlK49cd4SZlUqmjelbSY0kyq+gxHsGgAEaOYCAADZZ5A== Date: Thu, 29 Jun 2023 19:33:39 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=anritsu.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: OS3P301MB0421:EE_|TY3P301MB0660:EE_ x-ms-office365-filtering-correlation-id: ce680e01-f061-433d-333f-08db78d7bd60 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: ezkmTRbpfZFDyfTIOptnwzmfnCsc/OdM2j6hItHdtHxPY0VfCT9Wvop6UL35yoOfLEouxFOikhYvEKLWC4hvYT4OmJX3O/aSP9FKHtJxY4sL3K/cAvzXXLuQ2hkZr73IWWPcwIJZBvN3nffWf1iKU3/sxa+cjWssIITYQT/rAEU7/g4XREYMcObaqD8xz4HAUBtmFSVf/YuYuSUuo1e5KN4N8ikbDqiSMVZWkCnQ6csi+rr8zwRH9yvQj52JjwiM4zK5H/YvgMAzSe5HAigefSGWdMWhvrHQWQufD09lqGW5t2vwrPLyi6jMqMsJD/HfzYvwVPM8Cyg7DIMMY8iBfuxJhbjYGYTwLA3oYIFDQqq2D2hddOLEstzsxFuYnsZkHD9K6V8kwgJcNUVoxpuHUCu9uGIpY0ZxOTYqF5PDNjstIGqMFY/2tQI8SfwnxRIGdVr9gDrsDeeR4qglcliV2CEZM3utGFUKY43btYvnXkGuLWsqIvMOF3fO+Fc1dSWoDwAf/z9ueVkm/eov8WnKOqNt5ecVK+QA8h7YVsHcrOIpBW5QQ3giaahr3SDfUv4960frcmfk3CSX8Q5Kz2TGKD7cA1eyX/HQ+Q2B85Q/1+0= x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:OS3P301MB0421.JPNP301.PROD.OUTLOOK.COM;PTR:;CAT:NONE;SFS:(13230028)(366004)(39860400002)(346002)(136003)(396003)(376002)(451199021)(186003)(26005)(2906002)(6916009)(71200400001)(122000001)(38100700002)(52536014)(6506007)(53546011)(66476007)(91956017)(9686003)(5660300002)(8936002)(66556008)(66446008)(83380400001)(76116006)(64756008)(66946007)(41300700001)(33656002)(8676002)(38070700005)(55016003)(54906003)(86362001)(966005)(316002)(4326008)(7696005)(478600001);DIR:OUT;SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?kwp0dmIEu9P/2c1q5I/P5NtnVTJVPDeAF1LdnWvFpkI7n5Nq0mlbo+EfyN?= =?iso-8859-1?Q?ppFkVNzFYwH7sZqgXpGdyGVS5Lg3ADzBQGBWyIUih8JNf7uMR86+j/1tgG?= =?iso-8859-1?Q?YP1UQyYf8QgKot8e7P9zf20I4/ZTNcLmEaUWGYS30CiwZeet9SpESoP1W8?= =?iso-8859-1?Q?FYDmbNpnKI+sz1Qg9uc1nkHhePHWuSbyyQxHV/QK+2+LbbuxeVcY+yOTT/?= =?iso-8859-1?Q?0dzQDA8aBMTS5LkAMUJ+hwCQGCl9Xmzy8t/EkV7npQkoONWqhdtWXo7L4y?= =?iso-8859-1?Q?Vxja6KCQQiy0r3mApP4W6QmOzDdzqsB5zgo1tyq01UG8oGCfPtNOkB/7CR?= =?iso-8859-1?Q?FSKB89Ucz4nRIxEfzJpgXBgp7WUn2WoUbPrrEzVH6kOBwMTAtsSap0hsZs?= =?iso-8859-1?Q?2FCwuwmfWG1VjO0MtWnJwaV59qqZ2TW0jDXdsC4R/e3ABTGU9NesvQG/MZ?= =?iso-8859-1?Q?z8Dgil6RABc5sWgrU4VoCjrT3WfrMy0XSqR8kPRD2b363nAs/tRVqCaOZD?= =?iso-8859-1?Q?WcO3M7Zq36OwE/+Rn1iVbQz2EFUTXTC5BXowoM0ERe6t0rh2m90x3vyYQM?= =?iso-8859-1?Q?6uK+yLNlsD3KeuqtqRee7/FCKO1gJle3dt2nlmI8rruwwdkoRmvqj3LIkv?= =?iso-8859-1?Q?juA0N0os+YNe9qpOzFgsYbj/5h+G9udomMb+XnKfP64BqGcGYRoI5Lrf+H?= =?iso-8859-1?Q?een4FRH72nPFWZgNK5rR+lE/8Jo1ZWIa6CBTu2EOs/jMk4+TrY5oj7k6WF?= =?iso-8859-1?Q?hA9lVVdgeo3F0uZRkbwBmUebjT0nVmlti5UwWXy3PRCKEfN8Ea0+GrOYEk?= =?iso-8859-1?Q?SPM2qt2+Q3n1lw6XyHazGcnJlCvLTob1BILadpgz3BIRy/WNtt95ajjhe7?= =?iso-8859-1?Q?yYIZDp9qxWGyJrzi+afAofRVikIAMlR6iZI6q+jJteV8ha0WxSpjRpQNjS?= =?iso-8859-1?Q?zsgncS+uBqSQlW+PsBXtWUOXDlMGqzOVpUkIVhpQUAPXhtyf+E2/WbK1IW?= =?iso-8859-1?Q?5OhUBDniRXMN16o8+cHrCD6aZrtdrFqwb9fhLDZ+zmZPslmw5Ha08qDYWU?= =?iso-8859-1?Q?Lk0sps8b/u4ar9YNjcDVCnpCb4WK9oolhVUB63EE80c6N4Ta0TdgSMEdf9?= =?iso-8859-1?Q?ZLPkt4XtVChJn8HpkXMNMLhSRNcXxpXQDZwRs7xQ+i5r6n8zjagN538LHi?= =?iso-8859-1?Q?cntIaVpMVFOP4fq2zFfCY+zpDtxCxDUolT3WpZToxMAL7QZTM/K66zKphd?= =?iso-8859-1?Q?7GhAAUuHQB8/T+xM0ki0en14Vbe8hc139JubBREbQtiIg9qq43J4l9wYQR?= =?iso-8859-1?Q?s1zy4ks9jAiJuFUFNEGEQzQ1DfVKDqZNMlJUJ9rBJbFLOiPAswWzRa4Iw+?= =?iso-8859-1?Q?CjoX53MA7AG6m3xjN0z0eVEOZidCaOcyf9LUXwfRcbIv0h8wrZdV81fTbo?= =?iso-8859-1?Q?kKqqY5+F8NOhcZckCjt60VBpZ36tqqSnzAj47h49460jN9kp3Lc8/Tn0Mi?= =?iso-8859-1?Q?2mx5AE49eaoZVWBychRMhSacE1POUmEAadjVuuhAfYiN9t20faflA2Jubo?= =?iso-8859-1?Q?b9/ZVO6SO0kCIEXCoCDANQzBLNxnTX5FVxVgGXmd3PSG1UJ4fl5rLFrJpL?= =?iso-8859-1?Q?UbhyUhiqngWzMLrqB0oa/l30EDp3EPXwwx?= MIME-Version: 1.0 X-OriginatorOrg: anritsu.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: OS3P301MB0421.JPNP301.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: ce680e01-f061-433d-333f-08db78d7bd60 X-MS-Exchange-CrossTenant-originalarrivaltime: 29 Jun 2023 19:33:39.3312 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 5655302f-85c9-44b7-bc12-807562d22c08 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 1RJNPNMHXjvMLEr5MOdd0NyiY9wdo7ruUPSTkhkuixhAUrl0L7Wa1+0ZzoigShb5Fi1qW3VDqd/SQNYHG9y2n9eq4OWQpxdL88VHFWVTOKI= X-MS-Exchange-Transport-CrossTenantHeadersStamped: TY3P301MB0660 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230629_123353_779085_F23D227C X-CRM114-Status: GOOD ( 24.15 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Mark, Thanks for the reply. =A0On Thursday, June 29, 2023 7:74 AM Mark Rutland wrote: > On Wed, Jun 28, 2023 at 09:38:14PM +0000, Havens, Austin wrote: >> >After some investigation I am guessing the issue is either in the iovec= tor >> >iteration changes (around >> >https://elixir.bootlin.com/linux/v5.15/source/lib/iov_iter.c#L922 ) or = the >> >lower level changes in arch/arm64/lib/copy_from_user.S, but I am pretty= out >> >of my depth so it is just speculation. = >> = >> After comparing the dissassembly of __arch_copy_from_user on both kernel= s and >> going through commit logs, I figured out the slowdown was mostly due to = to >> the changes from commit c703d80130b1c9d6783f4cbb9516fd5fe4a750d, specifi= ally >> the changes to uao_ldp. = > > For the benefit of others, that's commit: > > =A0 fc703d80130b1c9d ("arm64: uaccess: split user/kernel routine") Sorry for the copy paste error. >> = >> diff --git a/arch/arm64/include/asm/asm-uaccess.h b/arch/arm64/include/a= sm/asm-uaccess.h >> index 2c26ca5b7bb0..2b5454fa0f24 100644 >> --- a/arch/arm64/include/asm/asm-uaccess.h >> +++ b/arch/arm64/include/asm/asm-uaccess.h >> @@ -59,62 +59,32 @@ alternative_else_nop_endif >>=A0 #endif >>=A0 = >>=A0 /* >> - * Generate the assembly for UAO alternatives with exception table entr= ies. >> + * Generate the assembly for LDTR/STTR with exception table entries. >>=A0=A0 * This is complicated as there is no post-increment or pair versio= ns of the >>=A0=A0 * unprivileged instructions, and USER() only works for single inst= ructions. >>=A0=A0 */ >> -#ifdef CONFIG_ARM64_UAO >>=A0=A0=A0=A0=A0=A0=A0=A0 .macro uao_ldp l, reg1, reg2, addr, post_inc >> -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 alternative_if_not ARM64_HAS= _UAO >> -8888:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 ldp=A0=A0=A0= =A0 \reg1, \reg2, [\addr], \post_inc; >> -8889:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 nop; >> -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 nop; >> -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 alternative_else >> -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 ldtr= =A0=A0=A0 \reg1, [\addr]; >> -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 ldtr= =A0=A0=A0 \reg2, [\addr, #8]; >> -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 add= =A0=A0=A0=A0 \addr, \addr, \post_inc; >> -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 alternative_endif >> +8888:=A0=A0=A0=A0=A0=A0=A0=A0=A0 ldtr=A0=A0=A0 \reg1, [\addr]; >> +8889:=A0=A0=A0=A0=A0=A0=A0=A0=A0 ldtr=A0=A0=A0 \reg2, [\addr, #8]; >> +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 add=A0=A0=A0=A0 \addr, \addr= , \post_inc; >>=A0 = >>=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 _asm_extable=A0=A0=A0 88= 88b,\l; >>=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 _asm_extable=A0=A0=A0 88= 89b,\l; >>=A0=A0=A0=A0=A0=A0=A0=A0 .endm >> = >> I could not directly revert the changes to test since more names changed= in >> other commits than I cared to figure out, but I hacked out that change, = and >> saw that the performance of the test program was basically back to norma= l. = >> = >> diff --git a/arch/arm64/include/asm/asm-uaccess.h b/arch/arm64/include/a= sm/asm-uaccess.h >> index ccedf548dac9..2ddf7eba46fd 100644 >> --- a/arch/arm64/include/asm/asm-uaccess.h >> +++ b/arch/arm64/include/asm/asm-uaccess.h >> @@ -64,9 +64,9 @@ alternative_else_nop_endif >>=A0=A0 * unprivileged instructions, and USER() only works for single inst= ructions. >>=A0=A0 */ >>=A0=A0=A0=A0=A0=A0=A0=A0 .macro user_ldp l, reg1, reg2, addr, post_inc >> -8888:=A0=A0=A0=A0=A0=A0=A0=A0=A0 ldtr=A0=A0=A0 \reg1, [\addr]; >> -8889:=A0=A0=A0=A0=A0=A0=A0=A0=A0 ldtr=A0=A0=A0 \reg2, [\addr, #8]; >> -=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 add=A0=A0=A0=A0 \addr, \addr= , \post_inc; >> +8888:=A0=A0=A0=A0=A0=A0=A0=A0=A0 ldp=A0=A0=A0=A0 \reg1, \reg2, [\addr],= \post_inc; >> +8889:=A0=A0=A0=A0=A0=A0=A0=A0=A0 nop; >> +=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 nop; > > As Catalin noted, we can't make that change generally as it'd be broken f= or any > system with PAN, and in general we *really* want to use LDTR/STTR for user > accesses to catch any misuse with kernel pointers. I was afraid that would be the case. It puts us in a bit of a tough spot, b= ut at least we know we should be looking for workarounds instead of a fix. = >> Profiling with the hacked __arch_copy_from_user = >> root@ahraptor:/tmp# perf stat -einstructions -ecycles -e ld_dep_stall -e= read_alloc -e dTLB-load-misses /mnt/usrroot/test_copy >> = >>=A0 Performance counter stats for '/mnt/usrroot/test_copy': >> = >>=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 11822342=A0=A0=A0=A0=A0 instructions=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 #=A0=A0=A0 0.23=A0 insn per cycle=A0=A0= =A0=A0=A0=A0=A0=A0 = >>=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 50689594=A0=A0=A0=A0=A0 cycles=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 = >>=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 37627922=A0=A0=A0=A0=A0 ld_dep_stall=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 = >>=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 17933=A0=A0=A0=A0=A0 read_alloc= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 = >>=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 3421=A0=A0=A0=A0=A0 dTLB-load-= misses=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 = >> = >>=A0=A0=A0=A0=A0=A0=A0 0.043440253 seconds time elapsed >> = >>=A0=A0=A0=A0=A0=A0=A0 0.004382000 seconds user >>=A0=A0=A0=A0=A0=A0=A0 0.039442000 seconds sys >> = >> Unfortunately the hack crashes in other cases so it is not a viable solu= tion >> for us. Also, on our actual workload there is still a small difference in >> performance remaining that I have not tracked down yet (I am guessing it= has >> to do with the dTLB-load-misses remaining higher). = >> = >> Note, I think that the slow down is only noticeable in cases like ours w= here >> the data being copied from is not in cache (for us, because the FPGA wri= tes >> it). > > When you say "is not in cache", what exactly do you mean? If this were ju= st the > latency of filling a cache I wouldn't expect the size of the first access= to > make a difference, so I'm assuming the source buffer is not mapped with > cacheable memory attributes, which we generally assume. > > Which memory attribues are the source and destination buffers mapped with= ? Is > that Normal-WB, Normal-NC, or Device? How exactly has that memory been ma= pped? > > I'm assuming this is with some out-of-tree driver; if that's in a public = tree > could you please provide a pointer to it? > > Thanks, > Mark. I am actually not 100% clear on how the memory gets mapped. Currently we ca= ll = ioremap in our driver, so I think that should map it as iomem. When I remov= ed = that or used /dev/mem, nothing changed, and looking at things now I think t= hat = is because the origional mapping is from drivers/of/of_reserved_mem.c IIRC I mostly followed this wiki when setting things up https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18841683/Linux+Reserv= ed+Memory I think the relevant parts are from the dts (note we do it 2x, because we h= ave some usages that also need to be accessed by other CPUs on the SoC which have a= dress = space restrictions) = reserved-memory { #address-cells =3D <2>; #size-cells =3D <2>; ranges; iq_capture: fpga_mem@1 { compatible =3D "shared-dma-pool"; no-map; = reg =3D <0x0 0x70000000 0x0 0x10000000>; }; big_iq_capture: fpga_mem@2 { compatible =3D "shared-dma-pool"; no-map; reg =3D <0x8 0x0 0x0 0x80000000>; }; }; anritsu-databuffer@0 { compatible =3D "anritsu,databuffer"; memory-region =3D <&iq_capture>; device-name =3D "databuffer-device"; }; anritsu-databuffer@1 { compatible =3D "anritsu,databuffer"; memory-region =3D <&big_iq_capture>; device-name =3D "capturebuffer-device"; }; The databuffer driver is something we made and generally build out of tree, but I put it in tree on our github if you want to look at it. I have not ac= tually tried to build it in-tree yet, so I could have made some mistakes with the = Makefile or something. Here is a link to where the ioremap is. = https://github.com/Anritsu/linux-xlnx/blob/intree_databuffer_driver/drivers= /char/databuffer_driver.c#L242 Despite doing my best to read the documentation, I was never really sure if= I got the = memory mapping right for our use case. = If you are interested in context, the use case is in spectrum analyzers. https://www.anritsu.com/en-us/test-measurement/products/ms2090a The feature is IQ capture, which if you are not familiar with Spectrum Anal= yzers, = is basically trying to take the data from an a high speed ADC and store it = as fast = as possible. Since the FPGA is writing the data is clocked to the ADC, the = rates = we can stream out without losing any data depend on how fast we can copy th= e = data from memory to either the network or a file, which is why this perform= ance is important to us. I think we should probably be using scatter/gather for = this, but I could not convince the FPGA engineers to implement it (and it sounded= hard so I did not try very hard to convince them). = Thanks for the help, Austin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel