From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ECFFCEB64DB for ; Tue, 20 Jun 2023 17:25:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230366AbjFTRZE (ORCPT ); Tue, 20 Jun 2023 13:25:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58036 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229922AbjFTRY4 (ORCPT ); Tue, 20 Jun 2023 13:24:56 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 32F02FC; Tue, 20 Jun 2023 10:24:55 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id B8D946132D; Tue, 20 Jun 2023 17:24:54 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id F40CCC433C8; Tue, 20 Jun 2023 17:24:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1687281894; bh=Rd2KQT7Sg/jj2vpD/+cypNQq21vO707uBhL4Wt60ONQ=; h=In-Reply-To:References:Date:From:To:Cc:Subject:From; b=qVLzEw3DYllRsW1kHmBV862G6LOZFXc4fKTQnIpsgdD3rkTdoAKKcvq1gRTVFpyy5 XbCuDXOA1g1mZTF7xKYW2DvV4WvN2nArEOEJgYcwd4dfupuNCgrJ1TcrC087JauSrD bRc/e3gFOmGsVaMU9qcLfWmy5T2ln7Jbkx9hx3XU2mUqramf+q8rMAFYMpM7+uX76N kL3uRI0aY94qeKbLwAOT9R5Vt7d0a4F/1qUypEQw4GZYFbdsYPH/DlFJM0rqVnm3R6 pFduwHV7iGgrpKWkxEPmXRIlBYyw1sOUEKglC1dG6CWAHw+KOuedQ+eu2dj3ec+4eB /QpIQ01kgaTbg== Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailauth.nyi.internal (Postfix) with ESMTP id C920427C005A; Tue, 20 Jun 2023 13:24:51 -0400 (EDT) Received: from imap48 ([10.202.2.98]) by compute3.internal (MEProxy); Tue, 20 Jun 2023 13:24:51 -0400 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrgeefhedgleduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvfevufgtgfesthhqredtreerjeenucfhrhhomhepfdet nhguhicunfhuthhomhhirhhskhhifdcuoehluhhtoheskhgvrhhnvghlrdhorhhgqeenuc ggtffrrghtthgvrhhnpeduveffvdegvdefhfegjeejlefgtdffueekudfgkeduvdetvddu ieeluefgjeeggfenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfh hrohhmpegrnhguhidomhgvshhmthhprghuthhhphgvrhhsohhnrghlihhthidqudduiedu keehieefvddqvdeifeduieeitdekqdhluhhtoheppehkvghrnhgvlhdrohhrgheslhhinh hugidrlhhuthhordhush X-ME-Proxy: Feedback-ID: ieff94742:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 85C9831A0063; Tue, 20 Jun 2023 13:24:50 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.9.0-alpha0-499-gf27bbf33e2-fm-20230619.001-gf27bbf33 Mime-Version: 1.0 Message-Id: <6145cabf-d016-4dba-b5d2-0fb793352058@app.fastmail.com> In-Reply-To: <7F566E60-C371-449B-992B-0C435AD6016B@gmail.com> References: <20230616085038.4121892-1-rppt@kernel.org> <20230616085038.4121892-3-rppt@kernel.org> <20230618080027.GA52412@kernel.org> <7F566E60-C371-449B-992B-0C435AD6016B@gmail.com> Date: Tue, 20 Jun 2023 10:24:29 -0700 From: "Andy Lutomirski" To: "Nadav Amit" , "Song Liu" Cc: "Mike Rapoport" , "Mark Rutland" , "Kees Cook" , "Linux Kernel Mailing List" , "Andrew Morton" , "Catalin Marinas" , "Christophe Leroy" , "David S. Miller" , "Dinh Nguyen" , "Heiko Carstens" , "Helge Deller" , "Huacai Chen" , "Kent Overstreet" , "Luis Chamberlain" , "Michael Ellerman" , "Naveen N. Rao" , "Palmer Dabbelt" , "Puranjay Mohan" , "Rick P Edgecombe" , "Russell King (Oracle)" , "Steven Rostedt" , "Thomas Bogendoerfer" , "Thomas Gleixner" , "Will Deacon" , bpf , linux-arm-kernel@lists.infradead.org, linux-mips@vger.kernel.org, linux-mm , linux-modules@vger.kernel.org, linux-parisc@vger.kernel.org, linux-riscv@lists.infradead.org, linux-s390 , linux-trace-kernel@vger.kernel.org, linuxppc-dev , loongarch@lists.linux.dev, netdev@vger.kernel.org, sparclinux@vger.kernel.org, "the arch/x86 maintainers" Subject: Re: [PATCH v2 02/12] mm: introduce execmem_text_alloc() and jit_text_alloc() Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-trace-kernel@vger.kernel.org On Mon, Jun 19, 2023, at 1:18 PM, Nadav Amit wrote: >> On Jun 19, 2023, at 10:09 AM, Andy Lutomirski wrote: >>=20 >> But jit_text_alloc() can't do this, because the order of operations d= oesn't match. With jit_text_alloc(), the executable mapping shows up be= fore the text is populated, so there is no atomic change from not-there = to populated-and-executable. Which means that there is an opportunity f= or CPUs, speculatively or otherwise, to start filling various caches wit= h intermediate states of the text, which means that various architecture= s (even x86!) may need serialization. >>=20 >> For eBPF- and module- like use cases, where JITting/code gen is quite= coarse-grained, perhaps something vaguely like: >>=20 >> jit_text_alloc() -> returns a handle and an executable virtual addres= s, but does *not* map it there >> jit_text_write() -> write to that handle >> jit_text_map() -> map it and synchronize if needed (no sync needed on= x86, I think) > > Andy, would you mind explaining why you think a sync is not needed? I=20 > mean I have a =E2=80=9Cfeeling=E2=80=9D that perhaps TSO can guarantee= something based=20 > on the order of write and page-table update. Is that the argument? Sorry, when I say "no sync" I mean no cross-CPU synchronization. I'm as= suming the underlying sequence of events is: allocate physical pages (jit_text_alloc) write to them (with MOV, memcpy, whatever), via the direct map or via a = temporary mm do an appropriate *local* barrier (which, on x86, is probably implied by= TSO, as the subsequent pagetable change is at least a release; also, an= y any previous temporary mm stuff would have done MOV CR3 afterwards, wh= ich is a full "serializing" barrier) optionally zap the direct map via IPI, assuming the pages are direct map= ped (but this could be avoided with a smart enough allocator and tempora= ry_mm above) install the final RX PTE (jit_text_map), which does a MOV or maybe a LOC= K CMPXCHG16B. Note that the virtual address in question was not readabl= e or executable before this, and all CPUs have serialized since the last= time it was executable. either jump to the new text locally, or: 1. Do a store-release to tell other CPUs that the text is mapped 2. Other CPU does a load-acquire to detect that the text is mapped and j= umps to the text This is all approximately the same thing that plain old mmap(..., PROT_E= XEC, ...) does. > > On this regard, one thing that I clearly do not understand is why=20 > *today* it is ok for users of bpf_arch_text_copy() not to call=20 > text_poke_sync(). Am I missing something? I cannot explain this, because I suspect the current code is wrong. But= it's only wrong across CPUs, because bpf_arch_text_copy goes through te= xt_poke_copy, which calls unuse_temporary_mm(), which is serializing. A= nd it's plausible that most eBPF use cases don't actually cause the load= ed program to get used on a different CPU without first serializing on t= he CPU that ends up using it. (Context switches and interrupts are seri= alizing.) FRED could make interrupts non-serializing. I sincerely hope that FRED d= oesn't cause this all to fall apart. --Andy