From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A40FCC04FFE for ; Fri, 17 May 2024 17:26:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=49wvlhw0tFT1b0eUET14ud32S3AzvoJFrAesmEL+EhI=; b=1yPBlTDyDMn10e Ius6/0urwkyIy051rIqXgpwZmoqssug9ej65hveJ39WF3ygpnI4V0PbUdkKRvKFuREvYhZO+eaeRi OVrmn4XXcsnl1oJV/oRMt2qicA4CP8vGmehSe+jt9yHhymsZ/YSuuZHVVEHl6krpkYBR/C8viObHT K6aJXkkjueKXaRoIh9StLcgk4F5DSOGc5qyF+nTBN7p9OtPQN6s+A0UdyB+H+aKCWWb8DGg2FXg9V p/sqeb+cY32J1gedVXmF2JAiOX+CcUSNr6m+7pfw7JLhpqvobbWpDr/VmV5V53ygiAzQEGZannr8u ZTU0Vqjy9LaifhrmLq3Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1s81L5-00000008Wfg-3aQd; Fri, 17 May 2024 17:25:51 +0000 Received: from dfw.source.kernel.org ([139.178.84.217]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1s81L3-00000008Wf6-0HlS for linux-arm-kernel@lists.infradead.org; Fri, 17 May 2024 17:25:50 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id D774161DE7; Fri, 17 May 2024 17:25:46 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 24867C2BD10; Fri, 17 May 2024 17:25:44 +0000 (UTC) Date: Fri, 17 May 2024 18:25:42 +0100 From: Catalin Marinas To: Yang Shi Cc: peterx@redhat.com, will@kernel.org, scott@os.amperecomputing.com, cl@gentwo.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] arm64: mm: force write fault for atomic RMW instructions Message-ID: References: <20240507223558.3039562-1-yang@os.amperecomputing.com> <6066e0da-f00a-40fd-a5e2-d4d78786c227@os.amperecomputing.com> <570c686c-6aa1-43f0-ba31-3597a329e037@os.amperecomputing.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <570c686c-6aa1-43f0-ba31-3597a329e037@os.amperecomputing.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240517_102549_225052_2D03841C X-CRM114-Status: GOOD ( 35.80 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, May 17, 2024 at 09:30:23AM -0700, Yang Shi wrote: > On 5/14/24 3:39 AM, Catalin Marinas wrote: > > It would be good to understand why openjdk is doing this instead of a > > plain write. Is it because it may be racing with some other threads > > already using the heap? That would be a valid pattern. > = > Yes, you are right. I think I quoted the JVM justification in earlier ema= il, > anyway they said "permit use of memory concurrently with pretouch". Ah, sorry, I missed that. This seems like a valid reason. > > A point Will raised was on potential ABI changes introduced by this > > patch. The ESR_EL1 reported to user remains the same as per the hardware > > spec (read-only), so from a SIGSEGV we may have some slight behaviour > > changes: > > = > > 1. PTE invalid: > > = > > a) vma is VM_READ && !VM_WRITE permission - SIGSEGV reported with > > ESR_EL1.WnR =3D=3D 0 in sigcontext with your patch. Without this > > patch, the PTE is mapped as PTE_RDONLY first and a subsequent > > fault will report SIGSEGV with ESR_EL1.WnR =3D=3D 1. > = > I think I can do something like the below conceptually: > = > if is_el0_atomic_instr && !is_write_abort > =A0=A0=A0 force_write =3D true > = > if VM_READ && !VM_WRITE && force_write =3D=3D true Nit: write implies read, so you only need to check !write. > =A0=A0=A0 vm_flags =3D VM_READ > =A0=A0=A0 mm_flags ~=3D FAULT_FLAG_WRITE > = > Then we just fallback to read fault. The following write fault will trigg= er > SIGSEGV with consistent ABI. I think this should work. So instead of reporting the write fault directly in case of a read-only vma, we let the core code handle the read fault and first and we retry the atomic instruction. > > b) vma is !VM_READ && !VM_WRITE permission - SIGSEGV reported with > > ESR_EL1.WnR =3D=3D 0, so no change from current behaviour, unles= s we > > fix the patch for (1.a) to fake the WnR bit which would change t= he > > current expectations. > > = > > 2. PTE valid with PTE_RDONLY - we get a normal writeable fault in > > hardware, no need to fix ESR_EL1 up. > > = > > The patch would have to address (1) above but faking the ESR_EL1.WnR bit > > based on the vma flags looks a bit fragile. > = > I think we don't need to fake the ESR_EL1.WnR bit with the fallback. I agree, with your approach above we don't need to fake WnR. > > Similarly, we have userfaultfd that reports the fault to user. I think > > in scenario (1) the kernel will report UFFD_PAGEFAULT_FLAG_WRITE with > > your patch but no UFFD_PAGEFAULT_FLAG_WP. Without this patch, there are > > indeed two faults, with the second having both UFFD_PAGEFAULT_FLAG_WP > > and UFFD_PAGEFAULT_FLAG_WRITE set. > = > I don't quite get what the problem is. IIUC, uffd just needs a signal from > kernel to tell this area will be written. It seems not break the semantic. > Added Peter Xu in this loop, who is the uffd developer. He may shed some > light. Not really familiar with uffd but just looking at the code, if a handler is registered for both MODE_MISSING and MODE_WP, currently the atomic instruction signals a user fault without UFFD_PAGEFAULT_FLAG_WRITE (the do_anonymous_page() path). If the page is mapped by the uffd handler as the zero page, a restart of the instruction would signal UFFD_PAGEFAULT_FLAG_WRITE and UFFD_PAGEFAULT_FLAG_WP (the do_wp_page() path). With your patch, we get the equivalent of UFFD_PAGEFAULT_FLAG_WRITE on the first attempt, just like having a STR instruction instead of separate LDR + STR (as the atomics behave from a fault perspective). However, I don't think that's a problem, the uffd handler should cope with an STR anyway, so it's not some unexpected combination of flags. -- = Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE77638DD6 for ; Fri, 17 May 2024 17:25:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715966747; cv=none; b=VYWdr6TjlqSB8A0lIFvE/C5sNNsc2yVAQpWf0xto0EfPfyBRZwWeqHM5v+EWED2Pigad+szXNZmlzgg3FfhBBt41OIUBR/aXqHbFDnDaAfjRpUKv3quG/PEmD8XedG5+Hr1oMh4t7QjpdBRdqek1x0CpJOkuus8kw0qg5YspSaM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715966747; c=relaxed/simple; bh=xTrgfjrD4g3Hlii3WVx0sS/D+vbmZjqWJkxgae/P9qs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=meb2vAnAuYGb2CnK5cDpL+4psQgTelG2ov0xY9XsPxRe0mYgPA6apdSLJJprThaItp76DRGbBVcYV/dzJePpNcmemsnqTw79aEPSH59Pk6XT8RR0fcit5vdGu5fuY0EnT5jvp6oikHZa1t7uhuDTpyNDZ9/N8kE10EoID1Q47dY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 24867C2BD10; Fri, 17 May 2024 17:25:44 +0000 (UTC) Date: Fri, 17 May 2024 18:25:42 +0100 From: Catalin Marinas To: Yang Shi Cc: peterx@redhat.com, will@kernel.org, scott@os.amperecomputing.com, cl@gentwo.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] arm64: mm: force write fault for atomic RMW instructions Message-ID: References: <20240507223558.3039562-1-yang@os.amperecomputing.com> <6066e0da-f00a-40fd-a5e2-d4d78786c227@os.amperecomputing.com> <570c686c-6aa1-43f0-ba31-3597a329e037@os.amperecomputing.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <570c686c-6aa1-43f0-ba31-3597a329e037@os.amperecomputing.com> On Fri, May 17, 2024 at 09:30:23AM -0700, Yang Shi wrote: > On 5/14/24 3:39 AM, Catalin Marinas wrote: > > It would be good to understand why openjdk is doing this instead of a > > plain write. Is it because it may be racing with some other threads > > already using the heap? That would be a valid pattern. > > Yes, you are right. I think I quoted the JVM justification in earlier email, > anyway they said "permit use of memory concurrently with pretouch". Ah, sorry, I missed that. This seems like a valid reason. > > A point Will raised was on potential ABI changes introduced by this > > patch. The ESR_EL1 reported to user remains the same as per the hardware > > spec (read-only), so from a SIGSEGV we may have some slight behaviour > > changes: > > > > 1. PTE invalid: > > > > a) vma is VM_READ && !VM_WRITE permission - SIGSEGV reported with > > ESR_EL1.WnR == 0 in sigcontext with your patch. Without this > > patch, the PTE is mapped as PTE_RDONLY first and a subsequent > > fault will report SIGSEGV with ESR_EL1.WnR == 1. > > I think I can do something like the below conceptually: > > if is_el0_atomic_instr && !is_write_abort >     force_write = true > > if VM_READ && !VM_WRITE && force_write == true Nit: write implies read, so you only need to check !write. >     vm_flags = VM_READ >     mm_flags ~= FAULT_FLAG_WRITE > > Then we just fallback to read fault. The following write fault will trigger > SIGSEGV with consistent ABI. I think this should work. So instead of reporting the write fault directly in case of a read-only vma, we let the core code handle the read fault and first and we retry the atomic instruction. > > b) vma is !VM_READ && !VM_WRITE permission - SIGSEGV reported with > > ESR_EL1.WnR == 0, so no change from current behaviour, unless we > > fix the patch for (1.a) to fake the WnR bit which would change the > > current expectations. > > > > 2. PTE valid with PTE_RDONLY - we get a normal writeable fault in > > hardware, no need to fix ESR_EL1 up. > > > > The patch would have to address (1) above but faking the ESR_EL1.WnR bit > > based on the vma flags looks a bit fragile. > > I think we don't need to fake the ESR_EL1.WnR bit with the fallback. I agree, with your approach above we don't need to fake WnR. > > Similarly, we have userfaultfd that reports the fault to user. I think > > in scenario (1) the kernel will report UFFD_PAGEFAULT_FLAG_WRITE with > > your patch but no UFFD_PAGEFAULT_FLAG_WP. Without this patch, there are > > indeed two faults, with the second having both UFFD_PAGEFAULT_FLAG_WP > > and UFFD_PAGEFAULT_FLAG_WRITE set. > > I don't quite get what the problem is. IIUC, uffd just needs a signal from > kernel to tell this area will be written. It seems not break the semantic. > Added Peter Xu in this loop, who is the uffd developer. He may shed some > light. Not really familiar with uffd but just looking at the code, if a handler is registered for both MODE_MISSING and MODE_WP, currently the atomic instruction signals a user fault without UFFD_PAGEFAULT_FLAG_WRITE (the do_anonymous_page() path). If the page is mapped by the uffd handler as the zero page, a restart of the instruction would signal UFFD_PAGEFAULT_FLAG_WRITE and UFFD_PAGEFAULT_FLAG_WP (the do_wp_page() path). With your patch, we get the equivalent of UFFD_PAGEFAULT_FLAG_WRITE on the first attempt, just like having a STR instruction instead of separate LDR + STR (as the atomics behave from a fault perspective). However, I don't think that's a problem, the uffd handler should cope with an STR anyway, so it's not some unexpected combination of flags. -- Catalin