From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id A40FCC04FFE
	for <linux-arm-kernel@archiver.kernel.org>; Fri, 17 May 2024 17:26:03 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20210309; h=Sender:
	Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post:
	List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:
	Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description:
	Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:
	List-Owner; bh=49wvlhw0tFT1b0eUET14ud32S3AzvoJFrAesmEL+EhI=; b=1yPBlTDyDMn10e
	Ius6/0urwkyIy051rIqXgpwZmoqssug9ej65hveJ39WF3ygpnI4V0PbUdkKRvKFuREvYhZO+eaeRi
	OVrmn4XXcsnl1oJV/oRMt2qicA4CP8vGmehSe+jt9yHhymsZ/YSuuZHVVEHl6krpkYBR/C8viObHT
	K6aJXkkjueKXaRoIh9StLcgk4F5DSOGc5qyF+nTBN7p9OtPQN6s+A0UdyB+H+aKCWWb8DGg2FXg9V
	p/sqeb+cY32J1gedVXmF2JAiOX+CcUSNr6m+7pfw7JLhpqvobbWpDr/VmV5V53ygiAzQEGZannr8u
	ZTU0Vqjy9LaifhrmLq3Q==;
Received: from localhost ([::1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux))
	id 1s81L5-00000008Wfg-3aQd;
	Fri, 17 May 2024 17:25:51 +0000
Received: from dfw.source.kernel.org ([139.178.84.217])
	by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux))
	id 1s81L3-00000008Wf6-0HlS
	for linux-arm-kernel@lists.infradead.org;
	Fri, 17 May 2024 17:25:50 +0000
Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58])
	by dfw.source.kernel.org (Postfix) with ESMTP id D774161DE7;
	Fri, 17 May 2024 17:25:46 +0000 (UTC)
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 24867C2BD10;
	Fri, 17 May 2024 17:25:44 +0000 (UTC)
Date: Fri, 17 May 2024 18:25:42 +0100
From: Catalin Marinas <catalin.marinas@arm.com>
To: Yang Shi <yang@os.amperecomputing.com>
Cc: peterx@redhat.com, will@kernel.org, scott@os.amperecomputing.com,
	cl@gentwo.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] arm64: mm: force write fault for atomic RMW instructions
Message-ID: <ZkeTFiF_OOy80stO@arm.com>
References: <20240507223558.3039562-1-yang@os.amperecomputing.com>
 <Zj4O8q9-bliXE435@arm.com>
 <6066e0da-f00a-40fd-a5e2-d4d78786c227@os.amperecomputing.com>
 <ZkM_WXxEQo51mrK5@arm.com>
 <570c686c-6aa1-43f0-ba31-3597a329e037@os.amperecomputing.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <570c686c-6aa1-43f0-ba31-3597a329e037@os.amperecomputing.com>
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20240517_102549_225052_2D03841C 
X-CRM114-Status: GOOD (  35.80  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

On Fri, May 17, 2024 at 09:30:23AM -0700, Yang Shi wrote:
> On 5/14/24 3:39 AM, Catalin Marinas wrote:
> > It would be good to understand why openjdk is doing this instead of a
> > plain write. Is it because it may be racing with some other threads
> > already using the heap? That would be a valid pattern.
> =

> Yes, you are right. I think I quoted the JVM justification in earlier ema=
il,
> anyway they said "permit use of memory concurrently with pretouch".

Ah, sorry, I missed that. This seems like a valid reason.

> > A point Will raised was on potential ABI changes introduced by this
> > patch. The ESR_EL1 reported to user remains the same as per the hardware
> > spec (read-only), so from a SIGSEGV we may have some slight behaviour
> > changes:
> > =

> > 1. PTE invalid:
> > =

> >     a) vma is VM_READ && !VM_WRITE permission - SIGSEGV reported with
> >        ESR_EL1.WnR =3D=3D 0 in sigcontext with your patch. Without this
> >        patch, the PTE is mapped as PTE_RDONLY first and a subsequent
> >        fault will report SIGSEGV with ESR_EL1.WnR =3D=3D 1.
> =

> I think I can do something like the below conceptually:
> =

> if is_el0_atomic_instr && !is_write_abort
> =A0=A0=A0 force_write =3D true
> =

> if VM_READ && !VM_WRITE && force_write =3D=3D true

Nit: write implies read, so you only need to check !write.

> =A0=A0=A0 vm_flags =3D VM_READ
> =A0=A0=A0 mm_flags ~=3D FAULT_FLAG_WRITE
> =

> Then we just fallback to read fault. The following write fault will trigg=
er
> SIGSEGV with consistent ABI.

I think this should work. So instead of reporting the write fault
directly in case of a read-only vma, we let the core code handle the
read fault and first and we retry the atomic instruction.

> >     b) vma is !VM_READ && !VM_WRITE permission - SIGSEGV reported with
> >        ESR_EL1.WnR =3D=3D 0, so no change from current behaviour, unles=
s we
> >        fix the patch for (1.a) to fake the WnR bit which would change t=
he
> >        current expectations.
> > =

> > 2. PTE valid with PTE_RDONLY - we get a normal writeable fault in
> >     hardware, no need to fix ESR_EL1 up.
> > =

> > The patch would have to address (1) above but faking the ESR_EL1.WnR bit
> > based on the vma flags looks a bit fragile.
> =

> I think we don't need to fake the ESR_EL1.WnR bit with the fallback.

I agree, with your approach above we don't need to fake WnR.

> > Similarly, we have userfaultfd that reports the fault to user. I think
> > in scenario (1) the kernel will report UFFD_PAGEFAULT_FLAG_WRITE with
> > your patch but no UFFD_PAGEFAULT_FLAG_WP. Without this patch, there are
> > indeed two faults, with the second having both UFFD_PAGEFAULT_FLAG_WP
> > and UFFD_PAGEFAULT_FLAG_WRITE set.
> =

> I don't quite get what the problem is. IIUC, uffd just needs a signal from
> kernel to tell this area will be written. It seems not break the semantic.
> Added Peter Xu in this loop, who is the uffd developer. He may shed some
> light.

Not really familiar with uffd but just looking at the code, if a handler
is registered for both MODE_MISSING and MODE_WP, currently the atomic
instruction signals a user fault without UFFD_PAGEFAULT_FLAG_WRITE (the
do_anonymous_page() path). If the page is mapped by the uffd handler as
the zero page, a restart of the instruction would signal
UFFD_PAGEFAULT_FLAG_WRITE and UFFD_PAGEFAULT_FLAG_WP (the do_wp_page()
path).

With your patch, we get the equivalent of UFFD_PAGEFAULT_FLAG_WRITE on
the first attempt, just like having a STR instruction instead of
separate LDR + STR (as the atomics behave from a fault perspective).

However, I don't think that's a problem, the uffd handler should cope
with an STR anyway, so it's not some unexpected combination of flags.

-- =

Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE77638DD6
	for <linux-kernel@vger.kernel.org>; Fri, 17 May 2024 17:25:46 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1715966747; cv=none; b=VYWdr6TjlqSB8A0lIFvE/C5sNNsc2yVAQpWf0xto0EfPfyBRZwWeqHM5v+EWED2Pigad+szXNZmlzgg3FfhBBt41OIUBR/aXqHbFDnDaAfjRpUKv3quG/PEmD8XedG5+Hr1oMh4t7QjpdBRdqek1x0CpJOkuus8kw0qg5YspSaM=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1715966747; c=relaxed/simple;
	bh=xTrgfjrD4g3Hlii3WVx0sS/D+vbmZjqWJkxgae/P9qs=;
	h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=meb2vAnAuYGb2CnK5cDpL+4psQgTelG2ov0xY9XsPxRe0mYgPA6apdSLJJprThaItp76DRGbBVcYV/dzJePpNcmemsnqTw79aEPSH59Pk6XT8RR0fcit5vdGu5fuY0EnT5jvp6oikHZa1t7uhuDTpyNDZ9/N8kE10EoID1Q47dY=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 24867C2BD10;
	Fri, 17 May 2024 17:25:44 +0000 (UTC)
Date: Fri, 17 May 2024 18:25:42 +0100
From: Catalin Marinas <catalin.marinas@arm.com>
To: Yang Shi <yang@os.amperecomputing.com>
Cc: peterx@redhat.com, will@kernel.org, scott@os.amperecomputing.com,
	cl@gentwo.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] arm64: mm: force write fault for atomic RMW instructions
Message-ID: <ZkeTFiF_OOy80stO@arm.com>
References: <20240507223558.3039562-1-yang@os.amperecomputing.com>
 <Zj4O8q9-bliXE435@arm.com>
 <6066e0da-f00a-40fd-a5e2-d4d78786c227@os.amperecomputing.com>
 <ZkM_WXxEQo51mrK5@arm.com>
 <570c686c-6aa1-43f0-ba31-3597a329e037@os.amperecomputing.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <570c686c-6aa1-43f0-ba31-3597a329e037@os.amperecomputing.com>

On Fri, May 17, 2024 at 09:30:23AM -0700, Yang Shi wrote:
> On 5/14/24 3:39 AM, Catalin Marinas wrote:
> > It would be good to understand why openjdk is doing this instead of a
> > plain write. Is it because it may be racing with some other threads
> > already using the heap? That would be a valid pattern.
> 
> Yes, you are right. I think I quoted the JVM justification in earlier email,
> anyway they said "permit use of memory concurrently with pretouch".

Ah, sorry, I missed that. This seems like a valid reason.

> > A point Will raised was on potential ABI changes introduced by this
> > patch. The ESR_EL1 reported to user remains the same as per the hardware
> > spec (read-only), so from a SIGSEGV we may have some slight behaviour
> > changes:
> > 
> > 1. PTE invalid:
> > 
> >     a) vma is VM_READ && !VM_WRITE permission - SIGSEGV reported with
> >        ESR_EL1.WnR == 0 in sigcontext with your patch. Without this
> >        patch, the PTE is mapped as PTE_RDONLY first and a subsequent
> >        fault will report SIGSEGV with ESR_EL1.WnR == 1.
> 
> I think I can do something like the below conceptually:
> 
> if is_el0_atomic_instr && !is_write_abort
>     force_write = true
> 
> if VM_READ && !VM_WRITE && force_write == true

Nit: write implies read, so you only need to check !write.

>     vm_flags = VM_READ
>     mm_flags ~= FAULT_FLAG_WRITE
> 
> Then we just fallback to read fault. The following write fault will trigger
> SIGSEGV with consistent ABI.

I think this should work. So instead of reporting the write fault
directly in case of a read-only vma, we let the core code handle the
read fault and first and we retry the atomic instruction.

> >     b) vma is !VM_READ && !VM_WRITE permission - SIGSEGV reported with
> >        ESR_EL1.WnR == 0, so no change from current behaviour, unless we
> >        fix the patch for (1.a) to fake the WnR bit which would change the
> >        current expectations.
> > 
> > 2. PTE valid with PTE_RDONLY - we get a normal writeable fault in
> >     hardware, no need to fix ESR_EL1 up.
> > 
> > The patch would have to address (1) above but faking the ESR_EL1.WnR bit
> > based on the vma flags looks a bit fragile.
> 
> I think we don't need to fake the ESR_EL1.WnR bit with the fallback.

I agree, with your approach above we don't need to fake WnR.

> > Similarly, we have userfaultfd that reports the fault to user. I think
> > in scenario (1) the kernel will report UFFD_PAGEFAULT_FLAG_WRITE with
> > your patch but no UFFD_PAGEFAULT_FLAG_WP. Without this patch, there are
> > indeed two faults, with the second having both UFFD_PAGEFAULT_FLAG_WP
> > and UFFD_PAGEFAULT_FLAG_WRITE set.
> 
> I don't quite get what the problem is. IIUC, uffd just needs a signal from
> kernel to tell this area will be written. It seems not break the semantic.
> Added Peter Xu in this loop, who is the uffd developer. He may shed some
> light.

Not really familiar with uffd but just looking at the code, if a handler
is registered for both MODE_MISSING and MODE_WP, currently the atomic
instruction signals a user fault without UFFD_PAGEFAULT_FLAG_WRITE (the
do_anonymous_page() path). If the page is mapped by the uffd handler as
the zero page, a restart of the instruction would signal
UFFD_PAGEFAULT_FLAG_WRITE and UFFD_PAGEFAULT_FLAG_WP (the do_wp_page()
path).

With your patch, we get the equivalent of UFFD_PAGEFAULT_FLAG_WRITE on
the first attempt, just like having a STR instruction instead of
separate LDR + STR (as the atomics behave from a fault perspective).

However, I don't think that's a problem, the uffd handler should cope
with an STR anyway, so it's not some unexpected combination of flags.

-- 
Catalin