From mboxrd@z Thu Jan  1 00:00:00 1970
From: Catalin Marinas <catalin.marinas@arm.com>
Date: Wed, 20 Oct 2021 23:44:15 +0100
Subject: [Cluster-devel] [PATCH v8 00/17] gfs2: Fix mmap + page fault
	deadlocks
In-Reply-To: <CAHk-=wgv=KPZBJGnx_O5-7hhST8CL9BN4wJwtVuycjhv_1MmvQ@mail.gmail.com>
References: <20211019134204.3382645-1-agruenba@redhat.com>
	<CAHk-=wh0_3y5s7-G74U0Pcjm7Y_yHB608NYrQSvgogVNBxsWSQ@mail.gmail.com>
	<YXBFqD9WVuU8awIv@arm.com>
	<CAHk-=wgv=KPZBJGnx_O5-7hhST8CL9BN4wJwtVuycjhv_1MmvQ@mail.gmail.com>
Message-ID: <YXCbv5gdfEEtAYo8@arm.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

On Wed, Oct 20, 2021 at 10:11:19AM -1000, Linus Torvalds wrote:
> On Wed, Oct 20, 2021 at 6:37 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > The atomic "add zero" trick isn't that simple for MTE since the arm64
> > atomic or exclusive instructions run with kernel privileges and
> > therefore with the kernel tag checking mode.
> 
> Are there any instructions that are useful for "probe_user_write()"
> kind of thing?

If it's on a user address, the only single-instruction that works with
MTE is STTR (as in put_user()) but that's destructive. Other "add zero"
constructs require some potentially expensive system register accesses
just to set the tag checking mode of the current task.

A probe_user_write() on the kernel linear address involves reading the
tag from memory and comparing it with the tag in the user pointer. In
addition, it needs to take into account the current task's tag checking
mode and the vma vm_flags. We should have most of the information in the
gup code.

> Although at least for MTE, I think the solution was to do a regular
> read, and that checks the tag, and then we could use the gup machinery
> for the writability checks.

Yes, for MTE this should work. For CHERI I think an "add zero" would
do the trick (it should have atomics that work on capabilities
directly). However, with MTE doing both get_user() every 16 bytes and
gup can get pretty expensive. The problematic code is
fault_in_safe_writable() in this series.

I can give this 16-byte probing in gup a try (on top of -next) but IMHO
we unnecessarily overload the fault_in_*() logic with something the
kernel cannot fix up. The only reason we do it is so that we get an
error code and bail out of a loop but the uaccess routines could be
extended to report the fault type instead. It looks like we pretty much
duplicate the uaccess in the fault_in_*() functions (four accesses per
cache line).

-- 
Catalin


From mboxrd@z Thu Jan  1 00:00:00 1970
From: Catalin Marinas <catalin.marinas@arm.com>
Date: Wed, 20 Oct 2021 22:44:15 +0000
Subject: Re: [PATCH v8 00/17] gfs2: Fix mmap + page fault deadlocks
Message-Id: <YXCbv5gdfEEtAYo8@arm.com>
List-Id: <kvm-ppc.vger.kernel.org>
References: <20211019134204.3382645-1-agruenba@redhat.com>
 <CAHk-=wh0_3y5s7-G74U0Pcjm7Y_yHB608NYrQSvgogVNBxsWSQ@mail.gmail.com>
 <YXBFqD9WVuU8awIv@arm.com>
 <CAHk-=wgv=KPZBJGnx_O5-7hhST8CL9BN4wJwtVuycjhv_1MmvQ@mail.gmail.com>
In-Reply-To: <CAHk-=wgv=KPZBJGnx_O5-7hhST8CL9BN4wJwtVuycjhv_1MmvQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>, Paul Mackerras <paulus@ozlabs.org>, Alexander Viro <viro@zeniv.linux.org.uk>, Christoph Hellwig <hch@infradead.org>, "Darrick J. Wong" <djwong@kernel.org>, Jan Kara <jack@suse.cz>, Matthew Wilcox <willy@infradead.org>, cluster-devel <cluster-devel@redhat.com>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, ocfs2-devel@oss.oracle.com, kvm-ppc@vger.kernel.org, linux-btrfs <linux-btrfs@vger.kernel.org>

On Wed, Oct 20, 2021 at 10:11:19AM -1000, Linus Torvalds wrote:
> On Wed, Oct 20, 2021 at 6:37 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > The atomic "add zero" trick isn't that simple for MTE since the arm64
> > atomic or exclusive instructions run with kernel privileges and
> > therefore with the kernel tag checking mode.
> 
> Are there any instructions that are useful for "probe_user_write()"
> kind of thing?

If it's on a user address, the only single-instruction that works with
MTE is STTR (as in put_user()) but that's destructive. Other "add zero"
constructs require some potentially expensive system register accesses
just to set the tag checking mode of the current task.

A probe_user_write() on the kernel linear address involves reading the
tag from memory and comparing it with the tag in the user pointer. In
addition, it needs to take into account the current task's tag checking
mode and the vma vm_flags. We should have most of the information in the
gup code.

> Although at least for MTE, I think the solution was to do a regular
> read, and that checks the tag, and then we could use the gup machinery
> for the writability checks.

Yes, for MTE this should work. For CHERI I think an "add zero" would
do the trick (it should have atomics that work on capabilities
directly). However, with MTE doing both get_user() every 16 bytes and
gup can get pretty expensive. The problematic code is
fault_in_safe_writable() in this series.

I can give this 16-byte probing in gup a try (on top of -next) but IMHO
we unnecessarily overload the fault_in_*() logic with something the
kernel cannot fix up. The only reason we do it is so that we get an
error code and bail out of a loop but the uaccess routines could be
extended to report the fault type instead. It looks like we pretty much
duplicate the uaccess in the fault_in_*() functions (four accesses per
cache line).

-- 
Catalin

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-btrfs-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A3D1FC433F5
	for <linux-btrfs@archiver.kernel.org>; Wed, 20 Oct 2021 22:44:22 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 8E6D66121E
	for <linux-btrfs@archiver.kernel.org>; Wed, 20 Oct 2021 22:44:22 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231162AbhJTWqg (ORCPT <rfc822;linux-btrfs@archiver.kernel.org>);
        Wed, 20 Oct 2021 18:46:36 -0400
Received: from mail.kernel.org ([198.145.29.99]:39478 "EHLO mail.kernel.org"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S229771AbhJTWqf (ORCPT <rfc822;linux-btrfs@vger.kernel.org>);
        Wed, 20 Oct 2021 18:46:35 -0400
Received: by mail.kernel.org (Postfix) with ESMTPSA id 5CBE26115B;
        Wed, 20 Oct 2021 22:44:18 +0000 (UTC)
Date:   Wed, 20 Oct 2021 23:44:15 +0100
From:   Catalin Marinas <catalin.marinas@arm.com>
To:     Linus Torvalds <torvalds@linux-foundation.org>
Cc:     Andreas Gruenbacher <agruenba@redhat.com>,
        Paul Mackerras <paulus@ozlabs.org>,
        Alexander Viro <viro@zeniv.linux.org.uk>,
        Christoph Hellwig <hch@infradead.org>,
        "Darrick J. Wong" <djwong@kernel.org>, Jan Kara <jack@suse.cz>,
        Matthew Wilcox <willy@infradead.org>,
        cluster-devel <cluster-devel@redhat.com>,
        linux-fsdevel <linux-fsdevel@vger.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        ocfs2-devel@oss.oracle.com, kvm-ppc@vger.kernel.org,
        linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH v8 00/17] gfs2: Fix mmap + page fault deadlocks
Message-ID: <YXCbv5gdfEEtAYo8@arm.com>
References: <20211019134204.3382645-1-agruenba@redhat.com>
 <CAHk-=wh0_3y5s7-G74U0Pcjm7Y_yHB608NYrQSvgogVNBxsWSQ@mail.gmail.com>
 <YXBFqD9WVuU8awIv@arm.com>
 <CAHk-=wgv=KPZBJGnx_O5-7hhST8CL9BN4wJwtVuycjhv_1MmvQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAHk-=wgv=KPZBJGnx_O5-7hhST8CL9BN4wJwtVuycjhv_1MmvQ@mail.gmail.com>
Precedence: bulk
List-ID: <linux-btrfs.vger.kernel.org>
X-Mailing-List: linux-btrfs@vger.kernel.org

On Wed, Oct 20, 2021 at 10:11:19AM -1000, Linus Torvalds wrote:
> On Wed, Oct 20, 2021 at 6:37 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > The atomic "add zero" trick isn't that simple for MTE since the arm64
> > atomic or exclusive instructions run with kernel privileges and
> > therefore with the kernel tag checking mode.
> 
> Are there any instructions that are useful for "probe_user_write()"
> kind of thing?

If it's on a user address, the only single-instruction that works with
MTE is STTR (as in put_user()) but that's destructive. Other "add zero"
constructs require some potentially expensive system register accesses
just to set the tag checking mode of the current task.

A probe_user_write() on the kernel linear address involves reading the
tag from memory and comparing it with the tag in the user pointer. In
addition, it needs to take into account the current task's tag checking
mode and the vma vm_flags. We should have most of the information in the
gup code.

> Although at least for MTE, I think the solution was to do a regular
> read, and that checks the tag, and then we could use the gup machinery
> for the writability checks.

Yes, for MTE this should work. For CHERI I think an "add zero" would
do the trick (it should have atomics that work on capabilities
directly). However, with MTE doing both get_user() every 16 bytes and
gup can get pretty expensive. The problematic code is
fault_in_safe_writable() in this series.

I can give this 16-byte probing in gup a try (on top of -next) but IMHO
we unnecessarily overload the fault_in_*() logic with something the
kernel cannot fix up. The only reason we do it is so that we get an
error code and bail out of a loop but the uaccess routines could be
extended to report the fault type instead. It looks like we pretty much
duplicate the uaccess in the fault_in_*() functions (four accesses per
cache line).

-- 
Catalin

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=sznG=PI=oss.oracle.com=ocfs2-devel-bounces@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A9AAAC433EF
	for <ocfs2-devel@archiver.kernel.org>; Wed, 20 Oct 2021 22:50:46 +0000 (UTC)
Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 637AA6128E
	for <ocfs2-devel@archiver.kernel.org>; Wed, 20 Oct 2021 22:50:46 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 637AA6128E
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=oss.oracle.com
Received: from pps.filterd (m0246617.ppops.net [127.0.0.1])
	by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 19KLq1EY019159;
	Wed, 20 Oct 2021 22:50:46 GMT
Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79])
	by mx0b-00069f02.pphosted.com with ESMTP id 3btqyphjej-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
	Wed, 20 Oct 2021 22:50:45 +0000
Received: from pps.filterd (userp3020.oracle.com [127.0.0.1])
	by userp3020.oracle.com (8.16.1.2/8.16.1.2) with SMTP id 19KMjaBM037572;
	Wed, 20 Oct 2021 22:50:44 GMT
Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2])
	by userp3020.oracle.com with ESMTP id 3br8guxafv-1
	(version=TLSv1 cipher=AES256-SHA bits=256 verify=NO);
	Wed, 20 Oct 2021 22:50:43 +0000
Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com)
	by oss.oracle.com with esmtp (Exim 4.63)
	(envelope-from <ocfs2-devel-bounces@oss.oracle.com>)
	id 1mdKJs-00083Y-UC; Wed, 20 Oct 2021 15:44:24 -0700
Received: from userp3020.oracle.com ([156.151.31.79])
	by oss.oracle.com with esmtp (Exim 4.63)
	(envelope-from <cmarinas@kernel.org>) id 1mdKJr-00083C-RG
	for ocfs2-devel@oss.oracle.com; Wed, 20 Oct 2021 15:44:23 -0700
Received: from pps.filterd (userp3020.oracle.com [127.0.0.1])
	by userp3020.oracle.com (8.16.1.2/8.16.1.2) with SMTP id 19KMZbFP006099
	for <ocfs2-devel@oss.oracle.com>; Wed, 20 Oct 2021 22:44:23 GMT
Received: from mx0b-00069f01.pphosted.com (mx0b-00069f01.pphosted.com
	[205.220.177.26]) by userp3020.oracle.com with ESMTP id 3br8gux4rd-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK)
	for <ocfs2-devel@oss.oracle.com>; Wed, 20 Oct 2021 22:44:23 +0000
Received: from pps.filterd (m0246579.ppops.net [127.0.0.1])
	by mx0b-00069f01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id
	19KGt1YQ015369
	for <ocfs2-devel@oss.oracle.com>; Wed, 20 Oct 2021 22:44:22 GMT
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by mx0b-00069f01.pphosted.com with ESMTP id 3btbqddfha-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO)
	for <ocfs2-devel@oss.oracle.com>; Wed, 20 Oct 2021 22:44:22 +0000
Received: by mail.kernel.org (Postfix) with ESMTPSA id 5CBE26115B;
	Wed, 20 Oct 2021 22:44:18 +0000 (UTC)
Date: Wed, 20 Oct 2021 23:44:15 +0100
From: Catalin Marinas <catalin.marinas@arm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Message-ID: <YXCbv5gdfEEtAYo8@arm.com>
References: <20211019134204.3382645-1-agruenba@redhat.com>
	<CAHk-=wh0_3y5s7-G74U0Pcjm7Y_yHB608NYrQSvgogVNBxsWSQ@mail.gmail.com>
	<YXBFqD9WVuU8awIv@arm.com>
	<CAHk-=wgv=KPZBJGnx_O5-7hhST8CL9BN4wJwtVuycjhv_1MmvQ@mail.gmail.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <CAHk-=wgv=KPZBJGnx_O5-7hhST8CL9BN4wJwtVuycjhv_1MmvQ@mail.gmail.com>
X-Source-IP: 198.145.29.99
X-ServerName: mail.kernel.org
X-Proofpoint-SPF-Result: pass
X-Proofpoint-SPF-Record: v=spf1 mx include:_spf.kernel.org ~all
X-Proofpoint-Virus-Version: vendor=nai engine=6300 definitions=10143
	signatures=668683
X-Proofpoint-Spam-Reason: safe
X-Spam: OrgSafeList
X-SpamRule: orgsafelist
Cc: kvm-ppc@vger.kernel.org, Christoph Hellwig <hch@infradead.org>,
        cluster-devel <cluster-devel@redhat.com>, Jan Kara <jack@suse.cz>,
        Andreas Gruenbacher <agruenba@redhat.com>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        Paul Mackerras <paulus@ozlabs.org>,
        Alexander Viro <viro@zeniv.linux.org.uk>,
        linux-fsdevel <linux-fsdevel@vger.kernel.org>,
        linux-btrfs <linux-btrfs@vger.kernel.org>, ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] [PATCH v8 00/17] gfs2: Fix mmap + page fault
	deadlocks
X-BeenThere: ocfs2-devel@oss.oracle.com
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: <ocfs2-devel.oss.oracle.com>
List-Unsubscribe: <https://oss.oracle.com/mailman/listinfo/ocfs2-devel>,
	<mailto:ocfs2-devel-request@oss.oracle.com?subject=unsubscribe>
List-Archive: <http://oss.oracle.com/pipermail/ocfs2-devel>
List-Post: <mailto:ocfs2-devel@oss.oracle.com>
List-Help: <mailto:ocfs2-devel-request@oss.oracle.com?subject=help>
List-Subscribe: <https://oss.oracle.com/mailman/listinfo/ocfs2-devel>,
	<mailto:ocfs2-devel-request@oss.oracle.com?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: ocfs2-devel-bounces@oss.oracle.com
Errors-To: ocfs2-devel-bounces@oss.oracle.com
X-Proofpoint-Virus-Version: vendor=nai engine=6300 definitions=10143 signatures=668683
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 mlxscore=0 adultscore=0
 spamscore=0 phishscore=0 bulkscore=0 suspectscore=0 malwarescore=0
 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109230001
 definitions=main-2110200129
X-Proofpoint-ORIG-GUID: dM1mYDhP7kXPzW1qSxw7aJfwfeRye76J
X-Proofpoint-GUID: dM1mYDhP7kXPzW1qSxw7aJfwfeRye76J

On Wed, Oct 20, 2021 at 10:11:19AM -1000, Linus Torvalds wrote:
> On Wed, Oct 20, 2021 at 6:37 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > The atomic "add zero" trick isn't that simple for MTE since the arm64
> > atomic or exclusive instructions run with kernel privileges and
> > therefore with the kernel tag checking mode.
> 
> Are there any instructions that are useful for "probe_user_write()"
> kind of thing?

If it's on a user address, the only single-instruction that works with
MTE is STTR (as in put_user()) but that's destructive. Other "add zero"
constructs require some potentially expensive system register accesses
just to set the tag checking mode of the current task.

A probe_user_write() on the kernel linear address involves reading the
tag from memory and comparing it with the tag in the user pointer. In
addition, it needs to take into account the current task's tag checking
mode and the vma vm_flags. We should have most of the information in the
gup code.

> Although at least for MTE, I think the solution was to do a regular
> read, and that checks the tag, and then we could use the gup machinery
> for the writability checks.

Yes, for MTE this should work. For CHERI I think an "add zero" would
do the trick (it should have atomics that work on capabilities
directly). However, with MTE doing both get_user() every 16 bytes and
gup can get pretty expensive. The problematic code is
fault_in_safe_writable() in this series.

I can give this 16-byte probing in gup a try (on top of -next) but IMHO
we unnecessarily overload the fault_in_*() logic with something the
kernel cannot fix up. The only reason we do it is so that we get an
error code and bail out of a loop but the uaccess routines could be
extended to report the fault type instead. It looks like we pretty much
duplicate the uaccess in the fault_in_*() functions (four accesses per
cache line).

-- 
Catalin

_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel