From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80023C433FE for ; Wed, 27 Oct 2021 19:13:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 67EB760C40 for ; Wed, 27 Oct 2021 19:13:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240601AbhJ0TPd (ORCPT ); Wed, 27 Oct 2021 15:15:33 -0400 Received: from mail.kernel.org ([198.145.29.99]:40882 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240552AbhJ0TPc (ORCPT ); Wed, 27 Oct 2021 15:15:32 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 8EAE960EB4; Wed, 27 Oct 2021 19:13:04 +0000 (UTC) Date: Wed, 27 Oct 2021 20:13:01 +0100 From: Catalin Marinas To: Linus Torvalds Cc: Andreas Gruenbacher , Paul Mackerras , Alexander Viro , Christoph Hellwig , "Darrick J. Wong" , Jan Kara , Matthew Wilcox , cluster-devel , linux-fsdevel , Linux Kernel Mailing List , ocfs2-devel@oss.oracle.com, kvm-ppc@vger.kernel.org, linux-btrfs Subject: Re: [PATCH v8 00/17] gfs2: Fix mmap + page fault deadlocks Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Tue, Oct 26, 2021 at 11:50:04AM -0700, Linus Torvalds wrote: > On Tue, Oct 26, 2021 at 11:24 AM Catalin Marinas > wrote: > > While more intrusive, I'd rather change copy_page_from_iter_atomic() > > etc. to take a pointer where to write back an error code. [...] > That said, the fact that these sub-page faults are always > non-recoverable might be a hint to a solution to the problem: maybe we > could extend the existing return code with actual negative error > numbers. > > Because for _most_ cases of "copy_to/from_user()" and friends by far, > the only thing we look for is "zero for success". > > We could extend the "number of bytes _not_ copied" semantics to say > "negative means fatal", and because there are fairly few places that > actually look at non-zero values, we could have a coccinelle script > that actually marks those places. As you already replied, there are some odd places where the returned uncopied of bytes is used. Also for some valid cases like copy_mount_options(), it's likely that it will fall back to byte-at-a-time with MTE since it's a good chance it would hit a fault in a 4K page (not a fast path though). I'd have to go through all the cases and check whether the return value is meaningful. The iter_iov.c functions and their callers also seem to make use of the bytes copied in case they need to call iov_iter_revert() (though I suppose the iov_iter_iovec_advance() would skip the update in case of an error). As an alternative, you mentioned earlier that a per-thread fault status was not feasible on x86 due to races. Was this only for the hw poison case? I think the uaccess is slightly different. We can add a current->non_recoverable_uaccess variable cleared on pagefault_disable(), only set by uaccess faults and checked by the fs code before re-attempting the fault_in(). An interrupt shouldn't do a uaccess (well, if it does a _nofault one, we can detect in_interrupt() in the MTE exception handler). Last time I looked at io_uring it was running in a separate kernel thread, not sure whether this was changed. I don't see what else would be racing with such current->non_recoverable_uaccess variable. If that's doable, I think it's the least intrusive approach. -- Catalin