From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D03BBC433EF for ; Thu, 25 Nov 2021 20:46:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357211AbhKYUtP (ORCPT ); Thu, 25 Nov 2021 15:49:15 -0500 Received: from mail.kernel.org ([198.145.29.99]:48676 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242104AbhKYUrO (ORCPT ); Thu, 25 Nov 2021 15:47:14 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 0D2BD6101B; Thu, 25 Nov 2021 20:44:00 +0000 (UTC) Date: Thu, 25 Nov 2021 20:43:57 +0000 From: Catalin Marinas To: Linus Torvalds Cc: Matthew Wilcox , Josef Bacik , David Sterba , Andreas Gruenbacher , Al Viro , Andrew Morton , Will Deacon , linux-fsdevel , Linux Kernel Mailing List , Linux ARM , linux-btrfs Subject: Re: [PATCH 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults Message-ID: References: <20211124192024.2408218-1-catalin.marinas@arm.com> <20211124192024.2408218-4-catalin.marinas@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Thu, Nov 25, 2021 at 10:13:25AM -0800, Linus Torvalds wrote: > On Thu, Nov 25, 2021 at 3:10 AM Catalin Marinas wrote: > > For this specific btrfs case, if we want go with tuning the offset based > > on the fault address, we'd need copy_to_user_nofault() (or a new > > function) to be exact. > > I really don't see why you harp on the exactness. I guess because I always thought we either fix fault_in_writable() to probe the whole range (this series) or we change the loops to take the copy_to_user() returned value into account when re-faulting. > I really believe that the fix is to make the read/write probing just > be more aggressive. > > Make the read/write probing require that AT LEAST bytes be > readable/writable at the beginning, where 'n' is 'min(len,ALIGN)', and > ALIGN is whatever size that copy_from/to_user_xyz() might require just > because it might do multi-byte accesses. > > In fact, make ALIGN be perhaps something reasonable like 512 bytes or > whatever, and then you know you can handle the btrfs "copy a whole > structure and reset if that fails" case too. IIUC what you are suggesting, we still need changes to the btrfs loop similar to willy's but that should work fine together with a slightly more aggressive fault_in_writable(). A probing of at least sizeof(struct btrfs_ioctl_search_key) should suffice without any loop changes and 512 would cover it but it doesn't look generic enough. We could pass a 'probe_prefix' argument to fault_in_exact_writeable() to only probe this and btrfs would just specify the above sizeof(). > Don't require that the fundamental copying routines (and whatever > fixup the code might need) be some kind of byte-precise - it's the > error case that should instead be made stricter. > > If the user gave you a range that triggered a pointer color mismatch, > then returning an error is fine, rather than say "we'll do as much as > we can and waste time and effort on being byte-exact too". Yes, I'm fine with not copying anything at all, I just want to avoid the infinite loop. I think we are down to three approaches: 1. Probe the whole range in fault_in_*() for sub-page faults, no need to worry about copy_*_user() failures. 2. Get fault_in_*() to probe a sufficient prefix to cover the uaccess inexactness. In addition, change the btrfs loop to fault-in from where the copy_to_user() failed (willy's suggestion combined with the more aggressive probing in fault_in_*()). 3. Implement fault_in_exact_writeable(uaddr, size, probe_prefix) where loops that do some rewind would pass an appropriate value. (1) is this series, (2) requires changing the loop logic, (3) looks pretty simple. I'm happy to attempt either (2) or (3) with a preference for the latter. -- Catalin From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9E82BC433EF for ; Thu, 25 Nov 2021 20:45:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=LXkif8SyQENyqM15jp0+k6p+yEFlMfyj3ObYDa8BBuA=; b=5D48aDL/gCFqH3 yrc4xvrU6WW2DKm25x6sWfc6F1xb19cgdDmLCKH2zYE5C1SM0/vgviFqtUMpFVFLTHAmB81pvMIFf aZFge0effN7/oFFdoyzBFK7vrmFQrpq//1zfO85MJsi3RHJ9KMEtdiliY+b3LNK/+huNqHqJj01FR HjOuSwOvDIN8fIHx3cu/PZh4y9v7LauxYFc1RvA8ZMzQDe37OA3KbNeAO9sf4m5OdJtS7LqQKF4DO UGp6Bv+qp9eJKblEKsV1NdD4dYfU4DpnuVasJVKTBw83tjUEdQYaLzAxVWzHTaiaiMSgat5iJuPa3 +VtDpWqILG8/h5SDeZkw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mqLbJ-008cNC-LJ; Thu, 25 Nov 2021 20:44:13 +0000 Received: from mail.kernel.org ([198.145.29.99]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mqLbG-008cMd-Bt for linux-arm-kernel@lists.infradead.org; Thu, 25 Nov 2021 20:44:11 +0000 Received: by mail.kernel.org (Postfix) with ESMTPSA id 0D2BD6101B; Thu, 25 Nov 2021 20:44:00 +0000 (UTC) Date: Thu, 25 Nov 2021 20:43:57 +0000 From: Catalin Marinas To: Linus Torvalds Cc: Matthew Wilcox , Josef Bacik , David Sterba , Andreas Gruenbacher , Al Viro , Andrew Morton , Will Deacon , linux-fsdevel , Linux Kernel Mailing List , Linux ARM , linux-btrfs Subject: Re: [PATCH 3/3] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults Message-ID: References: <20211124192024.2408218-1-catalin.marinas@arm.com> <20211124192024.2408218-4-catalin.marinas@arm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20211125_124410_490631_03214E0B X-CRM114-Status: GOOD ( 25.62 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, Nov 25, 2021 at 10:13:25AM -0800, Linus Torvalds wrote: > On Thu, Nov 25, 2021 at 3:10 AM Catalin Marinas wrote: > > For this specific btrfs case, if we want go with tuning the offset based > > on the fault address, we'd need copy_to_user_nofault() (or a new > > function) to be exact. > > I really don't see why you harp on the exactness. I guess because I always thought we either fix fault_in_writable() to probe the whole range (this series) or we change the loops to take the copy_to_user() returned value into account when re-faulting. > I really believe that the fix is to make the read/write probing just > be more aggressive. > > Make the read/write probing require that AT LEAST bytes be > readable/writable at the beginning, where 'n' is 'min(len,ALIGN)', and > ALIGN is whatever size that copy_from/to_user_xyz() might require just > because it might do multi-byte accesses. > > In fact, make ALIGN be perhaps something reasonable like 512 bytes or > whatever, and then you know you can handle the btrfs "copy a whole > structure and reset if that fails" case too. IIUC what you are suggesting, we still need changes to the btrfs loop similar to willy's but that should work fine together with a slightly more aggressive fault_in_writable(). A probing of at least sizeof(struct btrfs_ioctl_search_key) should suffice without any loop changes and 512 would cover it but it doesn't look generic enough. We could pass a 'probe_prefix' argument to fault_in_exact_writeable() to only probe this and btrfs would just specify the above sizeof(). > Don't require that the fundamental copying routines (and whatever > fixup the code might need) be some kind of byte-precise - it's the > error case that should instead be made stricter. > > If the user gave you a range that triggered a pointer color mismatch, > then returning an error is fine, rather than say "we'll do as much as > we can and waste time and effort on being byte-exact too". Yes, I'm fine with not copying anything at all, I just want to avoid the infinite loop. I think we are down to three approaches: 1. Probe the whole range in fault_in_*() for sub-page faults, no need to worry about copy_*_user() failures. 2. Get fault_in_*() to probe a sufficient prefix to cover the uaccess inexactness. In addition, change the btrfs loop to fault-in from where the copy_to_user() failed (willy's suggestion combined with the more aggressive probing in fault_in_*()). 3. Implement fault_in_exact_writeable(uaddr, size, probe_prefix) where loops that do some rewind would pass an appropriate value. (1) is this series, (2) requires changing the loop logic, (3) looks pretty simple. I'm happy to attempt either (2) or (3) with a preference for the latter. -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel