From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from zeniv.linux.org.uk (zeniv.linux.org.uk [62.89.141.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C712502BE; Wed, 6 Aug 2025 01:54:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=62.89.141.173 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754445295; cv=none; b=cJnkn0FVcTNLbUZT7yRUfnK7wgvDHBfysWbim7Y8Ee95RlTWUlwDm7tsoEIqlanPtdM7S/3uj7jlj/RNyNnWQgLd9ZR1N08F+AMXQWFJBzpT8JzGJ9+dAuV3jXb7DIrVGMlwIDNRkDiUkOxlK73OrdvYY1OM2ToG00qojrXq+6Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754445295; c=relaxed/simple; bh=EPVzJPUmvWwbmQcqVM3T4ytLQLa5WfF2jfCt0acpElE=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ScF3ieRZY51BFVox2gqoaErtbbJ+hbtcVk1LUzvGDfeKIJJkHySzkXk0N+f7yjC6XD9XGSqIwLr99ozBTM7K/zkxmJA8gbMRsfweZdqSPTrYWzjxcRwDdpMwhOjLBq4urGzLD2mtYWABbiSIKBplDCoIKMQXBtJQSbUiuc7LcaE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=zeniv.linux.org.uk; spf=none smtp.mailfrom=ftp.linux.org.uk; dkim=pass (2048-bit key) header.d=linux.org.uk header.i=@linux.org.uk header.b=L744vBhq; arc=none smtp.client-ip=62.89.141.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=zeniv.linux.org.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ftp.linux.org.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linux.org.uk header.i=@linux.org.uk header.b="L744vBhq" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=linux.org.uk; s=zeniv-20220401; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=y7hjcTlBLlHEkIsVkJW9LcLYrfSDMpCRStdSI9oCyBM=; b=L744vBhqaFVKh14WsOoVpPCpHk Um4YnOsK0YP+Eb69qKtI76/vtSGMMUY9nBQN+YPT2U/WQKvvIi9T5refIu2ThAdMCbGYCVKXAh4Px Ff3n6Ie5XtTSolnLiIOZxqsv9/tCStxb7vgQhIS6e7u0kLJuMXhvNAphammMluL9+CUpN363VNH2r FWz8IoX1+jU3hNtXCvsyTh4NoQK4O9HErhAJflMa8WjFM1vatAJK7N34S105oExp6vsbipc8OvjZe LjvTUXfo9FQV0lrue+LCVbeaokOsgrhFKDPdrJ+7PY+Rc80OQuvbSnpCRJWdJe4icmLQZzJwkMbSB jF12ROjg==; Received: from viro by zeniv.linux.org.uk with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1ujTMf-00000007Aqw-3DD8; Wed, 06 Aug 2025 01:54:49 +0000 Date: Wed, 6 Aug 2025 02:54:49 +0100 From: Al Viro To: Christian Brauner Cc: Thomas =?iso-8859-1?Q?Wei=DFschuh?= , Jan Kara , Sargun Dhillon , Kees Cook , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH v2] fs: always return zero on success from replace_fd() Message-ID: <20250806015449.GA1638962@ZenIV> References: <20250804-fix-receive_fd_replace-v2-1-ecb28c7b9129@linutronix.de> <20250804-rundum-anwalt-10c3b9c11f8e@brauner> <20250804155229.GY222315@ZenIV> <20250805-beleidigen-klugheit-c19b1657674a@brauner> <20250805153457.GB222315@ZenIV> <20250805195003.GC222315@ZenIV> Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250805195003.GC222315@ZenIV> Sender: Al Viro On Tue, Aug 05, 2025 at 08:50:03PM +0100, Al Viro wrote: > On Tue, Aug 05, 2025 at 04:34:57PM +0100, Al Viro wrote: > > > They do no allow to express things like "foo() consumes lock X". > > >From time to time, we *do* need that, and when that happens guards > > become a menace. > > > > Another case is > > lock > > if (lock-dependent condition) > > some work > > unlock > > work that can't be under that lock > > else > > some other work > > unlock > > more work that can't be under that lock > > > > Fairly common, especially when that's a spinlock and "can't be under that > > lock" includes blocking operations. Can't be expressed with guards, not > > without a massage that often ends up with bloody awful results. > > FWIW, I'm looking through the raw data I've got during ->d_lock audit. > Except for a few functions (all static in fs/dcache.c), all scopes > terminate in the same function where they begin. ... and for ->file_lock we have the following: expand_fdtable(): drops and regains expand_files(): either nothing or drops and regains do_dup2(): drops everything else is neutral. 20 functions touching that stuff total. Convertible to guard() or scoped_guard(): put_unused_fd(), fd_install(), close_fd() (scoped_guard only), __range_cloexec(), file_close_fd(), set_close_on_exec(), iterate_fd(), fcntl_setlk() and fcntl_setlk64() (scoped_guard only), seq_show() - 10 out of 20. Anything that uses expand_fdtable() is in the best case an abuse of guard/scoped_guard; in the worst, it's actively confusing, since there's *not* one continuous scope there. expand_files() is in the same boat. That covers alloc_fd(), replace_fd() and ksys_dup3(). That's 5 out of remaining 10. BTW, trying to eliminate gotos from alloc_fd() is not fun either. dup_fd(): spin_lock(&oldf->file_lock); ... while (unlikely(open_files > new_fdt->max_fds)) { spin_unlock(&oldf->file_lock); ... (blocking, possibly return on failure here) spin_lock(&oldf->file_lock); ... } ... spin_unlock(&oldf->file_lock); ... No way to do that with guard() - not unless you mix it with explicit unlock/lock in the middle of scope, and even that will be bitch to deal with due to failure exit after allocation failure. We'd need to do this: spin_unlock(&oldf->file_lock); if (new_fdt != &newf->fdtab) __free_fdtable(new_fdt); new_fdt = alloc_fdtable(open_files); spin_lock(&oldf->file_lock); if (IS_ERR(new_fdt)) { kmem_cache_free(files_cachep, newf); return ERR_CAST(new_fdt); } all of that under guard(spinlock)(&oldf->file_lock). IMO that would be too confusing and brittle. __range_close(): spin_lock(&files->file_lock); ... for (; fd <= max_fd; fd++) { file = file_close_fd_locked(files, fd); if (file) { spin_unlock(&files->file_lock); filp_close(file, files); cond_resched(); spin_lock(&files->file_lock); } else if (need_resched()) { spin_unlock(&files->file_lock); cond_resched(); spin_lock(&files->file_lock); } } spin_unlock(&files->file_lock); Not a good fit for guard(), for the same reasons. do_close_on_exec(): ... spin_lock(&files->file_lock); for (i = 0; ; i++) { .... for ( ; set ; fd++, set >>= 1) { .... spin_unlock(&files->file_lock); filp_close(file, files); cond_resched(); spin_lock(&files->file_lock); } } spin_unlock(&files->file_lock); Same story. io_close(): might be convertible to scoped_guard; won't be pretty, AFAICS - that -EAGAIN case in the middle makes it very clumsy. do_dup2(): well... we could lift filp_close() into the callers, but that ends up with fairly unpleasant boilerplate in the callers, and one of those callers is a fairly hot syscall. And that's the remaining 5. For some locks scoped_guard() is a decent fit; for some it really isn't ;-/