From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 681A2C00528 for ; Mon, 24 Jul 2023 17:50:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230376AbjGXRum (ORCPT ); Mon, 24 Jul 2023 13:50:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50874 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230022AbjGXRuY (ORCPT ); Mon, 24 Jul 2023 13:50:24 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C1B864C19; Mon, 24 Jul 2023 10:47:27 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6434961360; Mon, 24 Jul 2023 17:46:35 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CD843C433C8; Mon, 24 Jul 2023 17:46:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1690220794; bh=jy1HYFTjzSQArcOEX/GY5RMYACOnapkGSnMBIFuMSVA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=AZyI90B0S9SoHJriLWGk76YncnZYcthpOlVsVabOcFAJLgSA/xo37jIca8IR/HBa1 m+HvLMsjwiPjR+tp6nQiXQ8kGKVHa/JEELgkZCWxAJ+NZHNNbTGYzn3ARc/p3d49pM xDgeU8sY7cYA62OXouG2rEgCB12c020nsLhqbIy9t3OnhFIBkHMFJl8obnYVgoN8Eq RVPWPmy5A+AaU5qr09aqdqBhnM7zxGL5eMNTGw9Om2wp6YMuEYUSDB2ypEuYrqrm4r BbEEKXb7ZivJlJ0t59WJ9ejX80asVcOf5nwFIVgYBuJl+KWcDT6FHs5SDf3Si4bxs1 F0r5SGCM7icFw== Date: Mon, 24 Jul 2023 19:46:30 +0200 From: Christian Brauner To: Linus Torvalds Cc: Jens Axboe , Christoph Hellwig , Aleksa Sarai , Al Viro , Seth Forshee , linux-fsdevel@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH] file: always lock position Message-ID: <20230724-eckpunkte-melden-fc35b97d1c11@brauner> References: <20230724-vfs-fdget_pos-v1-1-a4abfd7103f3@kernel.org> <20230724-scheren-absegnen-8c807c760ba1@brauner> <20230724-gebessert-wortwahl-195daecce8f0@brauner> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Mon, Jul 24, 2023 at 10:34:31AM -0700, Linus Torvalds wrote: > On Mon, 24 Jul 2023 at 10:23, Christian Brauner wrote: > > > > This means pidfd_getfd() needs the same treatment as MSG_PEEK for sockets. > > So the reason I think pidfd_getfd() is ok is that it has serialized > with the source of the file descriptor and uses fget_task() -> > __fget_files. > > And that code is nasty and complicated, but it does get_file_rcu() to > increment the file count, and then *after* that it still checks that > yes, the file pointer is still there. > > And that means that anybody who uses fget_task() will only ever get a > ref to a file if that file still existed in the source, and you can > never have a situation where a file comes back to life. > > The reason MSG_PEEK is special is exactly because it can "resurrect" a > file that was closed, and added to the unix SCM garbage collection > list as "only has a ref in the SCM thing", so when we then make it > live again, it needs that very very subtle thing. Oh, eew. > > So pidfd_getfd() is ok in this regard. > > But it *is* an example of how subtle it is to just get a new ref to an > existing file. > > That whole > > if (atomic_read_acquire(&files->count) == 1) { > > in __fget_light() is also worth being aware of. It isn't about the > file count, but it is about "I have exclusive access to the file > table". So you can *not* close a file, or open a file, for another > process from outside. The only thread that is allowed to access or > change the file table (including resizing it), is the thread itself. Yeah, I'm well aware of that because an earlier version of the getdents patchset would've run into exactly that problem because of async offload on the file while the process that registered that async offload would've been free to create another thread and issue additional requests. > > I really hope we don't have cases of that. I don't think we do but it's something to keep in mind with async io interfaces where the caller is free to create other threads after having registered a request. Depending on how file references are done things can get tricky easily. I'm not complaining here but it is worth noting that additions to e.g., io_uring now have to reason through two reference counting mechanisms: the regular system call rules and the fixed file io_uring rules. If conditional locking is involved in one part it might have effects/consequences in the other part as one can mix regular system call requests and fixed file requests on the same struct file.