From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18199CA0FF8 for ; Sat, 2 Sep 2023 03:44:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229901AbjIBDoS (ORCPT ); Fri, 1 Sep 2023 23:44:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53268 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229764AbjIBDoR (ORCPT ); Fri, 1 Sep 2023 23:44:17 -0400 Received: from zeniv.linux.org.uk (zeniv.linux.org.uk [IPv6:2a03:a000:7:0:5054:ff:fe1c:15ff]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D85D10F9; Fri, 1 Sep 2023 20:44:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=linux.org.uk; s=zeniv-20220401; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=TdcWiGkGWr+rD1ctZwNIbhg9pqLP9aJ0pJ+X3QdH5dA=; b=ftV/XKbOKQGkLA3auixsF08xzH oufphC6h59y/YuB+p4zlLX/RdPyKSu1OJLjy0aRck84oBbY2bJTtbCIuodLbeOazuvZpts8v2JZCy cBjDMAGdQMUDZyyZw8plGhtZKFKVm+vqDhaYcWlfhKELBPVHWLN8jlsSckQBgzq6ES/2ejHgpjRYx Z0whfqn+uo0dxqCv+fXdN4Ey7JPun/ScHB7zi8wmYnBWhJP3cwi/zvkRd4a9ZkS/pSYPpSJ1kQ+3+ KfGEC0oe2KH9Yz3AdY6ER9+9k8imbR15Usmvrb2IqROxg+Hs9sQETySC4K5mDjvQ3kCmdbGE8E37r VTi+tFUA==; Received: from viro by zeniv.linux.org.uk with local (Exim 4.96 #2 (Red Hat Linux)) id 1qcHYD-002jnC-2K; Sat, 02 Sep 2023 03:43:57 +0000 Date: Sat, 2 Sep 2023 04:43:57 +0100 From: Al Viro To: Linus Torvalds Cc: Christian Brauner , Mateusz Guzik , Jens Axboe , Christoph Hellwig , Aleksa Sarai , Seth Forshee , linux-fsdevel@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH] file: always lock position Message-ID: <20230902034357.GH3390869@ZenIV> References: <20230724-vfs-fdget_pos-v1-1-a4abfd7103f3@kernel.org> <20230724-pyjama-papier-9e4cdf5359cb@brauner> <20230803095311.ijpvhx3fyrbkasul@f> <20230803-libellen-klebrig-0a9e19dfa7dd@brauner> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Al Viro Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Thu, Aug 03, 2023 at 11:35:53AM -0700, Linus Torvalds wrote: > And yes, those exist. See at least 'adfs_dir_ops', and > 'ceph_dir_fops'. They may be broken, but people actually did do things > like that historically, maybe there's a reason adfs and ceph allow it. Huh? adfs is a red herring - there is a method called 'read' in struct adfs_dir_ops, but it has nothing to do with read(2) or anything of that sort; it is *used* by adfs_iterate() (via adfs_read_inode()), but the only file_operations that sucker has is const struct file_operations adfs_dir_operations = { .read = generic_read_dir, .llseek = generic_file_llseek, .iterate_shared = adfs_iterate, .fsync = generic_file_fsync, }; and generic_read_dir() is "fail with -EISDIR". ceph one is real, and it's... not nice. They do have a genuine ->read() on directories, which acts as if it had been a regular file contents generated by sprintf() (at the first read(2)). Format: "entries: %20lld\n" " files: %20lld\n" " subdirs: %20lld\n" "rentries: %20lld\n" " rfiles: %20lld\n" " rsubdirs: %20lld\n" "rbytes: %20lld\n" "rctime: %10lld.%09ld\n", and yes, it does go stale. Just close and reopen... File position is an offset in that string. Shared with position used by getdents()... It gets better - if you open a directory, then fork and have both child and parent read from it, well... if (!dfi->dir_info) { dfi->dir_info = kmalloc(bufsize, GFP_KERNEL); if (!dfi->dir_info) return -ENOMEM; No locking whatsoever. No priveleges needed either... Frankly, my preference would be to remove that kludge, but short of that we would need at least some exclusion for those allocations and if we do that, we might just as well have ceph_readdir() fail if we'd ever done ceph_read_dir() on the same struct file and vice versa. They are really not mixible. As far as I can tell, ceph is the only place in mainline where we have ->iterate_shared along with non-failing ->read. Nothing mixes it with ->read_iter, ->write or ->write_iter. lseek() is there, of course, and that's enough to kill the "special fdget variant for directory syscalls" kind of approach. But read() and write() on directories are very much not something people legitimately do.