From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D4B2C25B6B for ; Thu, 26 Oct 2023 12:17:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230413AbjJZMRk (ORCPT ); Thu, 26 Oct 2023 08:17:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50648 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229647AbjJZMRj (ORCPT ); Thu, 26 Oct 2023 08:17:39 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EDFB1B9; Thu, 26 Oct 2023 05:17:36 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 4150521AB5; Thu, 26 Oct 2023 12:17:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1698322655; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NR2l8V1cvlS3VC1AbzLrfqK8FhV3SgS3by08l7Z+82o=; b=sSYCcZk+yDDgpOpA8yS+lH8BnKu8YNkFvdOYvTk1jgpiWEECkofxHK3vCxDvYkR16U/AEg cPb14shKD5da1ZZ7V1woxchBwhccKpwvXdQoebCI4Dzn5FwhhyNbUb7mJBj8DSkjm/fY/e VVXkEEi47QKTrg1aOdI0yBa0RC6T6hg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1698322655; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NR2l8V1cvlS3VC1AbzLrfqK8FhV3SgS3by08l7Z+82o=; b=MDobBLbQnFMRjkinkoFjI6+BnCS5r61YLM8aSWSXMfDlB+94EnimpkUwUJrPOdvsS8nMHQ 5qatfUHS9lKjh3DA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 302DE1358F; Thu, 26 Oct 2023 12:17:35 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id mpC9C99YOmWtDgAAMHmgww (envelope-from ); Thu, 26 Oct 2023 12:17:35 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 9446AA05BC; Thu, 26 Oct 2023 14:17:34 +0200 (CEST) Date: Thu, 26 Oct 2023 14:17:34 +0200 From: Jan Kara To: Amir Goldstein Cc: Jan Kara , Christian Brauner , Chris Mason , Josef Bacik , David Sterba , linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 0/3] fanotify support for btrfs sub-volumes Message-ID: <20231026121734.o4k7djftwdnectq4@quack3> References: <20231025135048.36153-1-amir73il@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Authentication-Results: smtp-out1.suse.de; none X-Spamd-Result: default: False [-6.60 / 50.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; MIME_GOOD(-0.10)[text/plain]; NEURAL_HAM_LONG(-3.00)[-1.000]; DKIM_SIGNED(0.00)[suse.cz:s=susede2_rsa,suse.cz:s=susede2_ed25519]; NEURAL_HAM_SHORT(-1.00)[-1.000]; RCPT_COUNT_SEVEN(0.00)[8]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; MID_RHS_NOT_FQDN(0.50)[]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; BAYES_HAM(-3.00)[100.00%] Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Wed 25-10-23 21:02:45, Amir Goldstein wrote: > On Wed, Oct 25, 2023 at 8:17 PM Amir Goldstein wrote: > > > > On Wed, Oct 25, 2023 at 4:50 PM Amir Goldstein wrote: > > > > > > Jan, > > > > > > This patch set implements your suggestion [1] for handling fanotify > > > events for filesystems with non-uniform f_fsid. > > > > > > With these changes, events report the fsid as it would be reported > > > by statfs(2) on the same objet, i.e. the sub-volume's fsid for an inode > > > in sub-volume. > > > > > > This creates a small challenge to watching program, which needs to map > > > from fsid in event to a stored mount_fd to use with open_by_handle_at(2). > > > Luckily, for btrfs, fsid[0] is uniform and fsid[1] is per sub-volume. > > > > > > I have adapted fsnotifywatch tool [2] to be able to watch btrfs sb. > > > The adapted tool detects the special case of btrfs (a bit hacky) and > > > indexes the mount_fd to be used for open_by_handle_at(2) by fsid[0]. > > > > > > Note that this hackacry is not needed when the tool is watching a > > > single filesystem (no need for mount_fd lookup table), because btrfs > > > correctly decodes file handles from any sub-volume with mount_fd from > > > any other sub-volume. > > > > Jan, > > > > Now that I've implemented the userspace part of btrfs sb watch, > > I realize that if userspace has to be aware of the fsid oddity of btrfs > > anyway, maybe reporting the accurate fsid of the object in event is > > not that important at all. > > > > Facts: > > 1. file_handle is unique across all sub-volumes and can be resolved > > from any fd on any sub-volume > > 2. fsid[0] can be compared to match an event to a btrfs sb, where any > > fd can be used to resolve file_handle > > 3. userspace needs to be aware of this fsid[0] fact if it watches more > > than a single sb and userspace needs not care about the value of > > fsid in event at all when watching a single sb > > 4. even though fanotify never allowed setting sb mark on a path inside > > btrfs sub-volume, it always reported events on inodes in sub-volumes > > to btrfs sb watch - those events always carried the "wrong" fsid (i.e. > > the btrfs root volume fsid) > > 5. we already agreed that setting up inode marks on inodes inside > > sub-volume should be a no brainer > > > > If we allow reporting either sub-vol fsid or root-vol fsid (exactly as > > we do for inodes in sub-vol in current upstream), > > Another way to put it is that fsid in event describes the object > that was used to setup the mark not the target object. > > If an event is received via an inode/sb/mount mark, the fsid > would always describe the fsid of the inode that was used to setup > the mark and that is always the fsid that userspace would query > statfs(2) at the time of calling the fanotify_mark(2) call. > > Maybe it is non trivial to document, but for a library that returns > an opaque "watch descriptor", the "watch descriptor" can always > be deduced from the event. > > Does this make sense? Yes, it makes sense if we always reported event with fsid of the object used for placing the mark. For filesystems with homogeneous fsid there's no difference, for btrfs it looks like the least surprising choice and works well for inode marks as well. The only catch is in the internal fsnotify implementation AFAICT - if we have multiple marks for the same btrfs superblock, each mark on different subvolume, then we should be reporting one event with different fsids for different marks. So we need to cache the fsid in the mark and not in the connector. But that should be easy to do. Honza -- Jan Kara SUSE Labs, CR