From: Gabriel Krisman Bertazi <krisman@collabora.com>
To: Dave Chinner <david@fromorbit.com>
Cc: dhowells@redhat.com, viro@zeniv.linux.org.uk, tytso@mit.edu,
khazhy@google.com, adilger.kernel@dilger.ca,
linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
kernel@collabora.com
Subject: Re: [PATCH 4/8] vfs: Add superblock notifications
Date: Fri, 11 Dec 2020 17:55:32 -0300 [thread overview]
Message-ID: <87ft4cukor.fsf@collabora.com> (raw)
In-Reply-To: <20201210220914.GG4170059@dread.disaster.area> (Dave Chinner's message of "Fri, 11 Dec 2020 09:09:14 +1100")
Dave,
Thanks a lot for the very detailed review.
> On Mon, Dec 07, 2020 at 09:31:13PM -0300, Gabriel Krisman Bertazi wrote:
>> From: David Howells <dhowells@redhat.com>
>>
>> Add a superblock event notification facility whereby notifications about
>> superblock events, such as I/O errors (EIO), quota limits being hit
>> (EDQUOT) and running out of space (ENOSPC) can be reported to a monitoring
>> process asynchronously. Note that this does not cover vfsmount topology
>> changes. watch_mount() is used for that.
>
> watch_mount() is not in the upstream tree, nor is it defined in this
> patch set.
That is my mistake, not the author's. I picked this from a longer series that has
a watch_mount implementation, but didn't include it. Only the commit message
is bad, not the patch absence.
>> Records are of the following format:
>>
>> struct superblock_notification {
>> struct watch_notification watch;
>> __u64 sb_id;
>> } *n;
>>
>> Where:
>>
>> n->watch.type will be WATCH_TYPE_SB_NOTIFY.
>>
>> n->watch.subtype will indicate the type of event, such as
>> NOTIFY_SUPERBLOCK_READONLY.
>>
>> n->watch.info & WATCH_INFO_LENGTH will indicate the length of the
>> record.
>>
>> n->watch.info & WATCH_INFO_ID will be the fifth argument to
>> watch_sb(), shifted.
>>
>> n->watch.info & NOTIFY_SUPERBLOCK_IS_NOW_RO will be used for
>> NOTIFY_SUPERBLOCK_READONLY, being set if the superblock becomes
>> R/O, and being cleared otherwise.
>>
>> n->sb_id will be the ID of the superblock, as can be retrieved with
>> the fsinfo() syscall, as part of the fsinfo_sb_notifications
>> attribute in the watch_id field.
>>
>> Note that it is permissible for event records to be of variable length -
>> or, at least, the length may be dependent on the subtype. Note also that
>> the queue can be shared between multiple notifications of various types.
>
> /me puts on his "We really, really, REALLY suck at APIs" hat.
>
> This adds a new syscall that has a complex structure associated with
> in. This needs a full man page specification written for it
> describing the parameters, the protocol structures, behaviour, etc
> before we can really review this. It really also needs full test
> infrastructure for every aspect of the syscall written from the man
> page (not the implementation) for fstests so that we end up with a
> consistent implementation for every filesystem that implements these
> watches.
I see. I was thinking the other way around, getting a design accepted
by you all before writing down documentation, but that makes a lot of
sense. In fact, I'm taking a step back and writing a text proposal,
without patches, such that we can agree on the main points before I
start coding.
> Other things:
>
> - Scoping: inode/block related information is not "superblock"
> information. What about errors in non-inode related objects?
The previous RFC separated inode error notifications from other types,
but my idea was to have different notifications types for each object.
> - offets into files/devices/objects need to be in bytes, not blocks
> - errors can span multiple contiguous blocks, so the notification
> needs to report the -byte range- the error corresponds to.
> - superblocks can have multiple block devices under them with
> individual address ranges. Hence we need {object,dev,offset,len}
> to uniquely identify where an error occurred in a filesystem.
> - userspace face structures need padding and flags/version/size
> information so we can tell what shape the structure being passed
> is. It is guaranteed that we will want to expand the structure
> definitions in future, maybe even deprecate some...
> - syscall has no flags field.
> - syscall is of "at" type (relative path via dfd) so probably shoudl
> be called "watch..._at()"
will do all the above.
>
> Fundamentally, though, I'm struggling to understand what the
> difference between watch_mount() and watch_sb() is going to be.
> "superblock" watches seem like the wrong abstraction for a path
> based watch interface. Superblocks can be shared across multiple
> disjoint paths, subvolumes and even filesystems.
As far as I understand the original patchset, watch_mount was designed
to monitor mountpoint operations (mount, umount,.. ) in a sub-tree,
while watch_sb monitors filesystem operations and errors. I'm not
working with watch_mount, my current interest is in having a
notifications mechanism for filesystem errors, which seemed to fit
nicely with the watch_sb patchset for watch_queue.
> The path based user API is really asking to watch a mount, not a
> superblock. We don't otherwise expose superblocks to userspace at
> all, so this seems like the API is somewhat exposing internal kernel
> implementation behind mounts. However, there -is- a watch_mount()
> syscall floating around somewhere, so it makes me wonder exactly why
> we need a second syscall and interface protocol to expose
> essentially the same path-based watch information to userspace.
I think these are indeed different syscalls, but maybe a bit misnamed.
If not by path, how could we uniquely identify an entire filesystem?
Maybe pointing to a block device that has a valid filesystem and in the
case of fs spawning through multiple devices, consider all of them? But
that would not work for some misc filesystems, like tmpfs.
> Without having that syscall the same patchset, or a reference to
> that patchset (and man page documenting the interface), I have no
> idea what it does or why it is different or why it can't be used for
> these error notifications....
As a short summary, My goal is an error reporting mechanism for ext4
(preferably that can also be used by other filesystems) that allows a
userspace application to monitor errors on the filesystem without losing
information and without having to parse a convoluted dmesg. The
watch_queue API seem to expose exactly the infrastructure for this kind
of thing. As I said, I'm gonna send a proposal with more details
because, I'd really like to have something that can be used by several
filesystems.
--
Gabriel Krisman Bertazi
next prev parent reply other threads:[~2020-12-11 21:44 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-08 0:31 [PATCH 0/8] Superblock Notifications Gabriel Krisman Bertazi
2020-12-08 0:31 ` [PATCH 1/8] watch_queue: Make watch_sizeof() check record size Gabriel Krisman Bertazi
2020-12-08 0:31 ` [PATCH 2/8] security: Add hooks to rule on setting a watch for superblock Gabriel Krisman Bertazi
2020-12-08 0:31 ` [PATCH 3/8] watch_queue: Support a text field at the end of the notification Gabriel Krisman Bertazi
2020-12-08 0:31 ` [PATCH 4/8] vfs: Add superblock notifications Gabriel Krisman Bertazi
2020-12-08 0:56 ` Darrick J. Wong
2020-12-10 22:09 ` Dave Chinner
2020-12-11 20:55 ` Gabriel Krisman Bertazi [this message]
2020-12-18 1:06 ` Dave Chinner
2021-01-05 19:52 ` Gabriel Krisman Bertazi
2020-12-08 0:31 ` [PATCH 5/8] vfs: Include origin of the SB error notification Gabriel Krisman Bertazi
2020-12-08 0:51 ` Darrick J. Wong
2020-12-08 0:55 ` Gabriel Krisman Bertazi
2020-12-08 12:42 ` David Howells
2020-12-08 0:31 ` [PATCH 6/8] fs: Add more superblock error subtypes Gabriel Krisman Bertazi
2020-12-08 0:31 ` [PATCH 7/8] ext4: Implement SB error notification through watch_sb Gabriel Krisman Bertazi
2020-12-08 0:31 ` [PATCH 8/8] samples: watch_queue: Add sample of SB notifications Gabriel Krisman Bertazi
2020-12-08 12:51 ` [PATCH 5/8] vfs: Include origin of the SB error notification David Howells
2020-12-08 12:58 ` Gabriel Krisman Bertazi
2020-12-08 18:41 ` Darrick J. Wong
2020-12-08 19:29 ` Gabriel Krisman Bertazi
2020-12-09 3:24 ` Darrick J. Wong
2020-12-09 13:06 ` Gabriel Krisman Bertazi
2020-12-11 22:35 ` Darrick J. Wong
2020-12-08 12:57 ` [PATCH 3/8] watch_queue: Support a text field at the end of the notification David Howells
2020-12-08 12:59 ` [PATCH 0/8] Superblock Notifications David Howells
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ft4cukor.fsf@collabora.com \
--to=krisman@collabora.com \
--cc=adilger.kernel@dilger.ca \
--cc=david@fromorbit.com \
--cc=dhowells@redhat.com \
--cc=kernel@collabora.com \
--cc=khazhy@google.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).