All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gabriel Krisman Bertazi <krisman@collabora.com>
To: Dave Chinner <david@fromorbit.com>
Cc: dhowells@redhat.com, viro@zeniv.linux.org.uk, tytso@mit.edu,
	khazhy@google.com, adilger.kernel@dilger.ca,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	kernel@collabora.com
Subject: Re: [PATCH 4/8] vfs: Add superblock notifications
Date: Fri, 11 Dec 2020 17:55:32 -0300	[thread overview]
Message-ID: <87ft4cukor.fsf@collabora.com> (raw)
In-Reply-To: <20201210220914.GG4170059@dread.disaster.area> (Dave Chinner's message of "Fri, 11 Dec 2020 09:09:14 +1100")



Dave,

Thanks a lot for the very detailed review.

> On Mon, Dec 07, 2020 at 09:31:13PM -0300, Gabriel Krisman Bertazi wrote:
>> From: David Howells <dhowells@redhat.com>
>> 
>> Add a superblock event notification facility whereby notifications about
>> superblock events, such as I/O errors (EIO), quota limits being hit
>> (EDQUOT) and running out of space (ENOSPC) can be reported to a monitoring
>> process asynchronously.  Note that this does not cover vfsmount topology
>> changes.  watch_mount() is used for that.
>
> watch_mount() is not in the upstream tree, nor is it defined in this
> patch set.


That is my mistake, not the author's.  I picked this from a longer series that has
a watch_mount implementation, but didn't include it.  Only the commit message
is bad, not the patch absence.

>> Records are of the following format:
>> 
>> 	struct superblock_notification {
>> 		struct watch_notification watch;
>> 		__u64	sb_id;
>> 	} *n;
>> 
>> Where:
>> 
>> 	n->watch.type will be WATCH_TYPE_SB_NOTIFY.
>> 
>> 	n->watch.subtype will indicate the type of event, such as
>> 	NOTIFY_SUPERBLOCK_READONLY.
>> 
>> 	n->watch.info & WATCH_INFO_LENGTH will indicate the length of the
>> 	record.
>> 
>> 	n->watch.info & WATCH_INFO_ID will be the fifth argument to
>> 	watch_sb(), shifted.
>> 
>> 	n->watch.info & NOTIFY_SUPERBLOCK_IS_NOW_RO will be used for
>> 	NOTIFY_SUPERBLOCK_READONLY, being set if the superblock becomes
>> 	R/O, and being cleared otherwise.
>> 
>> 	n->sb_id will be the ID of the superblock, as can be retrieved with
>> 	the fsinfo() syscall, as part of the fsinfo_sb_notifications
>> 	attribute in the watch_id field.
>> 
>> Note that it is permissible for event records to be of variable length -
>> or, at least, the length may be dependent on the subtype.  Note also that
>> the queue can be shared between multiple notifications of various types.
>
> /me puts on his "We really, really, REALLY suck at APIs" hat.
>
> This adds a new syscall that has a complex structure associated with
> in. This needs a full man page specification written for it
> describing the parameters, the protocol structures, behaviour, etc
> before we can really review this. It really also needs full test
> infrastructure for every aspect of the syscall written from the man
> page (not the implementation) for fstests so that we end up with a
> consistent implementation for every filesystem that implements these
> watches.

I see.  I was thinking the other way around, getting a design accepted
by you all before writing down documentation, but that makes a lot of
sense. In fact, I'm taking a step back and writing a text proposal,
without patches, such that we can agree on the main points before I
start coding.

> Other things:
>
> - Scoping: inode/block related information is not "superblock"
>   information. What about errors in non-inode related objects?

The previous RFC separated inode error notifications from other types,
but my idea was to have different notifications types for each object.


> - offets into files/devices/objects need to be in bytes, not blocks
> - errors can span multiple contiguous blocks, so the notification
>   needs to report the -byte range- the error corresponds to.
> - superblocks can have multiple block devices under them with
>   individual address ranges. Hence we need {object,dev,offset,len}
>   to uniquely identify where an error occurred in a filesystem.
> - userspace face structures need padding and flags/version/size
>   information so we can tell what shape the structure being passed
>   is. It is guaranteed that we will want to expand the structure
>   definitions in future, maybe even deprecate some...
> - syscall has no flags field.
> - syscall is of "at" type (relative path via dfd) so probably shoudl
>   be called "watch..._at()"

will do all the above.

>
> Fundamentally, though, I'm struggling to understand what the
> difference between watch_mount() and watch_sb() is going to be.
> "superblock" watches seem like the wrong abstraction for a path
> based watch interface. Superblocks can be shared across multiple
> disjoint paths, subvolumes and even filesystems.

As far as I understand the original patchset, watch_mount was designed
to monitor mountpoint operations (mount, umount,.. ) in a sub-tree,
while watch_sb monitors filesystem operations and errors.  I'm not
working with watch_mount, my current interest is in having a
notifications mechanism for filesystem errors, which seemed to fit
nicely with the watch_sb patchset for watch_queue.

> The path based user API is really asking to watch a mount, not a
> superblock. We don't otherwise expose superblocks to userspace at
> all, so this seems like the API is somewhat exposing internal kernel
> implementation behind mounts. However, there -is- a watch_mount()
> syscall floating around somewhere, so it makes me wonder exactly why
> we need a second syscall and interface protocol to expose
> essentially the same path-based watch information to userspace.

I think these are indeed different syscalls, but maybe a bit misnamed.

If not by path, how could we uniquely identify an entire filesystem?
Maybe pointing to a block device that has a valid filesystem and in the
case of fs spawning through multiple devices, consider all of them?  But
that would not work for some misc filesystems, like tmpfs.

> Without having that syscall the same patchset, or a reference to
> that patchset (and man page documenting the interface), I have no
> idea what it does or why it is different or why it can't be used for
> these error notifications....

As a short summary, My goal is an error reporting mechanism for ext4
(preferably that can also be used by other filesystems) that allows a
userspace application to monitor errors on the filesystem without losing
information and without having to parse a convoluted dmesg.  The
watch_queue API seem to expose exactly the infrastructure for this kind
of thing.  As I said, I'm gonna send a proposal with more details
because, I'd really like to have something that can be used by several
filesystems.

-- 
Gabriel Krisman Bertazi

  reply	other threads:[~2020-12-11 21:44 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-08  0:31 [PATCH 0/8] Superblock Notifications Gabriel Krisman Bertazi
2020-12-08  0:31 ` [PATCH 1/8] watch_queue: Make watch_sizeof() check record size Gabriel Krisman Bertazi
2020-12-08  0:31 ` [PATCH 2/8] security: Add hooks to rule on setting a watch for superblock Gabriel Krisman Bertazi
2020-12-08  0:31 ` [PATCH 3/8] watch_queue: Support a text field at the end of the notification Gabriel Krisman Bertazi
2020-12-08 12:57   ` David Howells
2020-12-08  0:31 ` [PATCH 4/8] vfs: Add superblock notifications Gabriel Krisman Bertazi
2020-12-08  0:56   ` Darrick J. Wong
2020-12-10 22:09   ` Dave Chinner
2020-12-11 20:55     ` Gabriel Krisman Bertazi [this message]
2020-12-18  1:06       ` Dave Chinner
2021-01-05 19:52         ` Gabriel Krisman Bertazi
2020-12-08  0:31 ` [PATCH 5/8] vfs: Include origin of the SB error notification Gabriel Krisman Bertazi
2020-12-08  0:51   ` Darrick J. Wong
2020-12-08  0:55     ` Gabriel Krisman Bertazi
2020-12-08 12:42       ` David Howells
2020-12-08 12:51   ` David Howells
2020-12-08 12:58     ` Gabriel Krisman Bertazi
2020-12-08 18:41       ` Darrick J. Wong
2020-12-08 19:29         ` Gabriel Krisman Bertazi
2020-12-09  3:24           ` Darrick J. Wong
2020-12-09 13:06             ` Gabriel Krisman Bertazi
2020-12-11 22:35               ` Darrick J. Wong
2020-12-08  0:31 ` [PATCH 6/8] fs: Add more superblock error subtypes Gabriel Krisman Bertazi
2020-12-08  0:31 ` [PATCH 7/8] ext4: Implement SB error notification through watch_sb Gabriel Krisman Bertazi
2020-12-08  0:31 ` [PATCH 8/8] samples: watch_queue: Add sample of SB notifications Gabriel Krisman Bertazi
2020-12-08 12:59 ` [PATCH 0/8] Superblock Notifications David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ft4cukor.fsf@collabora.com \
    --to=krisman@collabora.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=david@fromorbit.com \
    --cc=dhowells@redhat.com \
    --cc=kernel@collabora.com \
    --cc=khazhy@google.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.