From: "Darrick J. Wong" <djwong@kernel.org>
To: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>,
Gabriel Krisman Bertazi <krisman@collabora.com>,
Jan Kara <jack@suse.com>, Linux API <linux-api@vger.kernel.org>,
Ext4 <linux-ext4@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Khazhismel Kumykov <khazhy@google.com>,
David Howells <dhowells@redhat.com>,
Dave Chinner <david@fromorbit.com>, Theodore Tso <tytso@mit.edu>,
Matthew Bobrowski <repnop@google.com>,
kernel@collabora.com
Subject: Re: [PATCH v6 18/21] fanotify: Emit generic error info type for error event
Date: Wed, 18 Aug 2021 20:58:49 -0700 [thread overview]
Message-ID: <20210819035849.GA12586@magnolia> (raw)
In-Reply-To: <20210818095818.GA28119@quack2.suse.cz>
On Wed, Aug 18, 2021 at 11:58:18AM +0200, Jan Kara wrote:
> On Wed 18-08-21 06:24:26, Amir Goldstein wrote:
> > [...]
> >
> > > > Just keep in mind that the current scheme pre-allocates the single event slot
> > > > on fanotify_mark() time and (I think) we agreed to pre-allocate
> > > > sizeof(fsnotify_error_event) + MAX_HDNALE_SZ.
> > > > If filesystems would want to store some variable length fs specific info,
> > > > a future implementation will have to take that into account.
> > >
> > > <nod> I /think/ for the fs and AG metadata we could preallocate these,
> > > so long as fsnotify doesn't free them out from under us.
> >
> > fs won't get notified when the event is freed, so fsnotify must
> > take ownership on the data structure.
> > I was thinking more along the lines of limiting maximum size for fs
> > specific info and pre-allocating that size for the event.
>
> Agreed. If there's a sensible upperbound than preallocating this inside
> fsnotify is likely the least problematic solution.
>
> > > For inodes...
> > > there are many more of those, so they'd have to be allocated
> > > dynamically.
> >
> > The current scheme is that the size of the queue for error events
> > is one and the single slot is pre-allocated.
> > The reason for pre-allocate is that the assumption is that fsnotify_error()
> > could be called from contexts where memory allocation would be
> > inconvenient.
> > Therefore, we can store the encoded file handle of the first erroneous
> > inode, but we do not store any more events until user read this
> > one event.
>
> Right. OTOH I can imagine allowing GFP_NOFS allocations in the error
> context. At least for ext4 it would be workable (after all ext4 manages to
> lock & modify superblock in its error handlers, GFP_NOFS allocation isn't
> harder). But then if events are dynamically allocated there's still the
> inconvenient question what are you going to do if you need to report fs
> error and you hit ENOMEM. Just not sending the notification may have nasty
> consequences and in the world of containerization and virtualization
> tightly packed machines where ENOMEM happens aren't that unlikely. It is
> just difficult to make assumptions about filesystems overall so we decided
> to be better safe and preallocate the event.
>
> Or, we could leave the allocation troubles for the filesystem and
> fsnotify_sb_error() would be passed already allocated event (this way
> attaching of fs-specific blobs to the event is handled as well) which it
> would just queue. Plus we'd need to provide some helper to fill in generic
> part of the event...
>
> The disadvantage is that if there are filesystems / callsites needing
> preallocated events, it would be painful for them. OTOH current two users -
> ext4 & xfs - can handle allocation in the error path AFAIU.
>
> Thinking about this some more, maybe we could have event preallocated (like
> a "rescue event"). Normally we would dynamically allocate (or get passed
> from fs) the event and only if the allocation fails, we would queue the
> rescue event to indicate to listeners that something bad happened, there
> was error but we could not fully report it.
Yes.
> But then, even if we'd go for dynamic event allocation by default, we need
> to efficiently merge events since some fs failures (e.g. resulting in
> journal abort in ext4) lead to basically all operations with the filesystem
> to fail and that could easily swamp the notification system with useless
> events.
Hm. Going out on a limb, I would guess that the majority of fs error
flood events happen if the storage fails catastrophically. Assuming
that a catastrophic failure will quickly take the filesystem offline, I
would say that for XFS we should probably send one last "and then we
died" event and stop reporting after that.
> Current system with preallocated event nicely handles this
> situation, it is questionable how to extend it for online fsck usecase
> where we need to queue more than one event (but even there probably needs
> to be some sensible upper-bound). I'll think about it...
At least for XFS, I was figuring that xfs_scrub errors wouldn't be
reported via fsnotify since the repair tool is already running anyway.
> > > Hmm. For handling accumulated errors, can we still access the
> > > fanotify_event_info_* object once we've handed it to fanotify? If the
> > > user hasn't picked up the event yet, it might be acceptable to set more
> > > bits in the type mask and bump the error count. In other words, every
> > > time userspace actually reads the event, it'll get the latest error
> > > state. I /think/ that's where the design of this patchset is going,
> > > right?
> >
> > Sort of.
> > fsnotify does have a concept of "merging" new event with an event
> > already in queue.
> >
> > With most fsnotify events, merge only happens if the info related
> > to the new event (e.g. sb,inode) is the same as that off the queued
> > event and the "merge" is only in the event mask
> > (e.g. FS_OPEN|FS_CLOSE).
> >
> > However, the current scheme for "merge" of an FS_ERROR event is only
> > bumping err_count, even if the new reported error or inode do not
> > match the error/inode in the queued event.
> >
> > If we define error event subtypes (e.g. FS_ERROR_WRITEBACK,
> > FS_ERROR_METADATA), then the error event could contain
> > a field for subtype mask and user could read the subtype mask
> > along with the accumulated error count, but this cannot be
> > done by providing the filesystem access to modify an internal
> > fsnotify event, so those have to be generic UAPI defined subtypes.
> >
> > If you think that would be useful, then we may want to consider
> > reserving the subtype mask field in fanotify_event_info_error in
> > advance.
>
> It depends on what exactly Darrick has in mind but I suspect we'd need a
> fs-specific merge helper that would look at fs-specific blobs in the event
> and decide whether events can be merged or not, possibly also handling the
> merge by updating the blob.
Yes. If the filesystem itself were allowed to manage the lifespan of
the fsnotify error event object then this would be trivial -- we'll own
the object, keep it updated as needed, and fsnotify can copy the
contents to userspace whenever convenient.
(This might be a naïve view of fsnotify...)
> From the POV of fsnotify that would probably
> mean merge callback in the event itself. But I guess this needs more
> details from Darrick and maybe we don't need to decide this at this moment
> since nobody is close to the point of having code needing to pass fs-blobs
> with events.
<nod> We ... probably don't need to decide this now.
--D
>
> Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
next prev parent reply other threads:[~2021-08-19 3:58 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-12 21:39 [PATCH v6 00/21] File system wide monitoring Gabriel Krisman Bertazi
2021-08-12 21:39 ` [PATCH v6 01/21] fsnotify: Don't insert unmergeable events in hashtable Gabriel Krisman Bertazi
2021-08-12 21:39 ` [PATCH v6 02/21] fanotify: Fold event size calculation to its own function Gabriel Krisman Bertazi
2021-08-12 21:39 ` [PATCH v6 03/21] fanotify: Split fsid check from other fid mode checks Gabriel Krisman Bertazi
2021-08-12 21:39 ` [PATCH v6 04/21] fsnotify: Reserve mark flag bits for backends Gabriel Krisman Bertazi
2021-08-13 7:28 ` Amir Goldstein
2021-08-16 13:15 ` Jan Kara
2021-08-23 14:36 ` Gabriel Krisman Bertazi
2021-08-12 21:39 ` [PATCH v6 05/21] fanotify: Split superblock marks out to a new cache Gabriel Krisman Bertazi
2021-08-16 13:18 ` Jan Kara
2021-08-12 21:39 ` [PATCH v6 06/21] inotify: Don't force FS_IN_IGNORED Gabriel Krisman Bertazi
2021-08-12 21:39 ` [PATCH v6 07/21] fsnotify: Add helper to detect overflow_event Gabriel Krisman Bertazi
2021-08-12 21:39 ` [PATCH v6 08/21] fsnotify: Add wrapper around fsnotify_add_event Gabriel Krisman Bertazi
2021-08-12 21:39 ` [PATCH v6 09/21] fsnotify: Allow events reported with an empty inode Gabriel Krisman Bertazi
2021-08-13 7:58 ` Amir Goldstein
2021-08-25 18:40 ` Gabriel Krisman Bertazi
2021-08-25 19:45 ` Amir Goldstein
2021-08-25 21:50 ` Gabriel Krisman Bertazi
2021-08-26 10:44 ` Amir Goldstein
2021-08-27 2:26 ` Paul Moore
2021-08-27 9:36 ` audit watch and kernfs Amir Goldstein
2021-08-27 10:22 ` Al Viro
2021-08-12 21:39 ` [PATCH v6 10/21] fsnotify: Support FS_ERROR event type Gabriel Krisman Bertazi
2021-08-13 7:48 ` Amir Goldstein
2021-08-16 13:23 ` Jan Kara
2021-08-12 21:40 ` [PATCH v6 11/21] fanotify: Allow file handle encoding for unhashed events Gabriel Krisman Bertazi
2021-08-13 7:59 ` Amir Goldstein
2021-08-12 21:40 ` [PATCH v6 12/21] fanotify: Encode invalid file handle when no inode is provided Gabriel Krisman Bertazi
2021-08-13 8:27 ` Amir Goldstein
2021-08-16 14:06 ` Jan Kara
2021-08-16 15:54 ` Amir Goldstein
2021-08-16 16:11 ` Jan Kara
2021-08-12 21:40 ` [PATCH v6 13/21] fanotify: Require fid_mode for any non-fd event Gabriel Krisman Bertazi
2021-08-13 8:28 ` Amir Goldstein
2021-08-12 21:40 ` [PATCH v6 14/21] fanotify: Reserve UAPI bits for FAN_FS_ERROR Gabriel Krisman Bertazi
2021-08-13 8:29 ` Amir Goldstein
2021-08-12 21:40 ` [PATCH v6 15/21] fanotify: Preallocate per superblock mark error event Gabriel Krisman Bertazi
2021-08-13 8:40 ` Amir Goldstein
2021-08-16 15:57 ` Jan Kara
2021-08-27 18:18 ` Gabriel Krisman Bertazi
2021-09-02 21:24 ` Gabriel Krisman Bertazi
2021-09-03 4:16 ` Amir Goldstein
2021-09-15 10:31 ` Jan Kara
2021-08-12 21:40 ` [PATCH v6 16/21] fanotify: Handle FAN_FS_ERROR events Gabriel Krisman Bertazi
2021-08-13 9:35 ` Amir Goldstein
2021-08-12 21:40 ` [PATCH v6 17/21] fanotify: Report fid info for file related file system errors Gabriel Krisman Bertazi
2021-08-13 9:00 ` Amir Goldstein
2021-08-13 9:03 ` Amir Goldstein
2021-08-16 16:18 ` Jan Kara
2021-08-12 21:40 ` [PATCH v6 18/21] fanotify: Emit generic error info type for error event Gabriel Krisman Bertazi
2021-08-13 8:47 ` Amir Goldstein
2021-08-16 16:23 ` Jan Kara
2021-08-16 21:41 ` Darrick J. Wong
2021-08-17 9:05 ` Jan Kara
2021-08-17 10:08 ` Amir Goldstein
2021-08-18 0:16 ` Darrick J. Wong
2021-08-18 3:24 ` Amir Goldstein
2021-08-18 9:58 ` Jan Kara
2021-08-19 3:58 ` Darrick J. Wong [this message]
2021-08-18 0:10 ` Darrick J. Wong
2021-08-24 16:53 ` Gabriel Krisman Bertazi
2021-08-25 4:09 ` Darrick J. Wong
2021-08-12 21:40 ` [PATCH v6 19/21] ext4: Send notifications on error Gabriel Krisman Bertazi
2021-08-16 16:26 ` Jan Kara
2021-08-12 21:40 ` [PATCH v6 20/21] samples: Add fs error monitoring example Gabriel Krisman Bertazi
2021-08-18 13:02 ` Jan Kara
2021-08-23 14:49 ` Gabriel Krisman Bertazi
2021-08-12 21:40 ` [PATCH v6 21/21] docs: Document the FAN_FS_ERROR event Gabriel Krisman Bertazi
2021-08-16 16:40 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210819035849.GA12586@magnolia \
--to=djwong@kernel.org \
--cc=amir73il@gmail.com \
--cc=david@fromorbit.com \
--cc=dhowells@redhat.com \
--cc=jack@suse.com \
--cc=jack@suse.cz \
--cc=kernel@collabora.com \
--cc=khazhy@google.com \
--cc=krisman@collabora.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=repnop@google.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).