All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Jan Kara <jack@suse.cz>
Cc: Amir Goldstein <amir73il@gmail.com>,
	Gabriel Krisman Bertazi <krisman@collabora.com>,
	Jan Kara <jack@suse.com>, Linux API <linux-api@vger.kernel.org>,
	Ext4 <linux-ext4@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Khazhismel Kumykov <khazhy@google.com>,
	David Howells <dhowells@redhat.com>,
	Dave Chinner <david@fromorbit.com>, Theodore Tso <tytso@mit.edu>,
	Matthew Bobrowski <repnop@google.com>,
	kernel@collabora.com
Subject: Re: [PATCH v6 18/21] fanotify: Emit generic error info type for error event
Date: Wed, 18 Aug 2021 20:58:49 -0700	[thread overview]
Message-ID: <20210819035849.GA12586@magnolia> (raw)
In-Reply-To: <20210818095818.GA28119@quack2.suse.cz>

On Wed, Aug 18, 2021 at 11:58:18AM +0200, Jan Kara wrote:
> On Wed 18-08-21 06:24:26, Amir Goldstein wrote:
> > [...]
> > 
> > > > Just keep in mind that the current scheme pre-allocates the single event slot
> > > > on fanotify_mark() time and (I think) we agreed to pre-allocate
> > > > sizeof(fsnotify_error_event) + MAX_HDNALE_SZ.
> > > > If filesystems would want to store some variable length fs specific info,
> > > > a future implementation will have to take that into account.
> > >
> > > <nod> I /think/ for the fs and AG metadata we could preallocate these,
> > > so long as fsnotify doesn't free them out from under us.
> > 
> > fs won't get notified when the event is freed, so fsnotify must
> > take ownership on the data structure.
> > I was thinking more along the lines of limiting maximum size for fs
> > specific info and pre-allocating that size for the event.
> 
> Agreed. If there's a sensible upperbound than preallocating this inside
> fsnotify is likely the least problematic solution.
> 
> > > For inodes...
> > > there are many more of those, so they'd have to be allocated
> > > dynamically.
> > 
> > The current scheme is that the size of the queue for error events
> > is one and the single slot is pre-allocated.
> > The reason for pre-allocate is that the assumption is that fsnotify_error()
> > could be called from contexts where memory allocation would be
> > inconvenient.
> > Therefore, we can store the encoded file handle of the first erroneous
> > inode, but we do not store any more events until user read this
> > one event.
> 
> Right. OTOH I can imagine allowing GFP_NOFS allocations in the error
> context. At least for ext4 it would be workable (after all ext4 manages to
> lock & modify superblock in its error handlers, GFP_NOFS allocation isn't
> harder). But then if events are dynamically allocated there's still the
> inconvenient question what are you going to do if you need to report fs
> error and you hit ENOMEM. Just not sending the notification may have nasty
> consequences and in the world of containerization and virtualization
> tightly packed machines where ENOMEM happens aren't that unlikely. It is
> just difficult to make assumptions about filesystems overall so we decided
> to be better safe and preallocate the event.
> 
> Or, we could leave the allocation troubles for the filesystem and
> fsnotify_sb_error() would be passed already allocated event (this way
> attaching of fs-specific blobs to the event is handled as well) which it
> would just queue. Plus we'd need to provide some helper to fill in generic
> part of the event...
> 
> The disadvantage is that if there are filesystems / callsites needing
> preallocated events, it would be painful for them. OTOH current two users -
> ext4 & xfs - can handle allocation in the error path AFAIU.
> 
> Thinking about this some more, maybe we could have event preallocated (like
> a "rescue event"). Normally we would dynamically allocate (or get passed
> from fs) the event and only if the allocation fails, we would queue the
> rescue event to indicate to listeners that something bad happened, there
> was error but we could not fully report it.

Yes.

> But then, even if we'd go for dynamic event allocation by default, we need
> to efficiently merge events since some fs failures (e.g. resulting in
> journal abort in ext4) lead to basically all operations with the filesystem
> to fail and that could easily swamp the notification system with useless
> events.

Hm.  Going out on a limb, I would guess that the majority of fs error
flood events happen if the storage fails catastrophically.  Assuming
that a catastrophic failure will quickly take the filesystem offline, I
would say that for XFS we should probably send one last "and then we
died" event and stop reporting after that.

> Current system with preallocated event nicely handles this
> situation, it is questionable how to extend it for online fsck usecase
> where we need to queue more than one event (but even there probably needs
> to be some sensible upper-bound). I'll think about it...

At least for XFS, I was figuring that xfs_scrub errors wouldn't be
reported via fsnotify since the repair tool is already running anyway.

> > > Hmm.  For handling accumulated errors, can we still access the
> > > fanotify_event_info_* object once we've handed it to fanotify?  If the
> > > user hasn't picked up the event yet, it might be acceptable to set more
> > > bits in the type mask and bump the error count.  In other words, every
> > > time userspace actually reads the event, it'll get the latest error
> > > state.  I /think/ that's where the design of this patchset is going,
> > > right?
> > 
> > Sort of.
> > fsnotify does have a concept of "merging" new event with an event
> > already in queue.
> > 
> > With most fsnotify events, merge only happens if the info related
> > to the new event (e.g. sb,inode) is the same as that off the queued
> > event and the "merge" is only in the event mask
> > (e.g. FS_OPEN|FS_CLOSE).
> > 
> > However, the current scheme for "merge" of an FS_ERROR event is only
> > bumping err_count, even if the new reported error or inode do not
> > match the error/inode in the queued event.
> > 
> > If we define error event subtypes (e.g. FS_ERROR_WRITEBACK,
> > FS_ERROR_METADATA), then the error event could contain
> > a field for subtype mask and user could read the subtype mask
> > along with the accumulated error count, but this cannot be
> > done by providing the filesystem access to modify an internal
> > fsnotify event, so those have to be generic UAPI defined subtypes.
> > 
> > If you think that would be useful, then we may want to consider
> > reserving the subtype mask field in fanotify_event_info_error in
> > advance.
> 
> It depends on what exactly Darrick has in mind but I suspect we'd need a
> fs-specific merge helper that would look at fs-specific blobs in the event
> and decide whether events can be merged or not, possibly also handling the
> merge by updating the blob.

Yes.  If the filesystem itself were allowed to manage the lifespan of
the fsnotify error event object then this would be trivial -- we'll own
the object, keep it updated as needed, and fsnotify can copy the
contents to userspace whenever convenient.

(This might be a naïve view of fsnotify...)

> From the POV of fsnotify that would probably
> mean merge callback in the event itself. But I guess this needs more
> details from Darrick and maybe we don't need to decide this at this moment
> since nobody is close to the point of having code needing to pass fs-blobs
> with events.

<nod> We ... probably don't need to decide this now.

--D

> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

  reply	other threads:[~2021-08-19  3:58 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-12 21:39 [PATCH v6 00/21] File system wide monitoring Gabriel Krisman Bertazi
2021-08-12 21:39 ` [PATCH v6 01/21] fsnotify: Don't insert unmergeable events in hashtable Gabriel Krisman Bertazi
2021-08-12 21:39 ` [PATCH v6 02/21] fanotify: Fold event size calculation to its own function Gabriel Krisman Bertazi
2021-08-12 21:39 ` [PATCH v6 03/21] fanotify: Split fsid check from other fid mode checks Gabriel Krisman Bertazi
2021-08-12 21:39 ` [PATCH v6 04/21] fsnotify: Reserve mark flag bits for backends Gabriel Krisman Bertazi
2021-08-13  7:28   ` Amir Goldstein
2021-08-16 13:15     ` Jan Kara
2021-08-23 14:36       ` Gabriel Krisman Bertazi
2021-08-12 21:39 ` [PATCH v6 05/21] fanotify: Split superblock marks out to a new cache Gabriel Krisman Bertazi
2021-08-16 13:18   ` Jan Kara
2021-08-12 21:39 ` [PATCH v6 06/21] inotify: Don't force FS_IN_IGNORED Gabriel Krisman Bertazi
2021-08-12 21:39 ` [PATCH v6 07/21] fsnotify: Add helper to detect overflow_event Gabriel Krisman Bertazi
2021-08-12 21:39 ` [PATCH v6 08/21] fsnotify: Add wrapper around fsnotify_add_event Gabriel Krisman Bertazi
2021-08-12 21:39 ` [PATCH v6 09/21] fsnotify: Allow events reported with an empty inode Gabriel Krisman Bertazi
2021-08-13  7:58   ` Amir Goldstein
2021-08-25 18:40     ` Gabriel Krisman Bertazi
2021-08-25 19:45       ` Amir Goldstein
2021-08-25 21:50         ` Gabriel Krisman Bertazi
2021-08-26 10:44           ` Amir Goldstein
2021-08-27  2:26             ` Paul Moore
2021-08-27  9:36               ` audit watch and kernfs Amir Goldstein
2021-08-27 10:22                 ` Al Viro
2021-08-12 21:39 ` [PATCH v6 10/21] fsnotify: Support FS_ERROR event type Gabriel Krisman Bertazi
2021-08-13  7:48   ` Amir Goldstein
2021-08-16 13:23   ` Jan Kara
2021-08-12 21:40 ` [PATCH v6 11/21] fanotify: Allow file handle encoding for unhashed events Gabriel Krisman Bertazi
2021-08-13  7:59   ` Amir Goldstein
2021-08-12 21:40 ` [PATCH v6 12/21] fanotify: Encode invalid file handle when no inode is provided Gabriel Krisman Bertazi
2021-08-13  8:27   ` Amir Goldstein
2021-08-16 14:06     ` Jan Kara
2021-08-16 15:54       ` Amir Goldstein
2021-08-16 16:11         ` Jan Kara
2021-08-12 21:40 ` [PATCH v6 13/21] fanotify: Require fid_mode for any non-fd event Gabriel Krisman Bertazi
2021-08-13  8:28   ` Amir Goldstein
2021-08-12 21:40 ` [PATCH v6 14/21] fanotify: Reserve UAPI bits for FAN_FS_ERROR Gabriel Krisman Bertazi
2021-08-13  8:29   ` Amir Goldstein
2021-08-12 21:40 ` [PATCH v6 15/21] fanotify: Preallocate per superblock mark error event Gabriel Krisman Bertazi
2021-08-13  8:40   ` Amir Goldstein
2021-08-16 15:57   ` Jan Kara
2021-08-27 18:18     ` Gabriel Krisman Bertazi
2021-09-02 21:24       ` Gabriel Krisman Bertazi
2021-09-03  4:16         ` Amir Goldstein
2021-09-15 10:31           ` Jan Kara
2021-08-12 21:40 ` [PATCH v6 16/21] fanotify: Handle FAN_FS_ERROR events Gabriel Krisman Bertazi
2021-08-13  9:35   ` Amir Goldstein
2021-08-12 21:40 ` [PATCH v6 17/21] fanotify: Report fid info for file related file system errors Gabriel Krisman Bertazi
2021-08-13  9:00   ` Amir Goldstein
2021-08-13  9:03     ` Amir Goldstein
2021-08-16 16:18   ` Jan Kara
2021-08-12 21:40 ` [PATCH v6 18/21] fanotify: Emit generic error info type for error event Gabriel Krisman Bertazi
2021-08-13  8:47   ` Amir Goldstein
2021-08-16 16:23   ` Jan Kara
2021-08-16 21:41   ` Darrick J. Wong
2021-08-17  9:05     ` Jan Kara
2021-08-17 10:08       ` Amir Goldstein
2021-08-18  0:16         ` Darrick J. Wong
2021-08-18  3:24           ` Amir Goldstein
2021-08-18  9:58             ` Jan Kara
2021-08-19  3:58               ` Darrick J. Wong [this message]
2021-08-18  0:10       ` Darrick J. Wong
2021-08-24 16:53       ` Gabriel Krisman Bertazi
2021-08-25  4:09         ` Darrick J. Wong
2021-08-12 21:40 ` [PATCH v6 19/21] ext4: Send notifications on error Gabriel Krisman Bertazi
2021-08-16 16:26   ` Jan Kara
2021-08-12 21:40 ` [PATCH v6 20/21] samples: Add fs error monitoring example Gabriel Krisman Bertazi
2021-08-18 13:02   ` Jan Kara
2021-08-23 14:49     ` Gabriel Krisman Bertazi
2021-08-12 21:40 ` [PATCH v6 21/21] docs: Document the FAN_FS_ERROR event Gabriel Krisman Bertazi
2021-08-16 16:40   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210819035849.GA12586@magnolia \
    --to=djwong@kernel.org \
    --cc=amir73il@gmail.com \
    --cc=david@fromorbit.com \
    --cc=dhowells@redhat.com \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=kernel@collabora.com \
    --cc=khazhy@google.com \
    --cc=krisman@collabora.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=repnop@google.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.