From: Amir Goldstein <amir73il@gmail.com>
To: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>,
Mo Re Ra <more7.rev@gmail.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Wez Furlong <wez@fb.com>,
Matthew Bobrowski <mbobrowski@mbobrowski.org>,
Linux API <linux-api@vger.kernel.org>
Subject: Re: File monitor problem
Date: Tue, 7 Jan 2020 20:56:20 +0200 [thread overview]
Message-ID: <CAOQ4uxjx_n3f44yu9_2dGxtBGy3WssG0xfZykwjQ+n=Wcii2-w@mail.gmail.com> (raw)
In-Reply-To: <20200107171014.GI25547@quack2.suse.cz>
On Tue, Jan 7, 2020 at 7:10 PM Jan Kara <jack@suse.cz> wrote:
>
> On Tue 24-12-19 05:49:42, Amir Goldstein wrote:
> > > > I can see the need for FAN_DIR_MODIFIED_WITH_NAME
> > > > (stupid name, I know) - generated when something changed with names in a
> > > > particular directory, reported with FID of the directory and the name
> > > > inside that directory involved with the change. Directory watching
> > > > application needs this to keep track of "names to check". Is the name
> > > > useful with any other type of event? _SELF events cannot even sensibly have
> > > > it so no discussion there as you mention below. Then we have OPEN, CLOSE,
> > > > ACCESS, ATTRIB events. Do we have any use for names with those?
> > > >
> > >
> > > The problem is that unlike dir fid, file fid cannot be reliably resolved
> > > to path, that is the reason that I implemented FAN_WITH_NAME
> > > for events "possible on child" (see branch fanotify_name-wip).
>
> Ok, but that seems to be a bit of an abuse, isn't it? Because with parent
> fid + name you may reconstruct the path but you won't be able to reliably
> identify the object where the operation happened? Even worse users can
> mistakenly think that parent fid + name identify the object but that is
> racy... This is exactly the kind of confusion I'd like to avoid with the
> new API.
>
> OTOH I understand that e.g. a file monitor may want to monitor CLOSE_WRITE
> like you mention below just to record directory FID + name as something
> that needs resyncing. So I agree that names in events other than directory
> events are useful as well. And I also agree that for that usecase what you
> propose would be fine.
>
> > > A filesystem monitor typically needs to be notified on name changes and on
> > > data/metadata modifications.
> > >
> > > So maybe add just two new event types:
> > > FAN_DIR_MODIFY
> > > FAN_CHILD_MODIFY
> > >
> > > Both those events are reported with name and allowed only with init flag
> > > FAN_REPORT_FID_NAME.
> > > User cannot filter FAN_DIR_MODIFY by part of create/delete/move.
> > > User cannot filter FAN_CHILD_MODIFY by part of attrib/modify/close_write.
> >
> > Nah, that won't do. I now remember discussing this with out in-house monitor
> > team and they said they needed to filter out FAN_MODIFY because it was too
> > noisy and rely on FAN_CLOSE_WRITE. And other may want open/access as
> > well.
>
> So for open/close/modify/read/attrib I don't see a need to obfuscate the
> event type. They are already abstract enough so I don't see how they could
> be easily misinterpretted. With directory events the potential for
> "optimizations" that are subtly wrong is IMHO much bigger.
>
OK, that simplifies things quite a bit.
> > There is another weird way to obfuscate the event type.
> > I am not sure if users will be less confused about it:
> > Each event type belongs to a group (i.e. self, dirent, poss_on_child)
> > User may set any event type in the mask (e.g. create|delete|open|close)
> > When getting an event from event group A (e.g. create), all event types
> > of that group will be reported (e.g. create|delete).
> >
> > To put it another way:
> > #define FAN_DIR_MODIFY (FAN_CREATE | FAN_MOVE | FAN_DELETE)
> >
> > For example in fanotify_group_event_mask():
> > if (event_with_name) {
> > if (marks_mask & test_mask & FAN_DIR_MODIFY)
> > test_mask |= marks_mask & FAN_DIR_MODIFY
> > ...
> >
> > Did somebody say over-engineering? ;)
> >
> > TBH, I don't see how we can do event type obfuscation
> > that is both usable and not confusing, because the concept is
> > confusing. I understand the reasoning behind it, but I don't think
> > that many users will.
> >
> > I'm hoping that you can prove me wrong and find a way to simplify
> > the API while retaining fair usability.
>
> I was thinking about this. If I understand the problem right, depending on
> the usecase we may need with each event some subset of 'object fid',
> 'directory fid', 'name in directory'. So what if we provided all these
> three things in each event? Events will get somewhat bloated but it may be
> bearable.
>
I agree.
What I like about the fact that users don't need to choose between
'parent fid' and 'object fid' is that it makes some hard questions go away:
1. How are "self" events reported? simple - just with 'object id'
2. How are events on disconnected dentries reported? simple - just
with 'object id'
3. How are events on the root of the watch reported? same answer
Did you write 'directory fid' as opposed to 'parent fid' for a reason?
Was it your intention to imply that events on directories (e.g.
open/close/attrib) are
never reported with 'parent fid' , 'name in directory'?
I see no functional problem with making that distinction between directory and
non-directory, but I have a feeling that 'parent fid', 'name in
directory', 'object id',
regardless of dir/non-dir is going to be easier to document and less confusing
for users to understand, so this is my preference.
> With this information we could reliably reconstruct (some) path (we always
> have directory fid + name), we can reliably identify the object involved in
> the change (we always have object fid). I'd still prefer if we obfuscated
> directory events, without possibility of filtering based of
> CREATE/DELETE/MOVE (i.e., just one FAN_DIR_MODIFY event for this fanotify
> group) - actually I have hard time coming with a usecase where application
> would care about one type of event and not the other one. The other events
> remain as they are. What do you think?
That sounds like a plan.
I have no problem with the FAN_DIR_MODIFY obfuscation.
Will re-work the patches and demo.
Thanks,
Amir.
next prev parent reply other threads:[~2020-01-07 18:56 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-04 10:02 File monitor problem Mo Re Ra
2019-12-04 12:53 ` Amir Goldstein
2019-12-04 14:24 ` Mo Re Ra
2019-12-04 17:34 ` Jan Kara
2019-12-04 18:37 ` Amir Goldstein
2019-12-04 19:02 ` Matthew Wilcox
2019-12-04 20:27 ` Amir Goldstein
2019-12-11 10:06 ` Jan Kara
2019-12-11 13:58 ` Amir Goldstein
2019-12-16 15:00 ` Amir Goldstein
2019-12-19 7:33 ` Amir Goldstein
2019-12-23 18:19 ` Jan Kara
2019-12-23 19:14 ` Amir Goldstein
2019-12-24 3:49 ` Amir Goldstein
2019-12-31 11:53 ` Amir Goldstein
2020-01-07 17:10 ` Jan Kara
2020-01-07 18:56 ` Amir Goldstein [this message]
2020-01-08 9:04 ` Jan Kara
2020-01-08 10:25 ` Amir Goldstein
2020-01-08 12:04 ` Jan Kara
2019-12-07 12:36 ` Mo Re Ra
2019-12-10 16:55 ` Jan Kara
2019-12-10 20:49 ` Amir Goldstein
2019-12-11 22:06 ` Wez Furlong
2019-12-12 5:56 ` Amir Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAOQ4uxjx_n3f44yu9_2dGxtBGy3WssG0xfZykwjQ+n=Wcii2-w@mail.gmail.com' \
--to=amir73il@gmail.com \
--cc=jack@suse.cz \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mbobrowski@mbobrowski.org \
--cc=more7.rev@gmail.com \
--cc=wez@fb.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).