public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC] 0/11 fanotify: fscking all notifiction and file access system (intended for antivirus scanning and file indexers)
@ 2008-09-26 21:07 Eric Paris
  2008-09-26 21:34 ` Alan Cox
  2008-09-27  6:05 ` david
  0 siblings, 2 replies; 11+ messages in thread
From: Eric Paris @ 2008-09-26 21:07 UTC (permalink / raw)
  To: linux-kernel, malware-list
  Cc: arjan, bunk, tytso, tvrtko.ursulin, alan, david, hch, andi, viro,
	peterz, Jonathan.Press, riel

The following is a file notification and access system intended to allow
a variety of userspace programs to get information about filesystem
events no matter where or how they happen on a system and use that in
conjunction with the actual on disk data related to that event to
provide additional services such as file change indexing or content
based antivirus scanning.  Minor changes are almost certainly possible
to make this notification and access interface usable for HSMs.  fscking
all notify is generally refered to as fanotify.  The ideas behind this
code are based on talpa the GPL antivirus interface originally pioneered
by Sophos and on the feedback from lkml and malware-list.  This is
however a complete rewrite from scratch, so if you remember talpa 'this
ain't it.'

the most up2date (but not always working) patch set can always be found
at http://people.redhat.com/~eparis/fanotify

comments, attacks, criticism, bad names, and really just about anything
can be sent to me but please lets not rehash useless conversations!  I
will send the full patch set to both lists, but I'm not going to cc
everyone individually.

**fanotify-executive-summary**

fanotify has 7 event types and only sends events for S_ISREG() files.
The event types are OPEN, READ, WRITE, CLOSE_WRITE, CLOSE_NOWRITE,
OPEN_ACCESS, and READ_ACCESS.  Events OPEN_ACCESS and READ_ACCESS
require that the listener return some sort of allow/deny/more_time
response as the original process blocks until it gets an event (or times
out.)  listeners may register a group which will get notifications about
any combination of these events.  Antivirus scanners will likely want
OPEN_ACCESS and READ_ACCESS while file indexers would likely use the
non-ACCESS form of these events.

groups are a construct in which userspace indicates what priority (only
really used for ACCESS type events) and what type of events its
listeners want to hear.  A single group may have unlimited listeners but
each event will only go to ONE listener.  Two groups may register for
the same type of events and one listener in EACH group will get a copy
of the event.

fanotify has 3 main 'special' files per group which a userspace listener
uses to interact with the kernel.

notification file - listeners read a string containing information about
an fs event from this file and a new fd will be created in the listener
context related to this event.

fastpath file - userspace programs may write a string into this file
which will add an fanotify_fastpath to the inode associated with the
given open fd.  A fastpath is merely an in core tag on an inode which
indicated that events for that inode do not need to be sent to the
fanotify listener until the file changes.

access file - some events require userspace permission (possibly open or
read.)  When userspace gets such an event from the notification file it
needs to write a response down the access file so the kernel can
complete the original action.

-----------------
fanotify, long winded:

Everything in a _user.c file is how the user interacts.  They typically
handle a single special file's IO and then call into functions in the
file with the corresponding name without _user.  My layout is like so

fanotify.c     --- all functions called from the main kernel
group.c        --- groups are my implementation of my multiple listeners
                   you can register different groups to get different
                   event types.
notification.c --- implementation surrounding the sending of events to a
                   userspace listener.
fastpath.c     --- implementation surrounding the addition of inode
                   fastpath entries for performance.
access.c       --- implementation surrounding the processing of
                   responses from userspace when the events require
                   a response.

fanotify is a new fscking all notify subsystem.  Much as inotify
provides notification of filesystem activity for some registered subset
of inodes fanotify provides notification of filesystem activity for ALL
of the system's S_ISREG() files.  fanotify has a smaller number of
notifications than inotify.

GROUPS AND EVENTS:
A "group" registers with fanotify and in doing so indicates what that
group should be called and what type of events it wishes to receive and
in what order it should receive events relative to other groups.  An
"event" is simply a notification about some filesystem action.  The list
of all fanotify events are read, write, open, close,
open_need_access_decision, close_need_access_decision.  Any number of
groups may be created for any subset of event types.  One group may
register to get reads and writes while another maybe register for opens
and read_need_access_desision.

Any number of userspace listeners may be active in a single group.  Each
group will get ONE copy of any filesystem event.  If there are 10
listeners in a single group and one fanotify event is generated only ONE
of those listeners will get the event.  If more than one group registers
for the same type of event one listener in EACH GROUP will get a copy of
that event.

BASIC TERMINOLOGY:

listener process - The fanotify aware process which is receiving events
from the notification special file and possibly writing answers back to
the kernel over the fastpath or access file.

original process - A normal linux process which is doing 'something' on
the filesystem.  For the purposes of this example this process will be
opening a file.

registration file - the file, /security/fanotify/register, used to
create fanotify groups.

notification file - the file, /security/fanotify/[name]/notification,
used for the listener process to get events from the kernel.

fastpath file - the file, /security/fanotify/[name]/fastpath, used to
send fastpath or 'cache' information to the kernel.

access file - the file /security/fanotify/[name]/access, used to send
access decisions back to the kernel if they are required for a given
event.

It all starts when 'something' registers a group.  Registering a group
is as simple as 'echo "open_grp 50 0x10" > /security/fanotify/register.
open_grp is just the name of the group, 50 is the priority (only
interesting for blocking/access events, will describe later) and 0x10 is
FAN_OPEN. If one wanted open and close you would use 0x1c = (FAN_OPEN |
FAN_CLOSE).  Inside the kernel this creates the new directory called
'open_grp' and the notification, fastpath, and access file inside that
directory.  A struct fanotify_group is allocated and initialized (see
fanotify_register_group()).  The group is added to a kernel global list
called groups.

Next the listener process will open (RD_ONLY) the notification file. The
group num_clients is incremented at this time. We will call read() on
that file.  Since the group at this point has no events to send to
userspace the listener process will block on the group->event_waitq.

Now lets say the original process calls open().  Open is going to happen
exactly as before until it gets to the fsnotify code (this is where both
inotify and dnotify hook into the kernel.)  From fsnotify we will call
into the function fanotify() with the mask FAN_OPEN.  We will then walk
to global groups list (which is ordered by priority, low first) looking
for any groups which want to receive notification about FAN_OPEN and we
will find the group 'open_grp' that was registered above. An
fanotify_event is allocated and any data we want the listener process to
get about the original process is added to the fanotify_event.  The
event contains a struct path with the dentry and vfsmount from the open
done by the original process.

Now we call add_event_to_group_notification() to add the event to the
group->notification_list.  This function has a little bit of magic.
Since an event may be needed in multiple groups notification_list I
created a helper structure, a struct fanotify_event_holder.  Each entry
in the group->event_list points to a unique event_holder which in turn
points to the ref counted event in question.

(assuming 2 groups)
group1->notification_list ==> fanotify_event_holder1 ==> single_fanotify_event
group2->notification_list ==> fanotify_event_holder2 ==> single_fanotify_event

The magic is that since we will always need at least 1 holder I embedded
one fanotify_event_holder inside an fanotity_event.  This means that
when removing an event from the group->notification_list we may need to
free the fanotify_event_holder (if it was allocated seperately) or we
may need to just clear it (if it was the embedded holder.)

After the event is added to the group->notification_list we wake up the
listener processes.  The original process never blocked and at this
point and is returning to userspace with the completed open.

Simultaneously the listener process will now remove the event from the
group->notification_list, see remove_event_from_group_notification().
Create a new file, fd, and install such in the listener process, see
fanotify_notification_read().  We will put_event (since this group
is finished with it) and will return the read() call to userspace.

The listener process will get a string that looks like "fd=10 cookie=0
mask=10."  This is telling the listener process that a new fd has been
created, #10.  The cookie (if this notification required an access
decision) was 0 and the mask of the event was 0x10 (FAN_OPEN.)

The listener process must call close(10) when it is finished with this
new fd.  But lets assume the listener, for whatever reason, decides it
doesn't want to hear any more of this type of message for this inode.
That means the listener process needs to "create a fastpath" entry.  To
do this the listener process needs to open (or have open) the fastpath
file.  After that all it needs to do is write to that file something
like "10 0x10."  This says 'create a fastpath entry for the inode
associated with my fd #10 for events of type 0x10 (FAN_OPEN).'

Inside the kernel (fanotify_fastpath_add()) what happens is that we will
create a new fanotify_fastpath_entry for the group and mask in question
and attach it to the inode.  The next time a process opens the inode in
question we will search the global groups list for a group that matches
the mask and we will look at the inode to see if there is a fastpath
entry for this group and mask.  If there is an entry no event will be
added to the group->event_list.

This is the end of 'a day in the life of fanotify when there are no
access decisions.'

Assuming the event was for FAN_OPEN_ACCESS much of the above is the
same.  Biggest difference is that the place from the kernel we will call
into the fanotify code is different (fsnotify is not in a good place to
provide security hooks).  If a group is found that wants this event the
event is added to the group->access_list AND to the group->event_list.
The original process is then blocked for a (now fixed 5 second) timeout
waiting for the event to get a non-zero event->response on the
group->access_waitq.

The listener process will get a notification exactly as above from the
notification file but this time will need to write an answer to the
access file.  The answer is again a simple string indicating the cookie
and the response (allow/deny).  If a response is received from userspace
the event is removed from the group->access_list and the original
process is woken up to continue, either by looking for the next group or
by returning -EPERM.  Userspace may also return FAN_RESETTIMER which
will reset the 5 second timeout.  A badly behaving userspace may hang an
open indeffinetly.

If the original process times out waiting for the listener process to
give a response we currently just allow the security access.

An interesting part of the code is fastpath cleanup handling.  Any time
fanotify gets a FAN_MODIFY event we clear the fastpath entries for the
associated inode.  This means our notification and access decisions are
NOT race free but the races are small.  This is not perfect security.  A
'problomatic' sequence of events would be like such

process1 calls read on file
listener scans file and finds it safe
process2 writes to file
listener creates a fastpath entry for file

Maybe I should add an explicit clear fastpath request so userspace can
close that race when it gets the write notification on file (if it so
chooses.)  Remember I'm not trying to protect against a rogue process
intelligently and actively attacking the system.  I'm trying to stop a
correctly functioning yum from reading a bad rpm that it downloaded.
I'm trying to stop an NFS server from accepting bad files and then
apache serving them out on the net.

There is the also a race between when a process calls mmap() (at that
point we get an event and scan) and when the process actually faults in
a page which may have been changed by a write from another process.  Its
not perfect, but it's damn sure better than what we have now.

Object lifetime:

fanotify_group - exists from registration to unregistration.
Unregistration racing with some the associated special file notification
open was a bitch to figure out but eventually I based it on the groups
existence in the global groups list.

fanotify_event - created inside the main fanotify loop which runs the
events list.  An event lasts until both the main loop ends AND the event
is no longer needed by all groups for which it was queued.  A refcnt on
the event is taken every time it is added to a group notification/access
list and is dropped when the group has removed the event from the list
and is finished with its contents.

fanotify_event_holder - allocated when an event is added to a group
notification/access list.  destroyed when an event is removed from a
notification/access list.  There is the special case of the embedded
holder inside the fanotify_event.  The embedded holder is assumed to be
available for use if holder->event_list is empty.

fanotify_fastpath_entry - created when a process writes to the fastpath
special file and added to the inode list.  This entry is destroy in 3
possible places.  If an inode has a modify event we flush them all.  If
an inode is eviced from core we flush them all.  If a group is
unregistered we flush them all for that group.


 fs/Kconfig                    |   39 --
 fs/Makefile                   |    5 
 fs/aio.c                      |    7 
 fs/compat.c                   |    5 
 fs/dnotify.c                  |  194 ----------
 fs/inode.c                    |    6 
 fs/inotify.c                  |  773 ------------------------------------------
 fs/inotify_user.c             |  768 -----------------------------------------
 fs/nfsd/vfs.c                 |    4 
 fs/notify/Kconfig             |   52 ++
 fs/notify/Makefile            |    6 
 fs/notify/access.c            |  160 ++++++++
 fs/notify/access_user.c       |  144 +++++++
 fs/notify/dnotify.c           |  194 ++++++++++
 fs/notify/fanotify.c          |  172 +++++++++
 fs/notify/fanotify.h          |  159 ++++++++
 fs/notify/fastpath.c          |  204 +++++++++++
 fs/notify/fastpath_user.c     |  159 ++++++++
 fs/notify/group.c             |  204 +++++++++++
 fs/notify/group_user.c        |  158 ++++++++
 fs/notify/info_user.c         |   85 ++++
 fs/notify/inotify.c           |  773 ++++++++++++++++++++++++++++++++++++++++++
 fs/notify/inotify_user.c      |  768 +++++++++++++++++++++++++++++++++++++++++
 fs/notify/notification.c      |  174 +++++++++
 fs/notify/notification_user.c |  306 ++++++++++++++++
 fs/open.c                     |    7 
 fs/read_write.c               |   14 
 include/linux/fanotify.h      |   76 ++++
 include/linux/fs.h            |    5 
 include/linux/fsnotify.h      |   31 +
 include/linux/sched.h         |    1 
 31 files changed, 3859 insertions(+), 1794 deletions(-)

(without the inotify/dnotify move its more like 2000 insertions 50 deletions)

01-fsnotify-subdir: move inotify and dnotify into a subdir
02-fsnotify-files-not-inodes - pass files not inodes to fsnotify
03-fanotify - basic implementation of groups and notification
04-fanotify-group-info - export info about groups RO
05-fanotify-fastpaths - implementation of fastpaths
06-fanotify-group-priorities - add group priorities
07-fanotify-access-decisions - access file and permissions
08-fanotify-access-reset-timer - reset the timeout for a read if listener still working
09-fanotify-metadata-pid - send original process pid to listener
10-fanotify-metadata-tgid - send original process tgid to listener
11-fanotify-metadata-flags - send original process f_flags to listener




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] 0/11 fanotify: fscking all notifiction and file access system (intended for antivirus scanning and file indexers)
  2008-09-26 21:07 [RFC] 0/11 fanotify: fscking all notifiction and file access system (intended for antivirus scanning and file indexers) Eric Paris
@ 2008-09-26 21:34 ` Alan Cox
  2008-09-26 21:48   ` [malware-list] " Greg KH
                     ` (2 more replies)
  2008-09-27  6:05 ` david
  1 sibling, 3 replies; 11+ messages in thread
From: Alan Cox @ 2008-09-26 21:34 UTC (permalink / raw)
  To: Eric Paris
  Cc: linux-kernel, malware-list, arjan, bunk, tytso, tvrtko.ursulin,
	david, hch, andi, viro, peterz, Jonathan.Press, riel

> It all starts when 'something' registers a group.  Registering a group
> is as simple as 'echo "open_grp 50 0x10" > /security/fanotify/register.

I thought the operation was usually called "mkdir" which also nicely
deals with races and exclusion.

> open_grp is just the name of the group, 50 is the priority (only
> interesting for blocking/access events, will describe later) and 0x10 is
> FAN_OPEN. If one wanted open and close you would use 0x1c = (FAN_OPEN |
> FAN_CLOSE).  Inside the kernel this creates the new directory called

How do you change group on the fly in this model ?

> The listener process will get a string that looks like "fd=10 cookie=0
> mask=10."  This is telling the listener process that a new fd has been
> created, #10.  The cookie (if this notification required an access
> decision) was 0 and the mask of the event was 0x10 (FAN_OPEN.)

Ok that is foul as an interface, utterly gross. I guess it would be
useful to also be able to not want fds

> event is added to the group->access_list AND to the group->event_list.
> The original process is then blocked for a (now fixed 5 second) timeout
> waiting for the event to get a non-zero event->response on the
> group->access_waitq.

That raises security and correctness questions with things like "make it
swap hard" attacks. Given that any timeout can be configured its not a
big deal. Do need to handle process death or close of the notification
descriptors.

I think the mechanism is pretty sound. There are some "how do I" cases to
do with open and watching for events when I want to rescan something as
it has been dirty for a while. I'm not sure mmap dirty properly updates
the file mtime - that wants doing anyway for backups tho so is the real
fix.

The userspace API you propoe should however be taken out and shot, then
buried with a stake through its heart, holy water in its mouth and its
head cut off, at midnight in a pentacle at a crossroads in the presence
of a priest.

The two discussions are fortunately orthogonal. Is there any reason you
can't use the socket based notification model - that gives you a much
more natural way to express the thing


		socket
		bind(AF_FAN, group=foo+flags etc, PF_FAN);

		fd = accept(old_fd, &addr[returned info])

		close(fd);

as well as fairly natural and importantly standards defined semantics for
poll including polling for a new file handles, for reconfiguration of
stuff via get/setsockopt (which do pass stuff like object sizes unlike
ioctls) and for reading/writing data.

Its not quite the same as a normal socket given you accept and get a non
socket fd with the info you need in the return address area but its much
closer than the rather mad file system proposal.

It would certainly be sane enough to, for example, start righting
scanners in stuff like python-twisted or ruby on rails (not that this is
neccessarily a good thing!)

Alan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [malware-list] [RFC] 0/11 fanotify: fscking all notifiction and file access system (intended for antivirus scanning and file indexers)
  2008-09-26 21:34 ` Alan Cox
@ 2008-09-26 21:48   ` Greg KH
  2008-09-26 22:03   ` Eric Paris
  2008-10-02 19:24   ` Eric Paris
  2 siblings, 0 replies; 11+ messages in thread
From: Greg KH @ 2008-09-26 21:48 UTC (permalink / raw)
  To: Alan Cox
  Cc: Eric Paris, david, bunk, peterz, linux-kernel, malware-list, hch,
	andi, viro, arjan

On Fri, Sep 26, 2008 at 10:34:04PM +0100, Alan Cox wrote:
> > It all starts when 'something' registers a group.  Registering a group
> > is as simple as 'echo "open_grp 50 0x10" > /security/fanotify/register.
> 
> I thought the operation was usually called "mkdir" which also nicely
> deals with races and exclusion.

So does configfs.  Eric, why not use that instead, it sounds like it
will work here nicely.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] 0/11 fanotify: fscking all notifiction and file access system (intended for antivirus scanning and file indexers)
  2008-09-26 21:34 ` Alan Cox
  2008-09-26 21:48   ` [malware-list] " Greg KH
@ 2008-09-26 22:03   ` Eric Paris
  2008-10-02 19:24   ` Eric Paris
  2 siblings, 0 replies; 11+ messages in thread
From: Eric Paris @ 2008-09-26 22:03 UTC (permalink / raw)
  To: Alan Cox
  Cc: linux-kernel, malware-list, arjan, bunk, tytso, tvrtko.ursulin,
	david, hch, andi, viro, peterz, Jonathan.Press, riel

On Fri, 2008-09-26 at 22:34 +0100, Alan Cox wrote:
> > It all starts when 'something' registers a group.  Registering a group
> > is as simple as 'echo "open_grp 50 0x10" > /security/fanotify/register.
> 
> I thought the operation was usually called "mkdir" which also nicely
> deals with races and exclusion.
> 
> > open_grp is just the name of the group, 50 is the priority (only
> > interesting for blocking/access events, will describe later) and 0x10 is
> > FAN_OPEN. If one wanted open and close you would use 0x1c = (FAN_OPEN |
> > FAN_CLOSE).  Inside the kernel this creates the new directory called
> 
> How do you change group on the fly in this model ?

you don't, you create a new one and unregister the old one if you want
something different.  There is no limit on the number of groups and
registered groups with nothing actively sitting there with the
notification file open have very minimal performance hit.

> 
> > The listener process will get a string that looks like "fd=10 cookie=0
> > mask=10."  This is telling the listener process that a new fd has been
> > created, #10.  The cookie (if this notification required an access
> > decision) was 0 and the mask of the event was 0x10 (FAN_OPEN.)
> 
> Ok that is foul as an interface, utterly gross. I guess it would be
> useful to also be able to not want fds

I took great care in making sure the interface and the implementation
were cleanly separated.  Heck, they are even in different _user files.
I clearly remembered gregkh hating me passing binary blobs and you
suggested syscalls.  This interface was to be easily extended, quickly
prototyped, and eventually thrown away for something the list likes.
The main goal was to make sure all communication was unidirectional and
race free.  A very similar interface with syscalls could use

fanotify_control (need to think about it, register/unregister)
fd = fanotify_get_notify(%[buffer for string of metadata])
error = fanotify_send_mesg(access/fastpath, value, cookie, fd)

> > event is added to the group->access_list AND to the group->event_list.
> > The original process is then blocked for a (now fixed 5 second) timeout
> > waiting for the event to get a non-zero event->response on the
> > group->access_waitq.
> 
> That raises security and correctness questions with things like "make it
> swap hard" attacks. Given that any timeout can be configured its not a
> big deal. Do need to handle process death or close of the notification
> descriptors.

You're suggesting a malicious program attached to a listener?  Yeah,
they can do horrible things to your machine.  My thoughts were these
files are root only and selinux can easily control who can read/write
from them....

> I think the mechanism is pretty sound. There are some "how do I" cases to
> do with open and watching for events when I want to rescan something as
> it has been dirty for a while. I'm not sure mmap dirty properly updates
> the file mtime - that wants doing anyway for backups tho so is the real
> fix.

not sure what you meant by part 1.  ACCESS events require an immediate
answer.  If you want to batch up some write events and scan it with
another process that's fine.  Pass your fd to that other process and
remember the pid of that other process.  Every time you get an event
from that other process just allow it.  That other process should not
have trouble adding the fastpath entry itself.

I thought we fixed mmap updates mtime a while back.  I'll test and make
sure.  That would throw a huge wrench in the works...

> The userspace API you propoe should however be taken out and shot, then
> buried with a stake through its heart, holy water in its mouth and its
> head cut off, at midnight in a pentacle at a crossroads in the presence
> of a priest.

shooting for an lwn quote of the week?

> 
> The two discussions are fortunately orthogonal. Is there any reason you
> can't use the socket based notification model - that gives you a much
> more natural way to express the thing
> 
> 
> 		socket
> 		bind(AF_FAN, group=foo+flags etc, PF_FAN);
> 
> 		fd = accept(old_fd, &addr[returned info])
> 
> 		close(fd);
> 
> as well as fairly natural and importantly standards defined semantics for
> poll including polling for a new file handles, for reconfiguration of
> stuff via get/setsockopt (which do pass stuff like object sizes unlike
> ioctls) and for reading/writing data.
> 
> Its not quite the same as a normal socket given you accept and get a non
> socket fd with the info you need in the return address area but its much
> closer than the rather mad file system proposal.
> 
> It would certainly be sane enough to, for example, start righting
> scanners in stuff like python-twisted or ruby on rails (not that this is
> neccessarily a good thing!)

The socket model you describe works very well and cleanly to replace the
'notification' part, but I can't think offhand how to send information
nearly as cleanly back.  I guess we replace writing to access and
fastpath with setsockopt?  Now how to make those easily extensible.....


As an aside I'm trying to get some quick and dirty perf numbers.  My
scsi driver isn't loading on my test machine with hand built kernel so I
might not have any numbers till monday.

-Eric


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] 0/11 fanotify: fscking all notifiction and file access system (intended for antivirus scanning and file indexers)
  2008-09-26 21:07 [RFC] 0/11 fanotify: fscking all notifiction and file access system (intended for antivirus scanning and file indexers) Eric Paris
  2008-09-26 21:34 ` Alan Cox
@ 2008-09-27  6:05 ` david
  2008-09-27 11:20   ` Alan Cox
  2008-09-27 14:04   ` Eric Paris
  1 sibling, 2 replies; 11+ messages in thread
From: david @ 2008-09-27  6:05 UTC (permalink / raw)
  To: Eric Paris
  Cc: linux-kernel, malware-list, arjan, bunk, tytso, tvrtko.ursulin,
	alan, hch, andi, viro, peterz, Jonathan.Press, riel

On Fri, 26 Sep 2008, Eric Paris wrote:

> fanotify has 7 event types and only sends events for S_ISREG() files.
> The event types are OPEN, READ, WRITE, CLOSE_WRITE, CLOSE_NOWRITE,
> OPEN_ACCESS, and READ_ACCESS.  Events OPEN_ACCESS and READ_ACCESS
> require that the listener return some sort of allow/deny/more_time
> response as the original process blocks until it gets an event (or times
> out.)  listeners may register a group which will get notifications about
> any combination of these events.  Antivirus scanners will likely want
> OPEN_ACCESS and READ_ACCESS while file indexers would likely use the
> non-ACCESS form of these events.

sending a message out for every READ/WRITE seems like it will generate a 
LOT of messages, and very few will be ones that anyone cares about.

one of the nice things about the TALPA approach was that there was an 
ability to notify only on a change of state (i.e. when a file that had 
been scanned was changed)

this could do a similar thing, but I think it would be a much more 
expensive process to do it all in userspace.

David Lang


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] 0/11 fanotify: fscking all notifiction and file access system (intended for antivirus scanning and file indexers)
  2008-09-27  6:05 ` david
@ 2008-09-27 11:20   ` Alan Cox
  2008-10-06 15:09     ` Pavel Machek
  2008-09-27 14:04   ` Eric Paris
  1 sibling, 1 reply; 11+ messages in thread
From: Alan Cox @ 2008-09-27 11:20 UTC (permalink / raw)
  To: david
  Cc: Eric Paris, linux-kernel, malware-list, arjan, bunk, tytso,
	tvrtko.ursulin, hch, andi, viro, peterz, Jonathan.Press, riel

> sending a message out for every READ/WRITE seems like it will generate a 
> LOT of messages, and very few will be ones that anyone cares about.

On read there isn't much point anyway, on write if you simply send one,
save an event counter number and don't send another until the last one is
cleared it all works well. When the last event is cleared if another
event has occurred then the event counter will have changed so you know
to send one immediately, if the app doesn't want to receive them for a
while it can just hang onto the event for a minute or two before clearing
it.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] 0/11 fanotify: fscking all notifiction and file access system (intended for antivirus scanning and file indexers)
  2008-09-27  6:05 ` david
  2008-09-27 11:20   ` Alan Cox
@ 2008-09-27 14:04   ` Eric Paris
       [not found]     ` <5CB739747AC639489F3E8210950C3E555C39EE@geousmail3.GEO.CORP.HCL.IN>
  1 sibling, 1 reply; 11+ messages in thread
From: Eric Paris @ 2008-09-27 14:04 UTC (permalink / raw)
  To: david
  Cc: linux-kernel, malware-list, arjan, bunk, tytso, tvrtko.ursulin,
	alan, hch, andi, viro, peterz, Jonathan.Press, riel

On Fri, 2008-09-26 at 23:05 -0700, david@lang.hm wrote:
> On Fri, 26 Sep 2008, Eric Paris wrote:
> 
> > fanotify has 7 event types and only sends events for S_ISREG() files.
> > The event types are OPEN, READ, WRITE, CLOSE_WRITE, CLOSE_NOWRITE,
> > OPEN_ACCESS, and READ_ACCESS.  Events OPEN_ACCESS and READ_ACCESS
> > require that the listener return some sort of allow/deny/more_time
> > response as the original process blocks until it gets an event (or times
> > out.)  listeners may register a group which will get notifications about
> > any combination of these events.  Antivirus scanners will likely want
> > OPEN_ACCESS and READ_ACCESS while file indexers would likely use the
> > non-ACCESS form of these events.
> 
> sending a message out for every READ/WRITE seems like it will generate a 
> LOT of messages, and very few will be ones that anyone cares about.
> 
> one of the nice things about the TALPA approach was that there was an 
> ability to notify only on a change of state (i.e. when a file that had 
> been scanned was changed)
> 
> this could do a similar thing, but I think it would be a much more 
> expensive process to do it all in userspace.

See the fastpath patch and explaination.  Doesn't help for writes...


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] 0/11 fanotify: fscking all notifiction and file access system (intended for antivirus scanning and file indexers)
  2008-09-26 21:34 ` Alan Cox
  2008-09-26 21:48   ` [malware-list] " Greg KH
  2008-09-26 22:03   ` Eric Paris
@ 2008-10-02 19:24   ` Eric Paris
  2008-10-02 20:48     ` Alan Cox
  2 siblings, 1 reply; 11+ messages in thread
From: Eric Paris @ 2008-10-02 19:24 UTC (permalink / raw)
  To: Alan Cox
  Cc: linux-kernel, malware-list, arjan, bunk, tytso, tvrtko.ursulin,
	david, hch, andi, viro, peterz, Jonathan.Press, riel, greg

On Fri, 2008-09-26 at 22:34 +0100, Alan Cox wrote:

> The two discussions are fortunately orthogonal. Is there any reason you
> can't use the socket based notification model - that gives you a much
> more natural way to express the thing
> 
> 
> 		socket
> 		bind(AF_FAN, group=foo+flags etc, PF_FAN);
> 
> 		fd = accept(old_fd, &addr[returned info])
> 
> 		close(fd);
> 
> as well as fairly natural and importantly standards defined semantics for
> poll including polling for a new file handles, for reconfiguration of
> stuff via get/setsockopt (which do pass stuff like object sizes unlike
> ioctls) and for reading/writing data.

An hour, a whiteboard, 3 other hackers and I think I have a handle on
something you might like a little more.

groups will be 'created' when you call bind().  the struct sockaddr will
include a priority and an event mask.  Group names will be eliminated
since priorities must be unique.  Two processes will be allowed to
bind() to the same priority if the mask is the same.  Those will be
considered to be in the same 'group.'

groups will be destroy when ALL fd's associated with that group are
closed.

calling accept() on the socket from bind will return a new fd.  This fd
will be created in the kernel using dentry_open() (just like i do it
today) only I will then try to overload and bastardize the new file to
add additional support so that this it will allow sendmsg() with flags =
MSG_OOB or setsockopt().  I'll also present an alternative below.

the struct sockaddr from the accept() will be filled with a something
like

struct fan_sockaddr {
	int version;
	unsigned int mask;
	pid_t pid;
	pid_t tgid;
	int f_flags;
}

so this will be a binary interface for metadata.  Sending the metadata
about the open fd up the sockaddrs is very slick, but not easily
extended that I can see.  Guess we need to get the metadata right the
first time.

One way to do responses from the listeners (like access decisions and
fastpath entries) would be by sending a message back down the new fd
using sendmsg(MSG_OOB).  The PF_FAN 'stuff' should be able to get this
message and do its magic.  I don't have a format for this message
thought up.  Maybe __u32 len, __u32 version, do whatever.  Maybe people
would prefer I bastardize on setsockopts() for this new fd send to the
listener?  Alternative still to come...

Some operations a listener program might want to do may not be
associated with an event.  This might include flushing all fastpaths on
a definitions update or preemptively adding a fastpath entry for an fd.
I suggest calling connect() on the bound fd to connect to the kernel
PF_FAN system.  This new fd from connect can then take commands using
either sendmsg() or setsockopt() or really anything since it doesn't
have a 'real file' on the backend.

Is this reasonable?  I don't even know the technical hurdles I'm going
to get trying to take a regular file that would pass S_ISREG and adding
on sendmesg() or setsockopt().

So the alternative would be that I could make ALL listener->kernel
communication go over the fd that came from connect().  Setting a
fastpath would be something like

setsockopt(connect_fd, FAN_LEVEL, fastpath_val, fastpath_struct, len);

with fastpath_struct something like

struct fastpath {
	int fd;
	unsigned int mask
}

Where fd was a currently open fd in the listener process that the
fastpath was intended to be applied to.  If I'm using a fd from connect
I can really use any method I want but setsockopt() seems nice since it
includes types and lens.  Is is it bad to have a fd whose only
implemented function is setsockopt?

-Eric


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] 0/11 fanotify: fscking all notifiction and file access system (intended for antivirus scanning and file indexers)
  2008-10-02 19:24   ` Eric Paris
@ 2008-10-02 20:48     ` Alan Cox
  0 siblings, 0 replies; 11+ messages in thread
From: Alan Cox @ 2008-10-02 20:48 UTC (permalink / raw)
  To: Eric Paris
  Cc: linux-kernel, malware-list, arjan, bunk, tytso, tvrtko.ursulin,
	david, hch, andi, viro, peterz, Jonathan.Press, riel, greg

> struct fan_sockaddr {
> 	int version;
> 	unsigned int mask;
> 	pid_t pid;
> 	pid_t tgid;
> 	int f_flags;
> }
> 
> so this will be a binary interface for metadata.  Sending the metadata
> about the open fd up the sockaddrs is very slick, but not easily
> extended that I can see.  Guess we need to get the metadata right the
> first time.

Usual way socket stuff covers for that is to stick

	unsigned int __unused[8];

or similar on the end...

> PF_FAN system.  This new fd from connect can then take commands using
> either sendmsg() or setsockopt() or really anything since it doesn't
> have a 'real file' on the backend.

That is the normal socket approach. Eg in traditional BSD interfaces for
IP routing you created an AF_INET socket and frobbed with it.

> So the alternative would be that I could make ALL listener->kernel
> communication go over the fd that came from connect().  Setting a

Probably a lot saner

> includes types and lens.  Is is it bad to have a fd whose only
> implemented function is setsockopt?

Not really. Lots of socket types have operations that are essentially

	fd = socket(...)
	ioctl(fd, ....);
	close(fd);

or similar. Traditionally ioctl is used for system changing stuff but
that is just tradition.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC] 0/11 fanotify: fscking all notifiction and file access system (intended for antivirus scanning and file indexers)
  2008-09-27 11:20   ` Alan Cox
@ 2008-10-06 15:09     ` Pavel Machek
  0 siblings, 0 replies; 11+ messages in thread
From: Pavel Machek @ 2008-10-06 15:09 UTC (permalink / raw)
  To: Alan Cox
  Cc: david, Eric Paris, linux-kernel, malware-list, arjan, bunk, tytso,
	tvrtko.ursulin, hch, andi, viro, peterz, Jonathan.Press, riel

Hi!

> > sending a message out for every READ/WRITE seems like it will generate a 
> > LOT of messages, and very few will be ones that anyone cares about.
> 
> On read there isn't much point anyway, on write if you simply send one,
> save an event counter number and don't send another until the last one is
> cleared it all works well. When the last event is cleared if another
> event has occurred then the event counter will have changed so you know
> to send one immediately, if the app doesn't want to receive them for a
> while it can just hang onto the event for a minute or two before clearing
> it.

Actually both read and write seems useless, as both can be bypassed by
mmap...?

								Pavel 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [malware-list] [RFC] 0/11 fanotify: fscking all notifiction andfile access system (intended for antivirus scanning and fileindexers)
       [not found]     ` <5CB739747AC639489F3E8210950C3E555C39EE@geousmail3.GEO.CORP.HCL.IN>
@ 2008-10-07 17:36       ` Eric Paris
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Paris @ 2008-10-07 17:36 UTC (permalink / raw)
  To: Michael Morley, HCL America; +Cc: malware-list, linux-kernel

On Tue, 2008-10-07 at 10:32 -0700, Michael Morley, HCL America wrote:
> > 
> > On Fri, 2008-09-26 at 23:05 -0700, david@lang.hm wrote:
> > > On Fri, 26 Sep 2008, Eric Paris wrote:
> > >
> > > > fanotify has 7 event types and only sends events for S_ISREG()
> files.
> > > > The event types are OPEN, READ, WRITE, CLOSE_WRITE, CLOSE_NOWRITE,
> > > > OPEN_ACCESS, and READ_ACCESS.  Events OPEN_ACCESS and READ_ACCESS
> > > > require that the listener return some sort of allow/deny/more_time
> > > > response as the original process blocks until it gets an event (or
> > times
> > > > out.)  listeners may register a group which will get notifications
> > about
> > > > any combination of these events.  Antivirus scanners will likely
> want
> > > > OPEN_ACCESS and READ_ACCESS while file indexers would likely use
> the
> > > > non-ACCESS form of these events.
> > >
> > > sending a message out for every READ/WRITE seems like it will
> generate a
> > > LOT of messages, and very few will be ones that anyone cares about.
> > >
> > > one of the nice things about the TALPA approach was that there was
> an
> > > ability to notify only on a change of state (i.e. when a file that
> had
> > > been scanned was changed)
> > >
> > > this could do a similar thing, but I think it would be a much more
> > > expensive process to do it all in userspace.
> > 
> > See the fastpath patch and explaination.  Doesn't help for writes...
> > 
> 
> Eric, have you considered the scenario where the listening process
> appears to have stopped responding to access events? Under your design,
> the original process would be released after 5 seconds. Too many of
> these timeouts could wreak havoc on the OS. There should be some logic
> in fanotify to remove the fanotify_group after a certain number of
> timeouts which may or may not have to be sequential.

anyone have thoughts on the topic?  Maybe I'll revisit it after I get a
new user interface.  25 missed permission events and I can just evict a
group altogether.  Should the counter be cleared if a listener makes a
decision?

> Less importantly, it would be nice if the listening process could set
> the timeout value when it registers with fanotify (with some limits of
> course).

noted.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-10-07 17:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-26 21:07 [RFC] 0/11 fanotify: fscking all notifiction and file access system (intended for antivirus scanning and file indexers) Eric Paris
2008-09-26 21:34 ` Alan Cox
2008-09-26 21:48   ` [malware-list] " Greg KH
2008-09-26 22:03   ` Eric Paris
2008-10-02 19:24   ` Eric Paris
2008-10-02 20:48     ` Alan Cox
2008-09-27  6:05 ` david
2008-09-27 11:20   ` Alan Cox
2008-10-06 15:09     ` Pavel Machek
2008-09-27 14:04   ` Eric Paris
     [not found]     ` <5CB739747AC639489F3E8210950C3E555C39EE@geousmail3.GEO.CORP.HCL.IN>
2008-10-07 17:36       ` [malware-list] [RFC] 0/11 fanotify: fscking all notifiction andfile access system (intended for antivirus scanning and fileindexers) Eric Paris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox