* fanotify - overall design before I start sending patches
@ 2009-07-24 20:13 Eric Paris
2009-07-24 20:48 ` david-gFPdbfVZQbY
` (8 more replies)
0 siblings, 9 replies; 63+ messages in thread
From: Eric Paris @ 2009-07-24 20:13 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
malware-list-h+Im9A44IAFcMpApZELgcQ
Cc: david-gFPdbfVZQbY, Valdis.Kletnieks-PjAqaU27lzQ,
a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
douglas.leeder-j34lQMj1tz/QT0dZR+AlfA,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w, aviro-H+wXaHxf7aLQT0dZR+AlfA,
jack-AlSwsSmVLrQ, jengelh-nopoi9nDyk+ELgA04lAiVw,
hch-wEGCiKHe2LqWVfeAwA7xHQ, pavel-AlSwsSmVLrQ,
alexl-H+wXaHxf7aLQT0dZR+AlfA, jcm-H+wXaHxf7aLQT0dZR+AlfA,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
arjan-wEGCiKHe2LqWVfeAwA7xHQ
I plan to start sending patches for fanotify in the next week or two.
I'd like to see more comments on the design, interface, and capabilities
in case there is a recognized need for major reworks or if I'm not
meeting some users needs (other than those noted at the end)
git://git.infradead.org/users/eparis/notify.git fanotify-experimental
should have working code to test what I'm talking about.
What is fanotify?
It is a new notification system that has a limited set of events (open,
close, read, write) in which notification not only comes with metadata
the describes what happened it also comes with an open file descriptor
to the object in question. fanotify will also allow the listener to
make access decisions on open and read events. This allows the
implementation of hierarchical storage management systems or an access
file scanning or integrity checking.
fanotify comes in two flavors 'directed' and 'global.' 'Directed' is
like inotify or dnotify in that you register specific inodes of interest
and only get events pertaining to those inodes. Global means you are
registering interest for event types system wide. With global mode the
listener program can later exclude objects from future events.
fanotify kernel/userspace interaction is over a new socket protocol. A
listener opens a new socket in the new PF_FANOTIFY family. The socket
is then bound to an address. Using the following struct:
struct fanotify_addr {
sa_family_t family;
__u32 priority;
__u32 group_num;
__u32 mask;
__u32 f_flags;
__u32 unused[16];
} __attribute__((packed));
The priority field indicates in which order fanotify listeners will get
events. Since 2 fanotify listeners would 'hear' each others events on
the new fd they create fanotify listeners will not hear events generated
by other fanotify listeners with a lower priority number.
The group_num is at the moment not used, but the plan was to allow 2
processes to bind to the same fanotify group and share the load of
processing events.
The f_flags is the flags which the fanotify listener wishes to use when
opening their notification fds. On access scanners would want to use
O_RDONLY, whereas HSM systems would need to use O_WRONLY.
The mask is the indication of the events this group is interested in.
The set of events of interest if FAN_GLOBAL_LISTENER is set at bind
time. If FAN_GLOBAL_LISTENER is not set, this field is meaningless as
the registration of events on individual inodes will dictate the
reception of events.
* FAN_ACCESS: every file access.
* FAN_MODIFY: file modifications.
* FAN_CLOSE: files are closed.
* FAN_OPEN: open() calls.
* FAN_ACCESS_PERM: like FAN_ACCESS, except that the process trying to
access the file is put on hold while the fanotify client decides whether
to allow the operation.
* FAN_OPEN_PERM: like FAN_OPEN, but with the permission check.
* FAN_EVENT_ON_CHILD: receive notification of events on inodes inside
this subdirectory. (this is not a full recursive notification of all
descendants, only direct children)
* FAN_GLOBAL_LISTENER: notify for events on all files in the system.
* FAN_SURVIVE_MODIFY: special flag that ignores should survive inode
modification. Discussed below.
After the socket is bound events are attained using the read() syscall
(recv* probably also works haven't tested). This will result in the
buffer being filled with one or more events like this:
struct fanotify_event_metadata {
__u32 event_len;
__s32 fd;
__u32 mask;
__u32 f_flags;
__s32 pid;
__s32 tgid;
__u64 cookie;
} __attribute__((packed));
fd specifies the new file descriptor that was created in the context of
the listener. (readlink of /proc/self/fd will give you A pathname)
mask indicates the events type (bitwise OR of the event types listed
above). f_flags here is the f_flags the ORIGINAL process has the file
open with. pid and tgid are from the original process. cookie is used
when the listener needs to allow, deny, or delay the operation.
If a FAN_ACCESS_PERM or FAN_OPEN_PERM event is received the listener
must send a response before the 5 second timeout. If no response is
sent before the 5 second timeout the original operation is allowed. If
this happens too many times (10 in a row) the fanotify group is evicted
from the kernel and will not get any new events. Sending a response is
done using the setsockopt() call with the socket options set to
FANOTIFY_ACCESS_RESPONSE. The buffer should contain a structure like:
struct fanotify_so_access {
__u64 cookie;
__u32 response;
} __attribute__((packed));
Where cookie is the cookie from the notification and response is one of:
FAN_ALLOW: allow the original operation
FAN_DENY: deny the original operation
FAN_RESET_TIMEOUT: reset the timeout.
The last main interface is the 'marking' of inodes. The purpose of
inode marks differ between 'directed' and 'global' listeners. Directed
fanotify listeners need to mark inodes of interest. They do that also
using setsockopt() of type FANOTIFY_SET_MARK with the buffer containing
a structure like:
struct fanotify_so_inode_mark {
__s32 fd;
__u32 mask;
__u32 ignored_mask;
} __attribute__((packed));
Where fd is backed by the inode in question. Mask is the events of
interest (only used in directed mode) and ignored_mask is the mask of
events which should be ignored.
The ignored_mask is cleared every time an inode receives a modification
events unless FAN_SURVIVE_MODIFY is also set. The ignored_mask is
mainly used for 2 purposes. Global listeners may just have no interest
in lots of events, so they should spam inodes with an ignored mask. The
ignored mask is also used to 'cache' access decisions. If the listener
sets FAN_ACCESS_PERM in the ignored mask all access operations will be
permitted without the call out to userspace. If the inode is modified
the ignored_mask will be cleared and userspace will again have to
approve the access. If userspace REALLY doesn't care ever they can use
the special FAN_SURVIVE_MODIFY flag inside the ignored_mask.
The only other current interface is the ability to ignore events by
superblock magic number. This makes it easy to ignore all events
in /proc which can be difficult to accomplish firing FANOTIFY_SET_MARK
with ignored_masks over and over as processes are created and destroyed.
***********
Future direction:
There are 2 things I'm interested in adding.
- Rename events.
The updatedb/mlocate people are interested in fanotify as a means to
not thrash the harddrive every night. They could instead update the db
in real time as files are moved.
- subtree notification.
Currently to only watch /home and all of it's descendants one must
either register a directed watch on every directory or use a global
listener. The global listener with ignored_mask is not as bad as it
sounds in my testing, but decent subtree registration and notification
would be a big win in a lot of people's mind.
***********
Please, complaints? sortcomings? design flaws? issues? failures? How
can it be tweaked to suit your needs?
-Eric
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-24 20:13 fanotify - overall design before I start sending patches Eric Paris
@ 2009-07-24 20:48 ` david-gFPdbfVZQbY
[not found] ` <alpine.DEB.1.10.0907241340580.28013-Z4YwzcCRHZnr5h6Zg1Auow@public.gmane.org>
2009-07-24 21:00 ` Andreas Dilger
` (7 subsequent siblings)
8 siblings, 1 reply; 63+ messages in thread
From: david-gFPdbfVZQbY @ 2009-07-24 20:48 UTC (permalink / raw)
To: Eric Paris
Cc: a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Valdis.Kletnieks-PjAqaU27lzQ,
douglas.leeder-j34lQMj1tz/QT0dZR+AlfA,
malware-list-h+Im9A44IAFcMpApZELgcQ,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w, aviro-H+wXaHxf7aLQT0dZR+AlfA,
jack-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
jengelh-nopoi9nDyk+ELgA04lAiVw, hch-wEGCiKHe2LqWVfeAwA7xHQ,
pavel-AlSwsSmVLrQ, alexl-H+wXaHxf7aLQT0dZR+AlfA,
jcm-H+wXaHxf7aLQT0dZR+AlfA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
arjan-wEGCiKHe2LqWVfeAwA7xHQ
getting an open fd to the file is good for things like content scanning,
but for other things like a HSM re-populating the file, you would need to
pass the path used to open the file at open time. is this in the metadata
you are passing?
this currently does not give you a good way to listen for info on a
specific directory tree, you imply at the end that you could go global,
but without a path (which you will not have in some cases) it's impossible
to decide if you care about this file or not.
to avoid race conditions, you may want some way that a listener on a
directory can flag that it wants to also be a listener for all new
directories created under the one it is listening on.
with the PERM checks, can a checker respond within the 5 second window
with 'I need more time'? or does it need to complete all it's work in <5
seconds?
David Lang
On Fri, 24 Jul 2009, Eric Paris wrote:
> I plan to start sending patches for fanotify in the next week or two.
> I'd like to see more comments on the design, interface, and capabilities
> in case there is a recognized need for major reworks or if I'm not
> meeting some users needs (other than those noted at the end)
>
> git://git.infradead.org/users/eparis/notify.git fanotify-experimental
>
> should have working code to test what I'm talking about.
>
> What is fanotify?
>
> It is a new notification system that has a limited set of events (open,
> close, read, write) in which notification not only comes with metadata
> the describes what happened it also comes with an open file descriptor
> to the object in question. fanotify will also allow the listener to
> make access decisions on open and read events. This allows the
> implementation of hierarchical storage management systems or an access
> file scanning or integrity checking.
>
> fanotify comes in two flavors 'directed' and 'global.' 'Directed' is
> like inotify or dnotify in that you register specific inodes of interest
> and only get events pertaining to those inodes. Global means you are
> registering interest for event types system wide. With global mode the
> listener program can later exclude objects from future events.
>
> fanotify kernel/userspace interaction is over a new socket protocol. A
> listener opens a new socket in the new PF_FANOTIFY family. The socket
> is then bound to an address. Using the following struct:
>
> struct fanotify_addr {
> sa_family_t family;
> __u32 priority;
> __u32 group_num;
> __u32 mask;
> __u32 f_flags;
> __u32 unused[16];
> } __attribute__((packed));
>
> The priority field indicates in which order fanotify listeners will get
> events. Since 2 fanotify listeners would 'hear' each others events on
> the new fd they create fanotify listeners will not hear events generated
> by other fanotify listeners with a lower priority number.
>
> The group_num is at the moment not used, but the plan was to allow 2
> processes to bind to the same fanotify group and share the load of
> processing events.
>
> The f_flags is the flags which the fanotify listener wishes to use when
> opening their notification fds. On access scanners would want to use
> O_RDONLY, whereas HSM systems would need to use O_WRONLY.
>
> The mask is the indication of the events this group is interested in.
> The set of events of interest if FAN_GLOBAL_LISTENER is set at bind
> time. If FAN_GLOBAL_LISTENER is not set, this field is meaningless as
> the registration of events on individual inodes will dictate the
> reception of events.
>
> * FAN_ACCESS: every file access.
> * FAN_MODIFY: file modifications.
> * FAN_CLOSE: files are closed.
> * FAN_OPEN: open() calls.
> * FAN_ACCESS_PERM: like FAN_ACCESS, except that the process trying to
> access the file is put on hold while the fanotify client decides whether
> to allow the operation.
> * FAN_OPEN_PERM: like FAN_OPEN, but with the permission check.
> * FAN_EVENT_ON_CHILD: receive notification of events on inodes inside
> this subdirectory. (this is not a full recursive notification of all
> descendants, only direct children)
> * FAN_GLOBAL_LISTENER: notify for events on all files in the system.
> * FAN_SURVIVE_MODIFY: special flag that ignores should survive inode
> modification. Discussed below.
>
> After the socket is bound events are attained using the read() syscall
> (recv* probably also works haven't tested). This will result in the
> buffer being filled with one or more events like this:
>
> struct fanotify_event_metadata {
> __u32 event_len;
> __s32 fd;
> __u32 mask;
> __u32 f_flags;
> __s32 pid;
> __s32 tgid;
> __u64 cookie;
> } __attribute__((packed));
>
> fd specifies the new file descriptor that was created in the context of
> the listener. (readlink of /proc/self/fd will give you A pathname)
> mask indicates the events type (bitwise OR of the event types listed
> above). f_flags here is the f_flags the ORIGINAL process has the file
> open with. pid and tgid are from the original process. cookie is used
> when the listener needs to allow, deny, or delay the operation.
>
> If a FAN_ACCESS_PERM or FAN_OPEN_PERM event is received the listener
> must send a response before the 5 second timeout. If no response is
> sent before the 5 second timeout the original operation is allowed. If
> this happens too many times (10 in a row) the fanotify group is evicted
> from the kernel and will not get any new events. Sending a response is
> done using the setsockopt() call with the socket options set to
> FANOTIFY_ACCESS_RESPONSE. The buffer should contain a structure like:
>
> struct fanotify_so_access {
> __u64 cookie;
> __u32 response;
> } __attribute__((packed));
>
> Where cookie is the cookie from the notification and response is one of:
>
> FAN_ALLOW: allow the original operation
> FAN_DENY: deny the original operation
> FAN_RESET_TIMEOUT: reset the timeout.
>
> The last main interface is the 'marking' of inodes. The purpose of
> inode marks differ between 'directed' and 'global' listeners. Directed
> fanotify listeners need to mark inodes of interest. They do that also
> using setsockopt() of type FANOTIFY_SET_MARK with the buffer containing
> a structure like:
>
> struct fanotify_so_inode_mark {
> __s32 fd;
> __u32 mask;
> __u32 ignored_mask;
> } __attribute__((packed));
>
> Where fd is backed by the inode in question. Mask is the events of
> interest (only used in directed mode) and ignored_mask is the mask of
> events which should be ignored.
>
> The ignored_mask is cleared every time an inode receives a modification
> events unless FAN_SURVIVE_MODIFY is also set. The ignored_mask is
> mainly used for 2 purposes. Global listeners may just have no interest
> in lots of events, so they should spam inodes with an ignored mask. The
> ignored mask is also used to 'cache' access decisions. If the listener
> sets FAN_ACCESS_PERM in the ignored mask all access operations will be
> permitted without the call out to userspace. If the inode is modified
> the ignored_mask will be cleared and userspace will again have to
> approve the access. If userspace REALLY doesn't care ever they can use
> the special FAN_SURVIVE_MODIFY flag inside the ignored_mask.
>
> The only other current interface is the ability to ignore events by
> superblock magic number. This makes it easy to ignore all events
> in /proc which can be difficult to accomplish firing FANOTIFY_SET_MARK
> with ignored_masks over and over as processes are created and destroyed.
>
> ***********
>
> Future direction:
> There are 2 things I'm interested in adding.
> - Rename events.
> The updatedb/mlocate people are interested in fanotify as a means to
> not thrash the harddrive every night. They could instead update the db
> in real time as files are moved.
>
> - subtree notification.
> Currently to only watch /home and all of it's descendants one must
> either register a directed watch on every directory or use a global
> listener. The global listener with ignored_mask is not as bad as it
> sounds in my testing, but decent subtree registration and notification
> would be a big win in a lot of people's mind.
>
> ***********
>
> Please, complaints? sortcomings? design flaws? issues? failures? How
> can it be tweaked to suit your needs?
>
> -Eric
>
>
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-24 20:13 fanotify - overall design before I start sending patches Eric Paris
2009-07-24 20:48 ` david-gFPdbfVZQbY
@ 2009-07-24 21:00 ` Andreas Dilger
2009-07-24 21:21 ` Eric Paris
2009-07-24 22:48 ` Jamie Lokier
` (6 subsequent siblings)
8 siblings, 1 reply; 63+ messages in thread
From: Andreas Dilger @ 2009-07-24 21:00 UTC (permalink / raw)
To: Eric Paris
Cc: david-gFPdbfVZQbY, a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
Valdis.Kletnieks-PjAqaU27lzQ,
douglas.leeder-j34lQMj1tz/QT0dZR+AlfA,
malware-list-h+Im9A44IAFcMpApZELgcQ,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w, aviro-H+wXaHxf7aLQT0dZR+AlfA,
jack-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
jengelh-nopoi9nDyk+ELgA04lAiVw, hch-wEGCiKHe2LqWVfeAwA7xHQ,
pavel-AlSwsSmVLrQ, alexl-H+wXaHxf7aLQT0dZR+AlfA,
jcm-H+wXaHxf7aLQT0dZR+AlfA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
arjan-wEGCiKHe2LqWVfeAwA7xHQ
On Jul 24, 2009 16:13 -0400, Eric Paris wrote:
> fanotify kernel/userspace interaction is over a new socket protocol. A
> listener opens a new socket in the new PF_FANOTIFY family. The socket
> is then bound to an address. Using the following struct:
Would it make sense to use existing netlink?
> struct fanotify_addr {
> sa_family_t family;
> __u32 priority;
> __u32 group_num;
> __u32 mask;
> __u32 f_flags;
> __u32 unused[16];
> } __attribute__((packed));
>
> The mask is the indication of the events this group is interested in.
> The set of events of interest if FAN_GLOBAL_LISTENER is set at bind
> time. If FAN_GLOBAL_LISTENER is not set, this field is meaningless as
> the registration of events on individual inodes will dictate the
> reception of events.
>
> * FAN_ACCESS: every file access.
> * FAN_MODIFY: file modifications.
> * FAN_CLOSE: files are closed.
> * FAN_OPEN: open() calls.
> * FAN_ACCESS_PERM: like FAN_ACCESS, except that the process trying to
> access the file is put on hold while the fanotify client decides whether
> to allow the operation.
> * FAN_OPEN_PERM: like FAN_OPEN, but with the permission check.
> * FAN_EVENT_ON_CHILD: receive notification of events on inodes inside
> this subdirectory. (this is not a full recursive notification of all
> descendants, only direct children)
> * FAN_GLOBAL_LISTENER: notify for events on all files in the system.
> * FAN_SURVIVE_MODIFY: special flag that ignores should survive inode
> modification. Discussed below.
It seems like a 32-bit mask might not be enough, it wouldn't be hard
at this stage to add a 64-bit mask. Lustre has a similar mechanism
(changelog) that allows tracking all different kinds of filesystem
events (create/unlink/symlink/link/rename/mkdir/setxattr/etc), instead
of just open/close, also use by HSM, enhanced rsync, etc.
> struct fanotify_event_metadata {
> __u32 event_len;
> __s32 fd;
> __u32 mask;
> __u32 f_flags;
> __s32 pid;
> __s32 tgid;
> __u64 cookie;
> } __attribute__((packed));
Getting the attributes that have changed into this message is also
useful, as it avoids a continual stream of "stat" calls on the inodes.
The other thing that is important for HSM is that this log is atomic
and persistent, otherwise there may be files that are missed if the
node crashes. This involves creating atomic update records as part
of the filesystem operation, and then userspace consumes them and
tells the kernel that it is finished with records up to X. Otherwise
you risk inconsistencies between rsync/HSM/updatedb for files that
are updated just before a crash.
> If a FAN_ACCESS_PERM or FAN_OPEN_PERM event is received the listener
> must send a response before the 5 second timeout. If no response is
> sent before the 5 second timeout the original operation is allowed. If
> this happens too many times (10 in a row) the fanotify group is evicted
> from the kernel and will not get any new events.
This should be a tunable, since if the intent is to monitor PERM checks
it would be possible for users to DOS the machine and delay the userspace
programs and access files they shouldn't be able to.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <alpine.DEB.1.10.0907241340580.28013-Z4YwzcCRHZnr5h6Zg1Auow@public.gmane.org>
@ 2009-07-24 21:01 ` Eric Paris
2009-07-24 21:44 ` Jamie Lokier
0 siblings, 1 reply; 63+ messages in thread
From: Eric Paris @ 2009-07-24 21:01 UTC (permalink / raw)
To: david-gFPdbfVZQbY
Cc: a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, Valdis.Kletnieks-PjAqaU27lzQ,
douglas.leeder-j34lQMj1tz/QT0dZR+AlfA,
malware-list-h+Im9A44IAFcMpApZELgcQ,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w, aviro-H+wXaHxf7aLQT0dZR+AlfA,
jack-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
jengelh-nopoi9nDyk+ELgA04lAiVw, hch-wEGCiKHe2LqWVfeAwA7xHQ,
alexl-H+wXaHxf7aLQT0dZR+AlfA, jcm-H+wXaHxf7aLQT0dZR+AlfA,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
arjan-wEGCiKHe2LqWVfeAwA7xHQ
On Fri, 2009-07-24 at 13:48 -0700, david-gFPdbfVZQbY@public.gmane.org wrote:
> getting an open fd to the file is good for things like content scanning,
> but for other things like a HSM re-populating the file, you would need to
> pass the path used to open the file at open time. is this in the metadata
> you are passing?
No, I will NOT EVER pass a pathname. Period. End of story. I stated
the if userspace wants to deal with pathnames (and they understand the
system setup well enough to know if pathnames even make sense to them)
they can use readlink(2) on /proc/self/fd
> to avoid race conditions, you may want some way that a listener on a
> directory can flag that it wants to also be a listener for all new
> directories created under the one it is listening on.
Interesting way to get the subtree checking people want, you do the
registration yourself the first time on the entire hierarchy and new
directories will be automagically added. I could probably do that, I'll
have to look.
> with the PERM checks, can a checker respond within the 5 second window
> with 'I need more time'? or does it need to complete all it's work in <5
> seconds?
FAN_RESET_TIMEOUT
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-24 21:00 ` Andreas Dilger
@ 2009-07-24 21:21 ` Eric Paris
2009-07-24 22:42 ` Andreas Dilger
0 siblings, 1 reply; 63+ messages in thread
From: Eric Paris @ 2009-07-24 21:21 UTC (permalink / raw)
To: Andreas Dilger
Cc: linux-kernel, linux-fsdevel, malware-list, Valdis.Kletnieks, greg,
jcm, douglas.leeder, tytso, arjan, david, jengelh, aviro, mrkafk,
alexl, jack, tvrtko.ursulin, a.p.zijlstra, hch, alan, mmorley,
pavel
On Fri, 2009-07-24 at 15:00 -0600, Andreas Dilger wrote:
> On Jul 24, 2009 16:13 -0400, Eric Paris wrote:
> > fanotify kernel/userspace interaction is over a new socket protocol. A
> > listener opens a new socket in the new PF_FANOTIFY family. The socket
> > is then bound to an address. Using the following struct:
>
> Would it make sense to use existing netlink?
I looked at netlink, but because of the nature of the fact that fd
creation has to be done in the listener context I couldn't figure out
how to make it suitable.
> > struct fanotify_addr {
> > sa_family_t family;
> > __u32 priority;
> > __u32 group_num;
> > __u32 mask;
> > __u32 f_flags;
> > __u32 unused[16];
> > } __attribute__((packed));
> >
> > The mask is the indication of the events this group is interested in.
> > The set of events of interest if FAN_GLOBAL_LISTENER is set at bind
> > time. If FAN_GLOBAL_LISTENER is not set, this field is meaningless as
> > the registration of events on individual inodes will dictate the
> > reception of events.
> >
> > * FAN_ACCESS: every file access.
> > * FAN_MODIFY: file modifications.
> > * FAN_CLOSE: files are closed.
> > * FAN_OPEN: open() calls.
> > * FAN_ACCESS_PERM: like FAN_ACCESS, except that the process trying to
> > access the file is put on hold while the fanotify client decides whether
> > to allow the operation.
> > * FAN_OPEN_PERM: like FAN_OPEN, but with the permission check.
> > * FAN_EVENT_ON_CHILD: receive notification of events on inodes inside
> > this subdirectory. (this is not a full recursive notification of all
> > descendants, only direct children)
> > * FAN_GLOBAL_LISTENER: notify for events on all files in the system.
> > * FAN_SURVIVE_MODIFY: special flag that ignores should survive inode
> > modification. Discussed below.
>
> It seems like a 32-bit mask might not be enough, it wouldn't be hard
> at this stage to add a 64-bit mask. Lustre has a similar mechanism
> (changelog) that allows tracking all different kinds of filesystem
> events (create/unlink/symlink/link/rename/mkdir/setxattr/etc), instead
> of just open/close, also use by HSM, enhanced rsync, etc.
I had a 64 bit mask, but Al Viro ask me to go back to a 32 bit mask
because of i386 register pressure. The bitmask operations are on VERY
hot paths inside the kernel.
> > struct fanotify_event_metadata {
> > __u32 event_len;
> > __s32 fd;
> > __u32 mask;
> > __u32 f_flags;
> > __s32 pid;
> > __s32 tgid;
> > __u64 cookie;
> > } __attribute__((packed));
>
> Getting the attributes that have changed into this message is also
> useful, as it avoids a continual stream of "stat" calls on the inodes.
Hmmm, I'll take a look. Do you have a good example of what you would
want to see? I don't think we know in the notification hooks what
actually is being changed :(
> The other thing that is important for HSM is that this log is atomic
> and persistent, otherwise there may be files that are missed if the
> node crashes. This involves creating atomic update records as part
> of the filesystem operation, and then userspace consumes them and
> tells the kernel that it is finished with records up to X. Otherwise
> you risk inconsistencies between rsync/HSM/updatedb for files that
> are updated just before a crash.
Uhhh, persistent across a crash? Nope, don't have that. Notification
is all in memory. Can't I just put the onus on userspace to recheck
things maybe? Sounds like a user for i_version....
> > If a FAN_ACCESS_PERM or FAN_OPEN_PERM event is received the listener
> > must send a response before the 5 second timeout. If no response is
> > sent before the 5 second timeout the original operation is allowed. If
> > this happens too many times (10 in a row) the fanotify group is evicted
> > from the kernel and will not get any new events.
>
> This should be a tunable, since if the intent is to monitor PERM checks
> it would be possible for users to DOS the machine and delay the userspace
> programs and access files they shouldn't be able to.
At the moment I cheat and say root only to bind. I do plan to open it
up to non-root users after it's in and working, but I'm seriously
considering leaving _PERM events as root only. It's hard to map the
original to listener security implications. So making sure the listener
is always root is easy :)
Userspace would never be able to access a file it shouldn't be allowed
to (the new fd is created in the context of the listener and EPERM is
possible.)
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-24 21:01 ` Eric Paris
@ 2009-07-24 21:44 ` Jamie Lokier
2009-07-27 17:52 ` Evgeniy Polyakov
0 siblings, 1 reply; 63+ messages in thread
From: Jamie Lokier @ 2009-07-24 21:44 UTC (permalink / raw)
To: Eric Paris
Cc: david, linux-kernel, linux-fsdevel, malware-list,
Valdis.Kletnieks, greg, jcm, douglas.leeder, tytso, arjan,
jengelh, aviro, mrkafk, alexl, jack, tvrtko.ursulin, a.p.zijlstra,
hch, alan, mmorley
Eric Paris wrote:
> On Fri, 2009-07-24 at 13:48 -0700, david@lang.hm wrote:
> > getting an open fd to the file is good for things like content scanning,
> > but for other things like a HSM re-populating the file, you would need to
> > pass the path used to open the file at open time. is this in the metadata
> > you are passing?
>
> No, I will NOT EVER pass a pathname. Period. End of story. I stated
> the if userspace wants to deal with pathnames (and they understand the
> system setup well enough to know if pathnames even make sense to them)
> they can use readlink(2) on /proc/self/fd
That makes sense.
In most cases where events trigger userspace cache or index updates,
userspace already has enough information to calculate the path (and
any derived data) from the inode number (in the case of non-hard-link
files) or from the inode number of the parent directory and the name
(not full path).
So it wouldn't even need to call readlink(2), provided those bits of
information are passed in the event.
That is one thing which inotify _nearly_ gets right. Nearly, because
it doesn't pass the inode number when you're watching a directory, and
watching every inode is too expensive.
> > to avoid race conditions, you may want some way that a listener on a
> > directory can flag that it wants to also be a listener for all new
> > directories created under the one it is listening on.
>
> Interesting way to get the subtree checking people want, you do the
> registration yourself the first time on the entire hierarchy and new
> directories will be automagically added. I could probably do that, I'll
> have to look.
Yes, automagically adding directories is essential, otherwise they can
be added, and someone can populate them with files that have some
effect before userspace gets a chance to scan them.
The other part of useful subtree notification is getting notifications
for a subtree without having to initially scan the whole hierachy
,which can take a long time as well as a huge amount of unnecessary
seeking and I/O.
The third part, which by the way is really recommended for security
applications, is persistence across umount/reboot/mount. That can be
done either by assuming there are no filesystem changes when userspace
isn't watching it, or by the simple expedient of letting userspace add
an xattr to things it has indexed, with a specially recognised name
that is automatically removed whenever the file/directory is changed.
-- Jamie
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-24 21:21 ` Eric Paris
@ 2009-07-24 22:42 ` Andreas Dilger
2009-07-24 23:01 ` Jamie Lokier
0 siblings, 1 reply; 63+ messages in thread
From: Andreas Dilger @ 2009-07-24 22:42 UTC (permalink / raw)
To: Eric Paris
Cc: linux-kernel, linux-fsdevel, malware-list, Valdis.Kletnieks, greg,
jcm, douglas.leeder, tytso, arjan, david, jengelh, aviro, mrkafk,
alexl, jack, tvrtko.ursulin, a.p.zijlstra, hch, alan, mmorley,
pavel
On Jul 24, 2009 17:21 -0400, Eric Paris wrote:
> On Fri, 2009-07-24 at 15:00 -0600, Andreas Dilger wrote:
> > On Jul 24, 2009 16:13 -0400, Eric Paris wrote:
> > It seems like a 32-bit mask might not be enough, it wouldn't be hard
> > at this stage to add a 64-bit mask. Lustre has a similar mechanism
> > (changelog) that allows tracking all different kinds of filesystem
> > events (create/unlink/symlink/link/rename/mkdir/setxattr/etc), instead
> > of just open/close, also use by HSM, enhanced rsync, etc.
>
> I had a 64 bit mask, but Al Viro ask me to go back to a 32 bit mask
> because of i386 register pressure. The bitmask operations are on VERY
> hot paths inside the kernel.
How about adding a spare "__u32 mask_hi" for future use, so that it can
be changed directly into a __u64 on LE machines? That preserves the
extensibility for the future, without hitting performance on 32-bit
machines before it is needed.
> > > struct fanotify_event_metadata {
> > > __u32 event_len;
> > > __s32 fd;
> > > __u32 mask;
> > > __u32 f_flags;
> > > __s32 pid;
> > > __s32 tgid;
> > > __u64 cookie;
> > > } __attribute__((packed));
> >
> > Getting the attributes that have changed into this message is also
> > useful, as it avoids a continual stream of "stat" calls on the inodes.
>
> Hmmm, I'll take a look. Do you have a good example of what you would
> want to see? I don't think we know in the notification hooks what
> actually is being changed :(
Well, I'm thinking there will be a lot of events that some applications
will not care about (e.g. PERM checks where the user is only changing
the file mode, vs. PERM checks where the owner of the file is changing).
Even if the old attributes are not available, having a mask of which
fields in the inode changed, and struct stat64 would be very useful.
> > The other thing that is important for HSM is that this log is atomic
> > and persistent, otherwise there may be files that are missed if the
> > node crashes. This involves creating atomic update records as part
> > of the filesystem operation, and then userspace consumes them and
> > tells the kernel that it is finished with records up to X. Otherwise
> > you risk inconsistencies between rsync/HSM/updatedb for files that
> > are updated just before a crash.
>
> Uhhh, persistent across a crash? Nope, don't have that. Notification
> is all in memory. Can't I just put the onus on userspace to recheck
> things maybe? Sounds like a user for i_version....
Well, if new files are created then userspace won't have any idea which
inodes need to be checked, and it will also need to keep a persistent
database of all file i_version values. If you are trying to hook a
backup tool onto such an interface and files created persistently on
disk before a crash are not handled, then they may never be backed up.
Tools like inotify are fine for desktop window refresh and similar uses,
but for applications which require robust handling they also need to
work over a crash.
The other issue is that you might get quite a large queue of operations
in memory, and if this can't be saved to the filesystem then it might
result in OOMing itself.
> > > If a FAN_ACCESS_PERM or FAN_OPEN_PERM event is received the listener
> > > must send a response before the 5 second timeout. If no response is
> > > sent before the 5 second timeout the original operation is allowed. If
> > > this happens too many times (10 in a row) the fanotify group is evicted
> > > from the kernel and will not get any new events.
> >
> > This should be a tunable, since if the intent is to monitor PERM checks
> > it would be possible for users to DOS the machine and delay the userspace
> > programs and access files they shouldn't be able to.
>
> At the moment I cheat and say root only to bind. I do plan to open it
> up to non-root users after it's in and working, but I'm seriously
> considering leaving _PERM events as root only. It's hard to map the
> original to listener security implications. So making sure the listener
> is always root is easy :)
My comment has nothing to do with non-root access. It has to do with
how long the userspace watcher has to handle an event. If a regular user
is running a 50-thread iozone with a 1M file directory you can imagine it
will create a lot of events to watch, along with a lot of seeking to slow
down the processing of events. If the user then does "open(secretfile)"
(where your _PERM check is doing something useful) it is possible
that the userspace listener will time out and miss some events.
> Userspace would never be able to access a file it shouldn't be allowed
> to (the new fd is created in the context of the listener and EPERM is
> possible.)
Ah, so the _PERM check is only intended to grant extra access, instead
of restricting it? That should be made clear in the documentation that
doing the opposite is an easily-bypassed security vulnerability.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-24 20:13 fanotify - overall design before I start sending patches Eric Paris
2009-07-24 20:48 ` david-gFPdbfVZQbY
2009-07-24 21:00 ` Andreas Dilger
@ 2009-07-24 22:48 ` Jamie Lokier
[not found] ` <20090724224813.GK27755-yetKDKU6eevNLxjTenLetw@public.gmane.org>
2009-07-25 14:22 ` Niraj kumar
` (5 subsequent siblings)
8 siblings, 1 reply; 63+ messages in thread
From: Jamie Lokier @ 2009-07-24 22:48 UTC (permalink / raw)
To: Eric Paris
Cc: linux-kernel, linux-fsdevel, malware-list, Valdis.Kletnieks, greg,
jcm, douglas.leeder, tytso, arjan, david, jengelh, aviro, mrkafk,
alexl, jack, tvrtko.ursulin, a.p.zijlstra, hch, alan, mmorley,
pavel
Eric Paris wrote:
> It is a new notification system that has a limited set of events (open,
> close, read, write) in which notification not only comes with metadata
> the describes what happened it also comes with an open file descriptor
> to the object in question. fanotify will also allow the listener to
> make access decisions on open and read events. This allows the
> implementation of hierarchical storage management systems or an access
> file scanning or integrity checking.
My first thought was to wonder, why not make it the same set of events
that inotify and dnotify provide? That is: open, close, read, write,
create, delete, rename, attribute change? In other words, I don't see
a good reason for it to be a subset of events.
Apart from aesthetics (which is my first thought), creating, renaming
and deleting files and symlinks also has security implications on
typical Linux systems. Since that fanotify is motivated by security
applications among other things, surely those type of events are of
interest too?
For example, just as you have the power to block a file open request
from some application, you may also need the power to block a
symlink(2) request.
> fanotify comes in two flavors 'directed' and 'global.' 'Directed' is
> like inotify or dnotify in that you register specific inodes of interest
> and only get events pertaining to those inodes. Global means you are
> registering interest for event types system wide. With global mode the
> listener program can later exclude objects from future events.
On a large multi-user system with, say, 10k users in /home and 100
logged in at any time, if you want to monitor the files in
/var/lib/ftp/some.ftp.site/, neither 'directed' nor 'global' are going
to be efficient.
Similarly, if you have 'enhanced rsync' as someone else has mentioned
(good example), it will want to monitor /home/me/kernels/2.6 only,
without slowing down the system when any of the other 4 million files
in /home/me are accessed.
I appreciate fanotify does not try to be perfect for every
application. But if we can make it handle a few more things in a more
scalable way without much code, and a clean interface too, that can
only be good.
> fanotify kernel/userspace interaction is over a new socket protocol. A
> listener opens a new socket in the new PF_FANOTIFY family. The socket
> is then bound to an address. Using the following struct:
>
> struct fanotify_addr {
> sa_family_t family;
> __u32 priority;
> __u32 group_num;
> __u32 mask;
> __u32 f_flags;
> __u32 unused[16];
> } __attribute__((packed));
>
> The priority field indicates in which order fanotify listeners will get
> events. Since 2 fanotify listeners would 'hear' each others events on
> the new fd they create fanotify listeners will not hear events generated
> by other fanotify listeners with a lower priority number.
I'm not sure if I understand the priority mechanism. If it means that
events are only delivered to the highest priority listener, that makes
the fanotify subsystem virtually useless for things like 'enhanced
rsync' which someone else has mentioned. Those programs need to know
they will receive all events, not miss some events when another
program is running.
But maybe I misunderstood the priority mechanism?
> The group_num is at the moment not used, but the plan was to allow 2
> processes to bind to the same fanotify group and share the load of
> processing events.
That's an interesting idea. I like it.
Couldn't both processes simply read from the same socket, so you
wouldn't need group_num? I think that would be cleaner and simpler.
For example, look at how Apache waits for incoming connections:
multiple processes call accept() on the same socket, and exactly one
process is woken with each new connection. This is quite efficient.
You could do the same: have each process read from the same socket,
blocking until there is an event, and only send the event to one of
the waiting processes.
It is important that the kernel code to handle reads dequeues events
in each process efficiently, without the "thundering herd" problem
(look it up, Apache used to have it with accept()).
> The f_flags is the flags which the fanotify listener wishes to use when
> opening their notification fds. On access scanners would want to use
> O_RDONLY, whereas HSM systems would need to use O_WRONLY.
Interesting. An option for file change trackers who don't care about
the open file descriptor would be good too. Perhaps they are just
logging.
> The mask is the indication of the events this group is interested in.
> The set of events of interest if FAN_GLOBAL_LISTENER is set at bind
> time. If FAN_GLOBAL_LISTENER is not set, this field is meaningless as
> the registration of events on individual inodes will dictate the
> reception of events.
>
> * FAN_ACCESS: every file access.
> * FAN_MODIFY: file modifications.
> * FAN_CLOSE: files are closed.
> * FAN_OPEN: open() calls.
> * FAN_ACCESS_PERM: like FAN_ACCESS, except that the process trying to
> access the file is put on hold while the fanotify client decides whether
> to allow the operation.
> * FAN_OPEN_PERM: like FAN_OPEN, but with the permission check.
> * FAN_EVENT_ON_CHILD: receive notification of events on inodes inside
> this subdirectory. (this is not a full recursive notification of all
> descendants, only direct children)
> * FAN_GLOBAL_LISTENER: notify for events on all files in the system.
> * FAN_SURVIVE_MODIFY: special flag that ignores should survive inode
> modification. Discussed below.
>
> After the socket is bound events are attained using the read() syscall
> (recv* probably also works haven't tested). This will result in the
> buffer being filled with one or more events like this:
>
> struct fanotify_event_metadata {
> __u32 event_len;
> __s32 fd;
> __u32 mask;
> __u32 f_flags;
> __s32 pid;
> __s32 tgid;
> __u64 cookie;
> } __attribute__((packed));
>
> fd specifies the new file descriptor that was created in the context of
> the listener. (readlink of /proc/self/fd will give you A pathname)
> mask indicates the events type (bitwise OR of the event types listed
> above). f_flags here is the f_flags the ORIGINAL process has the file
> open with. pid and tgid are from the original process. cookie is used
> when the listener needs to allow, deny, or delay the operation.
So far it looks quite similar to inotify, with some differences.
Some things taken away:
- Very similar events, but missing a few like renames (which you
are thinking of adding).
- No file name for things that happen in a subdirectory.
Application expected to call readlink("/proc/self/fd") if it
cares about the file name. But that won't work for every kind of
event!
Some things (useful I agree) added:
- Returns an open file descriptor to the affected file.
- Returns some other attributes, like accessing pid/tgid (uid though?).
- Can block the process trying to access the file.
API-wise, is there a particular reason for using a new socket
interface, rather than extending the inotify interface with a few more
flags and a different event structure?
By the way, you may not know the history of inotify originally. It
used a device, /dev/inotify, when it was a third-party patch. To get
into the mainline kernel, it was requested that it be changed to use
system calls. The same happened to epoll. So you may have better
luck with a system call interface than using a socket. That shouldn't
affect discussions of any other technical aspect, though.
> If a FAN_ACCESS_PERM or FAN_OPEN_PERM event is received the listener
> must send a response before the 5 second timeout. If no response is
> sent before the 5 second timeout the original operation is allowed. If
> this happens too many times (10 in a row) the fanotify group is evicted
> from the kernel and will not get any new events. Sending a response is
> done using the setsockopt() call with the socket options set to
> FANOTIFY_ACCESS_RESPONSE. The buffer should contain a structure like:
>
> struct fanotify_so_access {
> __u64 cookie;
> __u32 response;
> } __attribute__((packed));
>
> Where cookie is the cookie from the notification and response is one of:
What happens when a process sends a cookie that it did not receive,
but another process received it?
> FAN_ALLOW: allow the original operation
> FAN_DENY: deny the original operation
> FAN_RESET_TIMEOUT: reset the timeout.
>
> The last main interface is the 'marking' of inodes. The purpose of
> inode marks differ between 'directed' and 'global' listeners. Directed
> fanotify listeners need to mark inodes of interest. They do that also
> using setsockopt() of type FANOTIFY_SET_MARK with the buffer containing
> a structure like:
>
> struct fanotify_so_inode_mark {
> __s32 fd;
> __u32 mask;
> __u32 ignored_mask;
> } __attribute__((packed));
>
> Where fd is backed by the inode in question. Mask is the events of
> interest (only used in directed mode) and ignored_mask is the mask of
> events which should be ignored.
It's hard to see how this differs much from inotify_add_watch, except
- is this mark global to all processes, or local to the process
setting the mark?
> The ignored_mask is cleared every time an inode receives a modification
> events unless FAN_SURVIVE_MODIFY is also set. The ignored_mask is
> mainly used for 2 purposes. Global listeners may just have no interest
> in lots of events, so they should spam inodes with an ignored mask. The
> ignored mask is also used to 'cache' access decisions. If the listener
> sets FAN_ACCESS_PERM in the ignored mask all access operations will be
> permitted without the call out to userspace. If the inode is modified
> the ignored_mask will be cleared and userspace will again have to
> approve the access. If userspace REALLY doesn't care ever they can use
> the special FAN_SURVIVE_MODIFY flag inside the ignored_mask.
I do like the idea of caching access decisions. Are these flags
global to the whole system, or local to the listening process setting
the flags (or to the specific listener's socket)?
> The only other current interface is the ability to ignore events by
> superblock magic number. This makes it easy to ignore all events
> in /proc which can be difficult to accomplish firing FANOTIFY_SET_MARK
> with ignored_masks over and over as processes are created and destroyed.
>
> ***********
>
> Future direction:
Here's one more thing which may be needed to make hard guarantees for
security applications:
- Mount events, which it would be natural for fanotify to block
temporarily while it assesses the impact and/or synchronises it's
map of the mounts against the change. Mounts do change the set
of visible files, after all.
> There are 2 things I'm interested in adding.
> - Rename events.
> The updatedb/mlocate people are interested in fanotify as a means to
> not thrash the harddrive every night. They could instead update the db
> in real time as files are moved.
Great!
I'm interested in the same thing on narrower (but still large)
subdirectories, for things like enhanced rsync, make, git, indexing,
and complex caching of compiled things. You get the idea: it has a
lot of uses.
> - subtree notification.
> Currently to only watch /home and all of it's descendants one must
> either register a directed watch on every directory or use a global
> listener. The global listener with ignored_mask is not as bad as it
> sounds in my testing, but decent subtree registration and notification
> would be a big win in a lot of people's mind.
I believe we've talked about one suggestion for how to do this, on
lwn.net. I'll repeat it here.
Efficient recursive notifications method:
- You register for event on a directory with a RECURSIVE flag "give
me events for this directory and all paths below it".
- That listener gets events for any access of the appropriate type
whose path is via that directory, *using the specific run-time
path used for the access*.
- That _doesn't_ mean hard-link files need to know all their parent
directories, which would be silly and impossible. The event path
is just the one used at run-time for access, by the application
attempting to open/write/whatever.
- If a listener needs to track all accesses to a particular
hard-linked file, it's the responsibility of the listener to
ensure it listens to enough directories to cover every path to
that file - or listen to the file directly. It knows from
i_nlink and the mount map when it has enough directories.
- Notifying just the access path may seem counterintuitive, but in
fact it's what inotify and dnotify do already, and it does
actually work. Often a listener is maintaining a cache or index
of some kind, in which case it will already have sufficient
knowledge about where the hard-linked files are (or know that it
needs an initial indexing), and whether it has covered enough
parent directories to see all accesses to them.
- In practice it means each access traverses the path, following
parent directories until reaching a mount point, broadcasting
events on each one where there's a recursive listener. That's
not as inefficient as it looks, because paths don't usually have
a large number of components.
- I'm not sure exactly how fast/slow it is, though, and it may a
few thoughtfully cached flags in each dentry to elide traversals.
I won't discuss the details here, for fear of complicating the
discussion too much. They might well mesh with the 'access
decision cache' flags you mentioned.
- It is necessary that link(2) create an attribute-change event
(for i_nlink!) on the source path of the link. dnotify/inotify
don't do that now (unless they changed recently), but they should
to make this work.
Please shoot down the idea. I think it is good enough
for reliable subtree notifications, but I'd love to be proven wrong.
-- Jamie
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-24 22:42 ` Andreas Dilger
@ 2009-07-24 23:01 ` Jamie Lokier
0 siblings, 0 replies; 63+ messages in thread
From: Jamie Lokier @ 2009-07-24 23:01 UTC (permalink / raw)
To: Andreas Dilger
Cc: Eric Paris, linux-kernel, linux-fsdevel, malware-list,
Valdis.Kletnieks, greg, jcm, douglas.leeder, tytso, arjan, david,
jengelh, aviro, mrkafk, alexl, jack, tvrtko.ursulin, a.p.zijlstra,
hch, alan, mmorley, pavel
Andreas Dilger wrote:
> On Jul 24, 2009 17:21 -0400, Eric Paris wrote:
> > On Fri, 2009-07-24 at 15:00 -0600, Andreas Dilger wrote:
> > > On Jul 24, 2009 16:13 -0400, Eric Paris wrote:
> > > It seems like a 32-bit mask might not be enough, it wouldn't be hard
> > > at this stage to add a 64-bit mask. Lustre has a similar mechanism
> > > (changelog) that allows tracking all different kinds of filesystem
> > > events (create/unlink/symlink/link/rename/mkdir/setxattr/etc), instead
> > > of just open/close, also use by HSM, enhanced rsync, etc.
> >
> > I had a 64 bit mask, but Al Viro ask me to go back to a 32 bit mask
> > because of i386 register pressure. The bitmask operations are on VERY
> > hot paths inside the kernel.
>
> How about adding a spare "__u32 mask_hi" for future use, so that it can
> be changed directly into a __u64 on LE machines? That preserves the
> extensibility for the future, without hitting performance on 32-bit
> machines before it is needed.
If so, remember to put "__u32 mask_hi" *before* "__u32 mask" on BE
32-bit machines. Then it can be changed to __u64 on those too, if
needed.
> Well, if new files are created then userspace won't have any idea which
> inodes need to be checked, and it will also need to keep a persistent
> database of all file i_version values. If you are trying to hook a
> backup tool onto such an interface and files created persistently on
> disk before a crash are not handled, then they may never be backed up.
>
> Tools like inotify are fine for desktop window refresh and similar uses,
> but for applications which require robust handling they also need to
> work over a crash.
I see two ways to handle that:
- Simply assert that the monitoring program is running whenever
there are any changes to a particular filesystem, or the program
is told that it must reindex as a matter of policy.
For example you might run the program before mounting and after
unmounting, so you know there are no changes at other times.
That's not hard security, but then neither is i_version or any
other check, as root (which is responsible for the mount/umount
sequence after all) can also bypass filesystems.
- Have a well-known extended attribute (xattr) or set of them which
are _always deleted_ whenever files are modified. For example
"system.indexing.*". An application called Foo Monitor would
create "system.indexing.foo" xattrs on files prior to indexing
each one.
Each time a MODIFY event occurs on a file, whether it's being
watched or not, the kernel would remove all attributes whose
names match "system.indexing.*".
That includes recursively doing the same to parent directories
via the path used for that access, all the way to the filesystem
root (even if the root isn't visible due to mounting). (See my
other recent mail for why a single path works for hard-linked files).
Tools can look at those xattrs to determine if their indexing
information is up to date - persistently across crashes, unmounts
and reboots.
> The other issue is that you might get quite a large queue of operations
> in memory, and if this can't be saved to the filesystem then it might
> result in OOMing itself.
:-) What does inotify do in this scenario?
-- Jamie
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <20090724224813.GK27755-yetKDKU6eevNLxjTenLetw@public.gmane.org>
@ 2009-07-24 23:25 ` Eric Paris
2009-07-24 23:46 ` Jamie Lokier
2009-07-24 23:49 ` Eric Paris
2009-07-27 16:54 ` Jan Kara
2 siblings, 1 reply; 63+ messages in thread
From: Eric Paris @ 2009-07-24 23:25 UTC (permalink / raw)
To: Jamie Lokier
Cc: david-gFPdbfVZQbY, a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
Valdis.Kletnieks-PjAqaU27lzQ,
douglas.leeder-j34lQMj1tz/QT0dZR+AlfA,
malware-list-h+Im9A44IAFcMpApZELgcQ,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w, aviro-H+wXaHxf7aLQT0dZR+AlfA,
jack-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
jengelh-nopoi9nDyk+ELgA04lAiVw, hch-wEGCiKHe2LqWVfeAwA7xHQ,
pavel-AlSwsSmVLrQ, alexl-H+wXaHxf7aLQT0dZR+AlfA,
jcm-H+wXaHxf7aLQT0dZR+AlfA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
arjan-wEGCiKHe2LqWVfeAwA7xHQ
On Fri, 2009-07-24 at 23:48 +0100, Jamie Lokier wrote:
> Eric Paris wrote:
> > It is a new notification system that has a limited set of events (open,
> > close, read, write) in which notification not only comes with metadata
> > the describes what happened it also comes with an open file descriptor
> > to the object in question. fanotify will also allow the listener to
> > make access decisions on open and read events. This allows the
> > implementation of hierarchical storage management systems or an access
> > file scanning or integrity checking.
>
> My first thought was to wonder, why not make it the same set of events
> that inotify and dnotify provide? That is: open, close, read, write,
> create, delete, rename, attribute change? In other words, I don't see
> a good reason for it to be a subset of events.
The two real reasons?
1) These were the only 4 my original use case cared about.
2) These are the only 4 where the notification hook has enough
information to open a fd in the context of the listener.
In the kernel most notification is done with either an inode or a dentry
as that is enough for inotify, dnotify, audit_watch and audit_tree.
Opening a file descriptor, and thus fanotify, requires a dentry and a
vfsmnt, which is much harder to come by in the kernel.
Maybe as future work I'll try to convince Al to allow me to have that
information in more places, but for today, those 4 are the only ones I
can probably slip past him...
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-24 23:25 ` Eric Paris
@ 2009-07-24 23:46 ` Jamie Lokier
0 siblings, 0 replies; 63+ messages in thread
From: Jamie Lokier @ 2009-07-24 23:46 UTC (permalink / raw)
To: Eric Paris
Cc: linux-kernel, linux-fsdevel, malware-list, Valdis.Kletnieks, greg,
jcm, douglas.leeder, tytso, arjan, david, jengelh, aviro, mrkafk,
alexl, jack, tvrtko.ursulin, a.p.zijlstra, hch, alan, mmorley,
pavel
Eric Paris wrote:
> On Fri, 2009-07-24 at 23:48 +0100, Jamie Lokier wrote:
> > Eric Paris wrote:
> > > It is a new notification system that has a limited set of events (open,
> > > close, read, write) in which notification not only comes with metadata
> > > the describes what happened it also comes with an open file descriptor
> > > to the object in question. fanotify will also allow the listener to
> > > make access decisions on open and read events. This allows the
> > > implementation of hierarchical storage management systems or an access
> > > file scanning or integrity checking.
> >
> > My first thought was to wonder, why not make it the same set of events
> > that inotify and dnotify provide? That is: open, close, read, write,
> > create, delete, rename, attribute change? In other words, I don't see
> > a good reason for it to be a subset of events.
>
> The two real reasons?
>
> 1) These were the only 4 my original use case cared about.
> 2) These are the only 4 where the notification hook has enough
> information to open a fd in the context of the listener.
>
> In the kernel most notification is done with either an inode or a dentry
> as that is enough for inotify, dnotify, audit_watch and audit_tree.
> Opening a file descriptor, and thus fanotify, requires a dentry and a
> vfsmnt, which is much harder to come by in the kernel.
>
> Maybe as future work I'll try to convince Al to allow me to have that
> information in more places, but for today, those 4 are the only ones I
> can probably slip past him...
For the other events, maybe there is no need for a file descriptor
anyway.
-- Jamie
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <20090724224813.GK27755-yetKDKU6eevNLxjTenLetw@public.gmane.org>
2009-07-24 23:25 ` Eric Paris
@ 2009-07-24 23:49 ` Eric Paris
2009-07-25 0:29 ` Jamie Lokier
2009-07-27 16:54 ` Jan Kara
2 siblings, 1 reply; 63+ messages in thread
From: Eric Paris @ 2009-07-24 23:49 UTC (permalink / raw)
To: Jamie Lokier
Cc: david-gFPdbfVZQbY, a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
Valdis.Kletnieks-PjAqaU27lzQ,
douglas.leeder-j34lQMj1tz/QT0dZR+AlfA,
malware-list-h+Im9A44IAFcMpApZELgcQ,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w, aviro-H+wXaHxf7aLQT0dZR+AlfA,
jack-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
jengelh-nopoi9nDyk+ELgA04lAiVw, hch-wEGCiKHe2LqWVfeAwA7xHQ,
pavel-AlSwsSmVLrQ, alexl-H+wXaHxf7aLQT0dZR+AlfA,
jcm-H+wXaHxf7aLQT0dZR+AlfA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
arjan-wEGCiKHe2LqWVfeAwA7xHQ
On Fri, 2009-07-24 at 23:48 +0100, Jamie Lokier wrote:
> Eric Paris wrote:
> > fanotify kernel/userspace interaction is over a new socket protocol. A
> > listener opens a new socket in the new PF_FANOTIFY family. The socket
> > is then bound to an address. Using the following struct:
> >
> > struct fanotify_addr {
> > sa_family_t family;
> > __u32 priority;
> > __u32 group_num;
> > __u32 mask;
> > __u32 f_flags;
> > __u32 unused[16];
> > } __attribute__((packed));
> >
> > The priority field indicates in which order fanotify listeners will get
> > events. Since 2 fanotify listeners would 'hear' each others events on
> > the new fd they create fanotify listeners will not hear events generated
> > by other fanotify listeners with a lower priority number.
>
> I'm not sure if I understand the priority mechanism. If it means that
> events are only delivered to the highest priority listener, that makes
> the fanotify subsystem virtually useless for things like 'enhanced
> rsync' which someone else has mentioned. Those programs need to know
> they will receive all events, not miss some events when another
> program is running.
>
> But maybe I misunderstood the priority mechanism?
The priority mechanism ONLY excludes events generated by processes which
have an open fanotify listener. (someone is going to barf when the see
that patch)
The problem was basically.
Process A opens [file]
Listener 1 gets the event about A and opens [file]
Listener 2 gets the event about A and opens [file]
Listener 1 gets the event about 2 and opens [file]
Listener 2 gets the event about 1 and opens [file]
Listener 1 gets the event about 2 and opens [file]
..... shit, see how this keeps going forever?
Now that I type it out there might be some horribleness left since my
solution was to only send events caused by 3 to listeners with priority
< 3. So given 3 listeners and the above situation we get
Process A opens [file]
Listener 1 gets the event about A and opens [file]
Listener 2 gets the event about A and opens [file]
Listener 3 gets the event about A and opens [file]
Listener 1 gets the event about 2 and opens [file]
Listener 1 gets the event about 3 and opens [file]
Listener 2 gets the event about 3 and opens [file]
Listener 1 gets the event about 2 and opens [file]
done.
But maybe I should jsut do the 'if you have fanotify open, you don't
create other fanotify events'... so everyone gets what they expect...
>
> > The f_flags is the flags which the fanotify listener wishes to use when
> > opening their notification fds. On access scanners would want to use
> > O_RDONLY, whereas HSM systems would need to use O_WRONLY.
>
> Interesting. An option for file change trackers who don't care about
> the open file descriptor would be good too. Perhaps they are just
> logging.
No open fd would be pretty worthless, you'd know 'some file opened' but
you wouldn't know what file :) The open fd is the whole point of
fanotify.
> > fd specifies the new file descriptor that was created in the context of
> > the listener. (readlink of /proc/self/fd will give you A pathname)
> > mask indicates the events type (bitwise OR of the event types listed
> > above). f_flags here is the f_flags the ORIGINAL process has the file
> > open with. pid and tgid are from the original process. cookie is used
> > when the listener needs to allow, deny, or delay the operation.
>
> So far it looks quite similar to inotify, with some differences.
> Some things taken away:
>
> - Very similar events, but missing a few like renames (which you
> are thinking of adding).
> - No file name for things that happen in a subdirectory.
Actually I should be more clear about that. If you call
setsockopt(FANOTIFY_ADD_MARK) where
struct fanotify_so_inode_mark {
__s32 fd; = "/tmp/"
__u32 mask; = (FAN_OPEN | FAN_EVENT_ON_CHILD);
__u32 ignored_mask; = 0
};
and someone opens /tmp/file1 you are going to get an open fd
for /tmp/file1 NOT for /tmp. This is different than inotify.
> Application expected to call readlink("/proc/self/fd") if it
> cares about the file name. But that won't work for every kind of
> event!
It does, since I only give you events where it works :)
> Some things (useful I agree) added:
>
> - Returns an open file descriptor to the affected file.
> - Returns some other attributes, like accessing pid/tgid (uid though?).
> - Can block the process trying to access the file.
>
> API-wise, is there a particular reason for using a new socket
> interface, rather than extending the inotify interface with a few more
> flags and a different event structure?
Since they are syscalls, they pretty much suck to change (setsockopt is
SOOOOO much more versionable)
> So you may have better
> luck with a system call interface than using a socket.
I'm going on the suggestion of Alan Cox, but honestly the interface is
clearly segregated, so it can be changed if there is a better idea....
> > struct fanotify_so_access {
> > __u64 cookie;
> > __u32 response;
> > } __attribute__((packed));
> >
> > Where cookie is the cookie from the notification and response is one of:
>
> What happens when a process sends a cookie that it did not receive,
> but another process received it?
Cookie's are specific to your fanotify socket. If you response with an
invalid cookie the setsockopt() call returns -EINVAL;
>
> > FAN_ALLOW: allow the original operation
> > FAN_DENY: deny the original operation
> > FAN_RESET_TIMEOUT: reset the timeout.
> >
> > The last main interface is the 'marking' of inodes. The purpose of
> > inode marks differ between 'directed' and 'global' listeners. Directed
> > fanotify listeners need to mark inodes of interest. They do that also
> > using setsockopt() of type FANOTIFY_SET_MARK with the buffer containing
> > a structure like:
> >
> > struct fanotify_so_inode_mark {
> > __s32 fd;
> > __u32 mask;
> > __u32 ignored_mask;
> > } __attribute__((packed));
> >
> > Where fd is backed by the inode in question. Mask is the events of
> > interest (only used in directed mode) and ignored_mask is the mask of
> > events which should be ignored.
>
> It's hard to see how this differs much from inotify_add_watch, except
> - is this mark global to all processes, or local to the process
> setting the mark?
there are a LOT of similarities. bind() is a lot like inotify_init().
adding a mark is a lot like inotify_add_watch().....
as in inotify setting a mark only applies to the socket it was
associated with....
> > The ignored_mask is cleared every time an inode receives a modification
> > events unless FAN_SURVIVE_MODIFY is also set. The ignored_mask is
> > mainly used for 2 purposes. Global listeners may just have no interest
> > in lots of events, so they should spam inodes with an ignored mask. The
> > ignored mask is also used to 'cache' access decisions. If the listener
> > sets FAN_ACCESS_PERM in the ignored mask all access operations will be
> > permitted without the call out to userspace. If the inode is modified
> > the ignored_mask will be cleared and userspace will again have to
> > approve the access. If userspace REALLY doesn't care ever they can use
> > the special FAN_SURVIVE_MODIFY flag inside the ignored_mask.
>
> I do like the idea of caching access decisions. Are these flags
> global to the whole system, or local to the listening process setting
> the flags (or to the specific listener's socket)?
socket.
> > The only other current interface is the ability to ignore events by
> > superblock magic number. This makes it easy to ignore all events
> > in /proc which can be difficult to accomplish firing FANOTIFY_SET_MARK
> > with ignored_masks over and over as processes are created and destroyed.
> >
> > ***********
> >
> > Future direction:
>
> Here's one more thing which may be needed to make hard guarantees for
> security applications:
>
> - Mount events, which it would be natural for fanotify to block
> temporarily while it assesses the impact and/or synchronises it's
> map of the mounts against the change. Mounts do change the set
> of visible files, after all.
>
> > There are 2 things I'm interested in adding.
> > - Rename events.
> > The updatedb/mlocate people are interested in fanotify as a means to
> > not thrash the harddrive every night. They could instead update the db
> > in real time as files are moved.
>
> Great!
>
> I'm interested in the same thing on narrower (but still large)
> subdirectories, for things like enhanced rsync, make, git, indexing,
> and complex caching of compiled things. You get the idea: it has a
> lot of uses.
>
> > - subtree notification.
> > Currently to only watch /home and all of it's descendants one must
> > either register a directed watch on every directory or use a global
> > listener. The global listener with ignored_mask is not as bad as it
> > sounds in my testing, but decent subtree registration and notification
> > would be a big win in a lot of people's mind.
>
> I believe we've talked about one suggestion for how to do this, on
> lwn.net. I'll repeat it here.
>
> Efficient recursive notifications method:
>
> - You register for event on a directory with a RECURSIVE flag "give
> me events for this directory and all paths below it".
>
> - That listener gets events for any access of the appropriate type
> whose path is via that directory, *using the specific run-time
> path used for the access*.
>
> - That _doesn't_ mean hard-link files need to know all their parent
> directories, which would be silly and impossible. The event path
> is just the one used at run-time for access, by the application
> attempting to open/write/whatever.
>
> - If a listener needs to track all accesses to a particular
> hard-linked file, it's the responsibility of the listener to
> ensure it listens to enough directories to cover every path to
> that file - or listen to the file directly. It knows from
> i_nlink and the mount map when it has enough directories.
>
> - Notifying just the access path may seem counterintuitive, but in
> fact it's what inotify and dnotify do already, and it does
> actually work. Often a listener is maintaining a cache or index
> of some kind, in which case it will already have sufficient
> knowledge about where the hard-linked files are (or know that it
> needs an initial indexing), and whether it has covered enough
> parent directories to see all accesses to them.
>
> - In practice it means each access traverses the path, following
> parent directories until reaching a mount point, broadcasting
> events on each one where there's a recursive listener. That's
> not as inefficient as it looks, because paths don't usually have
> a large number of components.
>
> - I'm not sure exactly how fast/slow it is, though, and it may a
> few thoughtfully cached flags in each dentry to elide traversals.
> I won't discuss the details here, for fear of complicating the
> discussion too much. They might well mesh with the 'access
> decision cache' flags you mentioned.
>
> - It is necessary that link(2) create an attribute-change event
> (for i_nlink!) on the source path of the link. dnotify/inotify
> don't do that now (unless they changed recently), but they should
> to make this work.
>
> Please shoot down the idea. I think it is good enough
> for reliable subtree notifications, but I'd love to be proven wrong.
>
> -- Jamie
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-24 23:49 ` Eric Paris
@ 2009-07-25 0:29 ` Jamie Lokier
2009-07-27 18:33 ` Andreas Dilger
2009-07-29 20:07 ` Eric Paris
0 siblings, 2 replies; 63+ messages in thread
From: Jamie Lokier @ 2009-07-25 0:29 UTC (permalink / raw)
To: Eric Paris
Cc: linux-kernel, linux-fsdevel, malware-list, Valdis.Kletnieks, greg,
jcm, douglas.leeder, tytso, arjan, david, jengelh, aviro, mrkafk,
alexl, jack, tvrtko.ursulin, a.p.zijlstra, hch, alan, mmorley,
pavel
Eric Paris wrote:
> But maybe I should jsut do the 'if you have fanotify open, you don't
> create other fanotify events'... so everyone gets what they expect...
O_NONOTIFY. Similar security concerns, more control.
The security concern is clear: If you allow a process with fanotify
open to not create events, then any (root) process can open a fanotify
socket to hide it's behaviour.
> > - No file name for things that happen in a subdirectory.
>
> Actually I should be more clear about that. If you call
> setsockopt(FANOTIFY_ADD_MARK) where
>
> struct fanotify_so_inode_mark {
> __s32 fd; = "/tmp/"
> __u32 mask; = (FAN_OPEN | FAN_EVENT_ON_CHILD);
> __u32 ignored_mask; = 0
> };
>
> and someone opens /tmp/file1 you are going to get an open fd
> for /tmp/file1 NOT for /tmp. This is different than inotify.
If you inotify for IN_OPEN on /tmp, it would return an event when you
open /tmp/file1, with the name "file1" and the directory /tmp. It's
no so different. The main difference is inotify returns a name and no
inode number (so it's racy, sigh), whereas fanotify returns an open file.
Do I see right that you need to open the directory before you can set
the mark on it?
The main reason behind inotify's design wasn't the API (although it is
better than dnotify); it was to avoid having to open thousands of
directories, and to allow a filesystem to be unmounted while it is
being watched.
Does a fanotify mark stop a filesystem from being unmounted? If not,
if the filesystem is unmounted and remounted, is the mark lost?
-- Jamie
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-24 20:13 fanotify - overall design before I start sending patches Eric Paris
` (2 preceding siblings ...)
2009-07-24 22:48 ` Jamie Lokier
@ 2009-07-25 14:22 ` Niraj kumar
2009-07-29 20:08 ` Eric Paris
2009-07-28 11:48 ` Jon Masters
` (4 subsequent siblings)
8 siblings, 1 reply; 63+ messages in thread
From: Niraj kumar @ 2009-07-25 14:22 UTC (permalink / raw)
To: Eric Paris; +Cc: linux-kernel, linux-fsdevel
>
> - subtree notification.
> Currently to only watch /home and all of it's descendants one must
> either register a directed watch on every directory or use a global
> listener. The global listener with ignored_mask is not as bad as it
> sounds in my testing, but decent subtree registration and notification
> would be a big win in a lot of people's mind.
Unless it's already covered in some way, I would also be interested
in notification based on "process subtree". What this means is that
for whatever notification a particular process requests, it's only
interested in events generated by itself and it's children.
This is useful in doing auditing for file system related access.
-Niraj
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <20090724224813.GK27755-yetKDKU6eevNLxjTenLetw@public.gmane.org>
2009-07-24 23:25 ` Eric Paris
2009-07-24 23:49 ` Eric Paris
@ 2009-07-27 16:54 ` Jan Kara
2 siblings, 0 replies; 63+ messages in thread
From: Jan Kara @ 2009-07-27 16:54 UTC (permalink / raw)
To: Jamie Lokier
Cc: jack-AlSwsSmVLrQ, jengelh-nopoi9nDyk+ELgA04lAiVw,
pavel-AlSwsSmVLrQ, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
malware-list-h+Im9A44IAFcMpApZELgcQ,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w, hch-wEGCiKHe2LqWVfeAwA7xHQ,
a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, jcm-H+wXaHxf7aLQT0dZR+AlfA,
alexl-H+wXaHxf7aLQT0dZR+AlfA, arjan-wEGCiKHe2LqWVfeAwA7xHQ,
david-gFPdbfVZQbY, Valdis.Kletnieks-PjAqaU27lzQ,
douglas.leeder-j34lQMj1tz/QT0dZR+AlfA,
aviro-H+wXaHxf7aLQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
[-- Attachment #1: Type: text/plain, Size: 4129 bytes --]
On Fri 24-07-09 23:48:14, Jamie Lokier wrote:
> > - subtree notification.
> > Currently to only watch /home and all of it's descendants one must
> > either register a directed watch on every directory or use a global
> > listener. The global listener with ignored_mask is not as bad as it
> > sounds in my testing, but decent subtree registration and notification
> > would be a big win in a lot of people's mind.
>
> I believe we've talked about one suggestion for how to do this, on
> lwn.net. I'll repeat it here.
>
> Efficient recursive notifications method:
>
> - You register for event on a directory with a RECURSIVE flag "give
> me events for this directory and all paths below it".
>
> - That listener gets events for any access of the appropriate type
> whose path is via that directory, *using the specific run-time
> path used for the access*.
>
> - That _doesn't_ mean hard-link files need to know all their parent
> directories, which would be silly and impossible. The event path
> is just the one used at run-time for access, by the application
> attempting to open/write/whatever.
>
> - If a listener needs to track all accesses to a particular
> hard-linked file, it's the responsibility of the listener to
> ensure it listens to enough directories to cover every path to
> that file - or listen to the file directly. It knows from
> i_nlink and the mount map when it has enough directories.
>
> - Notifying just the access path may seem counterintuitive, but in
> fact it's what inotify and dnotify do already, and it does
> actually work. Often a listener is maintaining a cache or index
> of some kind, in which case it will already have sufficient
> knowledge about where the hard-linked files are (or know that it
> needs an initial indexing), and whether it has covered enough
> parent directories to see all accesses to them.
>
> - In practice it means each access traverses the path, following
> parent directories until reaching a mount point, broadcasting
> events on each one where there's a recursive listener. That's
> not as inefficient as it looks, because paths don't usually have
> a large number of components.
>
> - I'm not sure exactly how fast/slow it is, though, and it may a
> few thoughtfully cached flags in each dentry to elide traversals.
> I won't discuss the details here, for fear of complicating the
> discussion too much. They might well mesh with the 'access
> decision cache' flags you mentioned.
>
> - It is necessary that link(2) create an attribute-change event
> (for i_nlink!) on the source path of the link. dnotify/inotify
> don't do that now (unless they changed recently), but they should
> to make this work.
About two years ago, I had a similar idea for a lightweight persistent
recursive modification. I even have a proof-of-concept patch against 2.6.23
(attached to get an idea) which works nicely. I've aimed at things like
efficient backup or desktop indexing which are interested in processing
lots of changes in a batch once in a longer period of time... Actually I
believe this kind of use is quite different from the kind of use fanotify
aims at and maybe different approaches even make sence here... My approach
is only able to give the information "something in the subtree has changed"
via an inode flag in the directory inode and the application has to track
down what exactly it was (by recursively looking on the flags of the
subdirectories and stating regular files). The benefit is it's rather
scalable I believe.
Generally the trouble with this approach is that one has to handle
hardlinks, bind mounts and filesystems which don't support persistent
storage of your attributes. It's all doable but tricky, and I'm still
trying to get all the details right in a shared library wrapping up the
kernel feature (well, one of the problems also is I get to this for only a
few days a year :().
Honza
--
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
SUSE Labs, CR
[-- Attachment #2: ext3-2.6.23-1-i_flags_atomicity.diff --]
[-- Type: text/x-patch, Size: 9501 bytes --]
Implement atomic updates of EXT3_I(inode)->i_flags. So far the i_flags access
was guarded mostly by i_mutex but this is quite heavy-weight. We now use
inode->i_lock to protect i_flags reading and updates in ext3. This patch
introduces a bogus warning that jflag and oldflags may be uninitialized -
anyone knows how to cleanly get rid of it?
Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
diff -rupX /home/jack/.kerndiffexclude linux-2.6.23/fs/ext3/dir.c linux-2.6.23-1-i_flags_atomicity/fs/ext3/dir.c
--- linux-2.6.23/fs/ext3/dir.c 2007-10-11 12:01:23.000000000 +0200
+++ linux-2.6.23-1-i_flags_atomicity/fs/ext3/dir.c 2007-11-05 14:04:56.000000000 +0100
@@ -108,10 +108,10 @@ static int ext3_readdir(struct file * fi
sb = inode->i_sb;
#ifdef CONFIG_EXT3_INDEX
- if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb,
- EXT3_FEATURE_COMPAT_DIR_INDEX) &&
- ((EXT3_I(inode)->i_flags & EXT3_INDEX_FL) ||
- ((inode->i_size >> sb->s_blocksize_bits) == 1))) {
+ if (is_dx(inode) ||
+ (EXT3_HAS_COMPAT_FEATURE(inode->i_sb, \
+ EXT3_FEATURE_COMPAT_DIR_INDEX) &&
+ (inode->i_size >> sb->s_blocksize_bits) == 1)) {
err = ext3_dx_readdir(filp, dirent, filldir);
if (err != ERR_BAD_DX_DIR) {
ret = err;
@@ -121,7 +121,9 @@ static int ext3_readdir(struct file * fi
* We don't set the inode dirty flag since it's not
* critical that it get flushed back to the disk.
*/
+ spin_lock(&inode->i_lock);
EXT3_I(filp->f_path.dentry->d_inode)->i_flags &= ~EXT3_INDEX_FL;
+ spin_unlock(&inode->i_lock);
}
#endif
stored = 0;
diff -rupX /home/jack/.kerndiffexclude linux-2.6.23/fs/ext3/ialloc.c linux-2.6.23-1-i_flags_atomicity/fs/ext3/ialloc.c
--- linux-2.6.23/fs/ext3/ialloc.c 2006-11-29 22:57:37.000000000 +0100
+++ linux-2.6.23-1-i_flags_atomicity/fs/ext3/ialloc.c 2007-11-05 14:14:50.000000000 +0100
@@ -278,7 +278,7 @@ static int find_group_orlov(struct super
ndirs = percpu_counter_read_positive(&sbi->s_dirs_counter);
if ((parent == sb->s_root->d_inode) ||
- (EXT3_I(parent)->i_flags & EXT3_TOPDIR_FL)) {
+ ext3_test_inode_flags(parent, EXT3_TOPDIR_FL)) {
int best_ndir = inodes_per_group;
int best_group = -1;
@@ -566,7 +566,11 @@ got:
ei->i_dir_start_lookup = 0;
ei->i_disksize = 0;
+ /* Guard reading of directory's i_flags, created inode is safe as
+ * noone has a reference to it yet */
+ spin_lock(&dir->i_lock);
ei->i_flags = EXT3_I(dir)->i_flags & ~EXT3_INDEX_FL;
+ spin_unlock(&dir->i_lock);
if (S_ISLNK(mode))
ei->i_flags &= ~(EXT3_IMMUTABLE_FL|EXT3_APPEND_FL);
/* dirsync only applies to directories */
diff -rupX /home/jack/.kerndiffexclude linux-2.6.23/fs/ext3/inode.c linux-2.6.23-1-i_flags_atomicity/fs/ext3/inode.c
--- linux-2.6.23/fs/ext3/inode.c 2007-10-11 12:01:23.000000000 +0200
+++ linux-2.6.23-1-i_flags_atomicity/fs/ext3/inode.c 2007-11-05 14:24:39.000000000 +0100
@@ -2557,8 +2557,10 @@ int ext3_get_inode_loc(struct inode *ino
void ext3_set_inode_flags(struct inode *inode)
{
- unsigned int flags = EXT3_I(inode)->i_flags;
+ unsigned int flags;
+ spin_lock(&inode->i_lock);
+ flags = EXT3_I(inode)->i_flags;
inode->i_flags &= ~(S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC);
if (flags & EXT3_SYNC_FL)
inode->i_flags |= S_SYNC;
@@ -2570,13 +2572,16 @@ void ext3_set_inode_flags(struct inode *
inode->i_flags |= S_NOATIME;
if (flags & EXT3_DIRSYNC_FL)
inode->i_flags |= S_DIRSYNC;
+ spin_unlock(&inode->i_lock);
}
/* Propagate flags from i_flags to EXT3_I(inode)->i_flags */
void ext3_get_inode_flags(struct ext3_inode_info *ei)
{
- unsigned int flags = ei->vfs_inode.i_flags;
+ unsigned int flags;
+ spin_lock(&ei->vfs_inode.i_lock);
+ flags = ei->vfs_inode.i_flags;
ei->i_flags &= ~(EXT3_SYNC_FL|EXT3_APPEND_FL|
EXT3_IMMUTABLE_FL|EXT3_NOATIME_FL|EXT3_DIRSYNC_FL);
if (flags & S_SYNC)
@@ -2589,6 +2594,7 @@ void ext3_get_inode_flags(struct ext3_in
ei->i_flags |= EXT3_NOATIME_FL;
if (flags & S_DIRSYNC)
ei->i_flags |= EXT3_DIRSYNC_FL;
+ spin_unlock(&ei->vfs_inode.i_lock);
}
void ext3_read_inode(struct inode * inode)
@@ -2781,7 +2787,9 @@ static int ext3_do_update_inode(handle_t
raw_inode->i_mtime = cpu_to_le32(inode->i_mtime.tv_sec);
raw_inode->i_blocks = cpu_to_le32(inode->i_blocks);
raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
+ spin_lock(&inode->i_lock);
raw_inode->i_flags = cpu_to_le32(ei->i_flags);
+ spin_unlock(&inode->i_lock);
#ifdef EXT3_FRAGMENTS
raw_inode->i_faddr = cpu_to_le32(ei->i_faddr);
raw_inode->i_frag = ei->i_frag_no;
@@ -3209,10 +3217,12 @@ int ext3_change_inode_journal_flag(struc
* the inode's in-core data-journaling state flag now.
*/
+ spin_lock(&inode->i_lock);
if (val)
EXT3_I(inode)->i_flags |= EXT3_JOURNAL_DATA_FL;
else
EXT3_I(inode)->i_flags &= ~EXT3_JOURNAL_DATA_FL;
+ spin_unlock(&inode->i_lock);
ext3_set_aops(inode);
journal_unlock_updates(journal);
diff -rupX /home/jack/.kerndiffexclude linux-2.6.23/fs/ext3/ioctl.c linux-2.6.23-1-i_flags_atomicity/fs/ext3/ioctl.c
--- linux-2.6.23/fs/ext3/ioctl.c 2007-10-11 12:01:23.000000000 +0200
+++ linux-2.6.23-1-i_flags_atomicity/fs/ext3/ioctl.c 2007-11-05 14:32:12.000000000 +0100
@@ -29,7 +29,9 @@ int ext3_ioctl (struct inode * inode, st
switch (cmd) {
case EXT3_IOC_GETFLAGS:
ext3_get_inode_flags(ei);
+ spin_lock(&inode->i_lock);
flags = ei->i_flags & EXT3_FL_USER_VISIBLE;
+ spin_unlock(&inode->i_lock);
return put_user(flags, (int __user *) arg);
case EXT3_IOC_SETFLAGS: {
handle_t *handle = NULL;
@@ -51,10 +53,19 @@ int ext3_ioctl (struct inode * inode, st
flags &= ~EXT3_DIRSYNC_FL;
mutex_lock(&inode->i_mutex);
- oldflags = ei->i_flags;
+ handle = ext3_journal_start(inode, 1);
+ if (IS_ERR(handle)) {
+ mutex_unlock(&inode->i_mutex);
+ return PTR_ERR(handle);
+ }
+ if (IS_SYNC(inode))
+ handle->h_sync = 1;
+ err = ext3_reserve_inode_write(handle, inode, &iloc);
+ if (err)
+ goto flags_err;
- /* The JOURNAL_DATA flag is modifiable only by root */
- jflag = flags & EXT3_JOURNAL_DATA_FL;
+ spin_lock(&inode->i_lock);
+ oldflags = ei->i_flags;
/*
* The IMMUTABLE and APPEND_ONLY flags can only be changed by
@@ -64,8 +75,9 @@ int ext3_ioctl (struct inode * inode, st
*/
if ((flags ^ oldflags) & (EXT3_APPEND_FL | EXT3_IMMUTABLE_FL)) {
if (!capable(CAP_LINUX_IMMUTABLE)) {
- mutex_unlock(&inode->i_mutex);
- return -EPERM;
+ spin_unlock(&inode->i_lock);
+ err = -EPERM;
+ goto flags_err;
}
}
@@ -73,28 +85,19 @@ int ext3_ioctl (struct inode * inode, st
* The JOURNAL_DATA flag can only be changed by
* the relevant capability.
*/
+ jflag = flags & EXT3_JOURNAL_DATA_FL;
if ((jflag ^ oldflags) & (EXT3_JOURNAL_DATA_FL)) {
if (!capable(CAP_SYS_RESOURCE)) {
- mutex_unlock(&inode->i_mutex);
- return -EPERM;
+ spin_unlock(&inode->i_lock);
+ err = -EPERM;
+ goto flags_err;
}
}
-
- handle = ext3_journal_start(inode, 1);
- if (IS_ERR(handle)) {
- mutex_unlock(&inode->i_mutex);
- return PTR_ERR(handle);
- }
- if (IS_SYNC(inode))
- handle->h_sync = 1;
- err = ext3_reserve_inode_write(handle, inode, &iloc);
- if (err)
- goto flags_err;
-
flags = flags & EXT3_FL_USER_MODIFIABLE;
flags |= oldflags & ~EXT3_FL_USER_MODIFIABLE;
ei->i_flags = flags;
+ spin_unlock(&inode->i_lock);
ext3_set_inode_flags(inode);
inode->i_ctime = CURRENT_TIME_SEC;
diff -rupX /home/jack/.kerndiffexclude linux-2.6.23/include/linux/ext3_fs.h linux-2.6.23-1-i_flags_atomicity/include/linux/ext3_fs.h
--- linux-2.6.23/include/linux/ext3_fs.h 2007-07-16 17:47:28.000000000 +0200
+++ linux-2.6.23-1-i_flags_atomicity/include/linux/ext3_fs.h 2007-11-05 14:31:44.000000000 +0100
@@ -514,6 +514,17 @@ static inline int ext3_valid_inum(struct
(ino >= EXT3_FIRST_INO(sb) &&
ino <= le32_to_cpu(EXT3_SB(sb)->s_es->s_inodes_count));
}
+
+static inline unsigned int ext3_test_inode_flags(struct inode *inode, u32 flags)
+{
+ unsigned int ret;
+
+ spin_lock(&inode->i_lock);
+ ret = EXT3_I(inode)->i_flags & flags;
+ spin_unlock(&inode->i_lock);
+ return ret;
+}
+
#else
/* Assume that user mode programs are passing in an ext3fs superblock, not
* a kernel struct super_block. This will allow us to call the feature-test
@@ -666,9 +677,18 @@ struct ext3_dir_entry_2 {
*/
#ifdef CONFIG_EXT3_INDEX
- #define is_dx(dir) (EXT3_HAS_COMPAT_FEATURE(dir->i_sb, \
- EXT3_FEATURE_COMPAT_DIR_INDEX) && \
- (EXT3_I(dir)->i_flags & EXT3_INDEX_FL))
+static inline int is_dx(struct inode *dir)
+{
+ int ret = 0;
+
+ if (EXT3_HAS_COMPAT_FEATURE(dir->i_sb, \
+ EXT3_FEATURE_COMPAT_DIR_INDEX)) {
+ spin_lock(&dir->i_lock);
+ ret = EXT3_I(dir)->i_flags & EXT3_INDEX_FL;
+ spin_unlock(&dir->i_lock);
+ }
+ return ret;
+}
#define EXT3_DIR_LINK_MAX(dir) (!is_dx(dir) && (dir)->i_nlink >= EXT3_LINK_MAX)
#define EXT3_DIR_LINK_EMPTY(dir) ((dir)->i_nlink == 2 || (dir)->i_nlink == 1)
#else
diff -rupX /home/jack/.kerndiffexclude linux-2.6.23/include/linux/ext3_fs_i.h linux-2.6.23-1-i_flags_atomicity/include/linux/ext3_fs_i.h
--- linux-2.6.23/include/linux/ext3_fs_i.h 2007-07-16 17:47:28.000000000 +0200
+++ linux-2.6.23-1-i_flags_atomicity/include/linux/ext3_fs_i.h 2007-11-05 14:26:43.000000000 +0100
@@ -69,7 +69,7 @@ struct ext3_block_alloc_info {
*/
struct ext3_inode_info {
__le32 i_data[15]; /* unconverted */
- __u32 i_flags;
+ __u32 i_flags; /* Guarded by inode->i_lock */
#ifdef EXT3_FRAGMENTS
__u32 i_faddr;
__u8 i_frag_no;
[-- Attachment #3: ext3-2.6.23-2-make_frags_unused.diff --]
[-- Type: text/x-patch, Size: 5689 bytes --]
Make space reserved for fragments as unused as they were never implemented.
Remove also related initializations.
Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
diff -rupX /home/jack/.kerndiffexclude linux-2.6.23-1-i_flags_atomicity/fs/ext3/ialloc.c linux-2.6.23-2-make_flags_unused/fs/ext3/ialloc.c
--- linux-2.6.23-1-i_flags_atomicity/fs/ext3/ialloc.c 2007-11-05 14:14:50.000000000 +0100
+++ linux-2.6.23-2-make_flags_unused/fs/ext3/ialloc.c 2007-11-05 14:37:33.000000000 +0100
@@ -576,11 +576,6 @@ got:
/* dirsync only applies to directories */
if (!S_ISDIR(mode))
ei->i_flags &= ~EXT3_DIRSYNC_FL;
-#ifdef EXT3_FRAGMENTS
- ei->i_faddr = 0;
- ei->i_frag_no = 0;
- ei->i_frag_size = 0;
-#endif
ei->i_file_acl = 0;
ei->i_dir_acl = 0;
ei->i_dtime = 0;
diff -rupX /home/jack/.kerndiffexclude linux-2.6.23-1-i_flags_atomicity/fs/ext3/inode.c linux-2.6.23-2-make_flags_unused/fs/ext3/inode.c
--- linux-2.6.23-1-i_flags_atomicity/fs/ext3/inode.c 2007-11-05 14:24:39.000000000 +0100
+++ linux-2.6.23-2-make_flags_unused/fs/ext3/inode.c 2007-11-05 14:38:05.000000000 +0100
@@ -2651,11 +2651,6 @@ void ext3_read_inode(struct inode * inod
}
inode->i_blocks = le32_to_cpu(raw_inode->i_blocks);
ei->i_flags = le32_to_cpu(raw_inode->i_flags);
-#ifdef EXT3_FRAGMENTS
- ei->i_faddr = le32_to_cpu(raw_inode->i_faddr);
- ei->i_frag_no = raw_inode->i_frag;
- ei->i_frag_size = raw_inode->i_fsize;
-#endif
ei->i_file_acl = le32_to_cpu(raw_inode->i_file_acl);
if (!S_ISREG(inode->i_mode)) {
ei->i_dir_acl = le32_to_cpu(raw_inode->i_dir_acl);
@@ -2790,11 +2785,6 @@ static int ext3_do_update_inode(handle_t
spin_lock(&inode->i_lock);
raw_inode->i_flags = cpu_to_le32(ei->i_flags);
spin_unlock(&inode->i_lock);
-#ifdef EXT3_FRAGMENTS
- raw_inode->i_faddr = cpu_to_le32(ei->i_faddr);
- raw_inode->i_frag = ei->i_frag_no;
- raw_inode->i_fsize = ei->i_frag_size;
-#endif
raw_inode->i_file_acl = cpu_to_le32(ei->i_file_acl);
if (!S_ISREG(inode->i_mode)) {
raw_inode->i_dir_acl = cpu_to_le32(ei->i_dir_acl);
diff -rupX /home/jack/.kerndiffexclude linux-2.6.23-1-i_flags_atomicity/fs/ext3/super.c linux-2.6.23-2-make_flags_unused/fs/ext3/super.c
--- linux-2.6.23-1-i_flags_atomicity/fs/ext3/super.c 2007-11-05 15:04:19.000000000 +0100
+++ linux-2.6.23-2-make_flags_unused/fs/ext3/super.c 2007-11-05 15:01:37.000000000 +0100
@@ -1584,17 +1584,7 @@ static int ext3_fill_super (struct super
goto failed_mount;
}
}
- sbi->s_frag_size = EXT3_MIN_FRAG_SIZE <<
- le32_to_cpu(es->s_log_frag_size);
- if (blocksize != sbi->s_frag_size) {
- printk(KERN_ERR
- "EXT3-fs: fragsize %lu != blocksize %u (unsupported)\n",
- sbi->s_frag_size, blocksize);
- goto failed_mount;
- }
- sbi->s_frags_per_block = 1;
sbi->s_blocks_per_group = le32_to_cpu(es->s_blocks_per_group);
- sbi->s_frags_per_group = le32_to_cpu(es->s_frags_per_group);
sbi->s_inodes_per_group = le32_to_cpu(es->s_inodes_per_group);
if (EXT3_INODE_SIZE(sb) == 0)
goto cantfind_ext3;
@@ -1618,12 +1608,6 @@ static int ext3_fill_super (struct super
sbi->s_blocks_per_group);
goto failed_mount;
}
- if (sbi->s_frags_per_group > blocksize * 8) {
- printk (KERN_ERR
- "EXT3-fs: #fragments per group too big: %lu\n",
- sbi->s_frags_per_group);
- goto failed_mount;
- }
if (sbi->s_inodes_per_group > blocksize * 8) {
printk (KERN_ERR
"EXT3-fs: #inodes per group too big: %lu\n",
diff -rupX /home/jack/.kerndiffexclude linux-2.6.23-1-i_flags_atomicity/include/linux/ext3_fs.h linux-2.6.23-2-make_flags_unused/include/linux/ext3_fs.h
--- linux-2.6.23-1-i_flags_atomicity/include/linux/ext3_fs.h 2007-11-05 14:31:44.000000000 +0100
+++ linux-2.6.23-2-make_flags_unused/include/linux/ext3_fs.h 2007-11-05 14:37:33.000000000 +0100
@@ -291,27 +291,24 @@ struct ext3_inode {
__le32 i_generation; /* File version (for NFS) */
__le32 i_file_acl; /* File ACL */
__le32 i_dir_acl; /* Directory ACL */
- __le32 i_faddr; /* Fragment address */
+ __le32 i_obsolete_faddr; /* Unused */
union {
struct {
- __u8 l_i_frag; /* Fragment number */
- __u8 l_i_fsize; /* Fragment size */
+ __u16 l_i_obsolete_frag; /* Unused */
__u16 i_pad1;
__le16 l_i_uid_high; /* these 2 fields */
__le16 l_i_gid_high; /* were reserved2[0] */
__u32 l_i_reserved2;
} linux2;
struct {
- __u8 h_i_frag; /* Fragment number */
- __u8 h_i_fsize; /* Fragment size */
+ __u16 h_i_obsolete_frag; /* Unused */
__u16 h_i_mode_high;
__u16 h_i_uid_high;
__u16 h_i_gid_high;
__u32 h_i_author;
} hurd2;
struct {
- __u8 m_i_frag; /* Fragment number */
- __u8 m_i_fsize; /* Fragment size */
+ __u16 m_i_obsolete_frag; /* Unused */
__u16 m_pad1;
__u32 m_i_reserved2[2];
} masix2;
diff -rupX /home/jack/.kerndiffexclude linux-2.6.23-1-i_flags_atomicity/include/linux/ext3_fs_sb.h linux-2.6.23-2-make_flags_unused/include/linux/ext3_fs_sb.h
--- linux-2.6.23-1-i_flags_atomicity/include/linux/ext3_fs_sb.h 2007-10-11 12:01:28.000000000 +0200
+++ linux-2.6.23-2-make_flags_unused/include/linux/ext3_fs_sb.h 2007-11-05 14:50:55.000000000 +0100
@@ -28,10 +28,7 @@
* third extended-fs super-block data in memory
*/
struct ext3_sb_info {
- unsigned long s_frag_size; /* Size of a fragment in bytes */
- unsigned long s_frags_per_block;/* Number of fragments per block */
unsigned long s_inodes_per_block;/* Number of inodes per block */
- unsigned long s_frags_per_group;/* Number of fragments in a group */
unsigned long s_blocks_per_group;/* Number of blocks in a group */
unsigned long s_inodes_per_group;/* Number of inodes in a group */
unsigned long s_itb_per_group; /* Number of inode table blocks per group */
[-- Attachment #4: ext3-2.6.23-3-recursive_mtime.diff --]
[-- Type: text/x-patch, Size: 15024 bytes --]
Implement recursive mtime (rtime) feature for ext3. The feature works as
follows: In each directory we keep a flag EXT3_RTIME_FL (modifiable by a user)
whether rtime should be updated. In case a directory or a file in it is
modified and when the flag is set, directory's rtime is updated, the flag is
cleared, and we move to the parent. If the flag is set there, we clear it,
update rtime and continue upwards upto the root of the filesystem. In case a
regular file or symlink is modified, we pick arbitrary of its parents (actually
the one that happens to be at the head of i_dentry list) and start the rtime
update algorith there.
As the flag is always cleared after updating rtime and we don't climb up the
tree if the flag is cleared, we have constant amortized complexity of rtime
updates. That's for theoretical time consumption ;) Practically, there's no
measurable performance impact for a test case like: touch every file in a
kernel tree where every directory has RTIME flag set.
Intended use case is that application which wants to watch any modification in
a subtree scans the subtree and sets flags for all inodes there. Next time, it
just needs to recurse in directories having rtime newer than the start of the
previous scan. There it can handle modifications and set the flag again. It is
up to application to watch out for hardlinked files. It can e.g. build their
list and check their mtime separately (when a hardlink to a file is created its
inode is modified and rtimes properly updated and thus any application has an
effective way of finding new hardlinked files).
Signed-off-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
diff -rupX /home/jack/.kerndiffexclude linux-2.6.23-2-ext3_make_frags_unused/fs/ext3/ialloc.c linux-2.6.23-3-ext3_recursive_mtime/fs/ext3/ialloc.c
--- linux-2.6.23-2-ext3_make_frags_unused/fs/ext3/ialloc.c 2007-11-05 16:58:10.000000000 +0100
+++ linux-2.6.23-3-ext3_recursive_mtime/fs/ext3/ialloc.c 2007-11-05 16:58:53.000000000 +0100
@@ -569,7 +569,7 @@ got:
/* Guard reading of directory's i_flags, created inode is safe as
* noone has a reference to it yet */
spin_lock(&dir->i_lock);
- ei->i_flags = EXT3_I(dir)->i_flags & ~EXT3_INDEX_FL;
+ ei->i_flags = EXT3_I(dir)->i_flags & ~(EXT3_INDEX_FL | EXT3_RTIME_FL);
spin_unlock(&dir->i_lock);
if (S_ISLNK(mode))
ei->i_flags &= ~(EXT3_IMMUTABLE_FL|EXT3_APPEND_FL);
@@ -579,6 +579,7 @@ got:
ei->i_file_acl = 0;
ei->i_dir_acl = 0;
ei->i_dtime = 0;
+ ei->i_rtime = inode->i_mtime.tv_sec;
ei->i_block_alloc_info = NULL;
ei->i_block_group = group;
diff -rupX /home/jack/.kerndiffexclude linux-2.6.23-2-ext3_make_frags_unused/fs/ext3/inode.c linux-2.6.23-3-ext3_recursive_mtime/fs/ext3/inode.c
--- linux-2.6.23-2-ext3_make_frags_unused/fs/ext3/inode.c 2007-11-05 16:58:10.000000000 +0100
+++ linux-2.6.23-3-ext3_recursive_mtime/fs/ext3/inode.c 2007-11-06 16:13:50.000000000 +0100
@@ -1232,6 +1232,8 @@ static int ext3_ordered_commit_write(str
ret2 = ext3_journal_stop(handle);
if (!ret)
ret = ret2;
+ if (!ret)
+ ext3_update_rtimes(inode);
return ret;
}
@@ -1255,6 +1257,8 @@ static int ext3_writeback_commit_write(s
ret2 = ext3_journal_stop(handle);
if (!ret)
ret = ret2;
+ if (!ret)
+ ext3_update_rtimes(inode);
return ret;
}
@@ -1288,6 +1292,8 @@ static int ext3_journalled_commit_write(
ret2 = ext3_journal_stop(handle);
if (!ret)
ret = ret2;
+ if (!ret)
+ ext3_update_rtimes(inode);
return ret;
}
@@ -2386,6 +2392,10 @@ out_stop:
ext3_orphan_del(handle, inode);
ext3_journal_stop(handle);
+ /* We update time only for linked inodes. Unlinked ones already
+ * notified parent during unlink... */
+ if (inode->i_nlink)
+ ext3_update_rtimes(inode);
}
static ext3_fsblk_t ext3_get_inode_block(struct super_block *sb,
@@ -2628,6 +2638,8 @@ void ext3_read_inode(struct inode * inod
inode->i_ctime.tv_sec = (signed)le32_to_cpu(raw_inode->i_ctime);
inode->i_mtime.tv_sec = (signed)le32_to_cpu(raw_inode->i_mtime);
inode->i_atime.tv_nsec = inode->i_ctime.tv_nsec = inode->i_mtime.tv_nsec = 0;
+ if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb, EXT3_FEATURE_COMPAT_RTIME))
+ ei->i_rtime = le32_to_cpu(raw_inode->i_rtime);
ei->i_state = 0;
ei->i_dir_start_lookup = 0;
@@ -2780,6 +2792,8 @@ static int ext3_do_update_inode(handle_t
raw_inode->i_atime = cpu_to_le32(inode->i_atime.tv_sec);
raw_inode->i_ctime = cpu_to_le32(inode->i_ctime.tv_sec);
raw_inode->i_mtime = cpu_to_le32(inode->i_mtime.tv_sec);
+ if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb, EXT3_FEATURE_COMPAT_RTIME))
+ raw_inode->i_rtime = cpu_to_le32(ei->i_rtime);
raw_inode->i_blocks = cpu_to_le32(inode->i_blocks);
raw_inode->i_dtime = cpu_to_le32(ei->i_dtime);
spin_lock(&inode->i_lock);
@@ -2978,6 +2992,8 @@ int ext3_setattr(struct dentry *dentry,
if (!rc && (ia_valid & ATTR_MODE))
rc = ext3_acl_chmod(inode);
+ if (!rc)
+ ext3_update_rtimes(inode);
err_out:
ext3_std_error(inode->i_sb, error);
@@ -3129,6 +3145,7 @@ void ext3_dirty_inode(struct inode *inod
handle_t *current_handle = ext3_journal_current_handle();
handle_t *handle;
+ /* Reserve 2 blocks for inode and superblock */
handle = ext3_journal_start(inode, 2);
if (IS_ERR(handle))
goto out;
diff -rupX /home/jack/.kerndiffexclude linux-2.6.23-2-ext3_make_frags_unused/fs/ext3/ioctl.c linux-2.6.23-3-ext3_recursive_mtime/fs/ext3/ioctl.c
--- linux-2.6.23-2-ext3_make_frags_unused/fs/ext3/ioctl.c 2007-11-05 15:42:03.000000000 +0100
+++ linux-2.6.23-3-ext3_recursive_mtime/fs/ext3/ioctl.c 2007-11-05 16:58:53.000000000 +0100
@@ -23,10 +23,20 @@ int ext3_ioctl (struct inode * inode, st
struct ext3_inode_info *ei = EXT3_I(inode);
unsigned int flags;
unsigned short rsv_window_size;
+ unsigned int rtime;
ext3_debug ("cmd = %u, arg = %lu\n", cmd, arg);
switch (cmd) {
+ case EXT3_IOC_GETRTIME:
+ if (!test_opt(inode->i_sb, RTIME))
+ return -ENOTSUPP;
+ if (!S_ISDIR(inode->i_mode))
+ return -ENOTDIR;
+ spin_lock(&inode->i_lock);
+ rtime = ei->i_rtime;
+ spin_unlock(&inode->i_lock);
+ return put_user(rtime, (unsigned int __user *) arg);
case EXT3_IOC_GETFLAGS:
ext3_get_inode_flags(ei);
spin_lock(&inode->i_lock);
@@ -49,8 +59,10 @@ int ext3_ioctl (struct inode * inode, st
if (get_user(flags, (int __user *) arg))
return -EFAULT;
- if (!S_ISDIR(inode->i_mode))
+ if (!S_ISDIR(inode->i_mode)) {
flags &= ~EXT3_DIRSYNC_FL;
+ flags &= ~EXT3_RTIME_FL;
+ }
mutex_lock(&inode->i_mutex);
handle = ext3_journal_start(inode, 1);
diff -rupX /home/jack/.kerndiffexclude linux-2.6.23-2-ext3_make_frags_unused/fs/ext3/namei.c linux-2.6.23-3-ext3_recursive_mtime/fs/ext3/namei.c
--- linux-2.6.23-2-ext3_make_frags_unused/fs/ext3/namei.c 2007-10-09 22:31:38.000000000 +0200
+++ linux-2.6.23-3-ext3_recursive_mtime/fs/ext3/namei.c 2007-11-05 16:58:53.000000000 +0100
@@ -65,6 +65,59 @@ static struct buffer_head *ext3_append(h
return bh;
}
+/* We don't want to get new handle for every inode updated. Thus we batch
+ * updates of this many inodes into one transaction */
+#define RTIME_UPDATES_PER_TRANS 16
+
+/* Walk up the directory tree and modify rtimes.
+ * We journal i_rtime updates into a separate transaction - we don't guarantee
+ * consistency between other inode times and rtime. Only consistency between
+ * i_flags and i_rtime. */
+int __ext3_update_rtimes(struct inode *inode)
+{
+ struct dentry *dentry = list_entry(inode->i_dentry.next, struct dentry,
+ d_alias);
+ handle_t *handle;
+ int updates = 0;
+ int err = 0;
+
+ /* We should not have any transaction started - noone knows how many
+ * inode updates will be needed */
+ WARN_ON(ext3_journal_current_handle() != NULL);
+ if (!S_ISDIR(inode->i_mode)) {
+ dentry = dentry->d_parent;
+ inode = dentry->d_inode;
+ }
+ while (ext3_test_inode_flags(inode, EXT3_RTIME_FL)) {
+ if (!updates) {
+ /* For inode updates + superblock */
+ handle = ext3_journal_start(inode, RTIME_UPDATES_PER_TRANS + 1);
+ if (IS_ERR(handle))
+ return PTR_ERR(handle);
+ updates = RTIME_UPDATES_PER_TRANS;
+ }
+
+ spin_lock(&inode->i_lock);
+ EXT3_I(inode)->i_rtime = get_seconds();
+ EXT3_I(inode)->i_flags &= ~EXT3_RTIME_FL;
+ spin_unlock(&inode->i_lock);
+ ext3_mark_inode_dirty(handle, inode);
+ if (!--updates) {
+ err = ext3_journal_stop(handle);
+ if (err)
+ return err;
+ }
+
+ if (dentry == inode->i_sb->s_root)
+ break;
+ dentry = dentry->d_parent;
+ inode = dentry->d_inode;
+ }
+ if (updates)
+ err = ext3_journal_stop(handle);
+ return err;
+}
+
#ifndef assert
#define assert(test) J_ASSERT(test)
#endif
@@ -1738,6 +1791,8 @@ retry:
ext3_journal_stop(handle);
if (err == -ENOSPC && ext3_should_retry_alloc(dir->i_sb, &retries))
goto retry;
+ if (!err)
+ ext3_update_rtimes(dir);
return err;
}
@@ -1773,6 +1828,8 @@ retry:
ext3_journal_stop(handle);
if (err == -ENOSPC && ext3_should_retry_alloc(dir->i_sb, &retries))
goto retry;
+ if (!err)
+ ext3_update_rtimes(dir);
return err;
}
@@ -1847,6 +1904,8 @@ out_stop:
ext3_journal_stop(handle);
if (err == -ENOSPC && ext3_should_retry_alloc(dir->i_sb, &retries))
goto retry;
+ if (!err)
+ ext3_update_rtimes(dir);
return err;
}
@@ -2123,6 +2182,8 @@ static int ext3_rmdir (struct inode * di
end_rmdir:
ext3_journal_stop(handle);
+ if (!retval)
+ ext3_update_rtimes(dir);
brelse (bh);
return retval;
}
@@ -2177,6 +2238,8 @@ static int ext3_unlink(struct inode * di
end_unlink:
ext3_journal_stop(handle);
+ if (!retval)
+ ext3_update_rtimes(dir);
brelse (bh);
return retval;
}
@@ -2234,6 +2297,8 @@ out_stop:
ext3_journal_stop(handle);
if (err == -ENOSPC && ext3_should_retry_alloc(dir->i_sb, &retries))
goto retry;
+ if (!err)
+ ext3_update_rtimes(dir);
return err;
}
@@ -2270,6 +2335,10 @@ retry:
ext3_journal_stop(handle);
if (err == -ENOSPC && ext3_should_retry_alloc(dir->i_sb, &retries))
goto retry;
+ if (!err) {
+ ext3_update_rtimes(dir);
+ ext3_update_rtimes(inode);
+ }
return err;
}
@@ -2429,6 +2498,10 @@ end_rename:
brelse (old_bh);
brelse (new_bh);
ext3_journal_stop(handle);
+ if (!retval) {
+ ext3_update_rtimes(old_dir);
+ ext3_update_rtimes(new_dir);
+ }
return retval;
}
diff -rupX /home/jack/.kerndiffexclude linux-2.6.23-2-ext3_make_frags_unused/fs/ext3/super.c linux-2.6.23-3-ext3_recursive_mtime/fs/ext3/super.c
--- linux-2.6.23-2-ext3_make_frags_unused/fs/ext3/super.c 2007-11-05 16:58:10.000000000 +0100
+++ linux-2.6.23-3-ext3_recursive_mtime/fs/ext3/super.c 2007-11-05 16:58:53.000000000 +0100
@@ -684,7 +684,7 @@ enum {
Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota,
Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota,
Opt_ignore, Opt_barrier, Opt_err, Opt_resize, Opt_usrquota,
- Opt_grpquota
+ Opt_grpquota, Opt_rtime
};
static match_table_t tokens = {
@@ -734,6 +734,7 @@ static match_table_t tokens = {
{Opt_quota, "quota"},
{Opt_usrquota, "usrquota"},
{Opt_barrier, "barrier=%u"},
+ {Opt_rtime, "rtime"},
{Opt_err, NULL},
{Opt_resize, "resize"},
};
@@ -1066,6 +1067,14 @@ clear_qf_name:
case Opt_bh:
clear_opt(sbi->s_mount_opt, NOBH);
break;
+ case Opt_rtime:
+ if (!EXT3_HAS_COMPAT_FEATURE(sb, EXT3_FEATURE_COMPAT_RTIME)) {
+ printk("EXT3-fs: rtime option available only "
+ "if superblock has RTIME feature.\n");
+ return 0;
+ }
+ set_opt(sbi->s_mount_opt, RTIME);
+ break;
default:
printk (KERN_ERR
"EXT3-fs: Unrecognized mount option \"%s\" "
diff -rupX /home/jack/.kerndiffexclude linux-2.6.23-2-ext3_make_frags_unused/include/linux/ext3_fs.h linux-2.6.23-3-ext3_recursive_mtime/include/linux/ext3_fs.h
--- linux-2.6.23-2-ext3_make_frags_unused/include/linux/ext3_fs.h 2007-11-05 16:58:10.000000000 +0100
+++ linux-2.6.23-3-ext3_recursive_mtime/include/linux/ext3_fs.h 2007-11-06 16:34:43.000000000 +0100
@@ -177,10 +177,11 @@ struct ext3_group_desc
#define EXT3_NOTAIL_FL 0x00008000 /* file tail should not be merged */
#define EXT3_DIRSYNC_FL 0x00010000 /* dirsync behaviour (directories only) */
#define EXT3_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/
+#define EXT3_RTIME_FL 0x00100000 /* Update recursive mtime (directories only) */
#define EXT3_RESERVED_FL 0x80000000 /* reserved for ext3 lib */
-#define EXT3_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */
-#define EXT3_FL_USER_MODIFIABLE 0x000380FF /* User modifiable flags */
+#define EXT3_FL_USER_VISIBLE 0x0013DFFF /* User visible flags */
+#define EXT3_FL_USER_MODIFIABLE 0x001380FF /* User modifiable flags */
/*
* Inode dynamic state flags
@@ -229,6 +230,7 @@ struct ext3_new_group_data {
#endif
#define EXT3_IOC_GETRSVSZ _IOR('f', 5, long)
#define EXT3_IOC_SETRSVSZ _IOW('f', 6, long)
+#define EXT3_IOC_GETRTIME _IOR('f', 9, unsigned int)
/*
* ioctl commands in 32 bit emulation
@@ -291,7 +293,7 @@ struct ext3_inode {
__le32 i_generation; /* File version (for NFS) */
__le32 i_file_acl; /* File ACL */
__le32 i_dir_acl; /* Directory ACL */
- __le32 i_obsolete_faddr; /* Unused */
+ __le32 i_rtime; /* Recursive Modification Time - directories only */
union {
struct {
__u16 l_i_obsolete_frag; /* Unused */
@@ -381,6 +383,7 @@ struct ext3_inode {
#define EXT3_MOUNT_QUOTA 0x80000 /* Some quota option set */
#define EXT3_MOUNT_USRQUOTA 0x100000 /* "old" user quota */
#define EXT3_MOUNT_GRPQUOTA 0x200000 /* "old" group quota */
+#define EXT3_MOUNT_RTIME 0x400000 /* Update rtime */
/* Compatibility, for having both ext2_fs.h and ext3_fs.h included at once */
#ifndef _LINUX_EXT2_FS_H
@@ -580,6 +583,7 @@ static inline unsigned int ext3_test_ino
#define EXT3_FEATURE_COMPAT_EXT_ATTR 0x0008
#define EXT3_FEATURE_COMPAT_RESIZE_INODE 0x0010
#define EXT3_FEATURE_COMPAT_DIR_INDEX 0x0020
+#define EXT3_FEATURE_COMPAT_RTIME 0x0080
#define EXT3_FEATURE_RO_COMPAT_SPARSE_SUPER 0x0001
#define EXT3_FEATURE_RO_COMPAT_LARGE_FILE 0x0002
@@ -854,6 +858,13 @@ extern int ext3_orphan_add(handle_t *, s
extern int ext3_orphan_del(handle_t *, struct inode *);
extern int ext3_htree_fill_tree(struct file *dir_file, __u32 start_hash,
__u32 start_minor_hash, __u32 *next_hash);
+extern int __ext3_update_rtimes(struct inode *inode);
+static inline int ext3_update_rtimes(struct inode *inode)
+{
+ if (test_opt(inode->i_sb, RTIME))
+ return __ext3_update_rtimes(inode);
+ return 0;
+}
/* resize.c */
extern int ext3_group_add(struct super_block *sb,
diff -rupX /home/jack/.kerndiffexclude linux-2.6.23-2-ext3_make_frags_unused/include/linux/ext3_fs_i.h linux-2.6.23-3-ext3_recursive_mtime/include/linux/ext3_fs_i.h
--- linux-2.6.23-2-ext3_make_frags_unused/include/linux/ext3_fs_i.h 2007-11-05 15:42:03.000000000 +0100
+++ linux-2.6.23-3-ext3_recursive_mtime/include/linux/ext3_fs_i.h 2007-11-05 16:58:53.000000000 +0100
@@ -78,6 +78,7 @@ struct ext3_inode_info {
ext3_fsblk_t i_file_acl;
__u32 i_dir_acl;
__u32 i_dtime;
+ __u32 i_rtime;
/*
* i_block_group is the number of the block group which contains
[-- Attachment #5: Type: text/plain, Size: 190 bytes --]
_______________________________________________
malware-list mailing list
malware-list-h+Im9A44IAFcMpApZELgcQ@public.gmane.org
http://dmesg2.printk.net/cgi-bin/mailman/listinfo/malware-list
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-24 21:44 ` Jamie Lokier
@ 2009-07-27 17:52 ` Evgeniy Polyakov
2009-07-29 20:11 ` Eric Paris
0 siblings, 1 reply; 63+ messages in thread
From: Evgeniy Polyakov @ 2009-07-27 17:52 UTC (permalink / raw)
To: Jamie Lokier
Cc: Eric Paris, david, linux-kernel, linux-fsdevel, malware-list,
Valdis.Kletnieks, greg, jcm, douglas.leeder, tytso, arjan,
jengelh, aviro, mrkafk, alexl, jack, tvrtko.ursulin, a.p.zijlstra,
hch, alan, mmorley
Hi.
On Fri, Jul 24, 2009 at 10:44:01PM +0100, Jamie Lokier (jamie@shareable.org) wrote:
> > No, I will NOT EVER pass a pathname. Period. End of story. I stated
> > the if userspace wants to deal with pathnames (and they understand the
> > system setup well enough to know if pathnames even make sense to them)
> > they can use readlink(2) on /proc/self/fd
>
> That makes sense.
>
> In most cases where events trigger userspace cache or index updates,
> userspace already has enough information to calculate the path (and
> any derived data) from the inode number (in the case of non-hard-link
> files) or from the inode number of the parent directory and the name
> (not full path).
Except that rlimits may forbid to open new file descriptor while queue
length is enough to put another event with the full or partial path
name.
I will read initial mail next, but if it is not described there, how
rlimit problem is handled?
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-25 0:29 ` Jamie Lokier
@ 2009-07-27 18:33 ` Andreas Dilger
2009-07-27 19:23 ` Jamie Lokier
[not found] ` <20090727183354.GM4231-RIaA196FMs1uuQVovAj/GogTZbYi8/ss@public.gmane.org>
2009-07-29 20:07 ` Eric Paris
1 sibling, 2 replies; 63+ messages in thread
From: Andreas Dilger @ 2009-07-27 18:33 UTC (permalink / raw)
To: Jamie Lokier
Cc: Eric Paris, linux-kernel, linux-fsdevel, malware-list,
Valdis.Kletnieks, greg, jcm, douglas.leeder, tytso, arjan, david,
jengelh, aviro, mrkafk, alexl, jack, tvrtko.ursulin, a.p.zijlstra,
hch, alan, mmorley, pavel
On Jul 25, 2009 01:29 +0100, Jamie Lokier wrote:
> Eric Paris wrote:
> > But maybe I should jsut do the 'if you have fanotify open, you don't
> > create other fanotify events'... so everyone gets what they expect...
>
> O_NONOTIFY. Similar security concerns, more control.
>
> The security concern is clear: If you allow a process with fanotify
> open to not create events, then any (root) process can open a fanotify
> socket to hide it's behaviour.
I think the "fanotify doesn't generate more fanotify events" makes the
most sense. Given that the open will be done in the kernel specifically
due to fanotify, this doesn't actually allow the listener to open files
without detection (unlike the "O_NONOTIFY" flag would). The fanotify
"opens" would only be in response to other processes opening the file.
It might also make sense to verify that the process doing the open has
at least permission to open the file in question (i.e. root) so that
some unauthorized process cannot just get file handles to arbitrary files.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-27 18:33 ` Andreas Dilger
@ 2009-07-27 19:23 ` Jamie Lokier
2009-07-28 17:59 ` Andreas Dilger
[not found] ` <20090727192342.GA27895-yetKDKU6eevNLxjTenLetw@public.gmane.org>
[not found] ` <20090727183354.GM4231-RIaA196FMs1uuQVovAj/GogTZbYi8/ss@public.gmane.org>
1 sibling, 2 replies; 63+ messages in thread
From: Jamie Lokier @ 2009-07-27 19:23 UTC (permalink / raw)
To: Andreas Dilger
Cc: Eric Paris, linux-kernel, linux-fsdevel, malware-list,
Valdis.Kletnieks, greg, jcm, douglas.leeder, tytso, arjan, david,
jengelh, aviro, mrkafk, alexl, jack, tvrtko.ursulin, a.p.zijlstra,
hch, alan, mmorley, pavel
Andreas Dilger wrote:
> On Jul 25, 2009 01:29 +0100, Jamie Lokier wrote:
> > Eric Paris wrote:
> > > But maybe I should jsut do the 'if you have fanotify open, you don't
> > > create other fanotify events'... so everyone gets what they expect...
> >
> > O_NONOTIFY. Similar security concerns, more control.
> >
> > The security concern is clear: If you allow a process with fanotify
> > open to not create events, then any (root) process can open a fanotify
> > socket to hide it's behaviour.
>
> I think the "fanotify doesn't generate more fanotify events" makes the
> most sense. Given that the open will be done in the kernel specifically
> due to fanotify, this doesn't actually allow the listener to open files
> without detection (unlike the "O_NONOTIFY" flag would). The fanotify
> "opens" would only be in response to other processes opening the file.
Nice idea (if I understand it) - the file descriptors opened by
fanotify wouldn't cause fanotify events? Effectively having an
in-kernel O_NONOTIFY flag which can't be set from userspace?
'if you have fanotify open, you don't create other fanotify events' is
too severe - it means you can circumvent fanotify entirely for
everything your process does by just opening a fanotify socket and not
using it, which is clearly worse than having an O_NONOTIFY flag.
All ways to avoid creating fanotify events introduce security and
cache integrity problems though.
Why should processes which simply _watch_ the filesystem, without
modifying any files, ever fail to receive information about file
changes? Why should they be unable to claim they provide integrity
guarantees because of the loophole?
So before we implement that loophole: let's ask ourselves if we really
need it at all.
What's wrong with fanotify-using applications generating events when
they modify files themselves?
An example was given where app A gets an event and modifies the file,
then app B gets an event and modifies the file, and app A... cycling.
But if you have two "virus checker" style applications fighting over
modifying the same file without locking, you have much bigger problems
already. There's nothing to gain by fixing the fanotify cycle.
Maybe there's no need to suppress events after all.
Programs monitoring for writes to maintain caches or integrity checks
will be much happier if they know they're getting all file modifications.
-- Jamie
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-24 20:13 fanotify - overall design before I start sending patches Eric Paris
` (3 preceding siblings ...)
2009-07-25 14:22 ` Niraj kumar
@ 2009-07-28 11:48 ` Jon Masters
[not found] ` <1248781708.14145.21.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-08-03 16:23 ` Christoph Hellwig
` (3 subsequent siblings)
8 siblings, 1 reply; 63+ messages in thread
From: Jon Masters @ 2009-07-28 11:48 UTC (permalink / raw)
To: Eric Paris
Cc: linux-kernel, linux-fsdevel, malware-list, Valdis.Kletnieks, greg,
douglas.leeder, tytso, arjan, david, jengelh, aviro, mrkafk,
alexl, jack, tvrtko.ursulin, a.p.zijlstra, hch, alan, mmorley,
pavel
On Fri, 2009-07-24 at 16:13 -0400, Eric Paris wrote:
> I plan to start sending patches for fanotify in the next week or two.
Generally, I appreciate your effort (as I'm sure does everyone else).
I agree with Jamie that it's good to consider extending inotify and also
that the special socket idea probably won't work for mainline. Also:
1). Ability to watch only certain mount-points, not just directories. Or
directories and block on mount operations as Jamie suggested. Or both :)
2). Add event on mmap perhaps. Future theoretical cloud cuckoo land
ideas include forcing all mmap operations to be read-only and then
having the page fault handler fire an event for every write so that the
anti-malware thing can monitor every single touched page...joke.
3). Sounds a lot like netlink could be close enough. Kay and others have
been playing with in-kernel multiplexing and re-broadcasting of netlink
events, and I'm pretty sure most of the rest is doable.
I'm looking forward to updatedb using this. Let's try up-playing the use
cases outside malware for this stuff. I think the average person is
going to get more excited to see "Beagle done right" or "something like
Microsoft indexer service"[0] than 1970s updatedb. It's certainly a nice
and compelling reason to get this into mainline IMO.
Jon.
[0] Except anything but as crap as their version. Seriously, the last
time I used a Windows system and looked at it, the indexer was consuming
more CPU than Beagle ever did. And I liked the Beagle concept.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-27 19:23 ` Jamie Lokier
@ 2009-07-28 17:59 ` Andreas Dilger
[not found] ` <20090727192342.GA27895-yetKDKU6eevNLxjTenLetw@public.gmane.org>
1 sibling, 0 replies; 63+ messages in thread
From: Andreas Dilger @ 2009-07-28 17:59 UTC (permalink / raw)
To: Jamie Lokier
Cc: Eric Paris, linux-kernel, linux-fsdevel, malware-list,
Valdis.Kletnieks, greg, jcm, douglas.leeder, tytso, arjan, david,
jengelh, aviro, mrkafk, alexl, jack, tvrtko.ursulin, a.p.zijlstra,
hch, alan, mmorley, pavel
On Jul 27, 2009 20:23 +0100, Jamie Lokier wrote:
> Andreas Dilger wrote:
> > I think the "fanotify doesn't generate more fanotify events" makes the
> > most sense. Given that the open will be done in the kernel specifically
> > due to fanotify, this doesn't actually allow the listener to open files
> > without detection (unlike the "O_NONOTIFY" flag would). The fanotify
> > "opens" would only be in response to other processes opening the file.
>
> Nice idea (if I understand it) - the file descriptors opened by
> fanotify wouldn't cause fanotify events? Effectively having an
> in-kernel O_NONOTIFY flag which can't be set from userspace?
Right.
> 'if you have fanotify open, you don't create other fanotify events' is
> too severe - it means you can circumvent fanotify entirely for
> everything your process does by just opening a fanotify socket and not
> using it, which is clearly worse than having an O_NONOTIFY flag.
That isn't what I meant... It was only "operations on fanotify file
descriptors don't produce further fanotify events", otherwise madness
ensues.
> What's wrong with fanotify-using applications generating events when
> they modify files themselves?
>
> An example was given where app A gets an event and modifies the file,
> then app B gets an event and modifies the file, and app A... cycling.
>
> But if you have two "virus checker" style applications fighting over
> modifying the same file without locking, you have much bigger problems
> already. There's nothing to gain by fixing the fanotify cycle.
I don't think they are fighting over modifying the file, they are
generating an endless stream of events for each other to process.
I don't think locking will help.
It's the same as e.g. having verbose kernel debug messages related
to filesystem/block IO going to /var/log/messages on the filesystem
you are monitoring. After the first (external) IO, it is logged
to disk, which causes an IO to the filesystem, which is logged to
disk, which causes an IO...
There has to be some way to NOT generate any events on the files
that you are monitoring.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-25 0:29 ` Jamie Lokier
2009-07-27 18:33 ` Andreas Dilger
@ 2009-07-29 20:07 ` Eric Paris
1 sibling, 0 replies; 63+ messages in thread
From: Eric Paris @ 2009-07-29 20:07 UTC (permalink / raw)
To: Jamie Lokier
Cc: linux-kernel, linux-fsdevel, malware-list, Valdis.Kletnieks, greg,
jcm, douglas.leeder, tytso, arjan, david, jengelh, aviro, mrkafk,
alexl, jack, tvrtko.ursulin, a.p.zijlstra, hch, alan, mmorley,
pavel
On Sat, 2009-07-25 at 01:29 +0100, Jamie Lokier wrote:
> Eric Paris wrote:
> > But maybe I should jsut do the 'if you have fanotify open, you don't
> > create other fanotify events'... so everyone gets what they expect...
>
> O_NONOTIFY. Similar security concerns, more control.
>
> The security concern is clear: If you allow a process with fanotify
> open to not create events, then any (root) process can open a fanotify
> socket to hide it's behaviour.
Let me first say I'm not sure of how many useful 'security' goals can be
met using fanotify mainly because I haven't focused on any. You can do
data integrity checking before access but I'm saying up front, malicious
programs can almost certainly sneak information across these checks.
The 'problem' is really strongly with the open and open_perm, but there
are problems with read_perm as well. With just 2 groups listening to
'open' for the same file we get that infinite event loop as they each
see the open from the other listener as each listener opens an fd on the
object in question. If both groups are listening to open_perm you
obviously have a deadlock....
The same applies if 2 fanotify groups need read_perm as they both may
need to block waiting for the decision of the other.
I have an easy solution to the 'open' problems, just don't fire the open
and open_perm events when it is fanotify doing the open. I guess I
could use file->private_data when the fanotify listener does io on an
fanotify opened file to see if it was an fanotify opened fd and ignore
only those events.... I'll try this afternoon.
So really now I'm planning to only not send events to other fanotify
listeners which are on fanotify opened fds.
> Do I see right that you need to open the directory before you can set
> the mark on it?
Correct. But unlike dnotify, you don't have to keep it open.
> The main reason behind inotify's design wasn't the API (although it is
> better than dnotify); it was to avoid having to open thousands of
> directories, and to allow a filesystem to be unmounted while it is
> being watched.
I could make fanotify_so_mark pathname rather than fd based. But fd
based works very nicely in global mode as the fd you get in the event
metadata can be used in setting marks.
You need to open the directory so you have an fd, so you can set the
mark. You can then close the directory. If the directory is deleted,
your mark is gone (forever.)
> Does a fanotify mark stop a filesystem from being unmounted? If not,
> if the filesystem is unmounted and remounted, is the mark lost?
No, it does not prevent unmounting. Yes, if it is unmounted and
remounted the marks are lost forever.
-Eric
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-25 14:22 ` Niraj kumar
@ 2009-07-29 20:08 ` Eric Paris
0 siblings, 0 replies; 63+ messages in thread
From: Eric Paris @ 2009-07-29 20:08 UTC (permalink / raw)
To: Niraj kumar; +Cc: linux-kernel, linux-fsdevel
On Sat, 2009-07-25 at 19:52 +0530, Niraj kumar wrote:
> >
> > - subtree notification.
> > Currently to only watch /home and all of it's descendants one must
> > either register a directed watch on every directory or use a global
> > listener. The global listener with ignored_mask is not as bad as it
> > sounds in my testing, but decent subtree registration and notification
> > would be a big win in a lot of people's mind.
>
> Unless it's already covered in some way, I would also be interested
> in notification based on "process subtree". What this means is that
> for whatever notification a particular process requests, it's only
> interested in events generated by itself and it's children.
> This is useful in doing auditing for file system related access.
Has not been considered. You'd have to really flesh out the use case
before I could.....
-Eric
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-27 17:52 ` Evgeniy Polyakov
@ 2009-07-29 20:11 ` Eric Paris
0 siblings, 0 replies; 63+ messages in thread
From: Eric Paris @ 2009-07-29 20:11 UTC (permalink / raw)
To: Evgeniy Polyakov
Cc: Jamie Lokier, david, linux-kernel, linux-fsdevel, malware-list,
Valdis.Kletnieks, greg, jcm, douglas.leeder, tytso, arjan,
jengelh, aviro, mrkafk, alexl, jack, tvrtko.ursulin, a.p.zijlstra,
hch, alan, mmorley
On Mon, 2009-07-27 at 21:52 +0400, Evgeniy Polyakov wrote:
> Hi.
>
> On Fri, Jul 24, 2009 at 10:44:01PM +0100, Jamie Lokier (jamie@shareable.org) wrote:
> > > No, I will NOT EVER pass a pathname. Period. End of story. I stated
> > > the if userspace wants to deal with pathnames (and they understand the
> > > system setup well enough to know if pathnames even make sense to them)
> > > they can use readlink(2) on /proc/self/fd
> >
> > That makes sense.
> >
> > In most cases where events trigger userspace cache or index updates,
> > userspace already has enough information to calculate the path (and
> > any derived data) from the inode number (in the case of non-hard-link
> > files) or from the inode number of the parent directory and the name
> > (not full path).
>
> Except that rlimits may forbid to open new file descriptor while queue
> length is enough to put another event with the full or partial path
> name.
>
> I will read initial mail next, but if it is not described there, how
> rlimit problem is handled?
At the moment if you run out of rlimit fds you start getting (useless)
events with the fd equal to some errno (don't remember what hitting
rlimit errno is offhand)
-Eric
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <20090727183354.GM4231-RIaA196FMs1uuQVovAj/GogTZbYi8/ss@public.gmane.org>
@ 2009-07-29 20:12 ` Eric Paris
0 siblings, 0 replies; 63+ messages in thread
From: Eric Paris @ 2009-07-29 20:12 UTC (permalink / raw)
To: Andreas Dilger
Cc: jack-AlSwsSmVLrQ, jengelh-nopoi9nDyk+ELgA04lAiVw,
pavel-AlSwsSmVLrQ, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
malware-list-h+Im9A44IAFcMpApZELgcQ,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w, hch-wEGCiKHe2LqWVfeAwA7xHQ,
a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, jcm-H+wXaHxf7aLQT0dZR+AlfA,
Jamie Lokier, alexl-H+wXaHxf7aLQT0dZR+AlfA,
arjan-wEGCiKHe2LqWVfeAwA7xHQ, david-gFPdbfVZQbY,
Valdis.Kletnieks-PjAqaU27lzQ,
douglas.leeder-j34lQMj1tz/QT0dZR+AlfA,
aviro-H+wXaHxf7aLQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
On Mon, 2009-07-27 at 12:33 -0600, Andreas Dilger wrote:
> On Jul 25, 2009 01:29 +0100, Jamie Lokier wrote:
> It might also make sense to verify that the process doing the open has
> at least permission to open the file in question (i.e. root) so that
> some unauthorized process cannot just get file handles to arbitrary files.
All current permissions between the listener process and the object are
done. It's quite possible to get fanotify events where the fd = -EPERM.
-Eric
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <20090727192342.GA27895-yetKDKU6eevNLxjTenLetw@public.gmane.org>
@ 2009-07-29 20:14 ` Eric Paris
0 siblings, 0 replies; 63+ messages in thread
From: Eric Paris @ 2009-07-29 20:14 UTC (permalink / raw)
To: Jamie Lokier
Cc: Andreas Dilger, jack-AlSwsSmVLrQ, jengelh-nopoi9nDyk+ELgA04lAiVw,
pavel-AlSwsSmVLrQ, alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
malware-list-h+Im9A44IAFcMpApZELgcQ,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w, hch-wEGCiKHe2LqWVfeAwA7xHQ,
a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw, jcm-H+wXaHxf7aLQT0dZR+AlfA,
alexl-H+wXaHxf7aLQT0dZR+AlfA, arjan-wEGCiKHe2LqWVfeAwA7xHQ,
david-gFPdbfVZQbY, Valdis.Kletnieks-PjAqaU27lzQ,
douglas.leeder-j34lQMj1tz/QT0dZR+AlfA,
aviro-H+wXaHxf7aLQT0dZR+AlfA, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA
On Mon, 2009-07-27 at 20:23 +0100, Jamie Lokier wrote:
> Andreas Dilger wrote:
> > On Jul 25, 2009 01:29 +0100, Jamie Lokier wrote:
> > > Eric Paris wrote:
> What's wrong with fanotify-using applications generating events when
> they modify files themselves?
>
> An example was given where app A gets an event and modifies the file,
> then app B gets an event and modifies the file, and app A... cycling.
No the example was the 'open' which the kernel does on behalf of the
listener. I'm thinking now I should only exclude
OPEN
OPEN_PERM
ACCESS_PERM
as those are the only 3 event types I can see deadlock/recursion
problems with.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <1248781708.14145.21.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2009-07-29 20:20 ` Eric Paris
0 siblings, 0 replies; 63+ messages in thread
From: Eric Paris @ 2009-07-29 20:20 UTC (permalink / raw)
To: Jon Masters
Cc: david-gFPdbfVZQbY, a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
Valdis.Kletnieks-PjAqaU27lzQ, malware-list-h+Im9A44IAFcMpApZELgcQ,
douglas.leeder-j34lQMj1tz/QT0dZR+AlfA,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w, aviro-H+wXaHxf7aLQT0dZR+AlfA,
jack-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
jengelh-nopoi9nDyk+ELgA04lAiVw, hch-wEGCiKHe2LqWVfeAwA7xHQ,
pavel-AlSwsSmVLrQ, alexl-H+wXaHxf7aLQT0dZR+AlfA,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
arjan-wEGCiKHe2LqWVfeAwA7xHQ
On Tue, 2009-07-28 at 07:48 -0400, Jon Masters wrote:
> On Fri, 2009-07-24 at 16:13 -0400, Eric Paris wrote:
>
> > I plan to start sending patches for fanotify in the next week or two.
>
> Generally, I appreciate your effort (as I'm sure does everyone else).
>
> I agree with Jamie that it's good to consider extending inotify and also
> that the special socket idea probably won't work for mainline. Also:
The special socket idea was Alan Cox's idea and I haven't heard a usable
alternative.
> 1). Ability to watch only certain mount-points, not just directories. Or
> directories and block on mount operations as Jamie suggested. Or both :)
Show me the user and I'll consider it.
> 2). Add event on mmap perhaps. Future theoretical cloud cuckoo land
> ideas include forcing all mmap operations to be read-only and then
> having the page fault handler fire an event for every write so that the
> anti-malware thing can monitor every single touched page...joke.
>
> 3). Sounds a lot like netlink could be close enough. Kay and others have
> been playing with in-kernel multiplexing and re-broadcasting of netlink
> events, and I'm pretty sure most of the rest is doable.
Jeez, about the 50th time someone has said netlink. I need to do the fd
open in the context of the receiving process. How do I do that with
netlink? It cannot be done at the netlink msg send side (which is the
context of the original process accessing the file)
> I'm looking forward to updatedb using this.
Well, that's still in the future work, as all updatedb cares about it
rename events, and the kernel does have enough information to send
fanotify events during rename.
> Let's try up-playing the use
> cases outside malware for this stuff.
I'm not playing any spin or bullshit. I've got HSM users who wants it.
Readahead profiling wants it. I'm told that wine can properly do
windows style notification with it instead of some hack they have now.
I've also got a need to get people who want to run integrity
checkers/virus scanners to stop binary patching/hacking my kernels. I'd
say I've found plenty of use cases today even if you don't find them as
sexy as beagle :)
-Eric
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-24 20:13 fanotify - overall design before I start sending patches Eric Paris
` (4 preceding siblings ...)
2009-07-28 11:48 ` Jon Masters
@ 2009-08-03 16:23 ` Christoph Hellwig
[not found] ` <20090803162303.GA31058-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2009-08-04 16:09 ` Tvrtko Ursulin
` (2 subsequent siblings)
8 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2009-08-03 16:23 UTC (permalink / raw)
To: Eric Paris
Cc: linux-kernel, linux-fsdevel, malware-list, Valdis.Kletnieks, greg,
jcm, douglas.leeder, tytso, arjan, david, jengelh, aviro, mrkafk,
alexl, jack, tvrtko.ursulin, a.p.zijlstra, hch, alan, mmorley,
pavel
Before we get started: You promised yo get rid of the dnotify and
inotify fields in the inode when moving over those two to fsnotify.
While the dnotify fields are gone the inotify_watches and inotify_mutex
fields are still around. I'd really like to see this done before we
move on.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <20090803162303.GA31058-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2009-08-03 16:55 ` Eric Paris
2009-08-03 18:04 ` Christoph Hellwig
0 siblings, 1 reply; 63+ messages in thread
From: Eric Paris @ 2009-08-03 16:55 UTC (permalink / raw)
To: Christoph Hellwig
Cc: david-gFPdbfVZQbY, a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
Valdis.Kletnieks-PjAqaU27lzQ,
douglas.leeder-j34lQMj1tz/QT0dZR+AlfA,
malware-list-h+Im9A44IAFcMpApZELgcQ,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w, aviro-H+wXaHxf7aLQT0dZR+AlfA,
jack-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
jengelh-nopoi9nDyk+ELgA04lAiVw, pavel-AlSwsSmVLrQ,
alexl-H+wXaHxf7aLQT0dZR+AlfA, jcm-H+wXaHxf7aLQT0dZR+AlfA,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
arjan-wEGCiKHe2LqWVfeAwA7xHQ
On Mon, 2009-08-03 at 12:23 -0400, Christoph Hellwig wrote:
> Before we get started: You promised yo get rid of the dnotify and
> inotify fields in the inode when moving over those two to fsnotify.
> While the dnotify fields are gone the inotify_watches and inotify_mutex
> fields are still around. I'd really like to see this done before we
> move on.
In linux-next you will find that those fields are completely unused and
the kernel can be compiled without them with 0 lose of functionality
(they are still used in linus's kernel by the audit system)
The only reason I didn't remove them entirely is because the inotify
kernel interface is EXPORT_SYMBOL. I plan to leave it around for a
little bit until I'm sure out of tree users don't exist or at least had
enough time to realize they need to use fsnotify.
For all situations I know of at least, those 2 fields can be safely
compiled out. (aka !CONFIG_INOTIFY)
-Eric
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-08-03 16:55 ` Eric Paris
@ 2009-08-03 18:04 ` Christoph Hellwig
[not found] ` <20090803180437.GA9798-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2009-08-03 18:04 UTC (permalink / raw)
To: Eric Paris
Cc: Christoph Hellwig, linux-kernel, linux-fsdevel, malware-list,
Valdis.Kletnieks, greg, jcm, douglas.leeder, tytso, arjan, david,
jengelh, aviro, mrkafk, alexl, jack, tvrtko.ursulin, a.p.zijlstra,
alan, mmorley, pavel
On Mon, Aug 03, 2009 at 12:55:35PM -0400, Eric Paris wrote:
> In linux-next you will find that those fields are completely unused and
> the kernel can be compiled without them with 0 lose of functionality
> (they are still used in linus's kernel by the audit system)
>
> The only reason I didn't remove them entirely is because the inotify
> kernel interface is EXPORT_SYMBOL. I plan to leave it around for a
> little bit until I'm sure out of tree users don't exist or at least had
> enough time to realize they need to use fsnotify.
There's absolutely no reason to keep them around, we never guaranteed
API stability and never cared about out of tree modules. And if any
out of tree modules are out there using it they won't notice they need
to stop using it before you removed it. So just nuke them entirely
in the for-2.6.32 queue.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <20090803180437.GA9798-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2009-08-03 18:13 ` Eric Paris
0 siblings, 0 replies; 63+ messages in thread
From: Eric Paris @ 2009-08-03 18:13 UTC (permalink / raw)
To: Christoph Hellwig
Cc: david-gFPdbfVZQbY, a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
Valdis.Kletnieks-PjAqaU27lzQ,
douglas.leeder-j34lQMj1tz/QT0dZR+AlfA,
malware-list-h+Im9A44IAFcMpApZELgcQ,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w, aviro-H+wXaHxf7aLQT0dZR+AlfA,
jack-AlSwsSmVLrQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
jengelh-nopoi9nDyk+ELgA04lAiVw, alexl-H+wXaHxf7aLQT0dZR+AlfA,
jcm-H+wXaHxf7aLQT0dZR+AlfA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
arjan-wEGCiKHe2LqWVfeAwA7xHQ
On Mon, 2009-08-03 at 14:04 -0400, Christoph Hellwig wrote:
> On Mon, Aug 03, 2009 at 12:55:35PM -0400, Eric Paris wrote:
> > In linux-next you will find that those fields are completely unused and
> > the kernel can be compiled without them with 0 lose of functionality
> > (they are still used in linus's kernel by the audit system)
> >
> > The only reason I didn't remove them entirely is because the inotify
> > kernel interface is EXPORT_SYMBOL. I plan to leave it around for a
> > little bit until I'm sure out of tree users don't exist or at least had
> > enough time to realize they need to use fsnotify.
>
> There's absolutely no reason to keep them around, we never guaranteed
> API stability and never cared about out of tree modules. And if any
> out of tree modules are out there using it they won't notice they need
> to stop using it before you removed it. So just nuke them entirely
> in the for-2.6.32 queue.
Will nuke entirely in .32
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-24 20:13 fanotify - overall design before I start sending patches Eric Paris
` (5 preceding siblings ...)
2009-08-03 16:23 ` Christoph Hellwig
@ 2009-08-04 16:09 ` Tvrtko Ursulin
[not found] ` <200908041709.51659.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
2009-08-04 16:34 ` Tvrtko Ursulin
2009-08-05 2:05 ` Pavel Machek
8 siblings, 1 reply; 63+ messages in thread
From: Tvrtko Ursulin @ 2009-08-04 16:09 UTC (permalink / raw)
To: Eric Paris
Cc: david-gFPdbfVZQbY@public.gmane.org,
a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org,
Valdis.Kletnieks-PjAqaU27lzQ@public.gmane.org, Douglas Leeder,
malware-list-h+Im9A44IAFcMpApZELgcQ@public.gmane.org,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
aviro-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
jack-AlSwsSmVLrQ@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jengelh-nopoi9nDyk+ELgA04lAiVw@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
pavel-AlSwsSmVLrQ@public.gmane.org,
alexl-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
jcm-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org,
arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
Hi Eric, all,
On Friday 24 July 2009 21:13:49 Eric Paris wrote:
> If a FAN_ACCESS_PERM or FAN_OPEN_PERM event is received the listener
> must send a response before the 5 second timeout. If no response is
> sent before the 5 second timeout the original operation is allowed. If
> this happens too many times (10 in a row) the fanotify group is evicted
> from the kernel and will not get any new events. Sending a response is
Would it make more sense to deny on timeouts and then evict? I am thinking it
would be more secure with no significant drawbacks. Also for usages like HSM
allowing it without data being in place might present wrong content to the
user.
> The only other current interface is the ability to ignore events by
> superblock magic number. This makes it easy to ignore all events
> in /proc which can be difficult to accomplish firing FANOTIFY_SET_MARK
> with ignored_masks over and over as processes are created and destroyed.
Just to double-check, that would also work for any other filesystem and is
controllable from userspace?
Tvrtko
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <200908041709.51659.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
@ 2009-08-04 16:27 ` Eric Paris
2009-08-04 16:39 ` Tvrtko Ursulin
[not found] ` <1249403268.2361.21.camel-8EcGF3LoIElviLIMxPk1+R/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
0 siblings, 2 replies; 63+ messages in thread
From: Eric Paris @ 2009-08-04 16:27 UTC (permalink / raw)
To: Tvrtko Ursulin
Cc: david-gFPdbfVZQbY@public.gmane.org,
a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org,
Valdis.Kletnieks-PjAqaU27lzQ@public.gmane.org, Douglas Leeder,
malware-list-h+Im9A44IAFcMpApZELgcQ@public.gmane.org,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
aviro-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
jack-AlSwsSmVLrQ@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jengelh-nopoi9nDyk+ELgA04lAiVw@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
pavel-AlSwsSmVLrQ@public.gmane.org,
alexl-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
jcm-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org,
arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
On Tue, 2009-08-04 at 17:09 +0100, Tvrtko Ursulin wrote:
> Hi Eric, all,
>
> On Friday 24 July 2009 21:13:49 Eric Paris wrote:
> > If a FAN_ACCESS_PERM or FAN_OPEN_PERM event is received the listener
> > must send a response before the 5 second timeout. If no response is
> > sent before the 5 second timeout the original operation is allowed. If
> > this happens too many times (10 in a row) the fanotify group is evicted
> > from the kernel and will not get any new events. Sending a response is
>
> Would it make more sense to deny on timeouts and then evict? I am thinking it
> would be more secure with no significant drawbacks. Also for usages like HSM
> allowing it without data being in place might present wrong content to the
> user.
I'd be willing to go that route as long as noone else complains.
> > The only other current interface is the ability to ignore events by
> > superblock magic number. This makes it easy to ignore all events
> > in /proc which can be difficult to accomplish firing FANOTIFY_SET_MARK
> > with ignored_masks over and over as processes are created and destroyed.
>
> Just to double-check, that would also work for any other filesystem and is
> controllable from userspace?
Yes you set these in userspace using setsockopt(). It is based on
superblock magic number as found in linux/magic.h. So one could
exclude, procfs, sysfs, selinuxfs, etc. It does not provide a way to
say 'this ext3 filesystem but not that ext3 filesystem' as ext3 has a
single magic number.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-24 20:13 fanotify - overall design before I start sending patches Eric Paris
` (6 preceding siblings ...)
2009-08-04 16:09 ` Tvrtko Ursulin
@ 2009-08-04 16:34 ` Tvrtko Ursulin
[not found] ` <200908041734.05762.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
2009-08-05 10:35 ` Douglas Leeder
2009-08-05 2:05 ` Pavel Machek
8 siblings, 2 replies; 63+ messages in thread
From: Tvrtko Ursulin @ 2009-08-04 16:34 UTC (permalink / raw)
To: Eric Paris
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
malware-list@dmesg.printk.net, Valdis.Kletnieks@vt.edu,
greg@kroah.com, jcm@redhat.com, Douglas Leeder, tytso@mit.edu,
arjan@infradead.org, david@lang.hm, jengelh@medozas.de,
aviro@redhat.com, mrkafk@gmail.com, alexl@redhat.com,
jack@suse.cz, a.p.zijlstra@chello.nl, hch@infradead.org,
alan@lxorguk.ukuu.org.uk, mmorley@hcl.in, pavel@suse.cz
On Friday 24 July 2009 21:13:49 Eric Paris wrote:
> After the socket is bound events are attained using the read() syscall
> (recv* probably also works haven't tested). This will result in the
> buffer being filled with one or more events like this:
>
> struct fanotify_event_metadata {
> __u32 event_len;
> __s32 fd;
> __u32 mask;
> __u32 f_flags;
> __s32 pid;
> __s32 tgid;
> __u64 cookie;
> } __attribute__((packed));
>
> fd specifies the new file descriptor that was created in the context of
> the listener. (readlink of /proc/self/fd will give you A pathname)
> mask indicates the events type (bitwise OR of the event types listed
> above). f_flags here is the f_flags the ORIGINAL process has the file
> open with. pid and tgid are from the original process. cookie is used
> when the listener needs to allow, deny, or delay the operation.
One more thing. uid and gid (possibly whole set?) would be useful so we can
tell which user triggered an event without having to look at the process
which has maybe disappeared in the mean time. I _think_ uid was in the
original proposal/idea and don't remember if it was decided we cannot get it
deliberately, or it was omitted by accident?
Tvrtko
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-08-04 16:27 ` Eric Paris
@ 2009-08-04 16:39 ` Tvrtko Ursulin
[not found] ` <1249403268.2361.21.camel-8EcGF3LoIElviLIMxPk1+R/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
1 sibling, 0 replies; 63+ messages in thread
From: Tvrtko Ursulin @ 2009-08-04 16:39 UTC (permalink / raw)
To: Eric Paris
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
malware-list@dmesg.printk.net, Valdis.Kletnieks@vt.edu,
greg@kroah.com, jcm@redhat.com, Douglas Leeder, tytso@mit.edu,
arjan@infradead.org, david@lang.hm, jengelh@medozas.de,
aviro@redhat.com, mrkafk@gmail.com, alexl@redhat.com,
jack@suse.cz, a.p.zijlstra@chello.nl, hch@infradead.org,
alan@lxorguk.ukuu.org.uk, mmorley@hcl.in, pavel@suse.cz
On Tuesday 04 August 2009 17:27:48 Eric Paris wrote:
> On Tue, 2009-08-04 at 17:09 +0100, Tvrtko Ursulin wrote:
> > Hi Eric, all,
> >
> > On Friday 24 July 2009 21:13:49 Eric Paris wrote:
> > > If a FAN_ACCESS_PERM or FAN_OPEN_PERM event is received the listener
> > > must send a response before the 5 second timeout. If no response is
> > > sent before the 5 second timeout the original operation is allowed. If
> > > this happens too many times (10 in a row) the fanotify group is evicted
> > > from the kernel and will not get any new events. Sending a response is
> >
> > Would it make more sense to deny on timeouts and then evict? I am
> > thinking it would be more secure with no significant drawbacks. Also for
> > usages like HSM allowing it without data being in place might present
> > wrong content to the user.
>
> I'd be willing to go that route as long as noone else complains.
Ok, keep it open then for a while and I guess it is trivial to change this bit
of behaviour.
> > > The only other current interface is the ability to ignore events by
> > > superblock magic number. This makes it easy to ignore all events
> > > in /proc which can be difficult to accomplish firing FANOTIFY_SET_MARK
> > > with ignored_masks over and over as processes are created and
> > > destroyed.
> >
> > Just to double-check, that would also work for any other filesystem and
> > is controllable from userspace?
>
> Yes you set these in userspace using setsockopt(). It is based on
> superblock magic number as found in linux/magic.h. So one could
> exclude, procfs, sysfs, selinuxfs, etc. It does not provide a way to
> say 'this ext3 filesystem but not that ext3 filesystem' as ext3 has a
> single magic number.
This is probably good enough. Subtree and mount point exclusions would be even
better (in addition to superblock magic exclusions - I would not get rid of
them) but I have no idea how realistic this requirement is, or whether it is
possible to do it more efficiently in kernel space at all.
Tvrtko
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <1249403268.2361.21.camel-8EcGF3LoIElviLIMxPk1+R/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
@ 2009-08-04 17:22 ` Valdis.Kletnieks-PjAqaU27lzQ
[not found] ` <19585.1249406551-+bZmOdGhbsPr6rcHtW+onFJE71vCis6O@public.gmane.org>
0 siblings, 1 reply; 63+ messages in thread
From: Valdis.Kletnieks-PjAqaU27lzQ @ 2009-08-04 17:22 UTC (permalink / raw)
To: Eric Paris
Cc: david-gFPdbfVZQbY@public.gmane.org,
a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org,
Douglas Leeder,
malware-list-h+Im9A44IAFcMpApZELgcQ@public.gmane.org,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
aviro-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jengelh-nopoi9nDyk+ELgA04lAiVw@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
pavel-AlSwsSmVLrQ@public.gmane.org,
alexl-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
jcm-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jack-AlSwsSmVLrQ@public.gmane.org,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org,
arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
[-- Attachment #1.1: Type: text/plain, Size: 670 bytes --]
On Tue, 04 Aug 2009 12:27:48 EDT, Eric Paris said:
> On Tue, 2009-08-04 at 17:09 +0100, Tvrtko Ursulin wrote:
> > Would it make more sense to deny on timeouts and then evict? I am thinking it
> > would be more secure with no significant drawbacks. Also for usages like HSM
> > allowing it without data being in place might present wrong content to the
> > user.
>
> I'd be willing to go that route as long as noone else complains.
Yes, in my world, "deny on timeout and evict" is the better design decision.
For an HSM, you'd rather have a quick-and-ugly death on a failed file open than
an app accidentally reading the HSM's stub data thinking it's the original data.
[-- Attachment #1.2: Type: application/pgp-signature, Size: 226 bytes --]
[-- Attachment #2: Type: text/plain, Size: 190 bytes --]
_______________________________________________
malware-list mailing list
malware-list-h+Im9A44IAFcMpApZELgcQ@public.gmane.org
http://dmesg2.printk.net/cgi-bin/mailman/listinfo/malware-list
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <19585.1249406551-+bZmOdGhbsPr6rcHtW+onFJE71vCis6O@public.gmane.org>
@ 2009-08-04 18:20 ` John Stoffel
[not found] ` <19064.31705.491774.122207-HgN6juyGXH5AfugRpC6u6w@public.gmane.org>
2009-08-05 9:32 ` Tvrtko Ursulin
0 siblings, 2 replies; 63+ messages in thread
From: John Stoffel @ 2009-08-04 18:20 UTC (permalink / raw)
To: Valdis.Kletnieks-PjAqaU27lzQ
Cc: david-gFPdbfVZQbY@public.gmane.org,
a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org,
Douglas Leeder,
malware-list-h+Im9A44IAFcMpApZELgcQ@public.gmane.org,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
aviro-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jengelh-nopoi9nDyk+ELgA04lAiVw@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
pavel-AlSwsSmVLrQ@public.gmane.org,
alexl-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
jcm-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jack-AlSwsSmVLrQ@public.gmane.org,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org,
arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
>>>>> "Valdis" == Valdis Kletnieks <Valdis.Kletnieks-PjAqaU27lzQ@public.gmane.org> writes:
Valdis> On Tue, 04 Aug 2009 12:27:48 EDT, Eric Paris said:
>> On Tue, 2009-08-04 at 17:09 +0100, Tvrtko Ursulin wrote:
>> > Would it make more sense to deny on timeouts and then evict? I am thinking it
>> > would be more secure with no significant drawbacks. Also for usages like HSM
>> > allowing it without data being in place might present wrong content to the
>> > user.
>>
>> I'd be willing to go that route as long as noone else complains.
Valdis> Yes, in my world, "deny on timeout and evict" is the better
Valdis> design decision. For an HSM, you'd rather have a
Valdis> quick-and-ugly death on a failed file open than an app
Valdis> accidentally reading the HSM's stub data thinking it's the
Valdis> original data.
Speaking as somone who is working slowly to deploy an HSM service, one
thing to note is that when you *do* see the stub file contents, you
know that your HSM is busted somehow.
How will fanotify deal with this issue? Sorry, I haven't paid enough
attention to this thread though I know I should since it's up my $WORK
alley.
John
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <19064.31705.491774.122207-HgN6juyGXH5AfugRpC6u6w@public.gmane.org>
@ 2009-08-04 18:50 ` Eric Paris
0 siblings, 0 replies; 63+ messages in thread
From: Eric Paris @ 2009-08-04 18:50 UTC (permalink / raw)
To: John Stoffel
Cc: david-gFPdbfVZQbY@public.gmane.org,
a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org,
Valdis.Kletnieks-PjAqaU27lzQ, Douglas Leeder,
malware-list-h+Im9A44IAFcMpApZELgcQ@public.gmane.org,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
aviro-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
jack-AlSwsSmVLrQ@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jengelh-nopoi9nDyk+ELgA04lAiVw@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
pavel-AlSwsSmVLrQ@public.gmane.org,
alexl-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
jcm-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org,
arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
On Tue, 2009-08-04 at 14:20 -0400, John Stoffel wrote:
> >>>>> "Valdis" == Valdis Kletnieks <Valdis.Kletnieks-PjAqaU27lzQ@public.gmane.org> writes:
>
> Valdis> On Tue, 04 Aug 2009 12:27:48 EDT, Eric Paris said:
> >> On Tue, 2009-08-04 at 17:09 +0100, Tvrtko Ursulin wrote:
> >> > Would it make more sense to deny on timeouts and then evict? I am thinking it
> >> > would be more secure with no significant drawbacks. Also for usages like HSM
> >> > allowing it without data being in place might present wrong content to the
> >> > user.
> >>
> >> I'd be willing to go that route as long as noone else complains.
>
> Valdis> Yes, in my world, "deny on timeout and evict" is the better
> Valdis> design decision. For an HSM, you'd rather have a
> Valdis> quick-and-ugly death on a failed file open than an app
> Valdis> accidentally reading the HSM's stub data thinking it's the
> Valdis> original data.
>
> Speaking as somone who is working slowly to deploy an HSM service, one
> thing to note is that when you *do* see the stub file contents, you
> know that your HSM is busted somehow.
>
> How will fanotify deal with this issue? Sorry, I haven't paid enough
> attention to this thread though I know I should since it's up my $WORK
> alley.
fanotify doesn't explicitly deal with it at all. If the HSM implemented
as a fanotify listener starts to misbehave and not respond, processes
will start to get EACCES. If it misses 10 in a row it'll be evicted and
processes will start to see the stubs.
-Eric
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-07-24 20:13 fanotify - overall design before I start sending patches Eric Paris
` (7 preceding siblings ...)
2009-08-04 16:34 ` Tvrtko Ursulin
@ 2009-08-05 2:05 ` Pavel Machek
2009-08-05 16:46 ` Tvrtko Ursulin
8 siblings, 1 reply; 63+ messages in thread
From: Pavel Machek @ 2009-08-05 2:05 UTC (permalink / raw)
To: Eric Paris
Cc: linux-kernel, linux-fsdevel, malware-list, Valdis.Kletnieks, greg,
jcm, douglas.leeder, tytso, arjan, david, jengelh, aviro, mrkafk,
alexl, jack, tvrtko.ursulin, a.p.zijlstra, hch, alan, mmorley
BTW my -@suse.cz address no longer works. pavel@ucw.cz should be ok.
> If a FAN_ACCESS_PERM or FAN_OPEN_PERM event is received the listener
> must send a response before the 5 second timeout. If no response is
> sent before the 5 second timeout the original operation is allowed. If
> this happens too many times (10 in a row) the fanotify group is evicted
> from the kernel and will not get any new events. Sending a response is
> done using the setsockopt() call with the socket options set to
> FANOTIFY_ACCESS_RESPONSE. The buffer should contain a structure like:
The timeout part of interface is very ugly. Will fanotify users have
to be realtime/mlocked?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-08-04 18:20 ` John Stoffel
[not found] ` <19064.31705.491774.122207-HgN6juyGXH5AfugRpC6u6w@public.gmane.org>
@ 2009-08-05 9:32 ` Tvrtko Ursulin
1 sibling, 0 replies; 63+ messages in thread
From: Tvrtko Ursulin @ 2009-08-05 9:32 UTC (permalink / raw)
To: John Stoffel
Cc: Valdis.Kletnieks@vt.edu, Eric Paris, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, malware-list@dmesg.printk.net,
greg@kroah.com, jcm@redhat.com, Douglas Leeder, tytso@mit.edu,
arjan@infradead.org, david@lang.hm, jengelh@medozas.de,
aviro@redhat.com, mrkafk@gmail.com, alexl@redhat.com,
jack@suse.cz, a.p.zijlstra@chello.nl, hch@infradead.org,
alan@lxorguk.ukuu.org.uk, mmorley@hcl.in, "
On Tuesday 04 August 2009 19:20:09 John Stoffel wrote:
> >>>>> "Valdis" == Valdis Kletnieks <Valdis.Kletnieks@vt.edu> writes:
>
> Valdis> On Tue, 04 Aug 2009 12:27:48 EDT, Eric Paris said:
> >> On Tue, 2009-08-04 at 17:09 +0100, Tvrtko Ursulin wrote:
> >> > Would it make more sense to deny on timeouts and then evict? I am
> >> > thinking it would be more secure with no significant drawbacks. Also
> >> > for usages like HSM allowing it without data being in place might
> >> > present wrong content to the user.
> >>
> >> I'd be willing to go that route as long as noone else complains.
>
> Valdis> Yes, in my world, "deny on timeout and evict" is the better
> Valdis> design decision. For an HSM, you'd rather have a
> Valdis> quick-and-ugly death on a failed file open than an app
> Valdis> accidentally reading the HSM's stub data thinking it's the
> Valdis> original data.
>
> Speaking as somone who is working slowly to deploy an HSM service, one
> thing to note is that when you *do* see the stub file contents, you
> know that your HSM is busted somehow.
>
> How will fanotify deal with this issue? Sorry, I haven't paid enough
> attention to this thread though I know I should since it's up my $WORK
> alley.
Would it make sense to allow the listener to pass in (on registration, maybe
also on response) the error code which will be received by userspace? In that
way HSM could set it to -ENODATA (or something), malware scanning to -EACCESS
etc. which would give userspace a clearer indication of what went wrong.
Tvrtko
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <200908041734.05762.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
@ 2009-08-05 10:12 ` Douglas Leeder
0 siblings, 0 replies; 63+ messages in thread
From: Douglas Leeder @ 2009-08-05 10:12 UTC (permalink / raw)
To: Tvrtko Ursulin
Cc: david-gFPdbfVZQbY@public.gmane.org,
Valdis.Kletnieks-PjAqaU27lzQ@public.gmane.org,
a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org,
malware-list-h+Im9A44IAFcMpApZELgcQ@public.gmane.org,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
aviro-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
jack-AlSwsSmVLrQ@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jengelh-nopoi9nDyk+ELgA04lAiVw@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
pavel-AlSwsSmVLrQ@public.gmane.org,
alexl-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
jcm-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org,
arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
Tvrtko Ursulin wrote:
> On Friday 24 July 2009 21:13:49 Eric Paris wrote:
>> After the socket is bound events are attained using the read() syscall
>> (recv* probably also works haven't tested). This will result in the
>> buffer being filled with one or more events like this:
>>
>> struct fanotify_event_metadata {
>> __u32 event_len;
>> __s32 fd;
>> __u32 mask;
>> __u32 f_flags;
>> __s32 pid;
>> __s32 tgid;
>> __u64 cookie;
>> } __attribute__((packed));
>>
>> fd specifies the new file descriptor that was created in the context of
>> the listener. (readlink of /proc/self/fd will give you A pathname)
>> mask indicates the events type (bitwise OR of the event types listed
>> above). f_flags here is the f_flags the ORIGINAL process has the file
>> open with. pid and tgid are from the original process. cookie is used
>> when the listener needs to allow, deny, or delay the operation.
>
> One more thing. uid and gid (possibly whole set?) would be useful so we can
> tell which user triggered an event without having to look at the process
> which has maybe disappeared in the mean time. I _think_ uid was in the
> original proposal/idea and don't remember if it was decided we cannot get it
> deliberately, or it was omitted by accident?
Maybe it would be good to include in the documentation how to extract
extra information that listeners might want?
e.g.
To get the path (or an approximation), do readlink on the fd.
To get the UID/GID of the originator process, look in /proc/<PID>/????
etc.
This would provide easier answers to the questions on including extra
info in the fanotify events.
--
Douglas Leeder
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-08-04 16:34 ` Tvrtko Ursulin
[not found] ` <200908041734.05762.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
@ 2009-08-05 10:35 ` Douglas Leeder
1 sibling, 0 replies; 63+ messages in thread
From: Douglas Leeder @ 2009-08-05 10:35 UTC (permalink / raw)
To: Tvrtko Ursulin
Cc: Eric Paris, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, malware-list@dmesg.printk.net,
Valdis.Kletnieks@vt.edu, greg@kroah.com, jcm@redhat.com,
tytso@mit.edu, arjan@infradead.org, david@lang.hm,
jengelh@medozas.de, aviro@redhat.com, mrkafk@gmail.com,
alexl@redhat.com, a.p.zijlstra@chello.nl, hch@infradead.org,
alan@lxorguk.ukuu.org.uk, mmorley@hcl.in, pavel@suse.cz
> On Friday 24 July 2009 21:13:49 Eric Paris wrote:
>> After the socket is bound events are attained using the read() syscall
>> (recv* probably also works haven't tested). This will result in the
>> buffer being filled with one or more events like this:
>>
>> struct fanotify_event_metadata {
>> __u32 event_len;
>> __s32 fd;
>> __u32 mask;
>> __u32 f_flags;
>> __s32 pid;
>> __s32 tgid;
>> __u64 cookie;
>> } __attribute__((packed));
>>
>> fd specifies the new file descriptor that was created in the context of
>> the listener. (readlink of /proc/self/fd will give you A pathname)
>> mask indicates the events type (bitwise OR of the event types listed
>> above). f_flags here is the f_flags the ORIGINAL process has the file
>> open with. pid and tgid are from the original process. cookie is used
>> when the listener needs to allow, deny, or delay the operation.
>
One thing that's come up recently that we can't detect with Talpa:
Can fanotify differentiate between an execute and a normal open for reading?
If it can differentiate, could it send that information in the
event_metadata?
--
Douglas Leeder
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-08-05 2:05 ` Pavel Machek
@ 2009-08-05 16:46 ` Tvrtko Ursulin
[not found] ` <200908051746.17903.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 63+ messages in thread
From: Tvrtko Ursulin @ 2009-08-05 16:46 UTC (permalink / raw)
To: Pavel Machek
Cc: Eric Paris, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, malware-list@dmesg.printk.net,
Valdis.Kletnieks@vt.edu, greg@kroah.com, jcm@redhat.com,
Douglas Leeder, tytso@mit.edu, arjan@infradead.org, david@lang.hm,
jengelh@medozas.de, aviro@redhat.com, mrkafk@gmail.com,
alexl@redhat.com, jack@suse.cz, a.p.zijlstra@chello.nl,
hch@infradead.org, alan@lxorguk.ukuu.org.uk, mmorley@hcl.in
On Wednesday 05 August 2009 03:05:34 Pavel Machek wrote:
> BTW my -@suse.cz address no longer works. pavel@ucw.cz should be ok.
>
> > If a FAN_ACCESS_PERM or FAN_OPEN_PERM event is received the listener
> > must send a response before the 5 second timeout. If no response is
> > sent before the 5 second timeout the original operation is allowed. If
> > this happens too many times (10 in a row) the fanotify group is evicted
> > from the kernel and will not get any new events. Sending a response is
> > done using the setsockopt() call with the socket options set to
> > FANOTIFY_ACCESS_RESPONSE. The buffer should contain a structure like:
>
> The timeout part of interface is very ugly. Will fanotify users have
> to be realtime/mlocked?
Why do you think it is very ugly?
Just to make sure you haven't missed this - it is not that they have to
complete the whole operation before the timeout period (since you mention
realtime/mlock I suspect this is what you think?), but _during_ the operation
they have to show that they are active by sending something like keep alive
messages.
Or you are worried about failing to meet even that on a loaded system? There
has to be something like this otherwise hung userspace client would kill the
whole system.
Tvrtko
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <200908051746.17903.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
@ 2009-08-06 10:10 ` Pavel Machek
2009-08-06 10:20 ` Tvrtko Ursulin
[not found] ` <20090806101059.GD31370-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org>
0 siblings, 2 replies; 63+ messages in thread
From: Pavel Machek @ 2009-08-06 10:10 UTC (permalink / raw)
To: Tvrtko Ursulin
Cc: david-gFPdbfVZQbY@public.gmane.org,
a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org,
Valdis.Kletnieks-PjAqaU27lzQ@public.gmane.org, Douglas Leeder,
malware-list-h+Im9A44IAFcMpApZELgcQ@public.gmane.org,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
aviro-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
jack-AlSwsSmVLrQ@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jengelh-nopoi9nDyk+ELgA04lAiVw@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
alexl-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
jcm-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org,
arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
On Wed 2009-08-05 17:46:16, Tvrtko Ursulin wrote:
> On Wednesday 05 August 2009 03:05:34 Pavel Machek wrote:
> > BTW my -@suse.cz address no longer works. pavel-+ZI9xUNit7I@public.gmane.org should be ok.
> >
> > > If a FAN_ACCESS_PERM or FAN_OPEN_PERM event is received the listener
> > > must send a response before the 5 second timeout. If no response is
> > > sent before the 5 second timeout the original operation is allowed. If
> > > this happens too many times (10 in a row) the fanotify group is evicted
> > > from the kernel and will not get any new events. Sending a response is
> > > done using the setsockopt() call with the socket options set to
> > > FANOTIFY_ACCESS_RESPONSE. The buffer should contain a structure like:
> >
> > The timeout part of interface is very ugly. Will fanotify users have
> > to be realtime/mlocked?
>
> Why do you think it is very ugly?
Do I need to explain?
> Just to make sure you haven't missed this - it is not that they have to
> complete the whole operation before the timeout period (since you mention
> realtime/mlock I suspect this is what you think?), but _during_ the operation
> they have to show that they are active by sending something like keep alive
> messages.
>
> Or you are worried about failing to meet even that on a loaded system? There
> has to be something like this otherwise hung userspace client would kill the
> whole system.
Of course, I'm worried about failing to meet this on loaded
system. And the fact that I _have_ to worry about that means that
interface is ugly/broken.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-08-06 10:10 ` Pavel Machek
@ 2009-08-06 10:20 ` Tvrtko Ursulin
2009-08-06 10:24 ` Pavel Machek
[not found] ` <20090806101059.GD31370-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org>
1 sibling, 1 reply; 63+ messages in thread
From: Tvrtko Ursulin @ 2009-08-06 10:20 UTC (permalink / raw)
To: Pavel Machek
Cc: Eric Paris, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, malware-list@dmesg.printk.net,
Valdis.Kletnieks@vt.edu, greg@kroah.com, jcm@redhat.com,
Douglas Leeder, tytso@mit.edu, arjan@infradead.org, david@lang.hm,
jengelh@medozas.de, aviro@redhat.com, mrkafk@gmail.com,
alexl@redhat.com, jack@suse.cz, a.p.zijlstra@chello.nl,
hch@infradead.org, alan@lxorguk.ukuu.org.uk, mmorley@hcl.in
On Thursday 06 August 2009 11:10:59 Pavel Machek wrote:
> On Wed 2009-08-05 17:46:16, Tvrtko Ursulin wrote:
> > Just to make sure you haven't missed this - it is not that they have to
> > complete the whole operation before the timeout period (since you mention
> > realtime/mlock I suspect this is what you think?), but _during_ the
> > operation they have to show that they are active by sending something
> > like keep alive messages.
> >
> > Or you are worried about failing to meet even that on a loaded system?
> > There has to be something like this otherwise hung userspace client would
> > kill the whole system.
>
> Of course, I'm worried about failing to meet this on loaded
> system. And the fact that I _have_ to worry about that means that
> interface is ugly/broken.
Would you prefer an infinite timeout instead? Maybe Eric could make it
configurable. Or you have some other alternative ideas?
Tvrtko
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <20090806101059.GD31370-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org>
@ 2009-08-06 10:20 ` Douglas Leeder
2009-08-06 10:22 ` Pavel Machek
2009-08-06 10:29 ` Peter Zijlstra
0 siblings, 2 replies; 63+ messages in thread
From: Douglas Leeder @ 2009-08-06 10:20 UTC (permalink / raw)
To: Pavel Machek
Cc: david-gFPdbfVZQbY@public.gmane.org,
Valdis.Kletnieks-PjAqaU27lzQ@public.gmane.org,
a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org,
malware-list-h+Im9A44IAFcMpApZELgcQ@public.gmane.org,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
aviro-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jengelh-nopoi9nDyk+ELgA04lAiVw@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
alexl-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
jcm-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org,
arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
Pavel Machek wrote:
> On Wed 2009-08-05 17:46:16, Tvrtko Ursulin wrote:
>> On Wednesday 05 August 2009 03:05:34 Pavel Machek wrote:
>> Just to make sure you haven't missed this - it is not that they have to
>> complete the whole operation before the timeout period (since you mention
>> realtime/mlock I suspect this is what you think?), but _during_ the operation
>> they have to show that they are active by sending something like keep alive
>> messages.
>>
>> Or you are worried about failing to meet even that on a loaded system? There
>> has to be something like this otherwise hung userspace client would kill the
>> whole system.
>
> Of course, I'm worried about failing to meet this on loaded
> system. And the fact that I _have_ to worry about that means that
> interface is ugly/broken.
You mean that in 5 seconds, you won't have any point when you can tell
the kernel, "I'm still working"?
--
Douglas Leeder
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-08-06 10:20 ` Douglas Leeder
@ 2009-08-06 10:22 ` Pavel Machek
2009-08-07 8:59 ` Jamie Lokier
2009-08-06 10:29 ` Peter Zijlstra
1 sibling, 1 reply; 63+ messages in thread
From: Pavel Machek @ 2009-08-06 10:22 UTC (permalink / raw)
To: Douglas Leeder
Cc: Tvrtko Ursulin, Eric Paris, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, malware-list@dmesg.printk.net,
Valdis.Kletnieks@vt.edu, greg@kroah.com, jcm@redhat.com,
tytso@mit.edu, arjan@infradead.org, david@lang.hm,
jengelh@medozas.de, aviro@redhat.com, mrkafk@gmail.com,
alexl@redhat.com, a.p.zijlstra@chello.nl, hch@infradead.org,
alan@lxorguk.ukuu.org.uk, mmorley@hcl.in
On Thu 2009-08-06 11:20:57, Douglas Leeder wrote:
> Pavel Machek wrote:
> > On Wed 2009-08-05 17:46:16, Tvrtko Ursulin wrote:
> >> On Wednesday 05 August 2009 03:05:34 Pavel Machek wrote:
>
> >> Just to make sure you haven't missed this - it is not that they have to
> >> complete the whole operation before the timeout period (since you mention
> >> realtime/mlock I suspect this is what you think?), but _during_ the operation
> >> they have to show that they are active by sending something like keep alive
> >> messages.
> >>
> >> Or you are worried about failing to meet even that on a loaded system? There
> >> has to be something like this otherwise hung userspace client would kill the
> >> whole system.
> >
> > Of course, I'm worried about failing to meet this on loaded
> > system. And the fact that I _have_ to worry about that means that
> > interface is ugly/broken.
>
> You mean that in 5 seconds, you won't have any point when you can tell
> the kernel, "I'm still working"?
Yes. Try make -j on your machine one day.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-08-06 10:20 ` Tvrtko Ursulin
@ 2009-08-06 10:24 ` Pavel Machek
0 siblings, 0 replies; 63+ messages in thread
From: Pavel Machek @ 2009-08-06 10:24 UTC (permalink / raw)
To: Tvrtko Ursulin
Cc: Eric Paris, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, malware-list@dmesg.printk.net,
Valdis.Kletnieks@vt.edu, greg@kroah.com, jcm@redhat.com,
Douglas Leeder, tytso@mit.edu, arjan@infradead.org, david@lang.hm,
jengelh@medozas.de, aviro@redhat.com, mrkafk@gmail.com,
alexl@redhat.com, jack@suse.cz, a.p.zijlstra@chello.nl,
hch@infradead.org, alan@lxorguk.ukuu.org.uk, mmorley@hcl.in
On Thu 2009-08-06 11:20:25, Tvrtko Ursulin wrote:
> On Thursday 06 August 2009 11:10:59 Pavel Machek wrote:
> > On Wed 2009-08-05 17:46:16, Tvrtko Ursulin wrote:
> > > Just to make sure you haven't missed this - it is not that they have to
> > > complete the whole operation before the timeout period (since you mention
> > > realtime/mlock I suspect this is what you think?), but _during_ the
> > > operation they have to show that they are active by sending something
> > > like keep alive messages.
> > >
> > > Or you are worried about failing to meet even that on a loaded system?
> > > There has to be something like this otherwise hung userspace client would
> > > kill the whole system.
> >
> > Of course, I'm worried about failing to meet this on loaded
> > system. And the fact that I _have_ to worry about that means that
> > interface is ugly/broken.
>
> Would you prefer an infinite timeout instead? Maybe Eric could make it
> configurable. Or you have some other alternative ideas?
Infinite timeout would be less ugly, yes.
Having it configurable would be still ugly, but at least it would be
"administrator's fault" at that point.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-08-06 10:20 ` Douglas Leeder
2009-08-06 10:22 ` Pavel Machek
@ 2009-08-06 10:29 ` Peter Zijlstra
2009-08-06 10:59 ` Tvrtko Ursulin
1 sibling, 1 reply; 63+ messages in thread
From: Peter Zijlstra @ 2009-08-06 10:29 UTC (permalink / raw)
To: Douglas Leeder
Cc: Pavel Machek, Tvrtko Ursulin, Eric Paris,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
malware-list@dmesg.printk.net, Valdis.Kletnieks@vt.edu,
greg@kroah.com, jcm@redhat.com, tytso@mit.edu,
arjan@infradead.org, david@lang.hm, jengelh@medozas.de,
aviro@redhat.com, mrkafk@gmail.com, alexl@redhat.com,
hch@infradead.org, alan@lxorguk.ukuu.org.uk, mmorley@hcl.in
On Thu, 2009-08-06 at 11:20 +0100, Douglas Leeder wrote:
> Pavel Machek wrote:
> > On Wed 2009-08-05 17:46:16, Tvrtko Ursulin wrote:
> >> On Wednesday 05 August 2009 03:05:34 Pavel Machek wrote:
>
> >> Just to make sure you haven't missed this - it is not that they have to
> >> complete the whole operation before the timeout period (since you mention
> >> realtime/mlock I suspect this is what you think?), but _during_ the operation
> >> they have to show that they are active by sending something like keep alive
> >> messages.
> >>
> >> Or you are worried about failing to meet even that on a loaded system? There
> >> has to be something like this otherwise hung userspace client would kill the
> >> whole system.
> >
> > Of course, I'm worried about failing to meet this on loaded
> > system. And the fact that I _have_ to worry about that means that
> > interface is ugly/broken.
>
> You mean that in 5 seconds, you won't have any point when you can tell
> the kernel, "I'm still working"?
I have to agree with Pavel here, either you demand the monitor process
is RT/mlock and can respond in time, in which case the interface doesn't
need a 5 second timeout, or you cannot and you have a hole somewhere.
Now having the kernel depend on any user task to guarantee process is of
course utterly insane too.
Sounds like a bad place to be, and I'd rather not have it.
If you really need the intermediate you might as well use a FUSE
filesystem, but I suspect there's plenty of problems there as well.
It all reeks of ugly though..
/me craws back from whence he came.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-08-06 10:29 ` Peter Zijlstra
@ 2009-08-06 10:59 ` Tvrtko Ursulin
2009-08-06 11:23 ` Peter Zijlstra
[not found] ` <200908061159.45550.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
0 siblings, 2 replies; 63+ messages in thread
From: Tvrtko Ursulin @ 2009-08-06 10:59 UTC (permalink / raw)
To: Peter Zijlstra
Cc: david-gFPdbfVZQbY@public.gmane.org,
Valdis.Kletnieks-PjAqaU27lzQ@public.gmane.org, Douglas Leeder,
malware-list-h+Im9A44IAFcMpApZELgcQ@public.gmane.org,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
aviro-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jengelh-nopoi9nDyk+ELgA04lAiVw@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
alexl-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Pavel Machek,
jcm-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org,
arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
On Thursday 06 August 2009 11:29:08 Peter Zijlstra wrote:
> On Thu, 2009-08-06 at 11:20 +0100, Douglas Leeder wrote:
> > Pavel Machek wrote:
> > > On Wed 2009-08-05 17:46:16, Tvrtko Ursulin wrote:
> > >> On Wednesday 05 August 2009 03:05:34 Pavel Machek wrote:
> > >>
> > >> Just to make sure you haven't missed this - it is not that they have
> > >> to complete the whole operation before the timeout period (since you
> > >> mention realtime/mlock I suspect this is what you think?), but
> > >> _during_ the operation they have to show that they are active by
> > >> sending something like keep alive messages.
> > >>
> > >> Or you are worried about failing to meet even that on a loaded system?
> > >> There has to be something like this otherwise hung userspace client
> > >> would kill the whole system.
> > >
> > > Of course, I'm worried about failing to meet this on loaded
> > > system. And the fact that I _have_ to worry about that means that
> > > interface is ugly/broken.
> >
> > You mean that in 5 seconds, you won't have any point when you can tell
> > the kernel, "I'm still working"?
>
> I have to agree with Pavel here, either you demand the monitor process
> is RT/mlock and can respond in time, in which case the interface doesn't
> need a 5 second timeout, or you cannot and you have a hole somewhere.
>
> Now having the kernel depend on any user task to guarantee process is of
> course utterly insane too.
>
> Sounds like a bad place to be, and I'd rather not have it.
>
> If you really need the intermediate you might as well use a FUSE
> filesystem, but I suspect there's plenty of problems there as well.
So you mount FUSE on top of everything if you want to have systemwide
monitoring and then you _again_ depend on _userspace_, no? By this logic
everything has to be in kernel. But even if it was, and the CPUs are so
overloaded that an userspace thread does not get to run at all for X seconds,
are kernel threads scheduled differently eg. with priority other than nice
levels?
Also, it is not like that when the timeout expires the kernel will hang.
Rather, some application would get an error from open(2). Note how that is by
system configuration where the admin has made a _deliberate_ decision to
install such software which can cause this behaviour.
You can have a RT/mlocked client but what if it crashes (lets say busy loops)?
Which is also something timeout mechanism is guarding against.
I really think if we want to have this functionality there is no way around
the fact that any userspace can fail. Kernel should handle it of course, and
Eric's design does it by kicking repeatedly misbehaving clients out.
If the timeout is made configurable I think this is the best that can be done
here. I don't think the problem is so huge as you are presenting it.
Tvrtko
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-08-06 10:59 ` Tvrtko Ursulin
@ 2009-08-06 11:23 ` Peter Zijlstra
2009-08-06 12:48 ` Tvrtko Ursulin
` (3 more replies)
[not found] ` <200908061159.45550.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
1 sibling, 4 replies; 63+ messages in thread
From: Peter Zijlstra @ 2009-08-06 11:23 UTC (permalink / raw)
To: Tvrtko Ursulin
Cc: Douglas Leeder, Pavel Machek, Eric Paris,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
malware-list@dmesg.printk.net, Valdis.Kletnieks@vt.edu,
greg@kroah.com, jcm@redhat.com, tytso@mit.edu,
arjan@infradead.org, david@lang.hm, jengelh@medozas.de,
aviro@redhat.com, mrkafk@gmail.com, alexl@redhat.com,
hch@infradead.org, alan@lxorguk.ukuu.org.uk, mmorley@hcl.in
On Thu, 2009-08-06 at 11:59 +0100, Tvrtko Ursulin wrote:
> > I have to agree with Pavel here, either you demand the monitor process
> > is RT/mlock and can respond in time, in which case the interface doesn't
> > need a 5 second timeout, or you cannot and you have a hole somewhere.
> >
> > Now having the kernel depend on any user task to guarantee process is of
> > course utterly insane too.
> >
> > Sounds like a bad place to be, and I'd rather not have it.
> >
> > If you really need the intermediate you might as well use a FUSE
> > filesystem, but I suspect there's plenty of problems there as well.
>
> So you mount FUSE on top of everything if you want to have systemwide
> monitoring and then you _again_ depend on _userspace_, no? By this logic
> everything has to be in kernel.
I was assuming there was an unprotected region on the system, otherwise
you cannot bootstrap this, nor maintain it -- see the daemon dies can't
start a new one problem.
But yes, if its so invasive to the filesystem as to make it unusable I'd
argue it to be part of the filesystem, we do filesystem encryption in
the filesystem, so why should we do such invasive scanning outside of
it?
We are taking about the kind of fanotify client that says: No you cannot
open/read/write/mmap/etc.. this file until I say you can, right?
I'd call that deeply invasive.
> But even if it was, and the CPUs are so
> overloaded that an userspace thread does not get to run at all for X seconds,
> are kernel threads scheduled differently eg. with priority other than nice
> levels?
No, except that some are run as RT processes, but other than that
they're simply yet another task.
Thing is, they don't do random things after a timeout. Its not like we
simply give up a BIO if its been in the queue for a second. No we see it
through.
> Also, it is not like that when the timeout expires the kernel will hang.
> Rather, some application would get an error from open(2). Note how that is by
> system configuration where the admin has made a _deliberate_ decision to
> install such software which can cause this behaviour.
>
> You can have a RT/mlocked client but what if it crashes (lets say busy loops)?
> Which is also something timeout mechanism is guarding against.
By the above you're hosed anyway since starting a new one will fail due
to there being no daemon, right? Might as well forfeit all security
measures once the daemon dies. That is let security depend on there
being a daemon connected.
And once you do that, mandating the daemon to be a Real-Time process and
have everything mlocked to avoid it being DoS'd seems like a minimum
requirement.
> I really think if we want to have this functionality there is no way around
> the fact that any userspace can fail. Kernel should handle it of course, and
> Eric's design does it by kicking repeatedly misbehaving clients out.
Seems like a weird thing to me, suppose you DoS the system on purpose
and all clients start getting wonky, you kill them all, and are left
with non, then you cannot access any of your files anymore and
everything grinds to a halt?
> If the timeout is made configurable I think this is the best that can be done
> here. I don't think the problem is so huge as you are presenting it.
I think having a timeout is simply asking for trouble - either you do or
you don't having a timeout is like having a random number generator for
a security policy.
Like said, having the filesystem block actions based on external
processes seems just asking for trouble.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <200908061159.45550.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
@ 2009-08-06 11:24 ` Pavel Machek
0 siblings, 0 replies; 63+ messages in thread
From: Pavel Machek @ 2009-08-06 11:24 UTC (permalink / raw)
To: Tvrtko Ursulin
Cc: david-gFPdbfVZQbY@public.gmane.org,
Valdis.Kletnieks-PjAqaU27lzQ@public.gmane.org, Peter Zijlstra,
malware-list-h+Im9A44IAFcMpApZELgcQ@public.gmane.org,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
aviro-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jengelh-nopoi9nDyk+ELgA04lAiVw@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
alexl-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
jcm-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Douglas Leeder,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org,
arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
> > > > Of course, I'm worried about failing to meet this on loaded
> > > > system. And the fact that I _have_ to worry about that means that
> > > > interface is ugly/broken.
> > >
> > > You mean that in 5 seconds, you won't have any point when you can tell
> > > the kernel, "I'm still working"?
> >
> > I have to agree with Pavel here, either you demand the monitor process
> > is RT/mlock and can respond in time, in which case the interface doesn't
> > need a 5 second timeout, or you cannot and you have a hole somewhere.
> >
> > Now having the kernel depend on any user task to guarantee process is of
> > course utterly insane too.
> >
> > Sounds like a bad place to be, and I'd rather not have it.
> >
> > If you really need the intermediate you might as well use a FUSE
> > filesystem, but I suspect there's plenty of problems there as well.
>
> So you mount FUSE on top of everything if you want to have systemwide
> monitoring and then you _again_ depend on _userspace_, no? By this logic
> everything has to be in kernel. But even if it was, and the CPUs are so
> overloaded that an userspace thread does not get to run at all for X seconds,
> are kernel threads scheduled differently eg. with priority other than nice
> levels?
Userspace app not running for 5 seconds != CPU not being available for
5 seconds. I'd worry about swap behaviour here.
> If the timeout is made configurable I think this is the best that can be done
> here. I don't think the problem is so huge as you are presenting it.
It is ugly. FUSE is more elegant, being already in kernel. Of course,
if your userspace daemon fails, you are doomed, but...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-08-06 11:23 ` Peter Zijlstra
@ 2009-08-06 12:48 ` Tvrtko Ursulin
[not found] ` <200908061348.43625.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
2009-08-06 13:50 ` Kernel Event Notification Subsystem (was: fanotify - overall design before I start sending patches) Al Boldi
` (2 subsequent siblings)
3 siblings, 1 reply; 63+ messages in thread
From: Tvrtko Ursulin @ 2009-08-06 12:48 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Douglas Leeder, Pavel Machek, Eric Paris,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
malware-list@dmesg.printk.net, Valdis.Kletnieks@vt.edu,
greg@kroah.com, jcm@redhat.com, tytso@mit.edu,
arjan@infradead.org, david@lang.hm, jengelh@medozas.de,
aviro@redhat.com, mrkafk@gmail.com, alexl@redhat.com,
hch@infradead.org, alan@lxorguk.ukuu.org.uk, mmorley@hcl.in
On Thursday 06 August 2009 12:23:51 Peter Zijlstra wrote:
> On Thu, 2009-08-06 at 11:59 +0100, Tvrtko Ursulin wrote:
> > > I have to agree with Pavel here, either you demand the monitor process
> > > is RT/mlock and can respond in time, in which case the interface
> > > doesn't need a 5 second timeout, or you cannot and you have a hole
> > > somewhere.
> > >
> > > Now having the kernel depend on any user task to guarantee process is
> > > of course utterly insane too.
> > >
> > > Sounds like a bad place to be, and I'd rather not have it.
> > >
> > > If you really need the intermediate you might as well use a FUSE
> > > filesystem, but I suspect there's plenty of problems there as well.
> >
> > So you mount FUSE on top of everything if you want to have systemwide
> > monitoring and then you _again_ depend on _userspace_, no? By this logic
> > everything has to be in kernel.
>
> I was assuming there was an unprotected region on the system, otherwise
> you cannot bootstrap this, nor maintain it -- see the daemon dies can't
> start a new one problem.
There should be no unprotected areas unless configured so. When there are no
daemons connected operations are not blocked.
> But yes, if its so invasive to the filesystem as to make it unusable I'd
> argue it to be part of the filesystem, we do filesystem encryption in
> the filesystem, so why should we do such invasive scanning outside of
> it?
:) It is hard to satisfy everyone, when I posted a proposed patch initially
(not any more related to Eric's work) it had more in the kernel space which
made people unhappy. Now you are suggesting even more. I don't think it is
realistic to put all the code for different fanotify use cases in the kernel.
Certainly on the malware scanning side we are talking about hourly updates so
something at least has to be in userspace. For HSMs I guess it is similarly
complex with triggering and waiting for media changes where things can also
fail in huge amount of ways.
> We are taking about the kind of fanotify client that says: No you cannot
> open/read/write/mmap/etc.. this file until I say you can, right?
Yes and no, it would be more accurate to say "this open takes long while we do
something else in the background".
> > But even if it was, and the CPUs are so
> > overloaded that an userspace thread does not get to run at all for X
> > seconds, are kernel threads scheduled differently eg. with priority other
> > than nice levels?
>
> No, except that some are run as RT processes, but other than that
> they're simply yet another task.
>
> Thing is, they don't do random things after a timeout. Its not like we
> simply give up a BIO if its been in the queue for a second. No we see it
> through.
I don't think this analogy is correct. IO can also timeout when a controller
or disk is not responding, in which you can't see it through but you need to
fail and propagate, you don't wait indefinitely.
> > Also, it is not like that when the timeout expires the kernel will hang.
> > Rather, some application would get an error from open(2). Note how that
> > is by system configuration where the admin has made a _deliberate_
> > decision to install such software which can cause this behaviour.
> >
> > You can have a RT/mlocked client but what if it crashes (lets say busy
> > loops)? Which is also something timeout mechanism is guarding against.
>
> By the above you're hosed anyway since starting a new one will fail due
> to there being no daemon, right? Might as well forfeit all security
> measures once the daemon dies. That is let security depend on there
> being a daemon connected.
No to the first part (explained it earlier), yes to the second.
> And once you do that, mandating the daemon to be a Real-Time process and
> have everything mlocked to avoid it being DoS'd seems like a minimum
> requirement.
And what if there is a bug in the daemon and it enters into a busy loop? What
happens if the client needs to do some IO in order to make the decision, RT
and mlock are not enough to guarantee anything then right?
> > I really think if we want to have this functionality there is no way
> > around the fact that any userspace can fail. Kernel should handle it of
> > course, and Eric's design does it by kicking repeatedly misbehaving
> > clients out.
>
> Seems like a weird thing to me, suppose you DoS the system on purpose
> and all clients start getting wonky, you kill them all, and are left
> with non, then you cannot access any of your files anymore and
> everything grinds to a halt?
Again, when there are no clients accesses are not blocked. And you can DoS a
box today in different ways even without fanotify. I don't think fanotify
makes it any worse, especially since fanotify is generic ie. doesn't address
a particular threat model on it's own.
> > If the timeout is made configurable I think this is the best that can be
> > done here. I don't think the problem is so huge as you are presenting it.
>
> I think having a timeout is simply asking for trouble - either you do or
> you don't having a timeout is like having a random number generator for
> a security policy.
I don't think that the timeout duration defines the security policy to the
extent you are suggesting. Please keep in mind that fanotify is just an
interface which can be used in many ways, only one of them is blocking, and
that is used only if you knowingly configure your system with a software
which uses it in such way.
Tvrtko
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <200908061348.43625.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
@ 2009-08-06 12:58 ` Alan Cox
[not found] ` <20090806135800.7ccb7787-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>
0 siblings, 1 reply; 63+ messages in thread
From: Alan Cox @ 2009-08-06 12:58 UTC (permalink / raw)
To: Tvrtko Ursulin
Cc: david-gFPdbfVZQbY@public.gmane.org,
Valdis.Kletnieks-PjAqaU27lzQ@public.gmane.org, Peter Zijlstra,
Douglas Leeder, mrkafk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
aviro-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
malware-list-h+Im9A44IAFcMpApZELgcQ@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
alexl-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Pavel Machek,
jcm-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jengelh-nopoi9nDyk+ELgA04lAiVw@public.gmane.org,
arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
> > We are taking about the kind of fanotify client that says: No you cannot
> > open/read/write/mmap/etc.. this file until I say you can, right?
>
> Yes and no, it would be more accurate to say "this open takes long while we do
> something else in the background".
There are two or three ways to handle this
1. Block the open until the daemon dies or responds
2. Have a timeout (which would need to be connection configurable)
3. Require the daemon responds with "in progress" now and then.
For a superuser managed service its no different to an NFS mount which
can go wonky so the only real question is what should fanotify allow non
privileged users to do. The answer would appear anyway to be: not use
this aspect of such a facility.
For the superuser case the fact the daemon can be killed thus releasing
anything stuff is analogous to umount -f of a stuck NFS mount which seems
perfectly good for NFS.
Alan
^ permalink raw reply [flat|nested] 63+ messages in thread
* Kernel Event Notification Subsystem (was: fanotify - overall design before I start sending patches)
2009-08-06 11:23 ` Peter Zijlstra
2009-08-06 12:48 ` Tvrtko Ursulin
2009-08-06 13:50 ` Kernel Event Notification Subsystem (was: fanotify - overall design before I start sending patches) Al Boldi
@ 2009-08-06 13:50 ` Al Boldi
2009-08-06 18:18 ` fanotify - overall design before I start sending patches Eric Paris
3 siblings, 0 replies; 63+ messages in thread
From: Al Boldi @ 2009-08-06 13:50 UTC (permalink / raw)
To: Peter Zijlstra, Tvrtko Ursulin
Cc: Douglas Leeder, Pavel Machek, Eric Paris,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
malware-list@dmesg.printk.net, Valdis.Kletnieks@vt.edu,
greg@kroah.com, jcm@redhat.com, tytso@mit.edu,
arjan@infradead.org, david@lang.hm, jengelh@medozas.de,
aviro@redhat.com, mrkafk@gmail.com, alexl@redhat.com,
hch@infradead.org, alan@lxorguk.ukuu.org.uk, mmorley@hcl.in
Peter Zijlstra wrote:
> Like said, having the filesystem block actions based on external
> processes seems just asking for trouble.
I can't see anything wrong with that. In fact, moving policy into userland
seems like the correct approach anyway. IOW, the fact that the kernel is
dealing with policy all over the place is completely flawed.
What we really need, is to take the hard-coded policy burden out of the
kernel, and pipe all requests into dynamically configurable userland.
Which implies a "Kernel Event Notification Subsystem" that would by default
allow requests for root, and deny them for others, unless there is a userland
daemon that would respond differently within a timeout period.
Thanks!
--
Al
^ permalink raw reply [flat|nested] 63+ messages in thread
* Kernel Event Notification Subsystem (was: fanotify - overall design before I start sending patches)
2009-08-06 11:23 ` Peter Zijlstra
2009-08-06 12:48 ` Tvrtko Ursulin
@ 2009-08-06 13:50 ` Al Boldi
2009-08-06 13:50 ` Al Boldi
2009-08-06 18:18 ` fanotify - overall design before I start sending patches Eric Paris
3 siblings, 0 replies; 63+ messages in thread
From: Al Boldi @ 2009-08-06 13:50 UTC (permalink / raw)
To: Peter Zijlstra, Tvrtko Ursulin
Cc: Douglas Leeder, Pavel Machek, Eric Paris,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
malware-list@dmesg.printk.net, Valdis.Kletnieks@vt.edu,
greg@kroah.com, jcm@redhat.com, tytso@mit.edu,
arjan@infradead.org, david@lang.hm, jengelh@medozas.de,
aviro@redhat.com, mrkafk@gmail.com, alexl@redhat.com,
hch@infradead.org, alan@lxorguk.ukuu.org.uk, mmorley@hcl.in
Peter Zijlstra wrote:
> Like said, having the filesystem block actions based on external
> processes seems just asking for trouble.
I can't see anything wrong with that. In fact, moving policy into userland
seems like the correct approach anyway. IOW, the fact that the kernel is
dealing with policy all over the place is completely flawed.
What we really need, is to take the hard-coded policy burden out of the
kernel, and pipe all requests into dynamically configurable userland.
Which implies a "Kernel Event Notification Subsystem" that would by default
allow requests for root, and deny them for others, unless there is a userland
daemon that would respond differently within a timeout period.
Thanks!
--
Al
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <20090806135800.7ccb7787-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>
@ 2009-08-06 18:18 ` Eric Paris
0 siblings, 0 replies; 63+ messages in thread
From: Eric Paris @ 2009-08-06 18:18 UTC (permalink / raw)
To: Alan Cox
Cc: david-gFPdbfVZQbY@public.gmane.org,
Valdis.Kletnieks-PjAqaU27lzQ@public.gmane.org, Peter Zijlstra,
Douglas Leeder, mrkafk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
aviro-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
malware-list-h+Im9A44IAFcMpApZELgcQ@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
alexl-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Pavel Machek,
jcm-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jengelh-nopoi9nDyk+ELgA04lAiVw@public.gmane.org,
arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
On Thu, 2009-08-06 at 13:58 +0100, Alan Cox wrote:
> > > We are taking about the kind of fanotify client that says: No you cannot
> > > open/read/write/mmap/etc.. this file until I say you can, right?
> >
> > Yes and no, it would be more accurate to say "this open takes long while we do
> > something else in the background".
>
> There are two or three ways to handle this
>
> 1. Block the open until the daemon dies or responds
> 2. Have a timeout (which would need to be connection configurable)
> 3. Require the daemon responds with "in progress" now and then.
I've taken option #3. I don't see options #2 as viable, although off
list discussion from clamav people has said they believe they are
interested in #2 rather than #3.
> For a superuser managed service its no different to an NFS mount which
> can go wonky so the only real question is what should fanotify allow non
> privileged users to do. The answer would appear anyway to be: not use
> this aspect of such a facility.
That's the approach taken thus far. Although non-blocking/access
notification will be opened up to normal users (currently even
notification is root only)
> For the superuser case the fact the daemon can be killed thus releasing
> anything stuff is analogous to umount -f of a stuck NFS mount which seems
> perfectly good for NFS.
It does work for NFS (which I would call case #1.) I claim that it
doesn't work for this case since a global listener stuck would stop you
from running kill() since it owuldn't be able to get permission to open
it....
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-08-06 11:23 ` Peter Zijlstra
` (2 preceding siblings ...)
2009-08-06 13:50 ` Al Boldi
@ 2009-08-06 18:18 ` Eric Paris
2009-08-07 16:36 ` Miklos Szeredi
[not found] ` <1249582695.20644.35.camel-8EcGF3LoIElviLIMxPk1+R/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
3 siblings, 2 replies; 63+ messages in thread
From: Eric Paris @ 2009-08-06 18:18 UTC (permalink / raw)
To: Peter Zijlstra
Cc: david-gFPdbfVZQbY@public.gmane.org,
Valdis.Kletnieks-PjAqaU27lzQ@public.gmane.org, Douglas Leeder,
malware-list-h+Im9A44IAFcMpApZELgcQ@public.gmane.org,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
aviro-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jengelh-nopoi9nDyk+ELgA04lAiVw@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
alexl-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Pavel Machek,
jcm-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org,
arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
On Thu, 2009-08-06 at 13:23 +0200, Peter Zijlstra wrote:
> But yes, if its so invasive to the filesystem as to make it unusable I'd
> argue it to be part of the filesystem, we do filesystem encryption in
> the filesystem, so why should we do such invasive scanning outside of
> it?
In kernel? yes. In filesystem? No. This isn't fileystem specific,
it's system wide. Didn't we have the discussion about fuse not being as
all encompassing as a number of people wanted? Think nfsd. Dazuko is I
believe working on generic stackable filesystems which might eventually
be able to implement a better set of hooks, but that is a LONG way from
a solved problem last I heard.
>
> We are taking about the kind of fanotify client that says: No you cannot
> open/read/write/mmap/etc.. this file until I say you can, right?
Only open and read (or first mmap for read). Nothing available for
write. Noone purports this to be an LSM.
> By the above you're hosed anyway since starting a new one will fail due
> to there being no daemon, right? Might as well forfeit all security
> measures once the daemon dies. That is let security depend on there
> being a daemon connected.
Nope. You should stop calling and thinking of it as a security system.
As I've said multiple times it is at best an indexing and integrity
checking system. We fail open. We don't prevent or care about
malicious local attacks. When a group is evicted everything is going to
just work. The whole reason for the timeout is because I don't trust
userspace not to get it wrong and I'd rather not lose my box because of
it. Yes, the reset I'm proposing allows userspace to screw the system
forever, but at least that is an active operation, not an accidental
segfault bringing down the whole system.
> Seems like a weird thing to me, suppose you DoS the system on purpose
> and all clients start getting wonky, you kill them all, and are left
> with non, then you cannot access any of your files anymore and
> everything grinds to a halt?
Nope, you DoS the system on purpose all the listeners get evicted, now
everything else will be able to open/read data without those listeners
paying attention. When a group is evicted it's evicted. It no longer
needs to say yes or no.
> Like said, having the filesystem block actions based on external
> processes seems just asking for trouble.
Or it seems like exactly what hierarchical storage management systems
want....
-Eric
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-08-06 10:22 ` Pavel Machek
@ 2009-08-07 8:59 ` Jamie Lokier
0 siblings, 0 replies; 63+ messages in thread
From: Jamie Lokier @ 2009-08-07 8:59 UTC (permalink / raw)
To: Pavel Machek
Cc: Douglas Leeder, Tvrtko Ursulin, Eric Paris,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
malware-list@dmesg.printk.net, Valdis.Kletnieks@vt.edu,
greg@kroah.com, jcm@redhat.com, tytso@mit.edu,
arjan@infradead.org, david@lang.hm, jengelh@medozas.de,
aviro@redhat.com, mrkafk@gmail.com, alexl@redhat.com,
a.p.zijlstra@chello.nl, hch@infradead.org,
alan@lxorguk.ukuu.org.uk, "mmorley@hcl.in" <mmorle
Pavel Machek wrote:
> On Thu 2009-08-06 11:20:57, Douglas Leeder wrote:
> > You mean that in 5 seconds, you won't have any point when you can tell
> > the kernel, "I'm still working"?
>
> Yes. Try make -j on your machine one day.
Ironically, some users report that Tracker (which uses inotify, hence
the irony) can make their system unresponsive for more than 5 seconds. :-)
-- Jamie
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-08-06 18:18 ` fanotify - overall design before I start sending patches Eric Paris
@ 2009-08-07 16:36 ` Miklos Szeredi
[not found] ` <E1MZSQZ-0002as-FA-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org>
[not found] ` <1249582695.20644.35.camel-8EcGF3LoIElviLIMxPk1+R/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
1 sibling, 1 reply; 63+ messages in thread
From: Miklos Szeredi @ 2009-08-07 16:36 UTC (permalink / raw)
To: eparis
Cc: a.p.zijlstra, tvrtko.ursulin, douglas.leeder, pavel, linux-kernel,
linux-fsdevel, malware-list, Valdis.Kletnieks, greg, jcm, tytso,
arjan, david, jengelh, aviro, mrkafk, alexl, hch, alan, mmorley
On Thu, 06 Aug 2009, Eric Paris wrote:
> just work. The whole reason for the timeout is because I don't trust
> userspace not to get it wrong and I'd rather not lose my box because of
> it.
IMO this has nothing to do with userspace(*) and everything to do with
complexity. Virus scanning is complex and any such code, whether
runing in userspace or not, can easily screw up and freeze the system.
The way to solve that is not to implement hacks on the kernel
interface, but rather by separating the complex parts and implementing
a simple watchdog layer on top of that, that makes sure things don't
go wrong.
(*) We _must_ trust privileged userspace not to screw up. System
scripts can easily do much worse damage than freezing filesystem
operations. Just think "rm -rf /".
Thanks,
Miklos
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <E1MZSQZ-0002as-FA-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org>
@ 2009-08-07 17:43 ` Eric Paris
2009-08-08 10:36 ` Pavel Machek
2009-08-10 10:03 ` Miklos Szeredi
0 siblings, 2 replies; 63+ messages in thread
From: Eric Paris @ 2009-08-07 17:43 UTC (permalink / raw)
To: Miklos Szeredi
Cc: david-gFPdbfVZQbY, Valdis.Kletnieks-PjAqaU27lzQ,
a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
douglas.leeder-j34lQMj1tz/QT0dZR+AlfA,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w, aviro-H+wXaHxf7aLQT0dZR+AlfA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
malware-list-h+Im9A44IAFcMpApZELgcQ, hch-wEGCiKHe2LqWVfeAwA7xHQ,
alexl-H+wXaHxf7aLQT0dZR+AlfA, pavel-+ZI9xUNit7I,
jcm-H+wXaHxf7aLQT0dZR+AlfA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
jengelh-nopoi9nDyk+ELgA04lAiVw,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io,
arjan-wEGCiKHe2LqWVfeAwA7xHQ
On Fri, 2009-08-07 at 18:36 +0200, Miklos Szeredi wrote:
> On Thu, 06 Aug 2009, Eric Paris wrote:
> > just work. The whole reason for the timeout is because I don't trust
> > userspace not to get it wrong and I'd rather not lose my box because of
> > it.
>
> IMO this has nothing to do with userspace(*) and everything to do with
> complexity. Virus scanning is complex and any such code, whether
> runing in userspace or not, can easily screw up and freeze the system.
I agree, 'userspace' was not the best term. Let me rephrase:
"The whole reason for the timeout is because I don't trust anything not
to get it wrong and I'd rather not lose my box because of it."
> The way to solve that is not to implement hacks on the kernel
> interface, but rather by separating the complex parts and implementing
> a simple watchdog layer on top of that, that makes sure things don't
> go wrong.
So you would argue that every fanotify listener implement their own
watchdog layer that may or may not be correct rather than do a single
watchdog layer for everyone? And that's better?
-Eric
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
[not found] ` <1249582695.20644.35.camel-8EcGF3LoIElviLIMxPk1+R/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
@ 2009-08-08 10:34 ` Pavel Machek
0 siblings, 0 replies; 63+ messages in thread
From: Pavel Machek @ 2009-08-08 10:34 UTC (permalink / raw)
To: Eric Paris
Cc: david-gFPdbfVZQbY@public.gmane.org,
Valdis.Kletnieks-PjAqaU27lzQ@public.gmane.org, Peter Zijlstra,
malware-list-h+Im9A44IAFcMpApZELgcQ@public.gmane.org,
mrkafk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
aviro-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jengelh-nopoi9nDyk+ELgA04lAiVw@public.gmane.org,
hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
alexl-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
jcm-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Douglas Leeder,
alan-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org,
arjan-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
Hi!
> > By the above you're hosed anyway since starting a new one will fail due
> > to there being no daemon, right? Might as well forfeit all security
> > measures once the daemon dies. That is let security depend on there
> > being a daemon connected.
>
> Nope. You should stop calling and thinking of it as a security system.
> As I've said multiple times it is at best an indexing and integrity
> checking system. We fail open. We don't prevent or care about
> malicious local attacks. When a group is evicted everything is going to
> just work. The whole reason for the timeout is because I don't trust
> userspace not to get it wrong and I'd rather not lose my box because of
> it. Yes, the reset I'm proposing allows userspace to screw the
Well, if you are using this for hierarchical storage, then this daemon
will bring the system down.
Face it, you _are_ developing a security system; otherwise features of
fanotify do not make sense. (And you are developing _bad_ security
system).
So... what about just scrapping the open vetoing -- at least from
initial version?
> > Seems like a weird thing to me, suppose you DoS the system on purpose
> > and all clients start getting wonky, you kill them all, and are left
> > with non, then you cannot access any of your files anymore and
> > everything grinds to a halt?
>
> Nope, you DoS the system on purpose all the listeners get evicted, now
> everything else will be able to open/read data without those listeners
> paying attention. When a group is evicted it's evicted. It no longer
> needs to say yes or no.
Yes so I hammer your web server and you loose your antivirus
protection.
> > Like said, having the filesystem block actions based on external
> > processes seems just asking for trouble.
>
> Or it seems like exactly what hierarchical storage management systems
> want....
Timeout does not make sense for hierarchical storage.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-08-07 17:43 ` Eric Paris
@ 2009-08-08 10:36 ` Pavel Machek
2009-08-10 10:03 ` Miklos Szeredi
1 sibling, 0 replies; 63+ messages in thread
From: Pavel Machek @ 2009-08-08 10:36 UTC (permalink / raw)
To: Eric Paris
Cc: Miklos Szeredi, a.p.zijlstra, tvrtko.ursulin, douglas.leeder,
linux-kernel, linux-fsdevel, malware-list, Valdis.Kletnieks, greg,
jcm, tytso, arjan, david, jengelh, aviro, mrkafk, alexl, hch,
alan, mmorley
On Fri 2009-08-07 13:43:10, Eric Paris wrote:
> On Fri, 2009-08-07 at 18:36 +0200, Miklos Szeredi wrote:
> > On Thu, 06 Aug 2009, Eric Paris wrote:
> > > just work. The whole reason for the timeout is because I don't trust
> > > userspace not to get it wrong and I'd rather not lose my box because of
> > > it.
> >
> > IMO this has nothing to do with userspace(*) and everything to do with
> > complexity. Virus scanning is complex and any such code, whether
> > runing in userspace or not, can easily screw up and freeze the system.
>
> I agree, 'userspace' was not the best term. Let me rephrase:
>
> "The whole reason for the timeout is because I don't trust anything not
> to get it wrong and I'd rather not lose my box because of it."
>
> > The way to solve that is not to implement hacks on the kernel
> > interface, but rather by separating the complex parts and implementing
> > a simple watchdog layer on top of that, that makes sure things don't
> > go wrong.
>
> So you would argue that every fanotify listener implement their own
> watchdog layer that may or may not be correct rather than do a single
> watchdog layer for everyone? And that's better?
Yes.
(You can do library, and maybe you can just make fanotify listener
simple enough. Or you can just scrap the open vetoing [mis]feature).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: fanotify - overall design before I start sending patches
2009-08-07 17:43 ` Eric Paris
2009-08-08 10:36 ` Pavel Machek
@ 2009-08-10 10:03 ` Miklos Szeredi
1 sibling, 0 replies; 63+ messages in thread
From: Miklos Szeredi @ 2009-08-10 10:03 UTC (permalink / raw)
To: eparis
Cc: miklos, a.p.zijlstra, tvrtko.ursulin, douglas.leeder, pavel,
linux-kernel, linux-fsdevel, malware-list, Valdis.Kletnieks, greg,
jcm, tytso, arjan, david, jengelh, aviro, mrkafk, alexl, hch,
alan, mmorley
On Fri, 07 Aug 2009, Eric Paris wrote:
> On Fri, 2009-08-07 at 18:36 +0200, Miklos Szeredi wrote:
> > On Thu, 06 Aug 2009, Eric Paris wrote:
> > > just work. The whole reason for the timeout is because I don't trust
> > > userspace not to get it wrong and I'd rather not lose my box because of
> > > it.
> >
> > IMO this has nothing to do with userspace(*) and everything to do with
> > complexity. Virus scanning is complex and any such code, whether
> > runing in userspace or not, can easily screw up and freeze the system.
>
> I agree, 'userspace' was not the best term. Let me rephrase:
>
> "The whole reason for the timeout is because I don't trust anything not
> to get it wrong and I'd rather not lose my box because of it."
That's clearly not true. We don't have timers watching filesystems or
security modules to make sure they complete an operation within a
given amount of time.
So there's something else why you think the fanotify interface is
special, and the only reason it's special is that it's a userspace
API.
> > The way to solve that is not to implement hacks on the kernel
> > interface, but rather by separating the complex parts and implementing
> > a simple watchdog layer on top of that, that makes sure things don't
> > go wrong.
>
> So you would argue that every fanotify listener implement their own
> watchdog layer that may or may not be correct rather than do a single
> watchdog layer for everyone? And that's better?
As Pavel said, hopefully most fanotify listeners will _not_ need a
watchdog layer. Maybe virus scanners will need one, but that will be
the least of their worries, probably.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 63+ messages in thread
end of thread, other threads:[~2009-08-10 10:03 UTC | newest]
Thread overview: 63+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-24 20:13 fanotify - overall design before I start sending patches Eric Paris
2009-07-24 20:48 ` david-gFPdbfVZQbY
[not found] ` <alpine.DEB.1.10.0907241340580.28013-Z4YwzcCRHZnr5h6Zg1Auow@public.gmane.org>
2009-07-24 21:01 ` Eric Paris
2009-07-24 21:44 ` Jamie Lokier
2009-07-27 17:52 ` Evgeniy Polyakov
2009-07-29 20:11 ` Eric Paris
2009-07-24 21:00 ` Andreas Dilger
2009-07-24 21:21 ` Eric Paris
2009-07-24 22:42 ` Andreas Dilger
2009-07-24 23:01 ` Jamie Lokier
2009-07-24 22:48 ` Jamie Lokier
[not found] ` <20090724224813.GK27755-yetKDKU6eevNLxjTenLetw@public.gmane.org>
2009-07-24 23:25 ` Eric Paris
2009-07-24 23:46 ` Jamie Lokier
2009-07-24 23:49 ` Eric Paris
2009-07-25 0:29 ` Jamie Lokier
2009-07-27 18:33 ` Andreas Dilger
2009-07-27 19:23 ` Jamie Lokier
2009-07-28 17:59 ` Andreas Dilger
[not found] ` <20090727192342.GA27895-yetKDKU6eevNLxjTenLetw@public.gmane.org>
2009-07-29 20:14 ` Eric Paris
[not found] ` <20090727183354.GM4231-RIaA196FMs1uuQVovAj/GogTZbYi8/ss@public.gmane.org>
2009-07-29 20:12 ` Eric Paris
2009-07-29 20:07 ` Eric Paris
2009-07-27 16:54 ` Jan Kara
2009-07-25 14:22 ` Niraj kumar
2009-07-29 20:08 ` Eric Paris
2009-07-28 11:48 ` Jon Masters
[not found] ` <1248781708.14145.21.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-07-29 20:20 ` Eric Paris
2009-08-03 16:23 ` Christoph Hellwig
[not found] ` <20090803162303.GA31058-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2009-08-03 16:55 ` Eric Paris
2009-08-03 18:04 ` Christoph Hellwig
[not found] ` <20090803180437.GA9798-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2009-08-03 18:13 ` Eric Paris
2009-08-04 16:09 ` Tvrtko Ursulin
[not found] ` <200908041709.51659.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
2009-08-04 16:27 ` Eric Paris
2009-08-04 16:39 ` Tvrtko Ursulin
[not found] ` <1249403268.2361.21.camel-8EcGF3LoIElviLIMxPk1+R/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
2009-08-04 17:22 ` Valdis.Kletnieks-PjAqaU27lzQ
[not found] ` <19585.1249406551-+bZmOdGhbsPr6rcHtW+onFJE71vCis6O@public.gmane.org>
2009-08-04 18:20 ` John Stoffel
[not found] ` <19064.31705.491774.122207-HgN6juyGXH5AfugRpC6u6w@public.gmane.org>
2009-08-04 18:50 ` Eric Paris
2009-08-05 9:32 ` Tvrtko Ursulin
2009-08-04 16:34 ` Tvrtko Ursulin
[not found] ` <200908041734.05762.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
2009-08-05 10:12 ` Douglas Leeder
2009-08-05 10:35 ` Douglas Leeder
2009-08-05 2:05 ` Pavel Machek
2009-08-05 16:46 ` Tvrtko Ursulin
[not found] ` <200908051746.17903.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
2009-08-06 10:10 ` Pavel Machek
2009-08-06 10:20 ` Tvrtko Ursulin
2009-08-06 10:24 ` Pavel Machek
[not found] ` <20090806101059.GD31370-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org>
2009-08-06 10:20 ` Douglas Leeder
2009-08-06 10:22 ` Pavel Machek
2009-08-07 8:59 ` Jamie Lokier
2009-08-06 10:29 ` Peter Zijlstra
2009-08-06 10:59 ` Tvrtko Ursulin
2009-08-06 11:23 ` Peter Zijlstra
2009-08-06 12:48 ` Tvrtko Ursulin
[not found] ` <200908061348.43625.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
2009-08-06 12:58 ` Alan Cox
[not found] ` <20090806135800.7ccb7787-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>
2009-08-06 18:18 ` Eric Paris
2009-08-06 13:50 ` Kernel Event Notification Subsystem (was: fanotify - overall design before I start sending patches) Al Boldi
2009-08-06 13:50 ` Al Boldi
2009-08-06 18:18 ` fanotify - overall design before I start sending patches Eric Paris
2009-08-07 16:36 ` Miklos Szeredi
[not found] ` <E1MZSQZ-0002as-FA-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org>
2009-08-07 17:43 ` Eric Paris
2009-08-08 10:36 ` Pavel Machek
2009-08-10 10:03 ` Miklos Szeredi
[not found] ` <1249582695.20644.35.camel-8EcGF3LoIElviLIMxPk1+R/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
2009-08-08 10:34 ` Pavel Machek
[not found] ` <200908061159.45550.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
2009-08-06 11:24 ` Pavel Machek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).