linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@sun.com>
To: Eric Paris <eparis@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	malware-list@dmesg.printk.net, Valdis.Kletnieks@vt.edu,
	greg@kroah.com, jcm@redhat.com, douglas.leeder@sophos.com,
	tytso@mit.edu, arjan@infradead.org, david@lang.hm,
	jengelh@medozas.de, aviro@redhat.com, mrkafk@gmail.com,
	alexl@redhat.com, jack@suse.cz, tvrtko.ursulin@sophos.com,
	a.p.zijlstra@chello.nl, hch@infradead.org,
	alan@lxorguk.ukuu.org.uk, mmorley@hcl.in, pavel@suse.cz
Subject: Re: fanotify - overall design before I start sending patches
Date: Fri, 24 Jul 2009 16:42:23 -0600	[thread overview]
Message-ID: <20090724224223.GH4231@webber.adilger.int> (raw)
In-Reply-To: <1248470485.3567.106.camel@localhost>

On Jul 24, 2009  17:21 -0400, Eric Paris wrote:
> On Fri, 2009-07-24 at 15:00 -0600, Andreas Dilger wrote:
> > On Jul 24, 2009  16:13 -0400, Eric Paris wrote:
> > It seems like a 32-bit mask might not be enough, it wouldn't be hard
> > at this stage to add a 64-bit mask.  Lustre has a similar mechanism
> > (changelog) that allows tracking all different kinds of filesystem
> > events (create/unlink/symlink/link/rename/mkdir/setxattr/etc), instead
> > of just open/close, also use by HSM, enhanced rsync, etc.
> 
> I had a 64 bit mask, but Al Viro ask me to go back to a 32 bit mask
> because of i386 register pressure.  The bitmask operations are on VERY
> hot paths inside the kernel.

How about adding a spare "__u32 mask_hi" for future use, so that it can
be changed directly into a __u64 on LE machines?  That preserves the
extensibility for the future, without hitting performance on 32-bit
machines before it is needed.

> > > struct fanotify_event_metadata {
> > >         __u32 event_len;
> > >         __s32 fd;
> > >         __u32 mask;
> > >         __u32 f_flags;
> > >         __s32 pid;
> > >         __s32 tgid;
> > >         __u64 cookie;
> > > }  __attribute__((packed));
> > 
> > Getting the attributes that have changed into this message is also
> > useful, as it avoids a continual stream of "stat" calls on the inodes.
> 
> Hmmm, I'll take a look.  Do you have a good example of what you would
> want to see?  I don't think we know in the notification hooks what
> actually is being changed  :(

Well, I'm thinking there will be a lot of events that some applications
will not care about (e.g. PERM checks where the user is only changing
the file mode, vs. PERM checks where the owner of the file is changing).
Even if the old attributes are not available, having a mask of which
fields in the inode changed, and struct stat64 would be very useful.

> > The other thing that is important for HSM is that this log is atomic
> > and persistent, otherwise there may be files that are missed if the
> > node crashes.  This involves creating atomic update records as part
> > of the filesystem operation, and then userspace consumes them and
> > tells the kernel that it is finished with records up to X.  Otherwise
> > you risk inconsistencies between rsync/HSM/updatedb for files that
> > are updated just before a crash.
> 
> Uhhh, persistent across a crash?  Nope, don't have that.  Notification
> is all in memory.  Can't I just put the onus on userspace to recheck
> things maybe?  Sounds like a user for i_version....

Well, if new files are created then userspace won't have any idea which
inodes need to be checked, and it will also need to keep a persistent
database of all file i_version values.  If you are trying to hook a
backup tool onto such an interface and files created persistently on
disk before a crash are not handled, then they may never be backed up.

Tools like inotify are fine for desktop window refresh and similar uses,
but for applications which require robust handling they also need to
work over a crash.

The other issue is that you might get quite a large queue of operations
in memory, and if this can't be saved to the filesystem then it might
result in OOMing itself.

> > > If a FAN_ACCESS_PERM or FAN_OPEN_PERM event is received the listener
> > > must send a response before the 5 second timeout.  If no response is
> > > sent before the 5 second timeout the original operation is allowed.  If
> > > this happens too many times (10 in a row) the fanotify group is evicted
> > > from the kernel and will not get any new events.
> > 
> > This should be a tunable, since if the intent is to monitor PERM checks
> > it would be possible for users to DOS the machine and delay the userspace
> > programs and access files they shouldn't be able to.
> 
> At the moment I cheat and say root only to bind.  I do plan to open it
> up to non-root users after it's in and working, but I'm seriously
> considering leaving _PERM events as root only.  It's hard to map the
> original to listener security implications.  So making sure the listener
> is always root is easy   :)

My comment has nothing to do with non-root access.  It has to do with
how long the userspace watcher has to handle an event.  If a regular user
is running a 50-thread iozone with a 1M file directory you can imagine it
will create a lot of events to watch, along with a lot of seeking to slow
down the processing of events.  If the user then does "open(secretfile)"
(where your _PERM check is doing something useful) it is possible
that the userspace listener will time out and miss some events.

> Userspace would never be able to access a file it shouldn't be allowed
> to (the new fd is created in the context of the listener and EPERM is
> possible.)

Ah, so the _PERM check is only intended to grant extra access, instead
of restricting it?  That should be made clear in the documentation that
doing the opposite is an easily-bypassed security vulnerability.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


  reply	other threads:[~2009-07-24 22:42 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-24 20:13 fanotify - overall design before I start sending patches Eric Paris
2009-07-24 20:48 ` david-gFPdbfVZQbY
     [not found]   ` <alpine.DEB.1.10.0907241340580.28013-Z4YwzcCRHZnr5h6Zg1Auow@public.gmane.org>
2009-07-24 21:01     ` Eric Paris
2009-07-24 21:44       ` Jamie Lokier
2009-07-27 17:52         ` Evgeniy Polyakov
2009-07-29 20:11           ` Eric Paris
2009-07-24 21:00 ` Andreas Dilger
2009-07-24 21:21   ` Eric Paris
2009-07-24 22:42     ` Andreas Dilger [this message]
2009-07-24 23:01       ` Jamie Lokier
2009-07-24 22:48 ` Jamie Lokier
     [not found]   ` <20090724224813.GK27755-yetKDKU6eevNLxjTenLetw@public.gmane.org>
2009-07-24 23:25     ` Eric Paris
2009-07-24 23:46       ` Jamie Lokier
2009-07-24 23:49     ` Eric Paris
2009-07-25  0:29       ` Jamie Lokier
2009-07-27 18:33         ` Andreas Dilger
2009-07-27 19:23           ` Jamie Lokier
2009-07-28 17:59             ` Andreas Dilger
     [not found]             ` <20090727192342.GA27895-yetKDKU6eevNLxjTenLetw@public.gmane.org>
2009-07-29 20:14               ` Eric Paris
     [not found]           ` <20090727183354.GM4231-RIaA196FMs1uuQVovAj/GogTZbYi8/ss@public.gmane.org>
2009-07-29 20:12             ` Eric Paris
2009-07-29 20:07         ` Eric Paris
2009-07-27 16:54     ` Jan Kara
2009-07-25 14:22 ` Niraj kumar
2009-07-29 20:08   ` Eric Paris
2009-07-28 11:48 ` Jon Masters
     [not found]   ` <1248781708.14145.21.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-07-29 20:20     ` Eric Paris
2009-08-03 16:23 ` Christoph Hellwig
     [not found]   ` <20090803162303.GA31058-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2009-08-03 16:55     ` Eric Paris
2009-08-03 18:04       ` Christoph Hellwig
     [not found]         ` <20090803180437.GA9798-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2009-08-03 18:13           ` Eric Paris
2009-08-04 16:09 ` Tvrtko Ursulin
     [not found]   ` <200908041709.51659.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
2009-08-04 16:27     ` Eric Paris
2009-08-04 16:39       ` Tvrtko Ursulin
     [not found]       ` <1249403268.2361.21.camel-8EcGF3LoIElviLIMxPk1+R/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
2009-08-04 17:22         ` Valdis.Kletnieks-PjAqaU27lzQ
     [not found]           ` <19585.1249406551-+bZmOdGhbsPr6rcHtW+onFJE71vCis6O@public.gmane.org>
2009-08-04 18:20             ` John Stoffel
     [not found]               ` <19064.31705.491774.122207-HgN6juyGXH5AfugRpC6u6w@public.gmane.org>
2009-08-04 18:50                 ` Eric Paris
2009-08-05  9:32               ` Tvrtko Ursulin
2009-08-04 16:34 ` Tvrtko Ursulin
     [not found]   ` <200908041734.05762.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
2009-08-05 10:12     ` Douglas Leeder
2009-08-05 10:35   ` Douglas Leeder
2009-08-05  2:05 ` Pavel Machek
2009-08-05 16:46   ` Tvrtko Ursulin
     [not found]     ` <200908051746.17903.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
2009-08-06 10:10       ` Pavel Machek
2009-08-06 10:20         ` Tvrtko Ursulin
2009-08-06 10:24           ` Pavel Machek
     [not found]         ` <20090806101059.GD31370-I/5MKhXcvmPrBKCeMvbIDA@public.gmane.org>
2009-08-06 10:20           ` Douglas Leeder
2009-08-06 10:22             ` Pavel Machek
2009-08-07  8:59               ` Jamie Lokier
2009-08-06 10:29             ` Peter Zijlstra
2009-08-06 10:59               ` Tvrtko Ursulin
2009-08-06 11:23                 ` Peter Zijlstra
2009-08-06 12:48                   ` Tvrtko Ursulin
     [not found]                     ` <200908061348.43625.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
2009-08-06 12:58                       ` Alan Cox
     [not found]                         ` <20090806135800.7ccb7787-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>
2009-08-06 18:18                           ` Eric Paris
2009-08-06 13:50                   ` Kernel Event Notification Subsystem (was: fanotify - overall design before I start sending patches) Al Boldi
2009-08-06 13:50                   ` Al Boldi
2009-08-06 18:18                   ` fanotify - overall design before I start sending patches Eric Paris
2009-08-07 16:36                     ` Miklos Szeredi
     [not found]                       ` <E1MZSQZ-0002as-FA-8f8m9JG5TPIdUIPVzhDTVZP2KDSNp7ea@public.gmane.org>
2009-08-07 17:43                         ` Eric Paris
2009-08-08 10:36                           ` Pavel Machek
2009-08-10 10:03                           ` Miklos Szeredi
     [not found]                     ` <1249582695.20644.35.camel-8EcGF3LoIElviLIMxPk1+R/sF2h8X+2i0E9HWUfgJXw@public.gmane.org>
2009-08-08 10:34                       ` Pavel Machek
     [not found]                 ` <200908061159.45550.tvrtko.ursulin-j34lQMj1tz/QT0dZR+AlfA@public.gmane.org>
2009-08-06 11:24                   ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090724224223.GH4231@webber.adilger.int \
    --to=adilger@sun.com \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=a.p.zijlstra@chello.nl \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=alexl@redhat.com \
    --cc=arjan@infradead.org \
    --cc=aviro@redhat.com \
    --cc=david@lang.hm \
    --cc=douglas.leeder@sophos.com \
    --cc=eparis@redhat.com \
    --cc=greg@kroah.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jcm@redhat.com \
    --cc=jengelh@medozas.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=malware-list@dmesg.printk.net \
    --cc=mmorley@hcl.in \
    --cc=mrkafk@gmail.com \
    --cc=pavel@suse.cz \
    --cc=tvrtko.ursulin@sophos.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).