From: Jan Kara <jack@suse.cz>
To: Andreas Dilger <adilger@dilger.ca>
Cc: "Jan Kara" <jack@suse.cz>,
"Beata Michalska" <b.michalska@samsung.com>,
LKML <linux-kernel@vger.kernel.org>,
"Theodore Ts'o" <tytso@mit.edu>,
"Hugh Dickins" <hughd@google.com>,
"Lukáš Czerner" <lczerner@redhat.com>,
"Christoph Hellwig" <hch@infradead.org>,
Ext4 <linux-ext4@vger.kernel.org>, linux-mm <linux-mm@kvack.org>,
kyungmin.park@samsung.com, kmpark@infradead.org
Subject: Re: [RFC 1/4] fs: Add generic file system event notifications
Date: Mon, 20 Apr 2015 12:32:32 +0200 [thread overview]
Message-ID: <20150420103232.GE3117@quack.suse.cz> (raw)
In-Reply-To: <067F429D-5480-4449-9141-87B5D9BD1309@dilger.ca>
On Fri 17-04-15 16:44:16, Andreas Dilger wrote:
> On Apr 17, 2015, at 5:31 AM, Jan Kara <jack@suse.cz> wrote:
> > On Wed 15-04-15 09:15:44, Beata Michalska wrote:
> >> Introduce configurable generic interface for file
> >> system-wide event notifications to provide file
> >> systems with a common way of reporting any potential
> >> issues as they emerge.
> >>
> >> The notifications are to be issued through generic
> >> netlink interface, by a dedicated, for file system
> >> events, multicast group. The file systems might as
> >> well use this group to send their own custom messages.
> >>
> >> The events have been split into four base categories:
> >> information, warnings, errors and threshold notifications,
> >> with some very basic event types like running out of space
> >> or file system being remounted as read-only.
> >>
> >> Threshold notifications have been included to allow
> >> triggering an event whenever the amount of free space
> >> drops below a certain level - or levels to be more precise
> >> as two of them are being supported: the lower and the upper
> >> range. The notifications work both ways: once the threshold
> >> level has been reached, an event shall be generated whenever
> >> the number of available blocks goes up again re-activating
> >> the threshold.
> >>
> >> The interface has been exposed through a vfs. Once mounted,
> >> it serves as an entry point for the set-up where one can
> >> register for particular file system events.
> >>
> >> Signed-off-by: Beata Michalska <b.michalska@samsung.com>
> > Thanks for the patches! Some comments are below.
> >
> >> ---
> >> Documentation/filesystems/events.txt | 254 +++++++++++
> >> fs/Makefile | 1 +
> >> fs/events/Makefile | 6 +
> >> fs/events/fs_event.c | 775 ++++++++++++++++++++++++++++++++++
> >> fs/events/fs_event.h | 27 ++
> >> fs/events/fs_event_netlink.c | 94 +++++
> >> fs/namespace.c | 1 +
> >> include/linux/fs.h | 6 +-
> >> include/linux/fs_event.h | 69 +++
> >> include/uapi/linux/fs_event.h | 62 +++
> >> include/uapi/linux/genetlink.h | 1 +
> >> net/netlink/genetlink.c | 7 +-
> >> 12 files changed, 1301 insertions(+), 2 deletions(-)
> >> create mode 100644 Documentation/filesystems/events.txt
> >> create mode 100644 fs/events/Makefile
> >> create mode 100644 fs/events/fs_event.c
> >> create mode 100644 fs/events/fs_event.h
> >> create mode 100644 fs/events/fs_event_netlink.c
> >> create mode 100644 include/linux/fs_event.h
> >> create mode 100644 include/uapi/linux/fs_event.h
> >>
> >> diff --git a/Documentation/filesystems/events.txt b/Documentation/filesystems/events.txt
> >> new file mode 100644
> >> index 0000000..c85dd88
> >> --- /dev/null
> >> +++ b/Documentation/filesystems/events.txt
> >> @@ -0,0 +1,254 @@
> >> +
> >> + Generic file system event notification interface
> >> +
> >> +Document created 09 April 2015 by Beata Michalska <b.michalska@samsung.com>
> >> +
> >> +1. The reason behind:
> >> +=====================
> >> +
> >> +There are many corner cases when things might get messy with the filesystems.
> >> +And it is not always obvious what and when went wrong. Sometimes you might
> >> +get some subtle hints that there is something going on - but by the time
> >> +you realise it, it might be too late as you are already out-of-space
> >> +or the filesystem has been remounted as read-only (i.e.). The generic
> >> +interface for the filesystem events fills the gap by providing a rather
> >> +easy way of real-time notifications triggered whenever something intreseting
> >> +happens, allowing filesystems to report events in a common way, as they occur.
> >> +
> >> +2. How does it work:
> >> +====================
> >> +
> >> +The interface itself has been exposed as fstrace-type Virtual File System,
> >> +primarily to ease the process of setting up the configuration for the file
> >> +system notifications. So for starters it needs to get mounted (obviously):
> >> +
> >> + mount -t fstrace none /sys/fs/events
> >> +
> >> +This will unveil the single fstrace filesystem entry - the 'config' file,
> >> +through which the notification are being set-up.
> >> +
> >> +Activating notifications for particular filesystem is as straightforward
> >> +as writing into the 'config' file. Note that by default all events despite
> >> +the actual filesystem type are being disregarded.
> > Is there a reason to have a special filesystem for this? Do you expect
> > extending it by (many) more files? Why not just creating a file in sysfs or
> > something like that?
> >
> >> +Synopsis of config:
> >> +------------------
> >> +
> >> + MOUNT EVENT_TYPE [L1] [L2]
> >> +
> >> + MOUNT : the filesystem's mount point
> > I'm not quite decided but is mountpoint really the right thing to pass
> > via the interface? They aren't unique (filesystem can be mounted in
> > multiple places) and more importantly can change over time. So won't it be
> > better to pass major:minor over the interface? These are stable, unique to
> > the filesystem, and userspace can easily get them by calling stat(2) on the
> > desired path (or directly from /proc/self/mountinfo). That could be also
> > used as an fs identifier instead of assigned ID (and thus we won't need
> > those events about creation of new trace which look somewhat strange to
> > me).
> >
> > OTOH using major:minor may have issues in container world where processes
> > could watch events from filesystems inaccessible to the container if they
> > guess the device number. So maybe we could use 'path' when creating new
> > trace but I'd still like to use the device number internally and for all
> > outgoing communication because of above mentioned problems with
> > mountpoints.
>
> Please don't make major:minor part of the interface. That doesn't make
> sense for network filesystems. Using the mountpoint to set this up is
> fine, and really what is expected by userspace tools to monitor a specific
> mountpoint. We could use sb->s_id to identify the events.
So for setup I agree that mountpoint is probably the easiest. For
reporting back from kernel, sb->s_id isn't enough because as Beata noted,
this isn't unique. You are right that for network filesystems (or
in-memory filesystem for that matter) device number doesn't make any
particular sense but each fs (even e.g. procfs) is assigned a "virtual"
device number which uniquely identifies that filesystem. You can see that
device number in /proc/self/mountinfo and you will also see it in st_dev
from stat(2). So using that is IMHO better than devising own unique number.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-04-20 10:32 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-15 7:15 [RFC 0/4] Generic file system events interface Beata Michalska
2015-04-15 7:15 ` [RFC 1/4] fs: Add generic file system event notifications Beata Michalska
2015-04-15 19:25 ` Darrick J. Wong
2015-04-16 8:22 ` Beata Michalska
2015-04-17 8:48 ` Jan Kara
2015-04-16 3:46 ` Eric Sandeen
2015-04-16 8:41 ` Beata Michalska
2015-04-16 20:10 ` Hugh Dickins
2015-04-17 9:10 ` Beata Michalska
2015-04-16 21:56 ` Heinrich Schuchardt
2015-04-17 9:46 ` Beata Michalska
2015-04-17 11:58 ` Jan Kara
2015-04-17 11:31 ` Jan Kara
2015-04-17 13:04 ` Beata Michalska
2015-04-17 13:15 ` Beata Michalska
2015-04-17 13:16 ` Jan Kara
2015-04-17 13:23 ` Austin S Hemmelgarn
2015-04-17 13:41 ` Jan Kara
2015-04-17 14:51 ` John Spray
2015-04-17 15:43 ` Jan Kara
2015-04-17 16:08 ` John Spray
2015-04-17 16:22 ` Jan Kara
2015-04-17 16:29 ` Austin S Hemmelgarn
2015-04-17 16:39 ` Jan Kara
2015-04-17 17:37 ` John Spray
2015-04-17 22:37 ` Andreas Dilger
2015-04-17 16:25 ` Beata Michalska
2015-04-17 22:44 ` Andreas Dilger
2015-04-20 8:56 ` Beata Michalska
2015-04-20 10:32 ` Jan Kara [this message]
2015-04-15 7:15 ` [RFC 2/4] ext4: Add helper function to mark group as corrupted Beata Michalska
2015-04-15 7:15 ` [RFC 3/4] ext4: Add support for generic FS events Beata Michalska
2015-04-15 19:18 ` Darrick J. Wong
2015-04-16 8:02 ` Beata Michalska
2015-04-15 7:15 ` [RFC 4/4] shmem: " Beata Michalska
2015-04-17 8:17 ` [RFC 0/4] Generic file system events interface Jan Kara
2015-04-17 9:10 ` Beata Michalska
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150420103232.GE3117@quack.suse.cz \
--to=jack@suse.cz \
--cc=adilger@dilger.ca \
--cc=b.michalska@samsung.com \
--cc=hch@infradead.org \
--cc=hughd@google.com \
--cc=kmpark@infradead.org \
--cc=kyungmin.park@samsung.com \
--cc=lczerner@redhat.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).