* Re: [PATCH 2.6.10-rd1-bk12] user-level lookup handler for tmpfs
2004-11-03 21:58 [PATCH 2.6.10-rd1-bk12] user-level lookup handler for tmpfs Adam J. Richter
@ 2004-11-03 14:45 ` Matthew Wilcox
2004-11-03 16:38 ` Mike Waychison
1 sibling, 0 replies; 8+ messages in thread
From: Matthew Wilcox @ 2004-11-03 14:45 UTC (permalink / raw)
To: Adam J. Richter; +Cc: linux-fsdevel, Michael.Waychison
On Wed, Nov 03, 2004 at 01:58:05PM -0800, Adam J. Richter wrote:
> This patch eliminates the user level race condition that
> I mentioned in my original trapfs announcement and which Michael
> Waychison also complained out. Now, if an attempt is made to
> open or stat a file name that is already blocking on a user
> level helper, the new attempt will also invoke a user level
> helper and block.
I don't think this is a great idea. What scenario do you envision where
this is more useful than having the second attempt block until the first
one has finished, then just returning the results of the first attempt
(or retrying if the first attempt was unsuccessful)?
--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2.6.10-rd1-bk12] user-level lookup handler for tmpfs
2004-11-03 21:58 [PATCH 2.6.10-rd1-bk12] user-level lookup handler for tmpfs Adam J. Richter
2004-11-03 14:45 ` Matthew Wilcox
@ 2004-11-03 16:38 ` Mike Waychison
1 sibling, 0 replies; 8+ messages in thread
From: Mike Waychison @ 2004-11-03 16:38 UTC (permalink / raw)
To: Adam J. Richter; +Cc: linux-fsdevel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Adam J. Richter wrote:
> This patch eliminates the user level race condition that
> I mentioned in my original trapfs announcement and which Michael
> Waychison also complained out. Now, if an attempt is made to
> open or stat a file name that is already blocking on a user
> level helper, the new attempt will also invoke a user level
> helper and block.
>
> A side effect of this is that, if a user level program
> wants to create a plain text file, the helper program has to create
> it with some file name that the program knows to ignore and then
> rename them to the correct file names. This is because there is
> nothing special about the user level helper program's attempts to
> open the nonexistant files, so that open() invokes another instance
> instance of a user level helper program, which needs to know to ignore
> the file name. This should be pretty easy to do, and I've added an
> untested hypothetical example of it to
> Documentation/filesystems/lookup-trap.txt.
So the ignoring is done by the userspace agent?
>
> If you want to have a directory tree that has no such file
> names, then make it a "mount --bind" copy of a subdirectory of a
> hidden tmpfs file system, something like this:
>
> mount -t tmpfs /hidden
> mkdir /hidden/mirror /hidden/tmp-files
> mount --bind /hidden/mirror /public
>
> ...and then have your user level helper program create
> files in /hidden/tmp-files and move them to /hidden/mirror or /public
> (makes no difference, although using /hidden/mirror would make it clearer
> that the move is within the same file system).
Stuff like moving a directory with a mountpoint on it can't be done
(from userspace). This scheme would require doing an explicit mkdir and
a mount --move.
>
> Serialization is up to the user level helper programs to deal
> with if necessary. This approach not only keeps the kernel code
> small, but also allows the user level helper programs to avoid
> unnecessary serialization.
>
> Anyhow, here is the updated patch. Please let me know what
> you think. I will probably post this and the devfs patch to
> lkml soon for comments from a wider audience.
>
- --
Mike Waychison
Sun Microsystems, Inc.
1 (650) 352-5299 voice
1 (416) 202-8336 voice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: The opinions expressed in this email are held by me,
and may not represent the views of Sun Microsystems, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFBiQmQdQs4kOxk3/MRAo6rAJ907iugzAz6h/m9zvzO8dpvG+a3UACghljw
aRWbEwiNizzvjat4r+1dYmg=
=lwzL
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2.6.10-rd1-bk12] user-level lookup handler for tmpfs
@ 2004-11-03 18:51 Adam J. Richter
0 siblings, 0 replies; 8+ messages in thread
From: Adam J. Richter @ 2004-11-03 18:51 UTC (permalink / raw)
To: matthew; +Cc: linux-fsdevel
>On Wed, Nov 03, 2004 at 01:58:05PM -0800, Adam J. Richter wrote:
>> This patch eliminates the user level race condition that
>> I mentioned in my original trapfs announcement and which Michael
>> Waychison also complained out. Now, if an attempt is made to
>> open or stat a file name that is already blocking on a user
>> level helper, the new attempt will also invoke a user level
>> helper and block.
>I don't think this is a great idea. What scenario do you envision where
>this is more useful than having the second attempt block until the first
>one has finished, then just returning the results of the first attempt
>(or retrying if the first attempt was unsuccessful)?
Waiting for the first helper to return was my preference too, but I
thought it would be hard to do. However, now I think that can work,
because ->d_revalidate() is allowed to block. I've coded up an attempt
at doing this, but my attempt at it is broken right now, and it will
probably be another ~12 hours before I can look at it again. So, I'm
sending you this email now to let you know that I'm looking into your
suggestion.
__ ______________
Adam J. Richter \ /
adam@yggdrasil.com | g g d r a s i l
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2.6.10-rd1-bk12] user-level lookup handler for tmpfs
@ 2004-11-03 21:58 Adam J. Richter
2004-11-03 14:45 ` Matthew Wilcox
2004-11-03 16:38 ` Mike Waychison
0 siblings, 2 replies; 8+ messages in thread
From: Adam J. Richter @ 2004-11-03 21:58 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Michael.Waychison
This patch eliminates the user level race condition that
I mentioned in my original trapfs announcement and which Michael
Waychison also complained out. Now, if an attempt is made to
open or stat a file name that is already blocking on a user
level helper, the new attempt will also invoke a user level
helper and block.
A side effect of this is that, if a user level program
wants to create a plain text file, the helper program has to create
it with some file name that the program knows to ignore and then
rename them to the correct file names. This is because there is
nothing special about the user level helper program's attempts to
open the nonexistant files, so that open() invokes another instance
instance of a user level helper program, which needs to know to ignore
the file name. This should be pretty easy to do, and I've added an
untested hypothetical example of it to
Documentation/filesystems/lookup-trap.txt.
If you want to have a directory tree that has no such file
names, then make it a "mount --bind" copy of a subdirectory of a
hidden tmpfs file system, something like this:
mount -t tmpfs /hidden
mkdir /hidden/mirror /hidden/tmp-files
mount --bind /hidden/mirror /public
...and then have your user level helper program create
files in /hidden/tmp-files and move them to /hidden/mirror or /public
(makes no difference, although using /hidden/mirror would make it clearer
that the move is within the same file system).
Serialization is up to the user level helper programs to deal
with if necessary. This approach not only keeps the kernel code
small, but also allows the user level helper programs to avoid
unnecessary serialization.
Anyhow, here is the updated patch. Please let me know what
you think. I will probably post this and the devfs patch to
lkml soon for comments from a wider audience.
__ ______________
Adam J. Richter \ /
adam@yggdrasil.com | g g d r a s i l
--- linux-2.6.10-rc1-bk12/include/linux/shmem_fs.h 2004-10-18 14:54:55.000000000 -0700
+++ linux/include/linux/shmem_fs.h 2004-11-03 11:42:11.000000000 -0800
@@ -3,6 +3,8 @@
#include <linux/swap.h>
#include <linux/mempolicy.h>
+#include <linux/fshelper.h>
+#include <linux/mount.h>
/* inode in-kernel data */
@@ -22,11 +24,13 @@
};
struct shmem_sb_info {
+ int limited; /* 0 = ignore max_blocks and max_inodes */
unsigned long max_blocks; /* How many blocks are allowed */
unsigned long free_blocks; /* How many are left for allocation */
unsigned long max_inodes; /* How many inodes are allowed */
unsigned long free_inodes; /* How many are left for allocation */
spinlock_t stat_lock;
+ struct fs_helper helper;
};
static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
@@ -34,4 +38,6 @@
return container_of(inode, struct shmem_inode_info, vfs_inode);
}
+extern int __init init_tmpfs(void); /* early initialization for devfs */
+
#endif
--- linux-2.6.10-rc1-bk12/include/linux/fshelper.h 1969-12-31 16:00:00.000000000 -0800
+++ linux/include/linux/fshelper.h 2004-11-02 20:21:47.000000000 -0800
@@ -0,0 +1,33 @@
+#ifndef _LINUX_FS_HELPER
+#define _LINUX_FS_HELPER
+
+#include <linux/dcache.h>
+#include <linux/rwsem.h>
+#include <linux/parser.h>
+
+/* The {init,set,call,clear}_fs_helper interface is for a file system
+ to invoke a shell command on an event related to a dentry. */
+
+struct fs_helper {
+ char *shell_command;
+ struct rw_semaphore rw_sem;
+};
+
+/* so we can change this in one place */
+extern void init_fs_helper(struct fs_helper *helper);
+
+extern int set_fs_helper(struct fs_helper *helper, substring_t *cmd);
+
+static inline void free_fs_helper(struct fs_helper *helper)
+{
+ kfree(helper->shell_command);
+}
+
+/*
+ Note: call_fs_helper releases and retakes dentry->d_parent->d_inode->i_sem.
+*/
+extern void call_fs_helper(struct fs_helper *helper,
+ const char *event,
+ struct dentry *dentry);
+
+#endif /* _LINUX_FS_HELPER */
--- linux-2.6.10-rc1-bk12/mm/shmem.c 2004-11-02 22:48:08.000000000 -0800
+++ linux/mm/shmem.c 2004-11-03 13:32:22.000000000 -0800
@@ -14,6 +14,9 @@
* Copyright (c) 2004, Luke Kenneth Casson Leighton <lkcl@lkcl.net>
* Copyright (c) 2004 Red Hat, Inc., James Morris <jmorris@redhat.com>
*
+ * User level helper for directory lookups:
+ * Copyright (C) 2004 Adam J. Richter, Yggdrasil Computing, Inc.
+ *
* This file is released under the GPL.
*/
@@ -46,6 +49,7 @@
#include <linux/mempolicy.h>
#include <linux/namei.h>
#include <linux/xattr.h>
+#include <linux/fshelper.h>
#include <asm/uaccess.h>
#include <asm/div64.h>
#include <asm/pgtable.h>
@@ -135,7 +139,20 @@
static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb)
{
+#ifdef CONFIG_TMPFS
return sb->s_fs_info;
+#else
+ return NULL; /* compiler optimization */
+#endif
+}
+
+static inline int shmem_have_quotas(struct shmem_sb_info *sb_info)
+{
+#ifdef CONFIG_TMPFS
+ return sb_info->limited;
+#else
+ return 0; /* sb_info will be NULL. */
+#endif
}
/*
@@ -194,7 +211,8 @@
static void shmem_free_blocks(struct inode *inode, long pages)
{
struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
- if (sbinfo) {
+
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
sbinfo->free_blocks += pages;
inode->i_blocks -= pages*BLOCKS_PER_PAGE;
@@ -357,7 +375,7 @@
* page (and perhaps indirect index pages) yet to allocate:
* a waste to allocate index if we cannot allocate data.
*/
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
if (sbinfo->free_blocks <= 1) {
spin_unlock(&sbinfo->stat_lock);
@@ -678,7 +696,7 @@
spin_unlock(&shmem_swaplist_lock);
}
}
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
BUG_ON(inode->i_blocks);
spin_lock(&sbinfo->stat_lock);
sbinfo->free_inodes++;
@@ -1081,7 +1099,7 @@
} else {
shmem_swp_unmap(entry);
sbinfo = SHMEM_SB(inode->i_sb);
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
if (sbinfo->free_blocks == 0 ||
shmem_acct_block(info->flags)) {
@@ -1269,7 +1287,7 @@
struct shmem_inode_info *info;
struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
if (!sbinfo->free_inodes) {
spin_unlock(&sbinfo->stat_lock);
@@ -1598,7 +1616,7 @@
buf->f_type = TMPFS_MAGIC;
buf->f_bsize = PAGE_CACHE_SIZE;
buf->f_namelen = NAME_MAX;
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
buf->f_blocks = sbinfo->max_blocks;
buf->f_bavail = buf->f_bfree = sbinfo->free_blocks;
@@ -1651,6 +1669,102 @@
}
/*
+ * Retaining negative dentries for an in-memory filesystem just wastes
+ * memory and lookup time: arrange for them to be deleted immediately.
+ */
+static int shmem_delete_dentry(struct dentry *dentry)
+{
+ return 1;
+}
+
+/* Force revalidation of all negative dentries. Note that this routine
+ only gets called when some other routine has a reference to the dentry
+ or the dentry has an inode. Otherwise, unused negative dentries
+ areimmediately dropped, because of shmem_delete_dentry. Apparently,
+ the only time shmem_dentry_valid ends up being called with an empty
+ inode is when a trapped reference is blocking on the user level helper,
+ which is exactly when we do want to force another lookup so that this
+ new reference will block too. I had previously used dentry->s_fsdata
+ to flag whether someone was a user level helper was blocking on it,
+ but now that seems unnecessary, so we just check dentry->d_inode here.
+*/
+static int shmem_dentry_valid(struct dentry *dentry,
+ struct nameidata *nd)
+{
+ return (dentry->d_inode != NULL);
+}
+
+
+/* shmem_dentry_ops gives the following behavior:
+ - Delete any negative dentry that is not in use.
+ - For empty ("negative") dentries that are in use, force a new lookup.
+*/
+static struct dentry_operations shmem_dentry_ops = {
+ .d_delete = shmem_delete_dentry,
+ .d_revalidate = shmem_dentry_valid,
+};
+
+static struct dentry * shmem_lookup(struct inode *dir,
+ struct dentry *dentry,
+ struct nameidata *nd)
+{
+ /*
+ We must do simple_looukp before trapfs_event to prevent
+ a duplicate dentry from being created if the trapfs_helper
+ program attempts to access the same file name in /dev.
+ If simple_lookup returns non-NULL, then that is to an error
+ like a malformed file name, so we do not invoke trapfs_event.
+ If the file is not found but there was no other error,
+ simple_lookup returns NULL, and that is the only case
+ in which we want to generate a notification.
+
+ We also filter out the final path element of mknod, mkdir
+ and symlink, because invoking the helper for mknod and mkdir
+ could lead to deadlock when trapfs loads a device driver
+ kernel module than. One would think that the way to filter
+ would be to look at nd->flags to check that LOOKUP_CREATE
+ is set and LOOKUP_OPEN is clear, but instead, the vfs
+ layers passed nd==NULL in these cases via a routine
+ called lookup_create (without any leading underscores),
+ so we filter out the case where nd == NULL.
+
+ Filtering out nd==NULL has the unintented side-effect of
+ filtering out the final path component of arguments to
+ rmdir, unlink and rename (both source and destination).
+ For rmdir, unlink, and the source arguement to rename,
+ that's fine, since nobody cares about attempts to remove
+ nonexistant files. We're probably also OK skipping the
+ notifications with regard to the destination argument to
+ rename, although that is less clear.
+ */
+
+ struct shmem_sb_info *sbinfo = SHMEM_SB(dir->i_sb);
+
+ struct dentry *result = NULL;
+
+ if (dentry->d_name.len > NAME_MAX)
+ return ERR_PTR(-ENAMETOOLONG);
+
+ dentry->d_op = &shmem_dentry_ops;
+ d_add(dentry, NULL);
+
+ if (nd != NULL) {
+ struct dentry *new;
+
+ call_fs_helper(&sbinfo->helper, "LOOKUP", dentry);
+
+ new = d_lookup(dentry->d_parent, &dentry->d_name);
+
+ if (new != dentry) /* also handles new==NULL */
+ result = new;
+ else
+ dput(new);
+ }
+
+ return result;
+}
+
+/*
* Link a file..
*/
static int shmem_link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry)
@@ -1663,7 +1777,7 @@
* but each new link needs a new dentry, pinning lowmem, and
* tmpfs dentries cannot be pruned until they are unlinked.
*/
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
if (!sbinfo->free_inodes) {
spin_unlock(&sbinfo->stat_lock);
@@ -1688,7 +1802,7 @@
if (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode)) {
struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
sbinfo->free_inodes++;
spin_unlock(&sbinfo->stat_lock);
@@ -1840,7 +1954,7 @@
#endif
};
-static int shmem_parse_options(char *options, int *mode, uid_t *uid, gid_t *gid, unsigned long *blocks, unsigned long *inodes)
+static int shmem_parse_options(char *options, int *mode, uid_t *uid, gid_t *gid, unsigned long *blocks, unsigned long *inodes, substring_t *helper)
{
char *this_char, *value, *rest;
@@ -1894,6 +2008,9 @@
*gid = simple_strtoul(value,&rest,0);
if (*rest)
goto bad_val;
+ } else if (!strcmp(this_char,"helper")) {
+ helper->from = value;
+ helper->to = value + strlen(value);
} else {
printk(KERN_ERR "tmpfs: Bad mount option %s\n",
this_char);
@@ -1914,27 +2031,45 @@
struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
unsigned long max_blocks = 0;
unsigned long max_inodes = 0;
+ substring_t helper_str;
+ int err;
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
max_blocks = sbinfo->max_blocks;
max_inodes = sbinfo->max_inodes;
}
- if (shmem_parse_options(data, NULL, NULL, NULL, &max_blocks, &max_inodes))
+ helper_str.from = NULL;
+ if (shmem_parse_options(data, NULL, NULL, NULL, &max_blocks, &max_inodes, &helper_str))
return -EINVAL;
+
+ if (helper_str.from) {
+ err = set_fs_helper(&sbinfo->helper, &helper_str);
+ if (err)
+ return err;
+ }
+
/* Keep it simple: disallow limited <-> unlimited remount */
- if ((max_blocks || max_inodes) == !sbinfo)
+ if ((max_blocks || max_inodes) != shmem_have_quotas(sbinfo))
return -EINVAL;
+
/* But allow the pointless unlimited -> unlimited remount */
- if (!sbinfo)
+ if (!max_blocks && !max_inodes)
return 0;
+
return shmem_set_size(sbinfo, max_blocks, max_inodes);
}
#endif
static void shmem_put_super(struct super_block *sb)
{
- kfree(sb->s_fs_info);
- sb->s_fs_info = NULL;
+#ifdef CONFIG_TMPFS
+ struct shmem_sb_info *sb_info = SHMEM_SB(sb);
+
+ free_fs_helper(&sb_info->helper);
+ kfree(sb_info);
+#endif
+
+ sb->s_fs_info = NULL; /* FIXME. Is this line necessary? */
}
#ifdef CONFIG_TMPFS_XATTR
@@ -1952,16 +2087,19 @@
uid_t uid = current->fsuid;
gid_t gid = current->fsgid;
int err = -ENOMEM;
+ substring_t helper_str;
#ifdef CONFIG_TMPFS
unsigned long blocks = 0;
unsigned long inodes = 0;
+ struct shmem_sb_info *sbinfo;
/*
* Per default we only allow half of the physical ram per
* tmpfs instance, limiting inodes to one per page of lowmem;
* but the internal instance is left unlimited.
*/
+ helper_str.from = NULL;
if (!(sb->s_flags & MS_NOUSER)) {
blocks = totalram_pages / 2;
inodes = totalram_pages - totalhigh_pages;
@@ -1969,21 +2107,32 @@
inodes = blocks;
if (shmem_parse_options(data, &mode,
- &uid, &gid, &blocks, &inodes))
+ &uid, &gid, &blocks, &inodes,
+ &helper_str))
return -EINVAL;
}
- if (blocks || inodes) {
- struct shmem_sb_info *sbinfo;
- sbinfo = kmalloc(sizeof(struct shmem_sb_info), GFP_KERNEL);
- if (!sbinfo)
- return -ENOMEM;
- sb->s_fs_info = sbinfo;
- spin_lock_init(&sbinfo->stat_lock);
- sbinfo->max_blocks = blocks;
- sbinfo->free_blocks = blocks;
- sbinfo->max_inodes = inodes;
- sbinfo->free_inodes = inodes;
+ sbinfo = kmalloc(sizeof(struct shmem_sb_info), GFP_KERNEL);
+ if (!sbinfo)
+ return -ENOMEM;
+
+ sbinfo->limited = (blocks || inodes);
+ sb->s_fs_info = sbinfo;
+ spin_lock_init(&sbinfo->stat_lock);
+ sbinfo->max_blocks = blocks;
+ sbinfo->free_blocks = blocks;
+ sbinfo->max_inodes = inodes;
+ sbinfo->free_inodes = inodes;
+ init_fs_helper(&sbinfo->helper);
+
+ if (helper_str.from) {
+ err = set_fs_helper(&sbinfo->helper, &helper_str);
+ if (err) {
+ free_fs_helper(&sbinfo->helper);
+ kfree(sbinfo);
+ return err;
+ }
+ err = -ENOMEM;
}
sb->s_xattr = shmem_xattr_handlers;
#endif
@@ -2088,7 +2237,7 @@
static struct inode_operations shmem_dir_inode_operations = {
#ifdef CONFIG_TMPFS
.create = shmem_create,
- .lookup = simple_lookup,
+ .lookup = shmem_lookup,
.link = shmem_link,
.unlink = shmem_unlink,
.symlink = shmem_symlink,
@@ -2192,9 +2341,15 @@
};
static struct vfsmount *shm_mnt;
-static int __init init_tmpfs(void)
+/* init_tmpfs is exported so that devfs can get an earlier initialization
+ if necessary. */
+int __init init_tmpfs(void)
{
int error;
+ static int initialized; /* = 0 */
+
+ if (initialized)
+ return 0;
error = init_inodecache();
if (error)
@@ -2215,6 +2370,7 @@
printk(KERN_ERR "Could not kern_mount tmpfs\n");
goto out1;
}
+ initialized = 1;
return 0;
out1:
@@ -2310,3 +2466,5 @@
vma->vm_ops = &shmem_vm_ops;
return 0;
}
+EXPORT_SYMBOL(shmem_lock);
+EXPORT_SYMBOL(shmem_nopage);
--- linux-2.6.10-rc1-bk12/fs/Makefile 2004-11-02 22:48:05.000000000 -0800
+++ linux/fs/Makefile 2004-11-02 20:21:47.000000000 -0800
@@ -44,6 +44,7 @@
obj-y += devpts/
obj-$(CONFIG_PROFILING) += dcookies.o
+obj-$(CONFIG_TMPFS) += helper.o
# Do not add any filesystems before this line
obj-$(CONFIG_REISERFS_FS) += reiserfs/
--- linux-2.6.10-rc1-bk12/fs/helper.c 1969-12-31 16:00:00.000000000 -0800
+++ linux/fs/helper.c 2004-11-02 20:21:47.000000000 -0800
@@ -0,0 +1,214 @@
+/*
+ userhelper.c -- Invoke user level helper command for a struct dentry.
+
+ Written by Adam J. Richter
+ Copyright (C) 2004 Yggdrasil Computing, Inc.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/fs.h>
+#include <linux/fshelper.h>
+
+void init_fs_helper(struct fs_helper *helper)
+{
+ helper->shell_command = NULL;
+ init_rwsem(&helper->rw_sem);
+}
+EXPORT_SYMBOL_GPL(init_fs_helper);
+
+int set_fs_helper(struct fs_helper *helper, substring_t *arg)
+{
+ char *dup;
+
+ if (arg->from == arg->to)
+ dup = NULL;
+ else {
+ dup = match_strdup(arg);
+ if (!dup)
+ return -ENOMEM;
+ }
+
+ down_write(&helper->rw_sem);
+ kfree(helper->shell_command);
+ helper->shell_command = dup;
+ up_write(&helper->rw_sem);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(set_fs_helper);
+
+static int path_len (struct dentry *de, struct dentry *root)
+{
+ int len = 0;
+ while (de != root) {
+ len += de->d_name.len + 1; /* count the '/' */
+ de = de->d_parent;
+ }
+ return len; /* -1 because we omit the leading '/',
+ +1 because we include trailing '\0' */
+}
+
+static int write_path_from_mnt (struct dentry *de, char *path, int buflen)
+{
+ struct dentry *mnt_root = de->d_parent->d_inode->i_sb->s_root;
+ int len;
+ char *path_orig = path;
+
+ if (de == NULL || de == mnt_root)
+ return -EINVAL;
+
+ spin_lock(&dcache_lock);
+ len = path_len(de, mnt_root);
+ if (len > buflen) {
+ spin_unlock(&dcache_lock);
+ return -ENAMETOOLONG;
+ }
+
+ path += len - 1;
+ *path = '\0';
+
+ for (;;) {
+ path -= de->d_name.len;
+ memcpy(path, de->d_name.name, de->d_name.len);
+ de = de->d_parent;
+ if (de == mnt_root)
+ break;
+ *(--path) = '/';
+ }
+
+ spin_unlock(&dcache_lock);
+
+ BUG_ON(path != path_orig);
+
+ return 0;
+}
+
+static inline int
+calc_argc(const char *str_in, int *str_len)
+{
+ const char *str = str_in;
+ int argc = 0;
+ while (*str) {
+ while (*str == ' ' || *str == '\t')
+ str++;
+ argc++;
+ while (*str != ' ' && *str != '\t' && *str)
+ str++;
+ }
+ *str_len = str - str_in;
+ return argc;
+}
+
+static char **
+gen_argv(char *str_in, int argc_extra, int *argc_out)
+{
+ int argc;
+ char **argv;
+ char *str_out;
+ int str_len;
+
+ if (!str_in)
+ return NULL;
+
+ while (*str_in == ' ' || *str_in == '\t')
+ str_in++;
+
+ if (*str_in == '\0')
+ return NULL;
+
+ argc = calc_argc(str_in, &str_len);
+
+ argv = kmalloc(((argc + argc_extra) * sizeof(char*)) + str_len + 1,
+ GFP_KERNEL);
+ if (!argv)
+ return NULL;
+
+ str_out = (char*) (argv + argc + argc_extra);
+
+ argc = 0;
+ while (*str_in) {
+ argv[argc++] = str_out;
+
+ while (*str_in != ' ' && *str_in != '\t' && *str_in)
+ *(str_out++) = *(str_in++);
+
+ *(str_out++) = '\0';
+
+ while (*str_in == ' ' || *str_in == '\t')
+ str_in++;
+ }
+ *argc_out = argc;
+ return argv;
+}
+
+
+/*
+ Warning: dentry_usermodehelper releases and retakes
+ dentry->d_parent->d_inode->i_sem. It must be called with this
+ semaphore already held.
+*/
+extern void call_fs_helper(struct fs_helper *helper,
+ const char *event,
+ struct dentry *dentry)
+{
+ char path[64];
+ char **argv;
+ int argc;
+ struct inode *parent_inode = dentry->d_parent->d_inode;
+
+ if (write_path_from_mnt(dentry, path, sizeof(path)) == 0) {
+
+ up(&parent_inode->i_sem);
+
+ /*
+ FIXME. We would not need the extra memory allocation,
+ string copying, error branch and lines of source code
+ due to err_strdup(), and we could put gen_argv
+ into the set_fs_helper, if call_usermodehelper
+ and execve had a callback to inform us when
+ execve was done copying argv and envp. With
+ such a facility, we could just hold helper->rw_sem
+ up to that point, without having to make a copy of the
+ argument (which we currently do) or hold the semaphore
+ until the helper process exits (which would cause a
+ deadlock if a helper process ever tried to change
+ the helper string of a file system, especially since
+ there is not such a thing as rw_down_read_interruptible
+ that would make the deadlock breakable).
+ */
+
+ down_read(&helper->rw_sem);
+ argv = gen_argv(helper->shell_command, 3, &argc);
+ up_read(&helper->rw_sem);
+
+ if (argv != NULL) {
+ static char *envp[] =
+ {"PATH=/bin:/sbin:/usr/bin:/usr/sbin",
+ "HOME=/", NULL };
+
+ argv[argc++] = (char*) event;
+ argv[argc++] = path;
+ argv[argc] = NULL;
+
+ call_usermodehelper(argv[0], argv, envp, 1);
+ kfree(argv);
+ }
+
+ down(&parent_inode->i_sem);
+ }
+}
+EXPORT_SYMBOL_GPL(call_fs_helper);
--- linux-2.6.10-rc1-bk12/Documentation/filesystems/lookup-trap.txt 1969-12-31 16:00:00.000000000 -0800
+++ linux/Documentation/filesystems/lookup-trap.txt 2004-11-03 13:28:31.000000000 -0800
@@ -0,0 +1,375 @@
+User's Guide To Trapping Directory Lookup Operations in Tmpfs
+Version 0.2
+
+
+1. INSTRUCTIONS FOR THE IMPATIENT
+
+ % modprobe tmpfs
+ % mount -t tmpfs -o helper=/path/to/helper/program whatever /mnt
+ % ls /mnt/foo
+ # Notice that "/path/to/helper/program LOOKUP foo" was executed.
+
+
+2. OVERVIEW (from the Kconfig help)
+
+ Tmpfs now allows user level programs to implement file systems
+that are filled in on demand. This feature works by invoking a
+configurable helper program on attempts to open() or stat() a
+nonexistent file. The access waits until the helper finishes, so the
+helper can install the missing file if desired.
+
+ Using this facility, a shell script or small C program can
+implement a file system that automatically mounts remote file systems
+or creates device files on demand, similar to autofs or devfs,
+respectively. Tmpfs is, however, daemonless and, perhaps
+consequently, smaller than either of these, and may avoid some
+recursion problems.
+
+ Tmpfs might also be useful for debugging programs where you
+want to trap the first access a particular file or perhaps in
+automatic installation of missing command or libraries by specifying a
+tmpfs file system in certain search paths.
+
+ This access trapping facility is designed to be easily ported
+to other file systems, see include/linux/fshelper.h and fs/helper.c
+for the implementation.
+
+
+
+3. GETTING STARTED
+
+3.1 MOUNTING THE FILE SYSTEM
+
+ First, build and boot a kernel with tmpfs either compiled in
+or built as a module. If you compile tmpfs as a module, you may have
+to load it, although, since the module name ("tmpfs.ko") and the file
+system name ("tmpfs") match, that may be unnecessary if you've
+configured modprobe automatically.
+
+ % modprobe tmpfs
+
+ Now let's mount a tmpfs file system on /mnt.
+
+ % mount -t tmpfs blah /mnt
+
+ The file system will behave exactly like a ramfs file system.
+In fact, tmpfs is derived from ramfs. You can create files,
+directories, symbolic links and device nodes in it, and they will
+exist only in the computer's main memory. The contents of the file
+system will disappear as soon as you unmount it.
+
+ If you mount multiple instances of tmpfs, you will get
+separate file systems.
+
+
+3.2. THE HELPER PROGRAM
+
+ What distinguishes tmpfs from ramfs is that it can invoke a
+user level helper program when an attempt is made to open or stat a
+nonexistent file for the first time (if the name is not already in the
+dcache). The user level program is set with the "helper" mount
+option. It is possible to set, clear or change the helper command at
+any time, so let's go back to our example and put a helper command on
+the tmpfs file system that we mounted on /mnt:
+
+ % mount -o remount,helper=/tmp/helper /mnt
+
+ If you use the file system in /mnt now, nothing appears to have
+changed. Now let's put a simple shell script in /tmp/helper:
+
+ % cat > /tmp/helper
+ #!/bin/sh
+ echo "$*" > /dev/console
+ ^D
+ % chmod a+x /tmp/helper
+
+ Now you should see console messages like "LOOKUP foo" when you
+try to access the file /mnt/foo for the first time.
+
+ You can also pass arguments to the helper program by using
+spaces in the helper mount option, like so:
+
+ % mount -o remount,helper='/tmp/helper my_argument' /mnt
+
+ If you do this, your console messages will start to look
+something like "my_argument LOOKUP foo". The arguments that
+you specify come before "LOOKUP foo" to facilitate the use of
+command interpreters, like, say, helper='/usr/bin/perl handler.pl'.
+Arguments also make it easy to pass things like the mount point or
+configuration files, which should make it easier to write facilities
+that work on multiple mount points.
+
+ You can also deactivate the helper at any time, like so:
+
+ mount -o remount,helper='' /mnt
+
+4. PRACTICAL EXAMPLES
+
+4.1 AN NFS AUTOMOUNTER
+
+ % cat > /usr/sbin/tmpfs-automount
+ #!/bin/sh
+ topdir=$1
+ host=${3%/*}
+ dir=$topdir/$host
+ mkdir $dir
+ mount -t nfs $host:/ $dir
+ ^D
+ % chmod a+x /usr/sbin/tmpfs-automount
+ % mkdir /auto
+ % mount -t tmpfs -o helper="/usr/sbin/tmpfs-automount /auto" x /auto
+
+ Notice how we pass the additional argument "/auto" to the
+tmpfs-automount command.
+
+ If you want automatic unmount after a timeout, you'll probably
+want to do something a little more elaborate, perhaps with a script that
+runs from cron.
+
+
+4.2 DEMAND LOADING OF DEVICE DRIVERS
+
+ A version of devfs that uses tmpfs is under development and
+running on the system I am using to write this document, but I am
+still cleaning it up. Here is how it should work, although I have
+not yet actually tried devfs_helper on it.
+
+ The devfs_helper program was originally written for a stripped down
+rewrite of devfs, from which tmpfs is derived. It can read your
+/etc/devfs.conf file (the file previously used to configured
+devfsd) and load modules specified by "LOOKUP" commands. Other
+devfs.conf command are ignored.
+
+ % ftp ftp.yggdrasil.com
+ login: anonymous
+ password; guest
+ ftp> cd /pub/dist/device_control/devfs
+ ftp> get devfs_helper-0.2.tar.gz
+ .....
+ ftp> quit
+ % tar xfpvz devfs_helper-0.2.tar.gz
+ % cd devfs_helper-0.2
+ % make
+ % make install
+ % mkdir /tmp/tmpdev
+ % mount -t devfs /tmp/tmpdev
+ % cp -apRx /dev/* /tmpdev/
+ % mount -t devfs -o helper=/sbin/devfs_helper blah /dev
+ % mount -t msods /dev/floppy/0 /mnt
+
+ The above example should load the floppy.ko kernel module
+if you have a a line in your /etc/devfs.conf file like this:
+
+ LOOKUP floppy EXECUTE modprobe floppy
+
+
+ You should also be able to use execfs in this fashion to get
+automatic loading of kernel modules on non-devfs systems, although
+you'll need something like udev the larger udev to create the
+device files once the device drivers are registered.
+
+
+4.3 DEBUGGING A PROGRAM TRYING TO ACCESS A FILE
+
+ % cat > /tmp/call-sleep
+ #!/bin/sh
+ sleep 30
+ ^D
+ % mount -t tmpfs -o helper=/tmp/call-sleep foo /mnt
+ % mv .bashrc .bashrc-
+ % ln -s /mnt/whatever .bashrc
+ % gdb /bin/sh
+ GNU gdb 5.2
+ [blah blah blah]
+ (gdb) run
+ [program eventually hangs. Switch to another terminal session. You
+ cannot control-C out of it, a tmpfs bug from call_usermodehelper.]
+ % ps axf
+ [Find the process under gdb. Let say it's pid 1152.]
+ % kill -SEGV 1152
+ % ps auxww | grep sleep
+ [Find the sleeping tmpfs helper. Let's say it's pid 1120.]
+ % kill -9 1120
+ [Now back at the first session, running gdb on /bin/sh.]
+ Program received signal SIGSEGV, Segmentation fault.
+ 0xb7f303d4 in __libc_open () at __libc_open:-1
+ -1 __libc_open: No such file or directory.
+ in __libc_open
+ (gdb) where
+ #0 0xb7f303d4 in __libc_open () at __libc_open:-1
+ #1 0xb7f8b4c0 in __DTOR_END__ () from /lib/libc.so.6
+ #2 0x080921ef in _evalfile (filename=0x80dc788 "/tmp/junk/.bashrc", flags=9)
+ at evalfile.c:85
+ #3 0x08092635 in maybe_execute_file (
+ fname=0xfffffffe <Address 0xfffffffe out of bounds>,
+ force_noninteractive=1) at evalfile.c:218
+ #4 0x08059fe8 in run_startup_files () at shell.c:1019
+ #5 0x08059849 in main (argc=1, argv=0xbfffebc4, env=0xbfffebcc) at shell.c:581
+ #6 0xb7e88e02 in __libc_start_main (main=0x8059380 <main>, argc=1,
+ ubp_av=0xbfffebc4, init=0x805897c <_init>,
+ fini=0xb80005ac <_dl_debug_mask>, rtld_fini=0x8000, stack_end=0x0)
+ at ../sysdeps/generic/libc-start.c:129
+
+
+4.4 AUTOMATIC LOADING OF MISSING PROGRAMS
+
+ % cat > /usr/sbin/missing-program
+ #!/bin/sh
+ my-automatic-network-downloader $2
+ ^D
+ % chmod a+x /usr/sbin/missing-program
+ % mount -t tmpfs -o helper=/usr/sbin/my-automatic-installer /mnt
+ % PATH=$PATH:/mnt:$PATH
+ # We include $PATH a second time so that the program can be
+ # found after it is installed.
+ % kdevelop # Or some other program you don't have...
+
+ ...or maybe something like this...
+
+ % cat > /usr/sbin/missing-program
+ #!/bin/sh
+ export DISPLAY=:0
+ konqueror http://www.google.com/search?q="download+$2" &
+ ^D
+ % chmod a+x /usr/sbin/missing-program
+ % mount -t tmpfs -o helper=/usr/sbin/missing-program glorp /mnt
+ % xhost localhost
+ % PATH=$PATH:/mnt
+ % kdevelop
+
+4.4.1 AUTOMATIC LOADING OF MISSING LIBRARIES
+
+ Same as above, but with this line:
+ % LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mnt:$LD_LIBRARY_PATH
+
+4.5 ADVANCED EXAMPLE: Generating plain files
+
+ Doing automatic generation of plain files (as opposed to
+directories, device files and symbolic links) requiers more care,
+because the helper program's attempt to open a file for writing
+will itself invoke another instance of the user level handler.
+
+ One approach, perhaps the only one, is to define some temporary
+file name pattern that your user handler knows to ignore, create the
+file with a temporary name that matches that pattern, and then
+rename the temporary file to the real file name, since rename
+operations are not trapped (and if renames ever are trapped in future,
+releases you could filter out renames where the source matched the
+tempary file pattern).
+
+ [FIXME. This example is untested! --Adam Richter, 2004.11.03]
+
+
+ % cat > /usr/sbin/my-decrypter
+ #!/bin/sh
+ encrypted_dir=$2
+ decrypted_dir=$3
+ key=$4
+ # $5 is "LOOKUP"
+ filename=$6
+
+ case $target in ( tmp/* ) ; exit ;; esac
+ tmpfile=$decrypted_dir/$$
+ decrypt --key $key < $encrypted_dir/$filename > $tmpfile
+ mv $tmpfile $decrypted_dir/$filename
+ ^D
+ % chmod a+x /usr/sbin/my-decrypter
+ % mount -t tmpfs -o \
+ helper='/usr/sbin/my-dcrypter /cryptdir /mnt key' x /mnt
+ % cat /mnt/my-secret-file
+ ....
+
+ If you want to make the temporary directory completely
+inaccessible from the public directory, you can create two mount
+points, where the public directory for the file system is actually
+a subdirectory of a larger hidden directory, like so:
+
+ % mount -t tmpfs /some/place/hidden
+ % chmod go-rwx /someplace/hidden
+ % mkdir /some/place/hidden/mirror
+ % mount --bind /some/place/hidden/mirror /public
+
+ Now you can set up a helper program that operates in some
+other directory of /some/place/hidden, and then renames the
+resultant files into /some/place/hidden/mirror.
+
+4.6. OTHER USES?
+
+ I would be interested in hearing about any other uses that you up
+with for tmpfs, especially if I can include them in this document.
+
+5. SERIALIZATION
+
+ Note that many instances of the user level helper program
+can potentially be running at the same time. It is up to "you",
+the implementor of the helper program to determine what sort of
+serialization you need and implement it. A simple solution to
+enforce complete serialization would be to have every instance
+of the helper program take an exclusive flock on some common
+file.
+
+6. KERNEL DEVELOPER ANSWERS ABOUT IMPLEMENTATION DECISIONS
+
+6.1 Q: Why doesn't tmpfs provide REGISTER and UNREGISTER events when
+ new nodes are created or deleted from the file system, as the
+ mini-devfs implementation from which is derived did?
+
+ A: {,UN}REGISTER in Richard Gooch's implementation of devfs enabled
+ things like automatically setting permissions and sound settings
+ on your sound device when the driver was loaded, even if
+ the loading of the driver had not been caused by devfs.
+ For tmpfs-based devfs, I expect to implement that in
+ a more complex way by shadowing the real devfs file system
+ and creating {,UN}REGISTER events as updates are propagated
+ from the real devfs to /dev. The advantages of this would be
+ that module initialization would not be blockable by a user
+ level program, and events like a device quickly appearing and
+ disappearing could be coalesced (i.e., ignored in this case).
+
+ I'm not convinced that {,UN}REGISTER has to go, but I haven't
+ seen any compelling uses for it, and I know it's politically
+ easier to add a feature than to remove one, especially if anyone
+ has developed a depence on it and does not want to port. So,
+ I'm starting out with tmpfs not providing {,UN}REGISTER events.
+
+6.2 Q: Why isn't this facility implemented as an overlay file system? I'd
+ like to be able to apply it to, say, /dev, without having
+ to start out with /dev being a devfs file system. I could
+ also demand load certain facilities based on accesses to /proc.
+
+ A: There are about two dozen routines in inode_operations and
+ file_operations that would require pass-through versions, and
+ they are not as trivial as you might think because of
+ locking issues involved in going through the vfs layer again.
+ Also, currently, tmpfs, like ramfs and sysfs, use a struct
+ inode and a struct dentry in kernel low memory for every
+ existing node in the file system, about half a kilobyte
+ per entry. So, tmpfs would need to be converted to allow
+ inode structures to be released if it were to overlay
+ potentially large directory trees. Also there are issues
+ related to the underlying file system changing "out from
+ under" tmpfs. Perhaps in the future this an be implemented.
+
+
+6.3. Q: Why isn't tmpfs a built-in kernel facility that can be
+ applied to any file, like dnotify? That would also have
+ the above advantages and could eliminate the mount complexity
+ of overlays.
+
+ A: I thought about defining a separating inode->directory_operations
+ from inode->inode_operations. Since all of those operations that
+ I want to intercept are called with inode->i_sem held, it
+ follows that it would be SMP-safe to change an inode->dir_ops
+ pointer dynamically, which would allow stacking of inode
+ operations. This approach could be used to remove the special
+ case code that is used to implement dnotify. But what happens
+ when the inode is freed? It would be necessary to intercept
+ the victim superblock's superblock_operations.drop_inode
+ routine, which could get pretty messy, especially, if, for
+ example, more than one trap was being used on the same file
+ system. Perhaps if drop_inode were moved to struct
+ inode_operations this would be easier.
+
+
+Adam J. Richter (adam@yggdrasil.com)
+2004.11.02
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2.6.10-rd1-bk12] user-level lookup handler for tmpfs
@ 2004-11-04 7:34 Adam J. Richter
2004-11-04 13:28 ` Matthew Wilcox
0 siblings, 1 reply; 8+ messages in thread
From: Adam J. Richter @ 2004-11-04 7:34 UTC (permalink / raw)
To: matthew; +Cc: linux-fsdevel, Michael.Waychison
On Wed, 3 Nov 2004 14:45:03 +0000, Matthew Wilcox wrote:
>[...] What scenario do you envision where
>this is more useful than having the second attempt block until the first
>one has finished, then just returning the results of the first attempt
>(or retrying if the first attempt was unsuccessful)?
I had previously thought that it would require a lot of
code to do that correctly in with tmpfs user level helpers, but
I was wrong. Here is a new version of the patch, which implements the
change you asked about.
Note that there is a potential problem with using the
GNU version of the "ln -s" command to install symbolic links
with this blocking scheme, because "ln -s" (from GNU fileutils 4.1)
first attempts to do a stat on the target to see if it is a
directory (rather than doing the symlink and then doing the
directory test if symlink() return EEXIST). Other ways of
making symbolic links are not effected.
The GNU "ln -s" behavior can result in a deadlock, which
can only be broken by interrupting the child (just hitting ^C in
the parent won't work, because there is
call_usermodehelper_interruptible, something that should probably
be changed in the future). I've updated
Documenation/filesystems/lookup-trap.txt to describe several ways
to work around the GNU "ln -s" behavior.
In case anyone is following it this closely, this patch also
eliminates struct fs_helper and some routines that dealt with it.
Instead, it uses the superblock->s_umount rw_semaphore to protect
reads of the helper command string from occuring while the command
is changing.
Here is the new patch, this time against 2.6.10-rc1-bk13.
As always, more comments are welcome.
__ ______________
Adam J. Richter \ /
adam@yggdrasil.com | g g d r a s i l
--- linux-2.6.10-rc1-bk13/include/linux/shmem_fs.h 2004-10-18 14:54:55.000000000 -0700
+++ linux/include/linux/shmem_fs.h 2004-11-04 16:08:22.000000000 -0800
@@ -3,6 +3,7 @@
#include <linux/swap.h>
#include <linux/mempolicy.h>
+#include <linux/fshelper.h>
/* inode in-kernel data */
@@ -22,11 +23,13 @@
};
struct shmem_sb_info {
+ int limited; /* 0 = ignore max_blocks and max_inodes */
unsigned long max_blocks; /* How many blocks are allowed */
unsigned long free_blocks; /* How many are left for allocation */
unsigned long max_inodes; /* How many inodes are allowed */
unsigned long free_inodes; /* How many are left for allocation */
spinlock_t stat_lock;
+ char *helper_shell_command;
};
static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
@@ -34,4 +37,6 @@
return container_of(inode, struct shmem_inode_info, vfs_inode);
}
+extern int __init init_tmpfs(void); /* early initialization for devfs */
+
#endif
--- linux-2.6.10-rc1-bk13/include/linux/fshelper.h 1969-12-31 16:00:00.000000000 -0800
+++ linux/include/linux/fshelper.h 2004-11-04 16:08:23.000000000 -0800
@@ -0,0 +1,15 @@
+#ifndef _LINUX_FS_HELPER
+#define _LINUX_FS_HELPER
+
+#include <linux/dcache.h>
+#include <linux/rwsem.h>
+
+/*
+ Note: call_fs_helper releases and retakes dentry->d_parent->d_inode->i_sem.
+*/
+extern void call_fs_helper(char **comand_str_ptr,
+ struct rw_semaphore *command_rwsem,
+ const char *event,
+ struct dentry *dentry);
+
+#endif /* _LINUX_FS_HELPER */
--- linux-2.6.10-rc1-bk13/mm/shmem.c 2004-11-03 21:51:05.000000000 -0800
+++ linux/mm/shmem.c 2004-11-04 16:08:24.000000000 -0800
@@ -14,6 +14,9 @@
* Copyright (c) 2004, Luke Kenneth Casson Leighton <lkcl@lkcl.net>
* Copyright (c) 2004 Red Hat, Inc., James Morris <jmorris@redhat.com>
*
+ * User level helper for directory lookups:
+ * Copyright (C) 2004 Adam J. Richter, Yggdrasil Computing, Inc.
+ *
* This file is released under the GPL.
*/
@@ -46,6 +49,8 @@
#include <linux/mempolicy.h>
#include <linux/namei.h>
#include <linux/xattr.h>
+#include <linux/fshelper.h>
+#include <linux/parser.h>
#include <asm/uaccess.h>
#include <asm/div64.h>
#include <asm/pgtable.h>
@@ -83,6 +88,11 @@
SGP_WRITE, /* may exceed i_size, may allocate page */
};
+struct shmem_userhelper_wait {
+ struct semaphore calling;
+ struct rw_semaphore freeing;
+};
+
static int shmem_getpage(struct inode *inode, unsigned long idx,
struct page **pagep, enum sgp_type sgp, int *type);
@@ -135,7 +145,20 @@
static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb)
{
+#ifdef CONFIG_TMPFS
return sb->s_fs_info;
+#else
+ return NULL; /* compiler optimization */
+#endif
+}
+
+static inline int shmem_have_quotas(struct shmem_sb_info *sb_info)
+{
+#ifdef CONFIG_TMPFS
+ return sb_info->limited;
+#else
+ return 0; /* sb_info will be NULL. */
+#endif
}
/*
@@ -194,7 +217,8 @@
static void shmem_free_blocks(struct inode *inode, long pages)
{
struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
- if (sbinfo) {
+
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
sbinfo->free_blocks += pages;
inode->i_blocks -= pages*BLOCKS_PER_PAGE;
@@ -357,7 +381,7 @@
* page (and perhaps indirect index pages) yet to allocate:
* a waste to allocate index if we cannot allocate data.
*/
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
if (sbinfo->free_blocks <= 1) {
spin_unlock(&sbinfo->stat_lock);
@@ -678,7 +702,7 @@
spin_unlock(&shmem_swaplist_lock);
}
}
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
BUG_ON(inode->i_blocks);
spin_lock(&sbinfo->stat_lock);
sbinfo->free_inodes++;
@@ -1081,7 +1105,7 @@
} else {
shmem_swp_unmap(entry);
sbinfo = SHMEM_SB(inode->i_sb);
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
if (sbinfo->free_blocks == 0 ||
shmem_acct_block(info->flags)) {
@@ -1269,7 +1293,7 @@
struct shmem_inode_info *info;
struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
if (!sbinfo->free_inodes) {
spin_unlock(&sbinfo->stat_lock);
@@ -1598,7 +1622,7 @@
buf->f_type = TMPFS_MAGIC;
buf->f_bsize = PAGE_CACHE_SIZE;
buf->f_namelen = NAME_MAX;
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
buf->f_blocks = sbinfo->max_blocks;
buf->f_bavail = buf->f_bfree = sbinfo->free_blocks;
@@ -1650,6 +1674,154 @@
return shmem_mknod(dir, dentry, mode | S_IFREG, 0);
}
+static inline int shmem_want_to_trap(struct nameidata *nd)
+{
+ return (nd != NULL);
+}
+
+
+/*
+ * Retaining negative dentries for an in-memory filesystem just wastes
+ * memory and lookup time: arrange for them to be deleted immediately.
+ */
+static int shmem_delete_dentry(struct dentry *dentry)
+{
+ return 1;
+}
+
+/* Force revalidation of all negative dentries. Note that this routine
+ only gets called when some other routine has a reference to the dentry
+ or the dentry has an inode. Otherwise, unused negative dentries
+ areimmediately dropped, because of shmem_delete_dentry. Apparently,
+ the only time shmem_dentry_valid ends up being called with an empty
+ inode is when a trapped reference is blocking on the user level helper,
+ which is exactly when we do want to force another lookup so that this
+ new reference will block too. I had previously used dentry->s_fsdata
+ to flag whether someone was a user level helper was blocking on it,
+ but now that seems unnecessary, so we just check dentry->d_inode here.
+*/
+static int shmem_dentry_valid(struct dentry *dentry,
+ struct nameidata *nd)
+{
+ struct shmem_userhelper_wait *wait;
+ struct inode *parent_inode;
+
+ if (dentry->d_inode != NULL)
+ return 1;
+
+ if (!shmem_want_to_trap(nd))
+ return 0;
+
+ parent_inode = dentry->d_parent->d_inode;
+ if (down_interruptible(&parent_inode->i_sem))
+ return (dentry->d_inode != NULL);
+ wait = dentry->d_fsdata;
+ if (!wait) {
+ up(&parent_inode->i_sem);
+ return (dentry->d_inode != NULL);
+ }
+
+ if (!down_read_trylock(&wait->freeing))
+ BUG();
+
+ up(&parent_inode->i_sem);
+
+ if (down_interruptible(&wait->calling) == 0) {
+ /* call_usermodehelper has returned at this point */
+ up(&wait->calling);
+ }
+
+ up_read(&wait->freeing);
+ /* OK for lookup to free the data structure */
+ return (dentry->d_inode != NULL);
+
+}
+
+
+static struct dentry_operations shmem_dentry_maybe_trapped = {
+ .d_delete = shmem_delete_dentry,
+ .d_revalidate = shmem_dentry_valid,
+};
+
+static struct dentry_operations shmem_dentry_not_trapped = {
+ .d_delete = shmem_delete_dentry,
+};
+
+static struct dentry * shmem_lookup(struct inode *dir,
+ struct dentry *dentry,
+ struct nameidata *nd)
+{
+ /*
+ We must do simple_looukp before trapfs_event to prevent
+ a duplicate dentry from being created if the trapfs_helper
+ program attempts to access the same file name in /dev.
+ If simple_lookup returns non-NULL, then that is to an error
+ like a malformed file name, so we do not invoke trapfs_event.
+ If the file is not found but there was no other error,
+ simple_lookup returns NULL, and that is the only case
+ in which we want to generate a notification.
+
+ We also filter out the final path element of mknod, mkdir
+ and symlink, because invoking the helper for mknod and mkdir
+ could lead to deadlock when trapfs loads a device driver
+ kernel module than. One would think that the way to filter
+ would be to look at nd->flags to check that LOOKUP_CREATE
+ is set and LOOKUP_OPEN is clear, but instead, the vfs
+ layers passed nd==NULL in these cases via a routine
+ called lookup_create (without any leading underscores),
+ so we filter out the case where nd == NULL.
+
+ Filtering out nd==NULL has the unintented side-effect of
+ filtering out the final path component of arguments to
+ rmdir, unlink and rename (both source and destination).
+ For rmdir, unlink, and the source arguement to rename,
+ that's fine, since nobody cares about attempts to remove
+ nonexistant files. We're probably also OK skipping the
+ notifications with regard to the destination argument to
+ rename, although that is less clear.
+ */
+
+ struct super_block *sb = dir->i_sb;
+ struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
+
+ struct dentry *result = NULL;
+ struct shmem_userhelper_wait wait;
+ struct dentry *new;
+
+ if (dentry->d_name.len > NAME_MAX)
+ return ERR_PTR(-ENAMETOOLONG);
+
+ if (!shmem_want_to_trap(nd)) {
+ dentry->d_op = &shmem_dentry_not_trapped;
+ d_add(dentry, NULL);
+ } else {
+ dentry->d_op = &shmem_dentry_maybe_trapped;
+ dentry->d_fsdata = &wait;
+
+ init_MUTEX_LOCKED(&wait.calling);
+ init_rwsem(&wait.freeing);
+
+ d_add(dentry, NULL);
+
+ call_fs_helper(&sbinfo->helper_shell_command, &sb->s_umount,
+ "LOOKUP", dentry);
+
+ new = d_lookup(dentry->d_parent, &dentry->d_name);
+
+ if (new != dentry) /* also handles new==NULL */
+ result = new;
+ else
+ dput(new);
+
+ up(&wait.calling);
+ down_write(&wait.freeing);
+ /* no need ever to call up_write(&wait.freeing); */
+ dentry->d_fsdata = NULL;
+ }
+
+ return result;
+}
+
/*
* Link a file..
*/
@@ -1663,7 +1835,7 @@
* but each new link needs a new dentry, pinning lowmem, and
* tmpfs dentries cannot be pruned until they are unlinked.
*/
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
if (!sbinfo->free_inodes) {
spin_unlock(&sbinfo->stat_lock);
@@ -1688,7 +1860,7 @@
if (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode)) {
struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
sbinfo->free_inodes++;
spin_unlock(&sbinfo->stat_lock);
@@ -1840,7 +2012,7 @@
#endif
};
-static int shmem_parse_options(char *options, int *mode, uid_t *uid, gid_t *gid, unsigned long *blocks, unsigned long *inodes)
+static int shmem_parse_options(char *options, int *mode, uid_t *uid, gid_t *gid, unsigned long *blocks, unsigned long *inodes, substring_t *helper)
{
char *this_char, *value, *rest;
@@ -1894,6 +2066,9 @@
*gid = simple_strtoul(value,&rest,0);
if (*rest)
goto bad_val;
+ } else if (!strcmp(this_char,"helper")) {
+ helper->from = value;
+ helper->to = value + strlen(value);
} else {
printk(KERN_ERR "tmpfs: Bad mount option %s\n",
this_char);
@@ -1909,32 +2084,67 @@
}
+static int
+maybe_replace_from_substr(char **target, substring_t *substr)
+{
+ char *str;
+
+ if (substr->from == NULL)
+ return 0;
+
+ if (substr->from == substr->to)
+ str = NULL;
+ else {
+ str = match_strdup(substr);
+ if (!str)
+ return -ENOMEM;
+ }
+ kfree(*target);
+ *target = str;
+ return 0;
+}
+
static int shmem_remount_fs(struct super_block *sb, int *flags, char *data)
{
struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
unsigned long max_blocks = 0;
unsigned long max_inodes = 0;
+ substring_t helper_str;
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
max_blocks = sbinfo->max_blocks;
max_inodes = sbinfo->max_inodes;
}
- if (shmem_parse_options(data, NULL, NULL, NULL, &max_blocks, &max_inodes))
+ helper_str.from = NULL;
+ if (shmem_parse_options(data, NULL, NULL, NULL, &max_blocks, &max_inodes, &helper_str))
return -EINVAL;
+
+ if (maybe_replace_from_substr(&sbinfo->helper_shell_command,
+ &helper_str) != 0)
+ return -ENOMEM;
+
/* Keep it simple: disallow limited <-> unlimited remount */
- if ((max_blocks || max_inodes) == !sbinfo)
+ if ((max_blocks || max_inodes) != shmem_have_quotas(sbinfo))
return -EINVAL;
+
/* But allow the pointless unlimited -> unlimited remount */
- if (!sbinfo)
+ if (!max_blocks && !max_inodes)
return 0;
+
return shmem_set_size(sbinfo, max_blocks, max_inodes);
}
#endif
static void shmem_put_super(struct super_block *sb)
{
- kfree(sb->s_fs_info);
- sb->s_fs_info = NULL;
+#ifdef CONFIG_TMPFS
+ struct shmem_sb_info *sb_info = SHMEM_SB(sb);
+
+ kfree(sb_info->helper_shell_command);
+ kfree(sb_info);
+#endif
+
+ sb->s_fs_info = NULL; /* FIXME. Is this line necessary? */
}
#ifdef CONFIG_TMPFS_XATTR
@@ -1952,16 +2162,19 @@
uid_t uid = current->fsuid;
gid_t gid = current->fsgid;
int err = -ENOMEM;
+ substring_t helper_str;
#ifdef CONFIG_TMPFS
unsigned long blocks = 0;
unsigned long inodes = 0;
+ struct shmem_sb_info *sbinfo;
/*
* Per default we only allow half of the physical ram per
* tmpfs instance, limiting inodes to one per page of lowmem;
* but the internal instance is left unlimited.
*/
+ helper_str.from = NULL;
if (!(sb->s_flags & MS_NOUSER)) {
blocks = totalram_pages / 2;
inodes = totalram_pages - totalhigh_pages;
@@ -1969,22 +2182,28 @@
inodes = blocks;
if (shmem_parse_options(data, &mode,
- &uid, &gid, &blocks, &inodes))
+ &uid, &gid, &blocks, &inodes,
+ &helper_str))
return -EINVAL;
}
- if (blocks || inodes) {
- struct shmem_sb_info *sbinfo;
- sbinfo = kmalloc(sizeof(struct shmem_sb_info), GFP_KERNEL);
- if (!sbinfo)
- return -ENOMEM;
- sb->s_fs_info = sbinfo;
- spin_lock_init(&sbinfo->stat_lock);
- sbinfo->max_blocks = blocks;
- sbinfo->free_blocks = blocks;
- sbinfo->max_inodes = inodes;
- sbinfo->free_inodes = inodes;
- }
+ sbinfo = kmalloc(sizeof(struct shmem_sb_info), GFP_KERNEL);
+ if (!sbinfo)
+ return -ENOMEM;
+
+ sbinfo->limited = (blocks || inodes);
+ sb->s_fs_info = sbinfo;
+ spin_lock_init(&sbinfo->stat_lock);
+ sbinfo->max_blocks = blocks;
+ sbinfo->free_blocks = blocks;
+ sbinfo->max_inodes = inodes;
+ sbinfo->free_inodes = inodes;
+
+ sbinfo->helper_shell_command = NULL;
+ if (maybe_replace_from_substr(&sbinfo->helper_shell_command,
+ &helper_str) != 0)
+ return -ENOMEM;
+
sb->s_xattr = shmem_xattr_handlers;
#endif
@@ -2088,7 +2307,7 @@
static struct inode_operations shmem_dir_inode_operations = {
#ifdef CONFIG_TMPFS
.create = shmem_create,
- .lookup = simple_lookup,
+ .lookup = shmem_lookup,
.link = shmem_link,
.unlink = shmem_unlink,
.symlink = shmem_symlink,
@@ -2192,9 +2411,15 @@
};
static struct vfsmount *shm_mnt;
-static int __init init_tmpfs(void)
+/* init_tmpfs is exported so that devfs can get an earlier initialization
+ if necessary. */
+int __init init_tmpfs(void)
{
int error;
+ static int initialized; /* = 0 */
+
+ if (initialized)
+ return 0;
error = init_inodecache();
if (error)
@@ -2215,6 +2440,7 @@
printk(KERN_ERR "Could not kern_mount tmpfs\n");
goto out1;
}
+ initialized = 1;
return 0;
out1:
@@ -2310,3 +2536,5 @@
vma->vm_ops = &shmem_vm_ops;
return 0;
}
+EXPORT_SYMBOL(shmem_lock);
+EXPORT_SYMBOL(shmem_nopage);
--- linux-2.6.10-rc1-bk13/fs/Makefile 2004-11-03 21:50:59.000000000 -0800
+++ linux/fs/Makefile 2004-11-04 16:08:26.000000000 -0800
@@ -44,6 +44,7 @@
obj-y += devpts/
obj-$(CONFIG_PROFILING) += dcookies.o
+obj-$(CONFIG_TMPFS) += helper.o
# Do not add any filesystems before this line
obj-$(CONFIG_REISERFS_FS) += reiserfs/
--- linux-2.6.10-rc1-bk13/fs/helper.c 1969-12-31 16:00:00.000000000 -0800
+++ linux/fs/helper.c 2004-11-04 16:11:10.000000000 -0800
@@ -0,0 +1,190 @@
+/*
+ userhelper.c -- Invoke user level helper command for a struct dentry.
+
+ Written by Adam J. Richter
+ Copyright (C) 2004 Yggdrasil Computing, Inc.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/ctype.h>
+#include <linux/fshelper.h>
+
+static int path_len (struct dentry *de, struct dentry *root)
+{
+ int len = 0;
+ while (de != root) {
+ len += de->d_name.len + 1; /* count the '/' */
+ de = de->d_parent;
+ }
+ return len; /* -1 because we omit the leading '/',
+ +1 because we include trailing '\0' */
+}
+
+static int write_path_from_mnt (struct dentry *de, char *path, int buflen)
+{
+ struct dentry *mnt_root = de->d_parent->d_inode->i_sb->s_root;
+ int len;
+ char *path_orig = path;
+
+ if (de == NULL || de == mnt_root)
+ return -EINVAL;
+
+ spin_lock(&dcache_lock);
+ len = path_len(de, mnt_root);
+ if (len > buflen) {
+ spin_unlock(&dcache_lock);
+ return -ENAMETOOLONG;
+ }
+
+ path += len - 1;
+ *path = '\0';
+
+ for (;;) {
+ path -= de->d_name.len;
+ memcpy(path, de->d_name.name, de->d_name.len);
+ de = de->d_parent;
+ if (de == mnt_root)
+ break;
+ *(--path) = '/';
+ }
+
+ spin_unlock(&dcache_lock);
+
+ BUG_ON(path != path_orig);
+
+ return 0;
+}
+
+static inline int
+calc_argc(const char *str_in, int *str_len)
+{
+ const char *str = str_in;
+ int argc = 0;
+ while (*str) {
+ while (*str == ' ' || *str == '\t')
+ str++;
+ argc++;
+ while (*str != ' ' && *str != '\t' && *str)
+ str++;
+ }
+ *str_len = str - str_in;
+ return argc;
+}
+
+static char **
+gen_argv(char *str_in, int argc_extra, int *argc_out)
+{
+ int argc;
+ char **argv;
+ char *str_out;
+ int str_len;
+
+ if (!str_in)
+ return NULL;
+
+ while (*str_in == ' ' || *str_in == '\t')
+ str_in++;
+
+ if (*str_in == '\0')
+ return NULL;
+
+ argc = calc_argc(str_in, &str_len);
+
+ argv = kmalloc(((argc + argc_extra) * sizeof(char*)) + str_len + 1,
+ GFP_KERNEL);
+ if (!argv)
+ return NULL;
+
+ str_out = (char*) (argv + argc + argc_extra);
+
+ argc = 0;
+ while (*str_in) {
+ argv[argc++] = str_out;
+
+ while (*str_in != ' ' && *str_in != '\t' && *str_in)
+ *(str_out++) = *(str_in++);
+
+ *(str_out++) = '\0';
+
+ while (*str_in == ' ' || *str_in == '\t')
+ str_in++;
+ }
+ *argc_out = argc;
+ return argv;
+}
+
+/*
+ Warning: dentry_usermodehelper releases and retakes
+ dentry->d_parent->d_inode->i_sem. It must be called with this
+ semaphore already held.
+
+ command_p is a pointer to a single string. It is *not* in argv format.
+ Instead, elements are separated by spaces.
+*/
+void call_fs_helper(char **command_ptr,
+ struct rw_semaphore *command_rwsem,
+ const char *event,
+ struct dentry *dentry)
+{
+ char path[64];
+ int argc;
+ char **argv;
+ struct inode *parent_inode = dentry->d_parent->d_inode;
+
+ if (write_path_from_mnt(dentry, path, sizeof(path)) == 0) {
+
+ up(&parent_inode->i_sem);
+
+ /*
+ FIXME. We would not need the extra memory allocation,
+ string copying, error branch and lines of source code
+ due to err_strdup(), and we could put gen_argv
+ into the set_fs_helper, if call_usermodehelper
+ and execve had a callback to inform us when
+ execve was done copying argv and envp. With
+ such a facility, we could just hold helper->rw_sem
+ up to that point, without having to make a copy of the
+ argument (which we currently do) or hold the semaphore
+ until the helper process exits (which would cause a
+ deadlock if a helper process ever tried to change
+ the helper string of a file system, especially since
+ there is not such a thing as rw_down_read_interruptible
+ that would make the deadlock breakable).
+ */
+
+ down_read(command_rwsem);
+ argv = gen_argv(*command_ptr, 3, &argc);
+ up_read(command_rwsem);
+
+ if (argv != NULL) {
+ static char *envp[] =
+ {"PATH=/bin:/sbin:/usr/bin:/usr/sbin",
+ "HOME=/", NULL };
+
+ argv[argc++] = (char*) event;
+ argv[argc++] = path;
+ argv[argc] = NULL;
+
+ call_usermodehelper(argv[0], argv, envp, 1);
+ kfree(argv);
+ }
+
+ down(&parent_inode->i_sem);
+ }
+}
+EXPORT_SYMBOL_GPL(call_fs_helper);
--- linux-2.6.10-rc1-bk13/Documentation/filesystems/lookup-trap.txt 1969-12-31 16:00:00.000000000 -0800
+++ linux/Documentation/filesystems/lookup-trap.txt 2004-11-04 16:09:25.000000000 -0800
@@ -0,0 +1,431 @@
+User's Guide To Trapping Directory Lookup Operations in Tmpfs
+Version 0.2
+
+
+1. INSTRUCTIONS FOR THE IMPATIENT
+
+ % modprobe tmpfs
+ % mount -t tmpfs -o helper=/path/to/helper/program whatever /mnt
+ % ls /mnt/foo
+ # Notice that "/path/to/helper/program LOOKUP foo" was executed.
+
+
+2. OVERVIEW (from the Kconfig help)
+
+ Tmpfs now allows user level programs to implement file systems
+that are filled in on demand. This feature works by invoking a
+configurable helper program on attempts to open() or stat() a
+nonexistent file. The access waits until the helper finishes, so the
+helper can install the missing file if desired.
+
+ Using this facility, a shell script or small C program can
+implement a file system that automatically mounts remote file systems
+or creates device files on demand, similar to autofs or devfs,
+respectively. Tmpfs is, however, daemonless and, perhaps
+consequently, smaller than either of these, and may avoid some
+recursion problems.
+
+ Tmpfs might also be useful for debugging programs where you
+want to trap the first access a particular file or perhaps in
+automatic installation of missing command or libraries by specifying a
+tmpfs file system in certain search paths.
+
+ This access trapping facility is designed to be easily ported
+to other file systems, see include/linux/fshelper.h and fs/helper.c
+for the implementation.
+
+
+
+3. GETTING STARTED
+
+3.1 MOUNTING THE FILE SYSTEM
+
+ First, build and boot a kernel with tmpfs either compiled in
+or built as a module. If you compile tmpfs as a module, you may have
+to load it, although, since the module name ("tmpfs.ko") and the file
+system name ("tmpfs") match, that may be unnecessary if you've
+configured modprobe automatically.
+
+ % modprobe tmpfs
+
+ Now let's mount a tmpfs file system on /mnt.
+
+ % mount -t tmpfs blah /mnt
+
+ The file system will behave exactly like a ramfs file system.
+In fact, tmpfs is derived from ramfs. You can create files,
+directories, symbolic links and device nodes in it, and they will
+exist only in the computer's main memory. The contents of the file
+system will disappear as soon as you unmount it.
+
+ If you mount multiple instances of tmpfs, you will get
+separate file systems.
+
+
+3.2. THE HELPER PROGRAM
+
+ What distinguishes tmpfs from ramfs is that it can invoke a
+user level helper program when an attempt is made to open or stat a
+nonexistent file for the first time (if the name is not already in the
+dcache). The user level program is set with the "helper" mount
+option. It is possible to set, clear or change the helper command at
+any time, so let's go back to our example and put a helper command on
+the tmpfs file system that we mounted on /mnt:
+
+ % mount -o remount,helper=/tmp/helper /mnt
+
+ If you use the file system in /mnt now, nothing appears to have
+changed. Now let's put a simple shell script in /tmp/helper:
+
+ % cat > /tmp/helper
+ #!/bin/sh
+ echo "$*" > /dev/console
+ ^D
+ % chmod a+x /tmp/helper
+
+ Now you should see console messages like "LOOKUP foo" when you
+try to access the file /mnt/foo for the first time.
+
+ You can also pass arguments to the helper program by using
+spaces in the helper mount option, like so:
+
+ % mount -o remount,helper='/tmp/helper my_argument' /mnt
+
+ If you do this, your console messages will start to look
+something like "my_argument LOOKUP foo". The arguments that
+you specify come before "LOOKUP foo" to facilitate the use of
+command interpreters, like, say, helper='/usr/bin/perl handler.pl'.
+Arguments also make it easy to pass things like the mount point or
+configuration files, which should make it easier to write facilities
+that work on multiple mount points.
+
+ You can also deactivate the helper at any time, like so:
+
+ mount -o remount,helper='' /mnt
+
+4. PRACTICAL EXAMPLES
+
+4.1 AN NFS AUTOMOUNTER
+
+ % cat > /usr/sbin/tmpfs-automount
+ #!/bin/sh
+ topdir=$1
+ host=${3%/*}
+ dir=$topdir/$host
+ mkdir $dir
+ mount -t nfs $host:/ $dir
+ ^D
+ % chmod a+x /usr/sbin/tmpfs-automount
+ % mkdir /auto
+ % mount -t tmpfs -o helper="/usr/sbin/tmpfs-automount /auto" x /auto
+
+ Notice how we pass the additional argument "/auto" to the
+tmpfs-automount command.
+
+ If you want automatic unmount after a timeout, you'll probably
+want to do something a little more elaborate, perhaps with a script that
+runs from cron.
+
+
+4.2 DEMAND LOADING OF DEVICE DRIVERS
+
+ A version of devfs that uses tmpfs is under development and
+running on the system I am using to write this document, but I am
+still cleaning it up. Here is how it should work, although I have
+not yet actually tried devfs_helper on it.
+
+ The devfs_helper program was originally written for a stripped down
+rewrite of devfs, from which tmpfs is derived. It can read your
+/etc/devfs.conf file (the file previously used to configured
+devfsd) and load modules specified by "LOOKUP" commands. Other
+devfs.conf command are ignored.
+
+ % ftp ftp.yggdrasil.com
+ login: anonymous
+ password; guest
+ ftp> cd /pub/dist/device_control/devfs
+ ftp> get devfs_helper-0.2.tar.gz
+ .....
+ ftp> quit
+ % tar xfpvz devfs_helper-0.2.tar.gz
+ % cd devfs_helper-0.2
+ % make
+ % make install
+ % mkdir /tmp/tmpdev
+ % mount -t devfs /tmp/tmpdev
+ % cp -apRx /dev/* /tmpdev/
+ % mount -t devfs -o helper=/sbin/devfs_helper blah /dev
+ % mount -t msods /dev/floppy/0 /mnt
+
+ The above example should load the floppy.ko kernel module
+if you have a a line in your /etc/devfs.conf file like this:
+
+ LOOKUP floppy EXECUTE modprobe floppy
+
+
+ You should also be able to use execfs in this fashion to get
+automatic loading of kernel modules on non-devfs systems, although
+you'll need something like udev the larger udev to create the
+device files once the device drivers are registered.
+
+
+4.3 DEBUGGING A PROGRAM TRYING TO ACCESS A FILE
+
+ % cat > /tmp/call-sleep
+ #!/bin/sh
+ sleep 30
+ ^D
+ % mount -t tmpfs -o helper=/tmp/call-sleep foo /mnt
+ % mv .bashrc .bashrc-
+ % ln -s /mnt/whatever .bashrc
+ % gdb /bin/sh
+ GNU gdb 5.2
+ [blah blah blah]
+ (gdb) run
+ [program eventually hangs. Switch to another terminal session. You
+ cannot control-C out of it, a tmpfs bug from call_usermodehelper.]
+ % ps axf
+ [Find the process under gdb. Let say it's pid 1152.]
+ % kill -SEGV 1152
+ % ps auxww | grep sleep
+ [Find the sleeping tmpfs helper. Let's say it's pid 1120.]
+ % kill -9 1120
+ [Now back at the first session, running gdb on /bin/sh.]
+ Program received signal SIGSEGV, Segmentation fault.
+ 0xb7f303d4 in __libc_open () at __libc_open:-1
+ -1 __libc_open: No such file or directory.
+ in __libc_open
+ (gdb) where
+ #0 0xb7f303d4 in __libc_open () at __libc_open:-1
+ #1 0xb7f8b4c0 in __DTOR_END__ () from /lib/libc.so.6
+ #2 0x080921ef in _evalfile (filename=0x80dc788 "/tmp/junk/.bashrc", flags=9)
+ at evalfile.c:85
+ #3 0x08092635 in maybe_execute_file (
+ fname=0xfffffffe <Address 0xfffffffe out of bounds>,
+ force_noninteractive=1) at evalfile.c:218
+ #4 0x08059fe8 in run_startup_files () at shell.c:1019
+ #5 0x08059849 in main (argc=1, argv=0xbfffebc4, env=0xbfffebcc) at shell.c:581
+ #6 0xb7e88e02 in __libc_start_main (main=0x8059380 <main>, argc=1,
+ ubp_av=0xbfffebc4, init=0x805897c <_init>,
+ fini=0xb80005ac <_dl_debug_mask>, rtld_fini=0x8000, stack_end=0x0)
+ at ../sysdeps/generic/libc-start.c:129
+
+
+4.4 AUTOMATIC LOADING OF MISSING PROGRAMS
+
+ % cat > /usr/sbin/missing-program
+ #!/bin/sh
+ my-automatic-network-downloader $2
+ ^D
+ % chmod a+x /usr/sbin/missing-program
+ % mount -t tmpfs -o helper=/usr/sbin/my-automatic-installer /mnt
+ % PATH=$PATH:/mnt:$PATH
+ # We include $PATH a second time so that the program can be
+ # found after it is installed.
+ % kdevelop # Or some other program you don't have...
+
+ ...or maybe something like this...
+
+ % cat > /usr/sbin/missing-program
+ #!/bin/sh
+ export DISPLAY=:0
+ konqueror http://www.google.com/search?q="download+$2" &
+ ^D
+ % chmod a+x /usr/sbin/missing-program
+ % mount -t tmpfs -o helper=/usr/sbin/missing-program glorp /mnt
+ % xhost localhost
+ % PATH=$PATH:/mnt
+ % kdevelop
+
+4.4.1 AUTOMATIC LOADING OF MISSING LIBRARIES
+
+ Same as above, but with this line:
+ % LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mnt:$LD_LIBRARY_PATH
+
+4.5 ADVANCED EXAMPLE: Generating plain files
+
+ Doing automatic generation of plain files (as opposed to
+directories, device files and symbolic links) requiers more care,
+because the helper program's attempt to open a file for writing
+will itself invoke another instance of the user level handler.
+
+ One approach, perhaps the only one, is to define some temporary
+file name pattern that your user handler knows to ignore, create the
+file with a temporary name that matches that pattern, and then
+rename the temporary file to the real file name, since rename
+operations are not trapped (and if renames ever are trapped in future,
+releases you could filter out renames where the source matched the
+tempary file pattern).
+
+ [FIXME. This example is untested! --Adam Richter, 2004.11.03]
+
+
+ % cat > /usr/sbin/my-decrypter
+ #!/bin/sh
+ encrypted_dir=$2
+ decrypted_dir=$3
+ key=$4
+ # $5 is "LOOKUP"
+ filename=$6
+
+ case $target in ( tmp/* ) ; exit ;; esac
+ tmpfile=$decrypted_dir/$$
+ decrypt --key $key < $encrypted_dir/$filename > $tmpfile
+ mv $tmpfile $decrypted_dir/$filename
+ ^D
+ % chmod a+x /usr/sbin/my-decrypter
+ % mount -t tmpfs -o \
+ helper='/usr/sbin/my-dcrypter /cryptdir /mnt key' x /mnt
+ % cat /mnt/my-secret-file
+ ....
+
+ If you want to make the temporary directory completely
+inaccessible from the public directory, you can create two mount
+points, where the public directory for the file system is actually
+a subdirectory of a larger hidden directory, like so:
+
+ % mount -t tmpfs /some/place/hidden
+ % chmod go-rwx /someplace/hidden
+ % mkdir /some/place/hidden/mirror
+ % mount --bind /some/place/hidden/mirror /public
+
+ Now you can set up a helper program that operates in some
+other directory of /some/place/hidden, and then renames the
+resultant files into /some/place/hidden/mirror.
+
+
+4.5.1 A WARNING ABOUT SYMBOLIC LINKS AND THE GNU "ln -s" COMMAND
+
+ If your helper program invoke the shell command "ln" to create
+symbolic links, you may need to use "filter and rename" technique that
+was described above for plain files, or one of several other
+workarounds listed below. If you helper program just uses the
+symlink() system call directly, you don't have to worry about this.
+
+ What is the problem, exactly? The problem is that although
+symlink system calls are not trapped, the "ln" command from version
+4.1 of the GNU fileutils package (latest version as of this writing)
+does a stat system call on the target before doing the symlink,
+because it is specified to behave differently if the destination path
+exists and is a directory. stat, unlink symlink, is trapped, in order
+to support automatic creation of files in case a program stats a file
+before deciding whether to open it, and for things like "ls
+/auto/fileserver1.mycompany.com/". Consequently, a user level helper
+shell script that simply does "ln -s whatever $target_file_name" will
+deadlock (which can be broken by interrupting the child helper
+program).
+
+ There are several possible solutions to this problem.
+
+ 1. You can use a simpler symlink program, such as ssln,
+ instead of "ln -s".
+
+ 2. If the final path element of the symlink's contents is
+ the same as the final path element of the symlink's name,
+ then you can just specify the directory name as the second
+ argument to "ln -s". In other words, change
+
+ ln -s foo/bar /the/target/bar
+ ...to...
+ ln -s foo/bar /the/target
+
+ 3. You can use the same filtering techniques for symlink
+ as discussed for plain files in the previous section
+ (create them with speical temporary names that your helper
+ program knows to ignore and then rename them into place).
+
+ 4. Invoke perl to do the symlink, if you know you have it available:
+
+ perl -e 'symlink("contents", "target");'
+
+ 5. You can port shell script to C or perl or some other
+ language that gives you direct access to the symlink()
+ system call.
+
+
+ Perhaps, in the future, the GNU ln command could be changed so
+that, when it is called with exactly two file names, it would try to
+do the symlink() and then check if the target is a directory only if
+the symlink attempt failed.
+
+
+4.6. OTHER USES?
+
+ I would be interested in hearing about any other uses that you up
+with for tmpfs, especially if I can include them in this document.
+
+5. SERIALIZATION
+
+ Note that many instances of the user level helper program
+can potentially be running at the same time. It is up to "you",
+the implementor of the helper program to determine what sort of
+serialization you need and implement it. A simple solution to
+enforce complete serialization would be to have every instance
+of the helper program take an exclusive flock on some common
+file.
+
+6. KERNEL DEVELOPER ANSWERS ABOUT IMPLEMENTATION DECISIONS
+
+6.1 Q: Why doesn't tmpfs provide REGISTER and UNREGISTER events when
+ new nodes are created or deleted from the file system, as the
+ mini-devfs implementation from which is derived did?
+
+ A: {,UN}REGISTER in Richard Gooch's implementation of devfs enabled
+ things like automatically setting permissions and sound settings
+ on your sound device when the driver was loaded, even if
+ the loading of the driver had not been caused by devfs.
+ For tmpfs-based devfs, I expect to implement that in
+ a more complex way by shadowing the real devfs file system
+ and creating {,UN}REGISTER events as updates are propagated
+ from the real devfs to /dev. The advantages of this would be
+ that module initialization would not be blockable by a user
+ level program, and events like a device quickly appearing and
+ disappearing could be coalesced (i.e., ignored in this case).
+
+ I'm not convinced that {,UN}REGISTER has to go, but I haven't
+ seen any compelling uses for it, and I know it's politically
+ easier to add a feature than to remove one, especially if anyone
+ has developed a depence on it and does not want to port. So,
+ I'm starting out with tmpfs not providing {,UN}REGISTER events.
+
+6.2 Q: Why isn't this facility implemented as an overlay file system? I'd
+ like to be able to apply it to, say, /dev, without having
+ to start out with /dev being a devfs file system. I could
+ also demand load certain facilities based on accesses to /proc.
+
+ A: There are about two dozen routines in inode_operations and
+ file_operations that would require pass-through versions, and
+ they are not as trivial as you might think because of
+ locking issues involved in going through the vfs layer again.
+ Also, currently, tmpfs, like ramfs and sysfs, use a struct
+ inode and a struct dentry in kernel low memory for every
+ existing node in the file system, about half a kilobyte
+ per entry. So, tmpfs would need to be converted to allow
+ inode structures to be released if it were to overlay
+ potentially large directory trees. Also there are issues
+ related to the underlying file system changing "out from
+ under" tmpfs. Perhaps in the future this an be implemented.
+
+
+6.3. Q: Why isn't tmpfs a built-in kernel facility that can be
+ applied to any file, like dnotify? That would also have
+ the above advantages and could eliminate the mount complexity
+ of overlays.
+
+ A: I thought about defining a separating inode->directory_operations
+ from inode->inode_operations. Since all of those operations that
+ I want to intercept are called with inode->i_sem held, it
+ follows that it would be SMP-safe to change an inode->dir_ops
+ pointer dynamically, which would allow stacking of inode
+ operations. This approach could be used to remove the special
+ case code that is used to implement dnotify. But what happens
+ when the inode is freed? It would be necessary to intercept
+ the victim superblock's superblock_operations.drop_inode
+ routine, which could get pretty messy, especially, if, for
+ example, more than one trap was being used on the same file
+ system. Perhaps if drop_inode were moved to struct
+ inode_operations this would be easier.
+
+
+Adam J. Richter (adam@yggdrasil.com)
+2004.11.02
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2.6.10-rd1-bk12] user-level lookup handler for tmpfs
2004-11-04 7:34 Adam J. Richter
@ 2004-11-04 13:28 ` Matthew Wilcox
0 siblings, 0 replies; 8+ messages in thread
From: Matthew Wilcox @ 2004-11-04 13:28 UTC (permalink / raw)
To: Adam J. Richter; +Cc: matthew, linux-fsdevel, Michael.Waychison
On Wed, Nov 03, 2004 at 11:34:25PM -0800, Adam J. Richter wrote:
> Note that there is a potential problem with using the
> GNU version of the "ln -s" command to install symbolic links
> with this blocking scheme, because "ln -s" (from GNU fileutils 4.1)
> first attempts to do a stat on the target to see if it is a
> directory (rather than doing the symlink and then doing the
> directory test if symlink() return EEXIST). Other ways of
> making symbolic links are not effected.
FWIW, ln still has this behaviour in coreutils 5.2.1.
--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2.6.10-rd1-bk12] user-level lookup handler for tmpfs
@ 2004-11-05 8:27 Adam J. Richter
0 siblings, 0 replies; 8+ messages in thread
From: Adam J. Richter @ 2004-11-05 8:27 UTC (permalink / raw)
To: Michael.Waychison; +Cc: linux-fsdevel
Mike Waychison wrote:
>Adam J. Richter wrote:
>> This patch eliminates the user level race condition that
>> I mentioned in my original trapfs announcement and which Michael
>> Waychison also complained out. Now, if an attempt is made to
>> open or stat a file name that is already blocking on a user
>> level helper, the new attempt will also invoke a user level
>> helper and block.
>>
>> A side effect of this is that, if a user level program
>> wants to create a plain text file, the helper program has to create
>> it with some file name that the program knows to ignore and then
>> rename them to the correct file names. This is because there is
>> nothing special about the user level helper program's attempts to
>> open the nonexistant files, so that open() invokes another instance
>> instance of a user level helper program, which needs to know to ignore
>> the file name. This should be pretty easy to do, and I've added an
>> untested hypothetical example of it to
>> Documentation/filesystems/lookup-trap.txt.
>So the ignoring is done by the userspace agent?
Yes.
>> If you want to have a directory tree that has no such file
>> names, then make it a "mount --bind" copy of a subdirectory of a
>> hidden tmpfs file system, something like this:
>>
>> mount -t tmpfs /hidden
>> mkdir /hidden/mirror /hidden/tmp-files
>> mount --bind /hidden/mirror /public
>>
>> ...and then have your user level helper program create
>> files in /hidden/tmp-files and move them to /hidden/mirror or /public
>> (makes no difference, although using /hidden/mirror would make it clearer
>> that the move is within the same file system).
>Stuff like moving a directory with a mountpoint on it can't be done
>(from userspace). This scheme would require doing an explicit mkdir and
>a mount --move.
I am sorry if I did not describe this clearly enough. I am
not talking about having the user level helper moving either of the
two mount points. Let's say I want to make an automatically
decompressing file system (probably not something you want to store
files in RAM for, but it's an example). Let's set up an arrangement
similar to the one described above:
% mount -t tmpfs -o /usr/libexec/my-decompressor /hidden
% mkdir /hidden/mirror /hidden/tmp-files
% mount --bind /hidden/mirror /public
If a user does "cat /public/foo", the kernel will invoke the
following helper command: "/usr/libexec/my-decompressor hidden/mirror/foo".
The user level helper program would look something like this:
#!/bin/sh
target=$2
case $target in ( tmp-files/* ) ; exit ;; esac # Ignore non-public lookups
target=${target#mirror/}
gunzip /my-compressed-directory/${target}.gz > /hidden/tmp-files/$target
mv /hidden/{tmp-files,mirror}/$target
# Or, alternatively: mv /hidden/tmp-files/$target /public/$target
My reference to moving things was about the "mv" command,
not about moving mount points. I hope that clarifies it.
__ ______________
Adam J. Richter \ /
adam@yggdrasil.com | g g d r a s i l
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2.6.10-rd1-bk12] user-level lookup handler for tmpfs
@ 2004-11-05 9:46 Adam J. Richter
0 siblings, 0 replies; 8+ messages in thread
From: Adam J. Richter @ 2004-11-05 9:46 UTC (permalink / raw)
To: linux-fsdevel
Here is yet another new version of the tmpfs lookup
trapping patch (previously the trapfs file system, but changed
to a tmpfs feature per Greg Kroah-Hartmann's suggestion).
This version moves almost all of the lookup trapping code out
of tmpfs proper into fs/lookuptrap.c, which does not "#include"
anything about tmpfs, although it is currently only compiled into the
kernel if tmpfs is. fs/lookuptrap.c exports a single symbol,
trapping_simple_lookup(), which should make it easy for some other
similar file systems to adopt this feature if desired.
Like the patch I posted several hours ago, this version
includes the changes for the behavior when an access occurs to a
name that is already blocking on a user level handler, to address
concerns raised by Mike Waychison and Matthew Wilcox. Now, such
an access will block until the first user level handler returns,
without invoking a new user level handler.
Right now, the only other change that I'm contemplating before
posting this to lkml (hopefully tomorrow) is to rename fs/helper.c and
include/linux/fshelper.c to something clearer, but I don't know what
(fsusermodehelper.h, fscallback.h, fsnotify.h?). However, if anyone
has any other input, I'd be happy to hear it and try to address it.
I'll be happy to try address them.
__ ______________
Adam J. Richter \ /
adam@yggdrasil.com | g g d r a s i l
--- linux-2.6.10-rc1-bk14/include/linux/shmem_fs.h 2004-10-18 14:54:55.000000000 -0700
+++ linux/include/linux/shmem_fs.h 2004-11-04 16:11:42.000000000 -0800
@@ -3,6 +3,7 @@
#include <linux/swap.h>
#include <linux/mempolicy.h>
+#include <linux/fshelper.h>
/* inode in-kernel data */
@@ -22,11 +23,13 @@
};
struct shmem_sb_info {
+ int limited; /* 0 = ignore max_blocks and max_inodes */
unsigned long max_blocks; /* How many blocks are allowed */
unsigned long free_blocks; /* How many are left for allocation */
unsigned long max_inodes; /* How many inodes are allowed */
unsigned long free_inodes; /* How many are left for allocation */
spinlock_t stat_lock;
+ char *helper_shell_command;
};
static inline struct shmem_inode_info *SHMEM_I(struct inode *inode)
@@ -34,4 +37,6 @@
return container_of(inode, struct shmem_inode_info, vfs_inode);
}
+extern int __init init_tmpfs(void); /* early initialization for devfs */
+
#endif
--- linux-2.6.10-rc1-bk14/include/linux/fshelper.h 1969-12-31 16:00:00.000000000 -0800
+++ linux/include/linux/fshelper.h 2004-11-04 16:11:42.000000000 -0800
@@ -0,0 +1,15 @@
+#ifndef _LINUX_FS_HELPER
+#define _LINUX_FS_HELPER
+
+#include <linux/dcache.h>
+#include <linux/rwsem.h>
+
+/*
+ Note: call_fs_helper releases and retakes dentry->d_parent->d_inode->i_sem.
+*/
+extern void call_fs_helper(char **comand_str_ptr,
+ struct rw_semaphore *command_rwsem,
+ const char *event,
+ struct dentry *dentry);
+
+#endif /* _LINUX_FS_HELPER */
--- linux-2.6.10-rc1-bk14/include/linux/lookuptrap.h 1969-12-31 16:00:00.000000000 -0800
+++ linux/include/linux/lookuptrap.h 2004-11-04 23:28:11.000000000 -0800
@@ -0,0 +1,12 @@
+#ifndef _LINUX_LOOKUPTRAP_H
+#define _LINUX_LOOKUPTRAP_H
+
+#include <linux/fs.h>
+#include <linux/dcache.h>
+
+extern struct dentry *trapping_simple_lookup(struct inode *dir,
+ struct dentry *dentry,
+ struct nameidata *nd,
+ char **shell_command_ptr);
+
+#endif /* _LINUX_LOOKUPTRAP_H */
--- linux-2.6.10-rc1-bk14/mm/shmem.c 2004-11-04 23:32:44.000000000 -0800
+++ linux/mm/shmem.c 2004-11-04 23:28:11.000000000 -0800
@@ -14,6 +14,9 @@
* Copyright (c) 2004, Luke Kenneth Casson Leighton <lkcl@lkcl.net>
* Copyright (c) 2004 Red Hat, Inc., James Morris <jmorris@redhat.com>
*
+ * User level helper for directory lookups:
+ * Copyright (C) 2004 Adam J. Richter, Yggdrasil Computing, Inc.
+ *
* This file is released under the GPL.
*/
@@ -46,6 +49,8 @@
#include <linux/mempolicy.h>
#include <linux/namei.h>
#include <linux/xattr.h>
+#include <linux/lookuptrap.h>
+#include <linux/parser.h>
#include <asm/uaccess.h>
#include <asm/div64.h>
#include <asm/pgtable.h>
@@ -135,7 +140,20 @@
static inline struct shmem_sb_info *SHMEM_SB(struct super_block *sb)
{
+#ifdef CONFIG_TMPFS
return sb->s_fs_info;
+#else
+ return NULL; /* compiler optimization */
+#endif
+}
+
+static inline int shmem_have_quotas(struct shmem_sb_info *sb_info)
+{
+#ifdef CONFIG_TMPFS
+ return sb_info->limited;
+#else
+ return 0; /* sb_info will be NULL. */
+#endif
}
/*
@@ -194,7 +212,8 @@
static void shmem_free_blocks(struct inode *inode, long pages)
{
struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
- if (sbinfo) {
+
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
sbinfo->free_blocks += pages;
inode->i_blocks -= pages*BLOCKS_PER_PAGE;
@@ -357,7 +376,7 @@
* page (and perhaps indirect index pages) yet to allocate:
* a waste to allocate index if we cannot allocate data.
*/
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
if (sbinfo->free_blocks <= 1) {
spin_unlock(&sbinfo->stat_lock);
@@ -678,7 +697,7 @@
spin_unlock(&shmem_swaplist_lock);
}
}
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
BUG_ON(inode->i_blocks);
spin_lock(&sbinfo->stat_lock);
sbinfo->free_inodes++;
@@ -1081,7 +1100,7 @@
} else {
shmem_swp_unmap(entry);
sbinfo = SHMEM_SB(inode->i_sb);
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
if (sbinfo->free_blocks == 0 ||
shmem_acct_block(info->flags)) {
@@ -1269,7 +1288,7 @@
struct shmem_inode_info *info;
struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
if (!sbinfo->free_inodes) {
spin_unlock(&sbinfo->stat_lock);
@@ -1598,7 +1617,7 @@
buf->f_type = TMPFS_MAGIC;
buf->f_bsize = PAGE_CACHE_SIZE;
buf->f_namelen = NAME_MAX;
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
buf->f_blocks = sbinfo->max_blocks;
buf->f_bavail = buf->f_bfree = sbinfo->free_blocks;
@@ -1650,6 +1669,16 @@
return shmem_mknod(dir, dentry, mode | S_IFREG, 0);
}
+static struct dentry * shmem_lookup(struct inode *dir,
+ struct dentry *dentry,
+ struct nameidata *nd)
+{
+ struct shmem_sb_info *sbinfo = SHMEM_SB(dir->i_sb);
+
+ return trapping_simple_lookup(dir, dentry, nd,
+ &sbinfo->helper_shell_command);
+}
+
/*
* Link a file..
*/
@@ -1663,7 +1692,7 @@
* but each new link needs a new dentry, pinning lowmem, and
* tmpfs dentries cannot be pruned until they are unlinked.
*/
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
if (!sbinfo->free_inodes) {
spin_unlock(&sbinfo->stat_lock);
@@ -1688,7 +1717,7 @@
if (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode)) {
struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
spin_lock(&sbinfo->stat_lock);
sbinfo->free_inodes++;
spin_unlock(&sbinfo->stat_lock);
@@ -1840,7 +1869,7 @@
#endif
};
-static int shmem_parse_options(char *options, int *mode, uid_t *uid, gid_t *gid, unsigned long *blocks, unsigned long *inodes)
+static int shmem_parse_options(char *options, int *mode, uid_t *uid, gid_t *gid, unsigned long *blocks, unsigned long *inodes, substring_t *helper)
{
char *this_char, *value, *rest;
@@ -1894,6 +1923,9 @@
*gid = simple_strtoul(value,&rest,0);
if (*rest)
goto bad_val;
+ } else if (!strcmp(this_char,"helper")) {
+ helper->from = value;
+ helper->to = value + strlen(value);
} else {
printk(KERN_ERR "tmpfs: Bad mount option %s\n",
this_char);
@@ -1909,32 +1941,67 @@
}
+static int
+maybe_replace_from_substr(char **target, substring_t *substr)
+{
+ char *str;
+
+ if (substr->from == NULL)
+ return 0;
+
+ if (substr->from == substr->to)
+ str = NULL;
+ else {
+ str = match_strdup(substr);
+ if (!str)
+ return -ENOMEM;
+ }
+ kfree(*target);
+ *target = str;
+ return 0;
+}
+
static int shmem_remount_fs(struct super_block *sb, int *flags, char *data)
{
struct shmem_sb_info *sbinfo = SHMEM_SB(sb);
unsigned long max_blocks = 0;
unsigned long max_inodes = 0;
+ substring_t helper_str;
- if (sbinfo) {
+ if (shmem_have_quotas(sbinfo)) {
max_blocks = sbinfo->max_blocks;
max_inodes = sbinfo->max_inodes;
}
- if (shmem_parse_options(data, NULL, NULL, NULL, &max_blocks, &max_inodes))
+ helper_str.from = NULL;
+ if (shmem_parse_options(data, NULL, NULL, NULL, &max_blocks, &max_inodes, &helper_str))
return -EINVAL;
+
+ if (maybe_replace_from_substr(&sbinfo->helper_shell_command,
+ &helper_str) != 0)
+ return -ENOMEM;
+
/* Keep it simple: disallow limited <-> unlimited remount */
- if ((max_blocks || max_inodes) == !sbinfo)
+ if ((max_blocks || max_inodes) != shmem_have_quotas(sbinfo))
return -EINVAL;
+
/* But allow the pointless unlimited -> unlimited remount */
- if (!sbinfo)
+ if (!max_blocks && !max_inodes)
return 0;
+
return shmem_set_size(sbinfo, max_blocks, max_inodes);
}
#endif
static void shmem_put_super(struct super_block *sb)
{
- kfree(sb->s_fs_info);
- sb->s_fs_info = NULL;
+#ifdef CONFIG_TMPFS
+ struct shmem_sb_info *sb_info = SHMEM_SB(sb);
+
+ kfree(sb_info->helper_shell_command);
+ kfree(sb_info);
+#endif
+
+ sb->s_fs_info = NULL; /* FIXME. Is this line necessary? */
}
#ifdef CONFIG_TMPFS_XATTR
@@ -1952,16 +2019,19 @@
uid_t uid = current->fsuid;
gid_t gid = current->fsgid;
int err = -ENOMEM;
+ substring_t helper_str;
#ifdef CONFIG_TMPFS
unsigned long blocks = 0;
unsigned long inodes = 0;
+ struct shmem_sb_info *sbinfo;
/*
* Per default we only allow half of the physical ram per
* tmpfs instance, limiting inodes to one per page of lowmem;
* but the internal instance is left unlimited.
*/
+ helper_str.from = NULL;
if (!(sb->s_flags & MS_NOUSER)) {
blocks = totalram_pages / 2;
inodes = totalram_pages - totalhigh_pages;
@@ -1969,22 +2039,28 @@
inodes = blocks;
if (shmem_parse_options(data, &mode,
- &uid, &gid, &blocks, &inodes))
+ &uid, &gid, &blocks, &inodes,
+ &helper_str))
return -EINVAL;
}
- if (blocks || inodes) {
- struct shmem_sb_info *sbinfo;
- sbinfo = kmalloc(sizeof(struct shmem_sb_info), GFP_KERNEL);
- if (!sbinfo)
- return -ENOMEM;
- sb->s_fs_info = sbinfo;
- spin_lock_init(&sbinfo->stat_lock);
- sbinfo->max_blocks = blocks;
- sbinfo->free_blocks = blocks;
- sbinfo->max_inodes = inodes;
- sbinfo->free_inodes = inodes;
- }
+ sbinfo = kmalloc(sizeof(struct shmem_sb_info), GFP_KERNEL);
+ if (!sbinfo)
+ return -ENOMEM;
+
+ sbinfo->limited = (blocks || inodes);
+ sb->s_fs_info = sbinfo;
+ spin_lock_init(&sbinfo->stat_lock);
+ sbinfo->max_blocks = blocks;
+ sbinfo->free_blocks = blocks;
+ sbinfo->max_inodes = inodes;
+ sbinfo->free_inodes = inodes;
+
+ sbinfo->helper_shell_command = NULL;
+ if (maybe_replace_from_substr(&sbinfo->helper_shell_command,
+ &helper_str) != 0)
+ return -ENOMEM;
+
sb->s_xattr = shmem_xattr_handlers;
#endif
@@ -2088,7 +2164,7 @@
static struct inode_operations shmem_dir_inode_operations = {
#ifdef CONFIG_TMPFS
.create = shmem_create,
- .lookup = simple_lookup,
+ .lookup = shmem_lookup,
.link = shmem_link,
.unlink = shmem_unlink,
.symlink = shmem_symlink,
@@ -2192,9 +2268,15 @@
};
static struct vfsmount *shm_mnt;
-static int __init init_tmpfs(void)
+/* init_tmpfs is exported so that devfs can get an earlier initialization
+ if necessary. */
+int __init init_tmpfs(void)
{
int error;
+ static int initialized; /* = 0 */
+
+ if (initialized)
+ return 0;
error = init_inodecache();
if (error)
@@ -2215,6 +2297,7 @@
printk(KERN_ERR "Could not kern_mount tmpfs\n");
goto out1;
}
+ initialized = 1;
return 0;
out1:
@@ -2310,3 +2393,5 @@
vma->vm_ops = &shmem_vm_ops;
return 0;
}
+EXPORT_SYMBOL(shmem_lock);
+EXPORT_SYMBOL(shmem_nopage);
--- linux-2.6.10-rc1-bk14/fs/Makefile 2004-11-04 23:32:39.000000000 -0800
+++ linux/fs/Makefile 2004-11-04 23:08:57.000000000 -0800
@@ -44,6 +44,8 @@
obj-y += devpts/
obj-$(CONFIG_PROFILING) += dcookies.o
+obj-$(CONFIG_TMPFS) += helper.o
+obj-$(CONFIG_TMPFS) += lookuptrap.o
# Do not add any filesystems before this line
obj-$(CONFIG_REISERFS_FS) += reiserfs/
--- linux-2.6.10-rc1-bk14/fs/helper.c 1969-12-31 16:00:00.000000000 -0800
+++ linux/fs/helper.c 2004-11-04 23:29:17.000000000 -0800
@@ -0,0 +1,190 @@
+/*
+ helper.c -- Invoke user level helper command for a struct dentry.
+
+ Written by Adam J. Richter
+ Copyright (C) 2004 Yggdrasil Computing, Inc.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/ctype.h>
+#include <linux/fshelper.h>
+
+static int path_len (struct dentry *de, struct dentry *root)
+{
+ int len = 0;
+ while (de != root) {
+ len += de->d_name.len + 1; /* count the '/' */
+ de = de->d_parent;
+ }
+ return len; /* -1 because we omit the leading '/',
+ +1 because we include trailing '\0' */
+}
+
+static int write_path_from_mnt (struct dentry *de, char *path, int buflen)
+{
+ struct dentry *mnt_root = de->d_parent->d_inode->i_sb->s_root;
+ int len;
+ char *path_orig = path;
+
+ if (de == NULL || de == mnt_root)
+ return -EINVAL;
+
+ spin_lock(&dcache_lock);
+ len = path_len(de, mnt_root);
+ if (len > buflen) {
+ spin_unlock(&dcache_lock);
+ return -ENAMETOOLONG;
+ }
+
+ path += len - 1;
+ *path = '\0';
+
+ for (;;) {
+ path -= de->d_name.len;
+ memcpy(path, de->d_name.name, de->d_name.len);
+ de = de->d_parent;
+ if (de == mnt_root)
+ break;
+ *(--path) = '/';
+ }
+
+ spin_unlock(&dcache_lock);
+
+ BUG_ON(path != path_orig);
+
+ return 0;
+}
+
+static inline int
+calc_argc(const char *str_in, int *str_len)
+{
+ const char *str = str_in;
+ int argc = 0;
+ while (*str) {
+ while (*str == ' ' || *str == '\t')
+ str++;
+ argc++;
+ while (*str != ' ' && *str != '\t' && *str)
+ str++;
+ }
+ *str_len = str - str_in;
+ return argc;
+}
+
+static char **
+gen_argv(char *str_in, int argc_extra, int *argc_out)
+{
+ int argc;
+ char **argv;
+ char *str_out;
+ int str_len;
+
+ if (!str_in)
+ return NULL;
+
+ while (*str_in == ' ' || *str_in == '\t')
+ str_in++;
+
+ if (*str_in == '\0')
+ return NULL;
+
+ argc = calc_argc(str_in, &str_len);
+
+ argv = kmalloc(((argc + argc_extra) * sizeof(char*)) + str_len + 1,
+ GFP_KERNEL);
+ if (!argv)
+ return NULL;
+
+ str_out = (char*) (argv + argc + argc_extra);
+
+ argc = 0;
+ while (*str_in) {
+ argv[argc++] = str_out;
+
+ while (*str_in != ' ' && *str_in != '\t' && *str_in)
+ *(str_out++) = *(str_in++);
+
+ *(str_out++) = '\0';
+
+ while (*str_in == ' ' || *str_in == '\t')
+ str_in++;
+ }
+ *argc_out = argc;
+ return argv;
+}
+
+/*
+ Warning: dentry_usermodehelper releases and retakes
+ dentry->d_parent->d_inode->i_sem. It must be called with this
+ semaphore already held.
+
+ command_p is a pointer to a single string. It is *not* in argv format.
+ Instead, elements are separated by spaces.
+*/
+void call_fs_helper(char **command_ptr,
+ struct rw_semaphore *command_rwsem,
+ const char *event,
+ struct dentry *dentry)
+{
+ char path[64];
+ int argc;
+ char **argv;
+ struct inode *parent_inode = dentry->d_parent->d_inode;
+
+ if (write_path_from_mnt(dentry, path, sizeof(path)) == 0) {
+
+ up(&parent_inode->i_sem);
+
+ /*
+ FIXME. We would not need the extra memory allocation,
+ string copying, error branch and lines of source code
+ due to err_strdup(), and we could put gen_argv
+ into the set_fs_helper, if call_usermodehelper
+ and execve had a callback to inform us when
+ execve was done copying argv and envp. With
+ such a facility, we could just hold helper->rw_sem
+ up to that point, without having to make a copy of the
+ argument (which we currently do) or hold the semaphore
+ until the helper process exits (which would cause a
+ deadlock if a helper process ever tried to change
+ the helper string of a file system, especially since
+ there is not such a thing as rw_down_read_interruptible
+ that would make the deadlock breakable).
+ */
+
+ down_read(command_rwsem);
+ argv = gen_argv(*command_ptr, 3, &argc);
+ up_read(command_rwsem);
+
+ if (argv != NULL) {
+ static char *envp[] =
+ {"PATH=/bin:/sbin:/usr/bin:/usr/sbin",
+ "HOME=/", NULL };
+
+ argv[argc++] = (char*) event;
+ argv[argc++] = path;
+ argv[argc] = NULL;
+
+ call_usermodehelper(argv[0], argv, envp, 1);
+ kfree(argv);
+ }
+
+ down(&parent_inode->i_sem);
+ }
+}
+EXPORT_SYMBOL_GPL(call_fs_helper);
--- linux-2.6.10-rc1-bk14/fs/lookuptrap.c 1969-12-31 16:00:00.000000000 -0800
+++ linux/fs/lookuptrap.c 2004-11-04 23:08:57.000000000 -0800
@@ -0,0 +1,208 @@
+/*
+ struct dentry *trapping_simple_lookup(struct inode *dir,
+ struct dentry *dentry,
+ struct nameidata *nd,
+ char **shell_command_ptr)
+
+ trapping_simple_lookup() is an alternative to simple_lookup() from
+ libfs.c, which adds the ability to invoke a user level helper program
+ when an attempt is made to access a nonexistant file. It invokes
+ the command via call_fs_helper() from fs/helper.c.
+
+ trapping_simple_lookup takes one parameter in addition to the parameters
+ that an inode_operations->lookup method requires. This paratmer,
+ shell_command_ptr is the address of a pointer to a string containing
+ the command to be executed trapping_simple_lookup will always take a
+ read lock on superblock->s_umount, so it is multiprocessor-safe for your
+ file system's superblock mount and remount routines to modify that
+ address, since mount and remount will take a write lock on
+ superblock->s_umount.
+
+ See tmpfs in mm/shmem.c for an example of use of trapping_simple_lookup.
+
+ Trapping trapping_simple_lookup is the only symbol exported from
+ this file.
+
+
+ Written by Adam J. Richter
+ Copyright (C) 2004 Yggdrasil Computing, Inc.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/dcache.h>
+#include <linux/rwsem.h>
+#include <linux/fshelper.h>
+
+struct userhelper_wait {
+ struct semaphore calling;
+ struct rw_semaphore freeing;
+};
+
+static inline int want_to_trap(struct nameidata *nd)
+{
+ return (nd != NULL);
+}
+
+
+/*
+ * Retaining negative dentries for an in-memory filesystem just wastes
+ * memory and lookup time: arrange for them to be deleted immediately.
+ */
+static int always_delete_dentry(struct dentry *dentry)
+{
+ return 1;
+}
+
+/* Force revalidation of all negative dentries. Note that this routine
+ only gets called when some other routine has a reference to the dentry
+ or the dentry has an inode. Otherwise, unused negative dentries
+ areimmediately dropped, because of always_delete_dentry. Apparently,
+ the only time blocking_dentry_valid ends up being called with an empty
+ inode is when a trapped reference is blocking on the user level helper,
+ which is exactly when we do want to force another lookup so that this
+ new reference will block too. I had previously used dentry->s_fsdata
+ to flag whether someone was a user level helper was blocking on it,
+ but now that seems unnecessary, so we just check dentry->d_inode here.
+*/
+static int blocking_dentry_valid(struct dentry *dentry,
+ struct nameidata *nd)
+{
+ struct userhelper_wait *wait;
+ struct inode *parent_inode;
+
+ if (dentry->d_inode != NULL)
+ return 1;
+
+ if (!want_to_trap(nd))
+ return 0;
+
+ parent_inode = dentry->d_parent->d_inode;
+ if (down_interruptible(&parent_inode->i_sem))
+ return (dentry->d_inode != NULL);
+ wait = dentry->d_fsdata;
+ if (!wait) {
+ up(&parent_inode->i_sem);
+ return (dentry->d_inode != NULL);
+ }
+
+ if (!down_read_trylock(&wait->freeing))
+ BUG();
+
+ up(&parent_inode->i_sem);
+
+ if (down_interruptible(&wait->calling) == 0) {
+ /* call_usermodehelper has returned at this point */
+ up(&wait->calling);
+ }
+
+ up_read(&wait->freeing);
+ /* OK for lookup to free the data structure */
+ return (dentry->d_inode != NULL);
+
+}
+
+
+static struct dentry_operations trapping_dentry_ops = {
+ .d_delete = always_delete_dentry,
+ .d_revalidate = blocking_dentry_valid,
+};
+
+struct dentry *
+trapping_simple_lookup(struct inode *dir,
+ struct dentry *dentry,
+ struct nameidata *nd,
+ char **shell_command_ptr)
+{
+ /*
+ We must do simple_looukp before trapfs_event to prevent
+ a duplicate dentry from being created if the trapfs_helper
+ program attempts to access the same file name in /dev.
+ If simple_lookup returns non-NULL, then that is to an error
+ like a malformed file name, so we do not invoke trapfs_event.
+ If the file is not found but there was no other error,
+ simple_lookup returns NULL, and that is the only case
+ in which we want to generate a notification.
+
+ We also filter out the final path element of mknod, mkdir
+ and symlink, because invoking the helper for mknod and mkdir
+ could lead to deadlock when trapfs loads a device driver
+ kernel module than. One would think that the way to filter
+ would be to look at nd->flags to check that LOOKUP_CREATE
+ is set and LOOKUP_OPEN is clear, but instead, the vfs
+ layers passed nd==NULL in these cases via a routine
+ called lookup_create (without any leading underscores),
+ so we filter out the case where nd == NULL.
+
+ Filtering out nd==NULL has the unintented side-effect of
+ filtering out the final path component of arguments to
+ rmdir, unlink and rename (both source and destination).
+ For rmdir, unlink, and the source arguement to rename,
+ that's fine, since nobody cares about attempts to remove
+ nonexistant files. We're probably also OK skipping the
+ notifications with regard to the destination argument to
+ rename, although that is less clear.
+ */
+
+ struct super_block *sb;
+ struct dentry *result;
+ struct userhelper_wait wait;
+ struct dentry *new;
+
+ if (!want_to_trap(nd))
+ return simple_lookup(dir, dentry, nd);
+
+ sb = dir->i_sb;
+
+ if (dentry->d_name.len > NAME_MAX)
+ return ERR_PTR(-ENAMETOOLONG);
+
+ dentry->d_op = &trapping_dentry_ops;
+ dentry->d_fsdata = &wait;
+
+ init_MUTEX_LOCKED(&wait.calling);
+ init_rwsem(&wait.freeing);
+
+ d_add(dentry, NULL);
+
+ call_fs_helper(shell_command_ptr, &sb->s_umount, "LOOKUP", dentry);
+
+ new = d_lookup(dentry->d_parent, &dentry->d_name);
+
+ if (new != dentry) /* also handles new==NULL */
+ result = new;
+ else {
+ dput(new);
+ result = NULL;
+ }
+
+ up(&wait.calling);
+
+ /* The following down_write call makes us wait for all the
+ blocking_dentry_valid() calls to finish. No new calls
+ to blocking_dentry_valid on this dentry will happen while
+ we wait, because we are holding parent_inode->i_sem. */
+
+ down_write(&wait.freeing);
+ /* no need ever to call up_write(&wait.freeing); */
+
+ dentry->d_fsdata = NULL;
+
+ return result;
+}
+
+EXPORT_SYMBOL_GPL(trapping_simple_lookup);
--- linux-2.6.10-rc1-bk14/Documentation/filesystems/lookup-trap.txt 1969-12-31 16:00:00.000000000 -0800
+++ linux/Documentation/filesystems/lookup-trap.txt 2004-11-04 16:11:42.000000000 -0800
@@ -0,0 +1,431 @@
+User's Guide To Trapping Directory Lookup Operations in Tmpfs
+Version 0.2
+
+
+1. INSTRUCTIONS FOR THE IMPATIENT
+
+ % modprobe tmpfs
+ % mount -t tmpfs -o helper=/path/to/helper/program whatever /mnt
+ % ls /mnt/foo
+ # Notice that "/path/to/helper/program LOOKUP foo" was executed.
+
+
+2. OVERVIEW (from the Kconfig help)
+
+ Tmpfs now allows user level programs to implement file systems
+that are filled in on demand. This feature works by invoking a
+configurable helper program on attempts to open() or stat() a
+nonexistent file. The access waits until the helper finishes, so the
+helper can install the missing file if desired.
+
+ Using this facility, a shell script or small C program can
+implement a file system that automatically mounts remote file systems
+or creates device files on demand, similar to autofs or devfs,
+respectively. Tmpfs is, however, daemonless and, perhaps
+consequently, smaller than either of these, and may avoid some
+recursion problems.
+
+ Tmpfs might also be useful for debugging programs where you
+want to trap the first access a particular file or perhaps in
+automatic installation of missing command or libraries by specifying a
+tmpfs file system in certain search paths.
+
+ This access trapping facility is designed to be easily ported
+to other file systems, see include/linux/fshelper.h and fs/helper.c
+for the implementation.
+
+
+
+3. GETTING STARTED
+
+3.1 MOUNTING THE FILE SYSTEM
+
+ First, build and boot a kernel with tmpfs either compiled in
+or built as a module. If you compile tmpfs as a module, you may have
+to load it, although, since the module name ("tmpfs.ko") and the file
+system name ("tmpfs") match, that may be unnecessary if you've
+configured modprobe automatically.
+
+ % modprobe tmpfs
+
+ Now let's mount a tmpfs file system on /mnt.
+
+ % mount -t tmpfs blah /mnt
+
+ The file system will behave exactly like a ramfs file system.
+In fact, tmpfs is derived from ramfs. You can create files,
+directories, symbolic links and device nodes in it, and they will
+exist only in the computer's main memory. The contents of the file
+system will disappear as soon as you unmount it.
+
+ If you mount multiple instances of tmpfs, you will get
+separate file systems.
+
+
+3.2. THE HELPER PROGRAM
+
+ What distinguishes tmpfs from ramfs is that it can invoke a
+user level helper program when an attempt is made to open or stat a
+nonexistent file for the first time (if the name is not already in the
+dcache). The user level program is set with the "helper" mount
+option. It is possible to set, clear or change the helper command at
+any time, so let's go back to our example and put a helper command on
+the tmpfs file system that we mounted on /mnt:
+
+ % mount -o remount,helper=/tmp/helper /mnt
+
+ If you use the file system in /mnt now, nothing appears to have
+changed. Now let's put a simple shell script in /tmp/helper:
+
+ % cat > /tmp/helper
+ #!/bin/sh
+ echo "$*" > /dev/console
+ ^D
+ % chmod a+x /tmp/helper
+
+ Now you should see console messages like "LOOKUP foo" when you
+try to access the file /mnt/foo for the first time.
+
+ You can also pass arguments to the helper program by using
+spaces in the helper mount option, like so:
+
+ % mount -o remount,helper='/tmp/helper my_argument' /mnt
+
+ If you do this, your console messages will start to look
+something like "my_argument LOOKUP foo". The arguments that
+you specify come before "LOOKUP foo" to facilitate the use of
+command interpreters, like, say, helper='/usr/bin/perl handler.pl'.
+Arguments also make it easy to pass things like the mount point or
+configuration files, which should make it easier to write facilities
+that work on multiple mount points.
+
+ You can also deactivate the helper at any time, like so:
+
+ mount -o remount,helper='' /mnt
+
+4. PRACTICAL EXAMPLES
+
+4.1 AN NFS AUTOMOUNTER
+
+ % cat > /usr/sbin/tmpfs-automount
+ #!/bin/sh
+ topdir=$1
+ host=${3%/*}
+ dir=$topdir/$host
+ mkdir $dir
+ mount -t nfs $host:/ $dir
+ ^D
+ % chmod a+x /usr/sbin/tmpfs-automount
+ % mkdir /auto
+ % mount -t tmpfs -o helper="/usr/sbin/tmpfs-automount /auto" x /auto
+
+ Notice how we pass the additional argument "/auto" to the
+tmpfs-automount command.
+
+ If you want automatic unmount after a timeout, you'll probably
+want to do something a little more elaborate, perhaps with a script that
+runs from cron.
+
+
+4.2 DEMAND LOADING OF DEVICE DRIVERS
+
+ A version of devfs that uses tmpfs is under development and
+running on the system I am using to write this document, but I am
+still cleaning it up. Here is how it should work, although I have
+not yet actually tried devfs_helper on it.
+
+ The devfs_helper program was originally written for a stripped down
+rewrite of devfs, from which tmpfs is derived. It can read your
+/etc/devfs.conf file (the file previously used to configured
+devfsd) and load modules specified by "LOOKUP" commands. Other
+devfs.conf command are ignored.
+
+ % ftp ftp.yggdrasil.com
+ login: anonymous
+ password; guest
+ ftp> cd /pub/dist/device_control/devfs
+ ftp> get devfs_helper-0.2.tar.gz
+ .....
+ ftp> quit
+ % tar xfpvz devfs_helper-0.2.tar.gz
+ % cd devfs_helper-0.2
+ % make
+ % make install
+ % mkdir /tmp/tmpdev
+ % mount -t devfs /tmp/tmpdev
+ % cp -apRx /dev/* /tmpdev/
+ % mount -t devfs -o helper=/sbin/devfs_helper blah /dev
+ % mount -t msods /dev/floppy/0 /mnt
+
+ The above example should load the floppy.ko kernel module
+if you have a a line in your /etc/devfs.conf file like this:
+
+ LOOKUP floppy EXECUTE modprobe floppy
+
+
+ You should also be able to use execfs in this fashion to get
+automatic loading of kernel modules on non-devfs systems, although
+you'll need something like udev the larger udev to create the
+device files once the device drivers are registered.
+
+
+4.3 DEBUGGING A PROGRAM TRYING TO ACCESS A FILE
+
+ % cat > /tmp/call-sleep
+ #!/bin/sh
+ sleep 30
+ ^D
+ % mount -t tmpfs -o helper=/tmp/call-sleep foo /mnt
+ % mv .bashrc .bashrc-
+ % ln -s /mnt/whatever .bashrc
+ % gdb /bin/sh
+ GNU gdb 5.2
+ [blah blah blah]
+ (gdb) run
+ [program eventually hangs. Switch to another terminal session. You
+ cannot control-C out of it, a tmpfs bug from call_usermodehelper.]
+ % ps axf
+ [Find the process under gdb. Let say it's pid 1152.]
+ % kill -SEGV 1152
+ % ps auxww | grep sleep
+ [Find the sleeping tmpfs helper. Let's say it's pid 1120.]
+ % kill -9 1120
+ [Now back at the first session, running gdb on /bin/sh.]
+ Program received signal SIGSEGV, Segmentation fault.
+ 0xb7f303d4 in __libc_open () at __libc_open:-1
+ -1 __libc_open: No such file or directory.
+ in __libc_open
+ (gdb) where
+ #0 0xb7f303d4 in __libc_open () at __libc_open:-1
+ #1 0xb7f8b4c0 in __DTOR_END__ () from /lib/libc.so.6
+ #2 0x080921ef in _evalfile (filename=0x80dc788 "/tmp/junk/.bashrc", flags=9)
+ at evalfile.c:85
+ #3 0x08092635 in maybe_execute_file (
+ fname=0xfffffffe <Address 0xfffffffe out of bounds>,
+ force_noninteractive=1) at evalfile.c:218
+ #4 0x08059fe8 in run_startup_files () at shell.c:1019
+ #5 0x08059849 in main (argc=1, argv=0xbfffebc4, env=0xbfffebcc) at shell.c:581
+ #6 0xb7e88e02 in __libc_start_main (main=0x8059380 <main>, argc=1,
+ ubp_av=0xbfffebc4, init=0x805897c <_init>,
+ fini=0xb80005ac <_dl_debug_mask>, rtld_fini=0x8000, stack_end=0x0)
+ at ../sysdeps/generic/libc-start.c:129
+
+
+4.4 AUTOMATIC LOADING OF MISSING PROGRAMS
+
+ % cat > /usr/sbin/missing-program
+ #!/bin/sh
+ my-automatic-network-downloader $2
+ ^D
+ % chmod a+x /usr/sbin/missing-program
+ % mount -t tmpfs -o helper=/usr/sbin/my-automatic-installer /mnt
+ % PATH=$PATH:/mnt:$PATH
+ # We include $PATH a second time so that the program can be
+ # found after it is installed.
+ % kdevelop # Or some other program you don't have...
+
+ ...or maybe something like this...
+
+ % cat > /usr/sbin/missing-program
+ #!/bin/sh
+ export DISPLAY=:0
+ konqueror http://www.google.com/search?q="download+$2" &
+ ^D
+ % chmod a+x /usr/sbin/missing-program
+ % mount -t tmpfs -o helper=/usr/sbin/missing-program glorp /mnt
+ % xhost localhost
+ % PATH=$PATH:/mnt
+ % kdevelop
+
+4.4.1 AUTOMATIC LOADING OF MISSING LIBRARIES
+
+ Same as above, but with this line:
+ % LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mnt:$LD_LIBRARY_PATH
+
+4.5 ADVANCED EXAMPLE: Generating plain files
+
+ Doing automatic generation of plain files (as opposed to
+directories, device files and symbolic links) requiers more care,
+because the helper program's attempt to open a file for writing
+will itself invoke another instance of the user level handler.
+
+ One approach, perhaps the only one, is to define some temporary
+file name pattern that your user handler knows to ignore, create the
+file with a temporary name that matches that pattern, and then
+rename the temporary file to the real file name, since rename
+operations are not trapped (and if renames ever are trapped in future,
+releases you could filter out renames where the source matched the
+tempary file pattern).
+
+ [FIXME. This example is untested! --Adam Richter, 2004.11.03]
+
+
+ % cat > /usr/sbin/my-decrypter
+ #!/bin/sh
+ encrypted_dir=$2
+ decrypted_dir=$3
+ key=$4
+ # $5 is "LOOKUP"
+ filename=$6
+
+ case $target in ( tmp/* ) ; exit ;; esac
+ tmpfile=$decrypted_dir/$$
+ decrypt --key $key < $encrypted_dir/$filename > $tmpfile
+ mv $tmpfile $decrypted_dir/$filename
+ ^D
+ % chmod a+x /usr/sbin/my-decrypter
+ % mount -t tmpfs -o \
+ helper='/usr/sbin/my-dcrypter /cryptdir /mnt key' x /mnt
+ % cat /mnt/my-secret-file
+ ....
+
+ If you want to make the temporary directory completely
+inaccessible from the public directory, you can create two mount
+points, where the public directory for the file system is actually
+a subdirectory of a larger hidden directory, like so:
+
+ % mount -t tmpfs /some/place/hidden
+ % chmod go-rwx /someplace/hidden
+ % mkdir /some/place/hidden/mirror
+ % mount --bind /some/place/hidden/mirror /public
+
+ Now you can set up a helper program that operates in some
+other directory of /some/place/hidden, and then renames the
+resultant files into /some/place/hidden/mirror.
+
+
+4.5.1 A WARNING ABOUT SYMBOLIC LINKS AND THE GNU "ln -s" COMMAND
+
+ If your helper program invoke the shell command "ln" to create
+symbolic links, you may need to use "filter and rename" technique that
+was described above for plain files, or one of several other
+workarounds listed below. If you helper program just uses the
+symlink() system call directly, you don't have to worry about this.
+
+ What is the problem, exactly? The problem is that although
+symlink system calls are not trapped, the "ln" command from version
+4.1 of the GNU fileutils package (latest version as of this writing)
+does a stat system call on the target before doing the symlink,
+because it is specified to behave differently if the destination path
+exists and is a directory. stat, unlink symlink, is trapped, in order
+to support automatic creation of files in case a program stats a file
+before deciding whether to open it, and for things like "ls
+/auto/fileserver1.mycompany.com/". Consequently, a user level helper
+shell script that simply does "ln -s whatever $target_file_name" will
+deadlock (which can be broken by interrupting the child helper
+program).
+
+ There are several possible solutions to this problem.
+
+ 1. You can use a simpler symlink program, such as ssln,
+ instead of "ln -s".
+
+ 2. If the final path element of the symlink's contents is
+ the same as the final path element of the symlink's name,
+ then you can just specify the directory name as the second
+ argument to "ln -s". In other words, change
+
+ ln -s foo/bar /the/target/bar
+ ...to...
+ ln -s foo/bar /the/target
+
+ 3. You can use the same filtering techniques for symlink
+ as discussed for plain files in the previous section
+ (create them with speical temporary names that your helper
+ program knows to ignore and then rename them into place).
+
+ 4. Invoke perl to do the symlink, if you know you have it available:
+
+ perl -e 'symlink("contents", "target");'
+
+ 5. You can port shell script to C or perl or some other
+ language that gives you direct access to the symlink()
+ system call.
+
+
+ Perhaps, in the future, the GNU ln command could be changed so
+that, when it is called with exactly two file names, it would try to
+do the symlink() and then check if the target is a directory only if
+the symlink attempt failed.
+
+
+4.6. OTHER USES?
+
+ I would be interested in hearing about any other uses that you up
+with for tmpfs, especially if I can include them in this document.
+
+5. SERIALIZATION
+
+ Note that many instances of the user level helper program
+can potentially be running at the same time. It is up to "you",
+the implementor of the helper program to determine what sort of
+serialization you need and implement it. A simple solution to
+enforce complete serialization would be to have every instance
+of the helper program take an exclusive flock on some common
+file.
+
+6. KERNEL DEVELOPER ANSWERS ABOUT IMPLEMENTATION DECISIONS
+
+6.1 Q: Why doesn't tmpfs provide REGISTER and UNREGISTER events when
+ new nodes are created or deleted from the file system, as the
+ mini-devfs implementation from which is derived did?
+
+ A: {,UN}REGISTER in Richard Gooch's implementation of devfs enabled
+ things like automatically setting permissions and sound settings
+ on your sound device when the driver was loaded, even if
+ the loading of the driver had not been caused by devfs.
+ For tmpfs-based devfs, I expect to implement that in
+ a more complex way by shadowing the real devfs file system
+ and creating {,UN}REGISTER events as updates are propagated
+ from the real devfs to /dev. The advantages of this would be
+ that module initialization would not be blockable by a user
+ level program, and events like a device quickly appearing and
+ disappearing could be coalesced (i.e., ignored in this case).
+
+ I'm not convinced that {,UN}REGISTER has to go, but I haven't
+ seen any compelling uses for it, and I know it's politically
+ easier to add a feature than to remove one, especially if anyone
+ has developed a depence on it and does not want to port. So,
+ I'm starting out with tmpfs not providing {,UN}REGISTER events.
+
+6.2 Q: Why isn't this facility implemented as an overlay file system? I'd
+ like to be able to apply it to, say, /dev, without having
+ to start out with /dev being a devfs file system. I could
+ also demand load certain facilities based on accesses to /proc.
+
+ A: There are about two dozen routines in inode_operations and
+ file_operations that would require pass-through versions, and
+ they are not as trivial as you might think because of
+ locking issues involved in going through the vfs layer again.
+ Also, currently, tmpfs, like ramfs and sysfs, use a struct
+ inode and a struct dentry in kernel low memory for every
+ existing node in the file system, about half a kilobyte
+ per entry. So, tmpfs would need to be converted to allow
+ inode structures to be released if it were to overlay
+ potentially large directory trees. Also there are issues
+ related to the underlying file system changing "out from
+ under" tmpfs. Perhaps in the future this an be implemented.
+
+
+6.3. Q: Why isn't tmpfs a built-in kernel facility that can be
+ applied to any file, like dnotify? That would also have
+ the above advantages and could eliminate the mount complexity
+ of overlays.
+
+ A: I thought about defining a separating inode->directory_operations
+ from inode->inode_operations. Since all of those operations that
+ I want to intercept are called with inode->i_sem held, it
+ follows that it would be SMP-safe to change an inode->dir_ops
+ pointer dynamically, which would allow stacking of inode
+ operations. This approach could be used to remove the special
+ case code that is used to implement dnotify. But what happens
+ when the inode is freed? It would be necessary to intercept
+ the victim superblock's superblock_operations.drop_inode
+ routine, which could get pretty messy, especially, if, for
+ example, more than one trap was being used on the same file
+ system. Perhaps if drop_inode were moved to struct
+ inode_operations this would be easier.
+
+
+Adam J. Richter (adam@yggdrasil.com)
+2004.11.02
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-11-05 9:35 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-03 21:58 [PATCH 2.6.10-rd1-bk12] user-level lookup handler for tmpfs Adam J. Richter
2004-11-03 14:45 ` Matthew Wilcox
2004-11-03 16:38 ` Mike Waychison
-- strict thread matches above, loose matches on Subject: below --
2004-11-05 9:46 Adam J. Richter
2004-11-05 8:27 Adam J. Richter
2004-11-04 7:34 Adam J. Richter
2004-11-04 13:28 ` Matthew Wilcox
2004-11-03 18:51 Adam J. Richter
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.