* [PATCH] ramfs: pretend dirent sizes
@ 2005-07-15 3:01 Jan Blunck
2005-07-15 3:11 ` Andrew Morton
2005-07-15 18:52 ` Linus Torvalds
0 siblings, 2 replies; 24+ messages in thread
From: Jan Blunck @ 2005-07-15 3:01 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrew Morton, Linux-Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 258 bytes --]
This patch adds bogo dirent sizes for ramfs like already available for
tmpfs.
Although i_size of directories isn't covered by the POSIX standard it is
a bad idea to always set it to zero. Therefore pretend a bogo dirent
size for directory i_sizes.
Jan
[-- Attachment #2: ramfs_bogo_dirent_size.diff --]
[-- Type: text/x-patch, Size: 3065 bytes --]
Signed-off-by: Jan Blunck <j.blunck@tu-harburg.de>
fs/ramfs/inode.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 47 insertions(+), 4 deletions(-)
Index: linux-2.6/fs/ramfs/inode.c
===================================================================
--- linux-2.6.orig/fs/ramfs/inode.c
+++ linux-2.6/fs/ramfs/inode.c
@@ -38,6 +38,9 @@
/* some random number */
#define RAMFS_MAGIC 0x858458f6
+/* Pretend that each entry is of this size in directory's i_size */
+#define BOGO_DIRENT_SIZE 20
+
static struct super_operations ramfs_ops;
static struct address_space_operations ramfs_aops;
static struct inode_operations ramfs_file_inode_operations;
@@ -77,6 +80,7 @@ struct inode *ramfs_get_inode(struct sup
/* directory inodes start off with i_nlink == 2 (for "." entry) */
inode->i_nlink++;
+ inode->i_size = 2 * BOGO_DIRENT_SIZE;
break;
case S_IFLNK:
inode->i_op = &page_symlink_inode_operations;
@@ -97,6 +101,7 @@ ramfs_mknod(struct inode *dir, struct de
int error = -ENOSPC;
if (inode) {
+ dir->i_size += BOGO_DIRENT_SIZE;
if (dir->i_mode & S_ISGID) {
inode->i_gid = dir->i_gid;
if (S_ISDIR(mode))
@@ -132,6 +137,7 @@ static int ramfs_symlink(struct inode *
int l = strlen(symname)+1;
error = page_symlink(inode, symname, l);
if (!error) {
+ dir->i_size += BOGO_DIRENT_SIZE;
if (dir->i_mode & S_ISGID)
inode->i_gid = dir->i_gid;
d_instantiate(dentry, inode);
@@ -142,6 +148,43 @@ static int ramfs_symlink(struct inode *
return error;
}
+static int ramfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry)
+{
+ dir->i_size += BOGO_DIRENT_SIZE;
+ return simple_link(old_dentry, dir, dentry);
+}
+
+static int ramfs_unlink(struct inode *dir, struct dentry *dentry)
+{
+ dir->i_size -= BOGO_DIRENT_SIZE;
+ return simple_unlink(dir, dentry);
+}
+
+static int ramfs_rmdir(struct inode *dir, struct dentry *dentry)
+{
+ int ret;
+
+ ret = simple_rmdir(dir, dentry);
+ if (ret != -ENOTEMPTY)
+ dir->i_size -= BOGO_DIRENT_SIZE;
+
+ return ret;
+}
+
+static int ramfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry)
+{
+ int ret;
+
+ ret = simple_rename(old_dir, old_dentry, new_dir, new_dentry);
+ if (ret != -ENOTEMPTY) {
+ old_dir->i_size -= BOGO_DIRENT_SIZE;
+ new_dir->i_size += BOGO_DIRENT_SIZE;
+ }
+
+ return ret;
+}
+
static struct address_space_operations ramfs_aops = {
.readpage = simple_readpage,
.prepare_write = simple_prepare_write,
@@ -164,13 +207,13 @@ static struct inode_operations ramfs_fil
static struct inode_operations ramfs_dir_inode_operations = {
.create = ramfs_create,
.lookup = simple_lookup,
- .link = simple_link,
- .unlink = simple_unlink,
+ .link = ramfs_link,
+ .unlink = ramfs_unlink,
.symlink = ramfs_symlink,
.mkdir = ramfs_mkdir,
- .rmdir = simple_rmdir,
+ .rmdir = ramfs_rmdir,
.mknod = ramfs_mknod,
- .rename = simple_rename,
+ .rename = ramfs_rename,
};
static struct super_operations ramfs_ops = {
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-15 3:01 [PATCH] ramfs: pretend dirent sizes Jan Blunck @ 2005-07-15 3:11 ` Andrew Morton 2005-07-15 10:16 ` Jan Blunck 2005-07-15 18:52 ` Linus Torvalds 1 sibling, 1 reply; 24+ messages in thread From: Andrew Morton @ 2005-07-15 3:11 UTC (permalink / raw) To: Jan Blunck; +Cc: torvalds, linux-kernel Jan Blunck <j.blunck@tu-harburg.de> wrote: > > This patch adds bogo dirent sizes for ramfs like already available for > tmpfs. > > Although i_size of directories isn't covered by the POSIX standard it is > a bad idea to always set it to zero. Therefore pretend a bogo dirent > size for directory i_sizes. > Does it really matter? +static int ramfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry) > +{ > + dir->i_size += BOGO_DIRENT_SIZE; > + return simple_link(old_dentry, dir, dentry); > +} > + > +static int ramfs_unlink(struct inode *dir, struct dentry *dentry) > +{ > + dir->i_size -= BOGO_DIRENT_SIZE; > + return simple_unlink(dir, dentry); > +} > + > +static int ramfs_rmdir(struct inode *dir, struct dentry *dentry) > +{ > + int ret; > + > + ret = simple_rmdir(dir, dentry); > + if (ret != -ENOTEMPTY) > + dir->i_size -= BOGO_DIRENT_SIZE; > + > + return ret; > +} > + > +static int ramfs_rename(struct inode *old_dir, struct dentry *old_dentry, > + struct inode *new_dir, struct dentry *new_dentry) > +{ > + int ret; > + > + ret = simple_rename(old_dir, old_dentry, new_dir, new_dentry); > + if (ret != -ENOTEMPTY) { > + old_dir->i_size -= BOGO_DIRENT_SIZE; > + new_dir->i_size += BOGO_DIRENT_SIZE; > + } > + > + return ret; > +} > + I wonder if these should be in libfs - sysfs has the same problem, for example and someone might want to come along and fix that up too. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-15 3:11 ` Andrew Morton @ 2005-07-15 10:16 ` Jan Blunck 0 siblings, 0 replies; 24+ messages in thread From: Jan Blunck @ 2005-07-15 10:16 UTC (permalink / raw) To: Andrew Morton; +Cc: torvalds, linux-kernel Andrew Morton wrote: > > Does it really matter? > Yes! At least for me and my union mounts implementation. > > I wonder if these should be in libfs - sysfs has the same problem, for > example and someone might want to come along and fix that up too. Ok, I will check that next week. But AFAIK, sysfs does not use the libfs calls to modify directory contents. Therefore fixing the i_size problem must be done inside the sysfs code. For hugetblfs etc. it would be better to fix it in libfs. Jan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-15 3:01 [PATCH] ramfs: pretend dirent sizes Jan Blunck 2005-07-15 3:11 ` Andrew Morton @ 2005-07-15 18:52 ` Linus Torvalds 2005-07-16 0:39 ` Chris Wedgwood 2005-07-19 9:46 ` Jan Blunck 1 sibling, 2 replies; 24+ messages in thread From: Linus Torvalds @ 2005-07-15 18:52 UTC (permalink / raw) To: Jan Blunck; +Cc: Andrew Morton, Linux-Kernel Mailing List On Fri, 15 Jul 2005, Jan Blunck wrote: > > This patch adds bogo dirent sizes for ramfs like already available for > tmpfs. I really think you should update the "simple_xxx()" functions instead, and thus make this happen for _any_ filesystem that uses the simple fs helper functions. Linus ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-15 18:52 ` Linus Torvalds @ 2005-07-16 0:39 ` Chris Wedgwood 2005-07-19 9:28 ` Jan Blunck 2005-07-19 9:46 ` Jan Blunck 1 sibling, 1 reply; 24+ messages in thread From: Chris Wedgwood @ 2005-07-16 0:39 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jan Blunck, Andrew Morton, Linux-Kernel Mailing List On Fri, Jul 15, 2005 at 11:52:42AM -0700, Linus Torvalds wrote: > I really think you should update the "simple_xxx()" functions > instead, and thus make this happen for _any_ filesystem that uses > the simple fs helper functions. Why bother at all? I don't see why zero sizes are a problem. We've had them for year without complaints. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-16 0:39 ` Chris Wedgwood @ 2005-07-19 9:28 ` Jan Blunck 2005-07-19 16:16 ` Chris Wedgwood 0 siblings, 1 reply; 24+ messages in thread From: Jan Blunck @ 2005-07-19 9:28 UTC (permalink / raw) To: Chris Wedgwood; +Cc: Linus Torvalds, Andrew Morton, Linux-Kernel Mailing List Chris Wedgwood wrote: > > Why bother at all? > > I don't see why zero sizes are a problem. We've had them for year > without complaints. I'm using the i_size of directories in my patches. When reading from a union directory, I'm using the i_size to seek to the right offset in the union stack. Therefore I need values of dirent->d_off which are smaller than the i_size of the directory. Altogether, it doesn't make sense to me to seek to an offset which is greater than the i_size and let the dirent read succeed. Jan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-19 9:28 ` Jan Blunck @ 2005-07-19 16:16 ` Chris Wedgwood 2005-07-19 18:22 ` Jan Blunck 0 siblings, 1 reply; 24+ messages in thread From: Chris Wedgwood @ 2005-07-19 16:16 UTC (permalink / raw) To: Jan Blunck; +Cc: Linus Torvalds, Andrew Morton, Linux-Kernel Mailing List On Tue, Jul 19, 2005 at 11:28:10AM +0200, Jan Blunck wrote: > I'm using the i_size of directories in my patches. When reading > from a union directory, I'm using the i_size to seek to the right > offset in the union stack. Ick. That'a a bit of a hack. > Therefore I need values of dirent->d_off which are smaller than the > i_size of the directory. Hence the value of 20 I guess --- assuming nothing will stack this high? > Altogether, it doesn't make sense to me to seek to an offset which > is greater than the i_size and let the dirent read succeed. I personally would prefer that to be honest or some other way that doesn't change i_size. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-19 16:16 ` Chris Wedgwood @ 2005-07-19 18:22 ` Jan Blunck 2005-07-19 18:32 ` Chris Wedgwood 0 siblings, 1 reply; 24+ messages in thread From: Jan Blunck @ 2005-07-19 18:22 UTC (permalink / raw) To: Chris Wedgwood; +Cc: Linus Torvalds, Andrew Morton, Linux-Kernel Mailing List Chris Wedgwood wrote: > >>I'm using the i_size of directories in my patches. When reading >>from a union directory, I'm using the i_size to seek to the right >>offset in the union stack. > > > Ick. That'a a bit of a hack. > Don't think so: 1st dir: [XXXXXXXXXXXXXXXXXXXXX] f_pos=0 f_pos=i_size(1st) 2nd dir: [XXXXXXXXXXX|---------] f_pos=i_size(1st) f_pos=i_size(1st+2nd) ^ | f_pos=i_size(1st)+offset Since these "arranged" values are also used as the offsets in the return dirent IMO it is quite clean. > > Hence the value of 20 I guess --- assuming nothing will stack this > high? > Nope. This value is kind of traditional: tmpfs is using it (http://marc.theaimsgroup.com/?l=linux-kernel&m=103208296515378&w=2). I think a better value would be 1 (one) since this is also used as the dirent offset by dcache_readdir(). > > I personally would prefer that to be honest or some other way that > doesn't change i_size. The i_size of a directory isn't covered by the POSIX standard. IMO, it should be possible to seek in the range of i_size and a following readdir() on the directory should succeed. But this isn't possible even not with real file systems like ext2. But keeping the i_size bound to zero even if the directory contains entries does not make sense at all. Jan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-19 18:22 ` Jan Blunck @ 2005-07-19 18:32 ` Chris Wedgwood 2005-07-19 19:14 ` Jan Blunck 0 siblings, 1 reply; 24+ messages in thread From: Chris Wedgwood @ 2005-07-19 18:32 UTC (permalink / raw) To: Jan Blunck; +Cc: Linus Torvalds, Andrew Morton, Linux-Kernel Mailing List On Tue, Jul 19, 2005 at 08:22:26PM +0200, Jan Blunck wrote: > Since these "arranged" values are also used as the offsets in the > return dirent IMO it is quite clean. So the size you want to reflect is n*<stack-depth> i take it? Where in this case n is 20? So you can seek to m*<stack-depth>+<offset> to access an offset into something at depth m? > Nope. This value is kind of traditional: tmpfs is using it > (http://marc.theaimsgroup.com/?l=linux-kernel&m=103208296515378&w=2). I > think a better value would be 1 (one) since this is also used as the > dirent offset by dcache_readdir(). I really don't know why tmpfs is doing this. > The i_size of a directory isn't covered by the POSIX standard. IMO, > it should be possible to seek in the range of i_size and a following > readdir() on the directory should succeed. With what defined semantics? What if an entry is added in between seek and readdir? > But this isn't possible even not with real file systems like ext2. I'm not sure how expecting a meaningful offset into a directory can have consistent bahvior. > But keeping the i_size bound to zero even if the directory contains > entries does not make sense at all. Why? It seems perfectly reasonable that we can return 0 in such cases. Zero seems to make more sense as 'magical/unknown' than say any other arbitrary value. It's also possible a regular filesystem could return an arbitrary value such as 20 (not that this directly matters except it becomes confusing potentially): cw@taniwha:~$ mkdir foobar cw@taniwha:~$ ls -ld foobar drwxr-xr-x 2 cw cw 6 Jul 19 11:29 foobar cw@taniwha:~$ mkdir foobar/1234567 cw@taniwha:~$ ls -ld foobar drwxr-xr-x 3 cw cw 20 Jul 19 11:30 foobar ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-19 18:32 ` Chris Wedgwood @ 2005-07-19 19:14 ` Jan Blunck 2005-07-19 19:16 ` Chris Wedgwood 0 siblings, 1 reply; 24+ messages in thread From: Jan Blunck @ 2005-07-19 19:14 UTC (permalink / raw) To: Chris Wedgwood; +Cc: Linus Torvalds, Andrew Morton, Linux-Kernel Mailing List Chris Wedgwood wrote: > > So the size you want to reflect is n*<stack-depth> i take it? Where > in this case n is 20? > > So you can seek to m*<stack-depth>+<offset> to access an offset into > something at depth m? > Yes. >>The i_size of a directory isn't covered by the POSIX standard. IMO, >>it should be possible to seek in the range of i_size and a following >>readdir() on the directory should succeed. > > With what defined semantics? What if an entry is added in between > seek and readdir? > You have the same problem with regular files. This is a user and not a kernel problem. > > Why? It seems perfectly reasonable that we can return 0 in such > cases. Zero seems to make more sense as 'magical/unknown' than say > any other arbitrary value. > I disagree. Where is the information value of i_size if we always could return 0? IMO it should be at least an upper bound for the "number" of informations that could actually be read (in terms of a seek offset) like it is in the case of regular files. Better, if it is a strict upper bound so that you can seek to every value smaller than i_size. For this purpose the i_size of directories doesn't need to reflect any unit. Jan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-19 19:14 ` Jan Blunck @ 2005-07-19 19:16 ` Chris Wedgwood 2005-07-20 11:21 ` Jörn Engel 2005-07-20 18:24 ` Jan Blunck 0 siblings, 2 replies; 24+ messages in thread From: Chris Wedgwood @ 2005-07-19 19:16 UTC (permalink / raw) To: Jan Blunck; +Cc: Linus Torvalds, Andrew Morton, Linux-Kernel Mailing List On Tue, Jul 19, 2005 at 09:14:04PM +0200, Jan Blunck wrote: > >So you can seek to m*<stack-depth>+<offset> to access an offset into > >something at depth m? > > > > Yes. Hos does that work if offset >= m? > I disagree. Where is the information value of i_size if we always > could return 0? Directories clearly can't have zero size, so 0 means 'special'. Anything other than zero *might* be a real value. > IMO it should be at least an upper bound for the "number" of > informations that could actually be read (in terms of a seek offset) > like it is in the case of regular files. Why? And what should that upper bound be? > Better, if it is a strict upper bound so that you can seek to every > value smaller than i_size. For this purpose the i_size of > directories doesn't need to reflect any unit. lseek talks about bytes --- yes, it means for files specifically but I still don't see why we need to define more counter-intuitive semantics for directories when we don't need them. Also, how is lseek + readdir supposed to work in general? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-19 19:16 ` Chris Wedgwood @ 2005-07-20 11:21 ` Jörn Engel 2005-07-20 18:11 ` Chris Wedgwood 2005-07-20 18:24 ` Jan Blunck 1 sibling, 1 reply; 24+ messages in thread From: Jörn Engel @ 2005-07-20 11:21 UTC (permalink / raw) To: Chris Wedgwood Cc: Jan Blunck, Linus Torvalds, Andrew Morton, Linux-Kernel Mailing List On Tue, 19 July 2005 12:16:48 -0700, Chris Wedgwood wrote: > > Also, how is lseek + readdir supposed to work in general? To my understanding, you can lseek to any "proper" offset inside a directory. Proper means that the offset marks the beginning of a new dirent (or end of file) in the interpretation of the filesystem. A filesystem could use offset n for directory entry n, or it could interpret n as the offset in bytes and require that a dirent start at this offset (else the offset wouldn't be proper) or it could be anything else as well. Userspace doesn't have any means to figure out, which addresses are proper and which aren't. Except that getdents(2) moves the fd offset to a proper address, which likely will remain proper until the fd is closed. Reopening the same directory may result in a formerly proper offset isn't anymore. Does the above make sense? Jörn -- And spam is a useful source of entropy for /dev/random too! -- Jasmine Strong ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-20 11:21 ` Jörn Engel @ 2005-07-20 18:11 ` Chris Wedgwood 2005-07-20 18:48 ` Jan Blunck 2005-07-21 7:20 ` Jörn Engel 0 siblings, 2 replies; 24+ messages in thread From: Chris Wedgwood @ 2005-07-20 18:11 UTC (permalink / raw) To: J?rn Engel Cc: Jan Blunck, Linus Torvalds, Andrew Morton, Linux-Kernel Mailing List On Wed, Jul 20, 2005 at 01:21:27PM +0200, J?rn Engel wrote: > To my understanding, you can lseek to any "proper" offset inside a > directory. Proper means that the offset marks the beginning of a > new dirent (or end of file) in the interpretation of the filesystem. But you can never tell where these are in general. > Userspace doesn't have any means to figure out, which addresses are > proper and which aren't. Except that getdents(2) moves the fd > offset to a proper address, which likely will remain proper until > the fd is closed. I don't see why or how this can be true in general (it might be, but I don't see how myself). If we are half way through scanning a directory and people start messing with it we could end up somewhere bogus (in which case f_op->readdir I guess is expected to try and do something sane here?) > Reopening the same directory may result in a formerly proper offset > isn't anymore. For that to be the case where is the state kept to ensure your current offset is valid? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-20 18:11 ` Chris Wedgwood @ 2005-07-20 18:48 ` Jan Blunck 2005-07-20 18:52 ` Peter Staubach 2005-07-21 7:26 ` Jörn Engel 2005-07-21 7:20 ` Jörn Engel 1 sibling, 2 replies; 24+ messages in thread From: Jan Blunck @ 2005-07-20 18:48 UTC (permalink / raw) To: Chris Wedgwood Cc: J?rn Engel, Linus Torvalds, Andrew Morton, Linux-Kernel Mailing List Chris Wedgwood wrote: > >>To my understanding, you can lseek to any "proper" offset inside a >>directory. Proper means that the offset marks the beginning of a >>new dirent (or end of file) in the interpretation of the filesystem. > > > But you can never tell where these are in general. > > I don't want to tell where these are in general, I need an easy way to seek to the m'th directory + offset position without reading every single dirent. With i_sizes != 0 it is straight forward to use "the sum of the m directory's i_sizes + offset" as the f_pos to seek to. For this purpose it is not necessary to have a "honest" i_size as long as the i_size is bigger than the offset of the last dirent in the directory. > > I don't see why or how this can be true in general (it might be, but I > don't see how myself). If we are half way through scanning a > directory and people start messing with it we could end up somewhere > bogus (in which case f_op->readdir I guess is expected to try and do > something sane here?) > E.g.: ext2_validate_entry() is called if filp->f_version != inode->i_version. > >>Reopening the same directory may result in a formerly proper offset >>isn't anymore. > Which is a user problem again. Might be that you are opening a different directory with the same name ... or even a regular file! Jan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-20 18:48 ` Jan Blunck @ 2005-07-20 18:52 ` Peter Staubach 2005-07-21 7:26 ` Jörn Engel 1 sibling, 0 replies; 24+ messages in thread From: Peter Staubach @ 2005-07-20 18:52 UTC (permalink / raw) To: Jan Blunck Cc: Chris Wedgwood, J?rn Engel, Linus Torvalds, Andrew Morton, Linux-Kernel Mailing List Jan Blunck wrote: > > I don't want to tell where these are in general, I need an easy way to > seek to the m'th directory + offset position without reading every > single dirent. With i_sizes != 0 it is straight forward to use "the > sum of the m directory's i_sizes + offset" as the f_pos to seek to. > For this purpose it is not necessary to have a "honest" i_size as long > as the i_size is bigger than the offset of the last dirent in the > directory. > You are not going to get this functionality. It is simply not how directories work nowadays. Very few file systems are going to be usable if you insist upon this semantic. It is simply not possible to guess, with variable sized entries and with distributed directory structures. ps ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-20 18:48 ` Jan Blunck 2005-07-20 18:52 ` Peter Staubach @ 2005-07-21 7:26 ` Jörn Engel 1 sibling, 0 replies; 24+ messages in thread From: Jörn Engel @ 2005-07-21 7:26 UTC (permalink / raw) To: Jan Blunck Cc: Chris Wedgwood, Linus Torvalds, Andrew Morton, Linux-Kernel Mailing List On Wed, 20 July 2005 20:48:17 +0200, Jan Blunck wrote: > > I don't want to tell where these are in general, I need an easy way to > seek to the m'th directory + offset position without reading every > single dirent. With i_sizes != 0 it is straight forward to use "the sum > of the m directory's i_sizes + offset" as the f_pos to seek to. For this > purpose it is not necessary to have a "honest" i_size as long as the > i_size is bigger than the offset of the last dirent in the directory. So you use file->offset of the pseudo-directory representing the whole union stack for two values: 1. level within the union stack we're currently at and 2. offset within the real directory at said level. And you need a sane value for i_size at every level (but perhaps the last) to break the converged file->offset apart into (level, offset) again. Right? > >>Reopening the same directory may result in a formerly proper offset > >>isn't anymore. > > Which is a user problem again. Might be that you are opening a different > directory with the same name ... or even a regular file! Yep. Two consecutive open() on the "same file" are racy, so anything could have happened in between. That's a well-known proplem in userspace. Jörn -- Linux is more the core point of a concept that surrounds "open source" which, in turn, is based on a false concept. This concept is that people actually want to look at source code. -- Rob Enderle ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-20 18:11 ` Chris Wedgwood 2005-07-20 18:48 ` Jan Blunck @ 2005-07-21 7:20 ` Jörn Engel 2005-07-21 7:25 ` Chris Wedgwood 1 sibling, 1 reply; 24+ messages in thread From: Jörn Engel @ 2005-07-21 7:20 UTC (permalink / raw) To: Chris Wedgwood Cc: Jan Blunck, Linus Torvalds, Andrew Morton, Linux-Kernel Mailing List On Wed, 20 July 2005 11:11:01 -0700, Chris Wedgwood wrote: > On Wed, Jul 20, 2005 at 01:21:27PM +0200, J?rn Engel wrote: > > > To my understanding, you can lseek to any "proper" offset inside a > > directory. Proper means that the offset marks the beginning of a > > new dirent (or end of file) in the interpretation of the filesystem. > > But you can never tell where these are in general. Correct. Any filesystem can have any definition of proper. > > Userspace doesn't have any means to figure out, which addresses are > > proper and which aren't. Except that getdents(2) moves the fd > > offset to a proper address, which likely will remain proper until > > the fd is closed. > > I don't see why or how this can be true in general (it might be, but I > don't see how myself). If we are half way through scanning a > directory and people start messing with it we could end up somewhere > bogus (in which case f_op->readdir I guess is expected to try and do > something sane here?) If you read a large directory, it will take you quite a while to go through the thousands of getdents() invocations. Someone messing with the directory meanwhile should be expected and shouldn't cause userspace to get complete garbage. If userspace doesn't learn about all new dirents, that's fine. If it receives old ones, that were since deleted, that's fine. But under no circumstances should you hand out garbage. Ext2, iirc, solves this problem by never shrinking directories. Old dirents can be invalid and won't get passed to userspace anymore. But they don't move around. Hence, a linear read of the directory gives you exactly above properties. > > Reopening the same directory may result in a formerly proper offset > > isn't anymore. > > For that to be the case where is the state kept to ensure your current > offset is valid? Depends on the fs. Ext2, as seen above, only needs file->offset as state. But image a questionable optimization for an uncommon case. Whenever the last reader or an ext2 directory closes the fd, it is safe to shuffle things around. At this time, a "directory defragmenter" takes the directory and squeezes all holes out of it. The new, repacked directory consumes less space, which is nice. Alternatively, the directory defragmenter could create a new directory, copy all valid contents of the old one over, and new open() on the directory will get the new copy, while old file descriptors still point to the old one. In both cases, what used to be a proper offset in one fd can be complete bogus for another one. And no, I don't want to add directory defragmenters to ext2. It was just a stupid example of what is possible. Jörn -- Linux [...] existed just for discussion between people who wanted to show off how geeky they were. -- Rob Enderle ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-21 7:20 ` Jörn Engel @ 2005-07-21 7:25 ` Chris Wedgwood 2005-07-21 7:27 ` Jörn Engel 0 siblings, 1 reply; 24+ messages in thread From: Chris Wedgwood @ 2005-07-21 7:25 UTC (permalink / raw) To: J?rn Engel Cc: Jan Blunck, Linus Torvalds, Andrew Morton, Linux-Kernel Mailing List On Thu, Jul 21, 2005 at 09:20:03AM +0200, J?rn Engel wrote: > In both cases, what used to be a proper offset in one fd can be > complete bogus for another one. Exactly. Knowing the position within a directory is of questionable value and trying to implement any reliable semantics for lseek is futile. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-21 7:25 ` Chris Wedgwood @ 2005-07-21 7:27 ` Jörn Engel 0 siblings, 0 replies; 24+ messages in thread From: Jörn Engel @ 2005-07-21 7:27 UTC (permalink / raw) To: Chris Wedgwood Cc: Jan Blunck, Linus Torvalds, Andrew Morton, Linux-Kernel Mailing List On Thu, 21 July 2005 00:25:17 -0700, Chris Wedgwood wrote: > On Thu, Jul 21, 2005 at 09:20:03AM +0200, J?rn Engel wrote: > > > In both cases, what used to be a proper offset in one fd can be > > complete bogus for another one. > > Exactly. > > Knowing the position within a directory is of questionable value and > trying to implement any reliable semantics for lseek is futile. We surely agree on that one. But I guess Jan has something else in mind with this patch. See my other reply. Jörn -- Simplicity is prerequisite for reliability. -- Edsger W. Dijkstra ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-19 19:16 ` Chris Wedgwood 2005-07-20 11:21 ` Jörn Engel @ 2005-07-20 18:24 ` Jan Blunck 2005-07-20 18:50 ` Peter Staubach 1 sibling, 1 reply; 24+ messages in thread From: Jan Blunck @ 2005-07-20 18:24 UTC (permalink / raw) To: Chris Wedgwood; +Cc: Linus Torvalds, Andrew Morton, Linux-Kernel Mailing List Chris Wedgwood wrote: > > Hos does that work if offset >= m? > Eerrh, did you actually read my patch? The i_size of a directory is increased by SIMPLE_BOGO_DIRENT_SIZE for every entry in the directory. >>So you can seek to m*<stack-depth>+<offset> to access an offset into >>> >something at depth m? >>> > >> >> Yes. I got that one wrong! You can seek to the "sum of i_sizes (of the m directories) + offset" to access offset in the m'th directory/stack-depth. > > lseek talks about bytes --- yes, it means for files specifically but I > still don't see why we need to define more counter-intuitive semantics > for directories when we don't need them. It is counter-intuitive to have a value which is information less, like in the zero case. > > Also, how is lseek + readdir supposed to work in general? This is how libc is reading directories (at least on arch s390x): getdents() != 0 lseek() to d_off of last dirent getdents() != 0 lseek() to d_off of last dirent getdents() == 0 return Therefore I really need values that make sense for d_off. Therefore I really need values that make (some kind of) sense for i_size. Jan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-20 18:24 ` Jan Blunck @ 2005-07-20 18:50 ` Peter Staubach 0 siblings, 0 replies; 24+ messages in thread From: Peter Staubach @ 2005-07-20 18:50 UTC (permalink / raw) To: Jan Blunck Cc: Chris Wedgwood, Linus Torvalds, Andrew Morton, Linux-Kernel Mailing List Jan Blunck wrote: >> >> Also, how is lseek + readdir supposed to work in general? > > > This is how libc is reading directories (at least on arch s390x): > > getdents() != 0 > lseek() to d_off of last dirent > getdents() != 0 > lseek() to d_off of last dirent > getdents() == 0 > return > > Therefore I really need values that make sense for d_off. Therefore I > really need values that make (some kind of) sense for i_size. Please be aware that not all filesystems treat directories as sequential files, filled with with odd sized records. Some directories are distributed across the entire file system and the d_off fields really contain the address on disk of the next entry in the directory. The d_off field is really just a cookie and is not meant to be interpreted as an offset into the directory. The lseek() system call is used, but only because that was a handy way of positioning the pointer into the directory. It is unfortunate that the getdents(2) system call was not defined with a cookie as one of its arguments to indicate where to start fetching entries from. This would have reduced the confusion about treating directories as files or not. Thanx... ps ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-15 18:52 ` Linus Torvalds 2005-07-16 0:39 ` Chris Wedgwood @ 2005-07-19 9:46 ` Jan Blunck 1 sibling, 0 replies; 24+ messages in thread From: Jan Blunck @ 2005-07-19 9:46 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, Linux-Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 638 bytes --] Linus Torvalds wrote: > > I really think you should update the "simple_xxx()" functions instead, and > thus make this happen for _any_ filesystem that uses the simple fs helper > functions. > Ok, I revised the patch. This is just the first step with changes to libfs and ramfs. To move the directory i_size handling completely to libfs it is necessary to allocate the inodes in libfs. Therefore, simple_get_inode() and simple_mknod_init() (honestly, I don't like that name) are introduced. simple_mknod_init() uses a callback mechanism to initialize the file system specific stuff of the inodes. What do you think of it? Jan [-- Attachment #2: libfs_bogo_dirent_size.diff --] [-- Type: text/x-patch, Size: 4223 bytes --] fs/libfs.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++- include/linux/fs.h | 7 +++++ 2 files changed, 72 insertions(+), 1 deletion(-) Index: linux-2.6/fs/libfs.c =================================================================== --- linux-2.6.orig/fs/libfs.c +++ linux-2.6/fs/libfs.c @@ -238,13 +238,72 @@ Enomem: return ERR_PTR(-ENOMEM); } +struct inode *simple_get_inode(struct super_block *sb, int mode, + int (*init)(struct inode *, void *), void *arg) +{ + struct inode * inode = new_inode(sb); + + if (!inode) + return ERR_PTR(-ENOSPC); + + inode->i_mode = mode; + inode->i_uid = current->fsuid; + inode->i_gid = current->fsgid; + inode->i_blksize = sb->s_blocksize; + inode->i_blocks = 0; + inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; + + if (S_ISDIR(mode)) { + /* directory inodes start with i_nlink == 2 (for "." entry) */ + inode->i_nlink++; + /* i_size for "." and ".." */ + inode->i_size = 2 * SIMPLE_BOGO_DIRENT_SIZE; + } + + if (init) { + int error = init(inode, arg); + if (error) { + iput(inode); + inode = ERR_PTR(error); + } + } + + return inode; +} + +int simple_mknod_init(struct inode *dir, struct dentry *dentry, int mode, + int (*init)(struct inode *, void *), void *arg) +{ + struct inode * inode; + + inode = simple_get_inode(dir->i_sb, mode, init, arg); + if (IS_ERR(inode)) + return PTR_ERR(inode); + + if (dir->i_mode & S_ISGID) { + inode->i_gid = dir->i_gid; + if (S_ISDIR(mode)) + inode->i_mode |= S_ISGID; + } + + if (S_ISDIR(mode)) + dir->i_nlink++; + + dir->i_mtime = dir->i_ctime = CURRENT_TIME; + dir->i_size += SIMPLE_BOGO_DIRENT_SIZE; + d_instantiate(dentry, inode); + dget(dentry); /* pin the dentry in core */ + return 0; +} + int simple_link(struct dentry *old_dentry, struct inode *dir, struct dentry *dentry) { struct inode *inode = old_dentry->d_inode; - inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME; + inode->i_ctime = dir->i_mtime = dir->i_ctime = CURRENT_TIME; inode->i_nlink++; atomic_inc(&inode->i_count); + dir->i_size += SIMPLE_BOGO_DIRENT_SIZE; dget(dentry); d_instantiate(dentry, inode); return 0; @@ -276,6 +335,7 @@ int simple_unlink(struct inode *dir, str inode->i_ctime = dir->i_ctime = dir->i_mtime = CURRENT_TIME; inode->i_nlink--; + dir->i_size -= SIMPLE_BOGO_DIRENT_SIZE; dput(dentry); return 0; } @@ -311,6 +371,8 @@ int simple_rename(struct inode *old_dir, old_dir->i_ctime = old_dir->i_mtime = new_dir->i_ctime = new_dir->i_mtime = inode->i_ctime = CURRENT_TIME; + old_dir->i_size -= SIMPLE_BOGO_DIRENT_SIZE; + new_dir->i_size += SIMPLE_BOGO_DIRENT_SIZE; return 0; } @@ -629,6 +691,8 @@ EXPORT_SYMBOL(simple_empty); EXPORT_SYMBOL(d_alloc_name); EXPORT_SYMBOL(simple_fill_super); EXPORT_SYMBOL(simple_getattr); +EXPORT_SYMBOL(simple_get_inode); +EXPORT_SYMBOL(simple_mknod_init); EXPORT_SYMBOL(simple_link); EXPORT_SYMBOL(simple_lookup); EXPORT_SYMBOL(simple_pin_fs); Index: linux-2.6/include/linux/fs.h =================================================================== --- linux-2.6.orig/include/linux/fs.h +++ linux-2.6/include/linux/fs.h @@ -1624,6 +1624,11 @@ extern loff_t dcache_dir_lseek(struct fi extern int dcache_readdir(struct file *, void *, filldir_t); extern int simple_getattr(struct vfsmount *, struct dentry *, struct kstat *); extern int simple_statfs(struct super_block *, struct kstatfs *); +extern struct inode *simple_get_inode(struct super_block *, int , + int (*init)(struct inode *, void *), + void *); +extern int simple_mknod_init(struct inode *, struct dentry *, int , + int (*init)(struct inode *, void *), void *); extern int simple_link(struct dentry *, struct inode *, struct dentry *); extern int simple_unlink(struct inode *, struct dentry *); extern int simple_rmdir(struct inode *, struct dentry *); @@ -1744,6 +1749,8 @@ ssize_t simple_attr_read(struct file *fi ssize_t simple_attr_write(struct file *file, const char __user *buf, size_t len, loff_t *ppos); +/* Pretend that each dirent is of this size in simple directories */ +#define SIMPLE_BOGO_DIRENT_SIZE 20 #ifdef CONFIG_SECURITY static inline char *alloc_secdata(void) [-- Attachment #3: ramfs_bogo_dirent_size.diff --] [-- Type: text/x-patch, Size: 4622 bytes --] fs/ramfs/inode.c | 118 ++++++++++++++++++++++--------------------------------- 1 files changed, 48 insertions(+), 70 deletions(-) Index: linux-2.6/fs/ramfs/inode.c =================================================================== --- linux-2.6.orig/fs/ramfs/inode.c +++ linux-2.6/fs/ramfs/inode.c @@ -50,40 +50,40 @@ static struct backing_dev_info ramfs_bac BDI_CAP_READ_MAP | BDI_CAP_WRITE_MAP | BDI_CAP_EXEC_MAP, }; -struct inode *ramfs_get_inode(struct super_block *sb, int mode, dev_t dev) -{ - struct inode * inode = new_inode(sb); - - if (inode) { - inode->i_mode = mode; - inode->i_uid = current->fsuid; - inode->i_gid = current->fsgid; - inode->i_blksize = PAGE_CACHE_SIZE; - inode->i_blocks = 0; - inode->i_mapping->a_ops = &ramfs_aops; - inode->i_mapping->backing_dev_info = &ramfs_backing_dev_info; - inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; - switch (mode & S_IFMT) { - default: - init_special_inode(inode, mode, dev); - break; - case S_IFREG: - inode->i_op = &ramfs_file_inode_operations; - inode->i_fop = &ramfs_file_operations; - break; - case S_IFDIR: - inode->i_op = &ramfs_dir_inode_operations; - inode->i_fop = &simple_dir_operations; - - /* directory inodes start off with i_nlink == 2 (for "." entry) */ - inode->i_nlink++; - break; - case S_IFLNK: - inode->i_op = &page_symlink_inode_operations; - break; - } +struct ramfs_init_args { + dev_t dev; + const char *name; +}; + +static int ramfs_inode_init(struct inode *inode, void *__arg) +{ + struct ramfs_init_args *arg = (struct ramfs_init_args *)__arg; + int len; + int err = 0; + + inode->i_mapping->a_ops = &ramfs_aops; + inode->i_mapping->backing_dev_info = &ramfs_backing_dev_info; + + switch (inode->i_mode & S_IFMT) { + case S_IFREG: + inode->i_op = &ramfs_file_inode_operations; + inode->i_fop = &ramfs_file_operations; + break; + case S_IFDIR: + inode->i_op = &ramfs_dir_inode_operations; + inode->i_fop = &simple_dir_operations; + break; + case S_IFLNK: + inode->i_op = &page_symlink_inode_operations; + len = strlen(arg->symname)+1; + err = page_symlink(inode, arg->symname, len); + break; + default: + init_special_inode(inode, inode->i_mode, arg->dev); + break; } - return inode; + + return err; } /* @@ -93,53 +93,31 @@ struct inode *ramfs_get_inode(struct sup static int ramfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev) { - struct inode * inode = ramfs_get_inode(dir->i_sb, mode, dev); - int error = -ENOSPC; + struct ramfs_init_args args = { .dev = dev }; - if (inode) { - if (dir->i_mode & S_ISGID) { - inode->i_gid = dir->i_gid; - if (S_ISDIR(mode)) - inode->i_mode |= S_ISGID; - } - d_instantiate(dentry, inode); - dget(dentry); /* Extra count - pin the dentry in core */ - error = 0; - } - return error; + return simple_mknod_init(dir, dentry, mode, + ramfs_inode_init, (void *) &args); } static int ramfs_mkdir(struct inode * dir, struct dentry * dentry, int mode) { - int retval = ramfs_mknod(dir, dentry, mode | S_IFDIR, 0); - if (!retval) - dir->i_nlink++; - return retval; + return simple_mknod_init(dir, dentry, mode|S_IFDIR, + ramfs_inode_init, NULL); } static int ramfs_create(struct inode *dir, struct dentry *dentry, int mode, struct nameidata *nd) { - return ramfs_mknod(dir, dentry, mode | S_IFREG, 0); + return simple_mknod_init(dir, dentry, mode|S_IFREG, + ramfs_inode_init, NULL); } -static int ramfs_symlink(struct inode * dir, struct dentry *dentry, const char * symname) +static int ramfs_symlink(struct inode *dir, struct dentry *dentry, + const char *name) { - struct inode *inode; - int error = -ENOSPC; + struct ramfs_init_args args = { .name = name }; - inode = ramfs_get_inode(dir->i_sb, S_IFLNK|S_IRWXUGO, 0); - if (inode) { - int l = strlen(symname)+1; - error = page_symlink(inode, symname, l); - if (!error) { - if (dir->i_mode & S_ISGID) - inode->i_gid = dir->i_gid; - d_instantiate(dentry, inode); - dget(dentry); - } else - iput(inode); - } - return error; + return simple_mknod_init(dir, dentry, S_IFLNK|S_IRWXUGO, + ramfs_inode_init, (void *) &args); } static struct address_space_operations ramfs_aops = { @@ -189,9 +167,9 @@ static int ramfs_fill_super(struct super sb->s_magic = RAMFS_MAGIC; sb->s_op = &ramfs_ops; sb->s_time_gran = 1; - inode = ramfs_get_inode(sb, S_IFDIR | 0755, 0); - if (!inode) - return -ENOMEM; + inode = simple_get_inode(sb, S_IFDIR|0755, ramfs_inode_init, NULL); + if (IS_ERR(inode)) + return PTR_ERR(inode); // -ENOMEM root = d_alloc_root(inode); if (!root) { ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <4qoKs-6yv-13@gated-at.bofh.it>]
[parent not found: <4qoU5-6CQ-3@gated-at.bofh.it>]
[parent not found: <4qvsI-32Y-17@gated-at.bofh.it>]
* Re: [PATCH] ramfs: pretend dirent sizes [not found] ` <4qvsI-32Y-17@gated-at.bofh.it> @ 2005-07-15 13:12 ` Bodo Eggert 2005-07-19 9:13 ` Jan Blunck 0 siblings, 1 reply; 24+ messages in thread From: Bodo Eggert @ 2005-07-15 13:12 UTC (permalink / raw) To: Jan Blunck, Andrew Morton, torvalds, linux-kernel Jan Blunck <j.blunck@tu-harburg.de> wrote: > Andrew Morton wrote: > > Does it really matter? > > > > Yes! At least for me and my union mounts implementation. Is there a reason for not using size == link-count (or even static)? -- Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF verbreiteten Lügen zu sabotieren. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH] ramfs: pretend dirent sizes 2005-07-15 13:12 ` Bodo Eggert @ 2005-07-19 9:13 ` Jan Blunck 0 siblings, 0 replies; 24+ messages in thread From: Jan Blunck @ 2005-07-19 9:13 UTC (permalink / raw) To: 7eggert; +Cc: Andrew Morton, torvalds, linux-kernel Bodo Eggert wrote: > Jan Blunck <j.blunck@tu-harburg.de> wrote: > >>Andrew Morton wrote: > > >> > Does it really matter? >> > >> >>Yes! At least for me and my union mounts implementation. > > > Is there a reason for not using size == link-count (or even static)? Yes, the link count is only increased for subdirectories (because of the ".." link) and not for regular files. Jan ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2005-07-21 7:28 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-15 3:01 [PATCH] ramfs: pretend dirent sizes Jan Blunck
2005-07-15 3:11 ` Andrew Morton
2005-07-15 10:16 ` Jan Blunck
2005-07-15 18:52 ` Linus Torvalds
2005-07-16 0:39 ` Chris Wedgwood
2005-07-19 9:28 ` Jan Blunck
2005-07-19 16:16 ` Chris Wedgwood
2005-07-19 18:22 ` Jan Blunck
2005-07-19 18:32 ` Chris Wedgwood
2005-07-19 19:14 ` Jan Blunck
2005-07-19 19:16 ` Chris Wedgwood
2005-07-20 11:21 ` Jörn Engel
2005-07-20 18:11 ` Chris Wedgwood
2005-07-20 18:48 ` Jan Blunck
2005-07-20 18:52 ` Peter Staubach
2005-07-21 7:26 ` Jörn Engel
2005-07-21 7:20 ` Jörn Engel
2005-07-21 7:25 ` Chris Wedgwood
2005-07-21 7:27 ` Jörn Engel
2005-07-20 18:24 ` Jan Blunck
2005-07-20 18:50 ` Peter Staubach
2005-07-19 9:46 ` Jan Blunck
[not found] <4qoKs-6yv-13@gated-at.bofh.it>
[not found] ` <4qoU5-6CQ-3@gated-at.bofh.it>
[not found] ` <4qvsI-32Y-17@gated-at.bofh.it>
2005-07-15 13:12 ` Bodo Eggert
2005-07-19 9:13 ` Jan Blunck
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox