public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH][CFT] (updated) ext2 directories in pagecache
@ 2001-04-29 20:35 Daniel Phillips
  2001-05-02  3:03 ` Albert Cranford
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Phillips @ 2001-04-29 20:35 UTC (permalink / raw)
  To: viro; +Cc: linux-kernel

> Patch is on ftp.math.psu.edu/pub/viro/ext2-dir-patch-S4.gz

Here is my ext2 directory index as a patch against your patch:

    http://kernelnewbies.org/~phillips/htree/dx.pcache-2.4.4
    
Changes:

    - COMBSORT macro replaced by custom sort code
    - Most #ifdef CONFIG_EXT2_INDEX's changed to if (<constant>)

To do:

  - Split up the split code
  - Finalize hash function
  - Test/debug big endian
  - Fall back to linear search if bad index detected
  - Fail gracefully on random data
  - Remove the tracing and test options

To apply:

    cd source/tree
    zcat ext2-dir-patch-S4.gz | patch -p1
    cat dx.pcache-2.4.4 | patch -p0

To create an indexed directory:

    mount /dev/hdxxx /test -o index
    mkdir /test/foo

^ permalink raw reply	[flat|nested] 30+ messages in thread
* Re: [PATCH][CFT] (updated) ext2 directories in pagecache
@ 2001-05-11  7:10 Andreas Dilger
  2001-05-11  7:19 ` Alexander Viro
  2001-05-11 13:02 ` Daniel Phillips
  0 siblings, 2 replies; 30+ messages in thread
From: Andreas Dilger @ 2001-05-11  7:10 UTC (permalink / raw)
  To: adilger; +Cc: phillips, Linux kernel development list

[-- Attachment #1: Type: text/plain, Size: 2351 bytes --]

I previously wrote:
> OK, here are the patches described above.
> 
> Unfortunately, they haven't been tested.  I've given them several
> eyeballings and they appear OK, but when I try to run the ext2 index code
> (even without "-o index" mount option) my system deadlocks somwhere
> inside my ext2i module (tight loop, but SysRQ still works).  I doubt
> it is due to these patches, but possibly because I am also working on
> ext3 code in the same kernel.  I'm just in the process of getting kdb
> installed into that kernel so I can find out just where it is looping.

I've tested again, now with kdb, and the system loops in ext2_find_entry()
or ext2_add_link(), because there is a directory with a zero rec_len.
While the actual cause of this problem is elsewhere, the fact that
ext2_next_entry() will loop forever with a bad rec_len is a bug not in
the old ext2 code.

I have changed ext2_next_entry() to verify rec_len > EXT2_DIR_REC_LEN(0),
and added a helper function ext2_next_empty() which is nearly the same,
but it follows the de links until it finds one with enough space for a
new entry (used in ext2_add_link()).  The former is useful for both Al
and Daniel's code, while the latter may only be useful for Daniel's.

However, the way that I currently fixed ext2_next_entry() is probably
not very safe.  If it detects a bad rec_len, we should probably not
touch that block anymore, but it currently just makes it look like the
block is empty.  That may lead to deleting the rest of the directory
entries in the block, although this is what e2fsck does anyways...  I
at least need to set the error flag in the superblock...  Next patch.

I spotted another bug, namely that when allocating a new directory page
it sets rec_len to blocksize, but does not zero the inode field...  This
would imply that we would skip a bogus directory entry at the start of
a new page.

As yet I haven't found the cause of the real bug, but at least now my
kernel doesn't lock up on (relatively common) bogus data.

Cheers, Andreas

PS - patch has recieved some editing to remove the other INDEX flag stuff,
     so hopefully it is OK
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

[-- Attachment #2: Type: text/plain, Size: 10724 bytes --]

diff -ru linux/fs/ext2i.orig/dir.c linux/fs/ext2i/dir.c
--- linux/fs/ext2i.orig/dir.c	Thu May 10 12:10:22 2001
+++ linux/fs/ext2i/dir.c	Fri May 11 00:01:46 2001
@@ -573,20 +599,48 @@
 }
 
 /*
- * p is at least 6 bytes before the end of page
+ * Check record length to ensure we don't loop on a bad directory entry.
+ * de is at least 6 bytes before the end of page.
  */
-static inline ext2_dirent *ext2_next_entry(ext2_dirent *p)
+static inline ext2_dirent *ext2_next_entry(ext2_dirent *de, ext2_dirent *top)
 {
-	return (ext2_dirent *)((char*)p + le16_to_cpu(p->rec_len));
+	int len = le16_to_cpu(de->rec_len);
+	if (len < EXT2_DIR_REC_LEN(0)) {
+		printk(KERN_ERR "EXT2-fs: bad directory record length %d!\n",
+		       len);
+		return top;
+	}
+	return (ext2_dirent *)((char *)de + len);
+}
+
+static inline ext2_dirent *ext2_next_empty(ext2_dirent *de, ext2_dirent *top,
+					   unsigned reclen)
+{
+	while (de <= top) {
+		unsigned nlen = EXT2_DIR_REC_LEN(de->name_len);
+		unsigned rlen = le16_to_cpu(de->rec_len);
+		if (rlen < EXT2_DIR_REC_LEN(0)) {
+			printk(KERN_ERR
+			       "EXT2-fs: bad directory record length %d!\n",
+			       rlen);
+			break;
+		}
+		if ((de->inode? rlen - nlen: rlen) >= reclen)
+			return de;
+		de = (ext2_dirent *) ((char *) de + rlen);
+	}
+	return NULL;
 }
 
 static inline unsigned 
 ext2_validate_entry(char *base, unsigned offset, unsigned mask)
 {
 	ext2_dirent *de = (ext2_dirent*)(base + offset);
 	ext2_dirent *p = (ext2_dirent*)(base + (offset&mask));
-	while ((char*)p < (char*)de)
-		p = ext2_next_entry(p);
+	while (p < de)
+		p = ext2_next_entry(p, de);
 	return (char *)p - base;
 }
 
@@ -640,8 +706,8 @@
 		types = ext2_filetype_table;
 
 	for ( ; n < npages; n++, offset = 0) {
-		char *kaddr, *limit;
-		ext2_dirent *de;
+		char *kaddr;
+		ext2_dirent *de, *top;
 		struct page *page = ext2_get_page(inode, n);
 
 		if (IS_ERR(page))
@@ -652,8 +718,9 @@
 			need_revalidate = 0;
 		}
 		de = (ext2_dirent *)(kaddr+offset);
-		limit = kaddr + PAGE_CACHE_SIZE - EXT2_DIR_REC_LEN(1);
-		for ( ;(char*)de <= limit; de = ext2_next_entry(de))
+		top = (ext2_dirent *)(kaddr + PAGE_CACHE_SIZE -
+				      EXT2_DIR_REC_LEN(0));
+		for ( ; de < top; de = ext2_next_entry(de, top))
 			if (de->inode) {
 				int over;
 				unsigned char d_type = DT_UNKNOWN;
@@ -691,8 +753,11 @@
 	int namelen = dentry->d_name.len;
 	struct buffer_head *bh;
 	struct super_block *sb = dir->i_sb;
-	unsigned blockshift = sb->s_blocksize_bits, blocksize = sb->s_blocksize, block;
+	unsigned blockshift = sb->s_blocksize_bits;
+	unsigned blocksize = sb->s_blocksize;
+	unsigned block;
 	ext2_dirent *de;
-	char *data, *top;
+	char *data;
+
 	*result = NULL;
 	if (namelen > EXT2_NAME_LEN) return NULL;
@@ -705,13 +770,20 @@
 
 		while (1)
 		{
-			if (IS_ERR(bh = ext2_bread (dir, dx_get_block(frame->at), 0)))
-				goto dxfail;
+			ext2_dirent *top;
+
+			bh = ext2_bread (dir, dx_get_block(frame->at), 0);
+			if (IS_ERR(bh))
+				goto dxfail;
 			data = bh->b_data;
-			top = data + blocksize - EXT2_DIR_REC_LEN(namelen);
-			for (de = (ext2_dirent *) data; (char *) de <= top; de = ext2_next_entry(de))
-				if (ext2_match (namelen, name, de))
+			top = (ext2_dirent *)(data + blocksize -
+					      EXT2_DIR_REC_LEN(0));
+			for (de = (ext2_dirent *) data; de < top;
+			     de = ext2_next_entry(de, top))
+				if (ext2_match (namelen, name, de)) {
+					*result = bh;
 					goto dxfound;
+				}
 			brelse (bh);
 			/* Same hash continues in next block?  Search on. */
 			if (++(frame->at) - frame->entries == dx_get_count(frame->entries))
@@ -735,12 +811,23 @@
 	}
 	for (block = 0; block < dir->i_size >> blockshift; block++)
 	{
-		if (IS_ERR(bh = ext2_bread (dir, block, 0))) break;
+		ext2_dirent *top;
+
+		if (IS_ERR(bh = ext2_bread (dir, block, 0))) {
+			ext2_error(sb, __FUNCTION__,
+				   "reading directory %ld block %d, skipped\n",
+				   dir->i_ino, block);
+			continue;
+		}
 		data = bh->b_data;
-		top = data + blocksize - EXT2_DIR_REC_LEN(namelen);
-		for (de = (ext2_dirent *) data; (char *) de <= top; de = ext2_next_entry(de))
-			if (ext2_match (namelen, name, de))
+		top = (ext2_dirent *)(data + blocksize - EXT2_DIR_REC_LEN(0));
+		for (de = (ext2_dirent *) data; de < top;
+		     de = ext2_next_entry(de, top)) {
+			if (ext2_match (namelen, name, de)) {
+				*result = bh;
 				goto found;
+			}
+		}
 		brelse(bh);
 	}
 	return NULL;
@@ -758,7 +842,8 @@ 
 	struct buffer_head *bh = ext2_bread(dir, 0, 0);
 
 	if (!IS_ERR(bh)) {
-		de = ext2_next_entry((ext2_dirent *) bh->b_data);
+		de = ext2_next_entry((ext2_dirent *) bh->b_data,
+				     (ext2_dirent *) bh->b_data);
 		*p = bh;
 	}
 	return de;
@@ -806,10 +895,10 @@
 	int namelen = dentry->d_name.len;
 	struct buffer_head *bh;
 	ext2_dirent *de, *de2;
-	unsigned blockshift = dir->i_sb->s_blocksize_bits, blocksize = 1 << blockshift;
+	unsigned blockshift = dir->i_sb->s_blocksize_bits;
+	unsigned blocksize = 1 << blockshift;
 	unsigned short reclen = EXT2_DIR_REC_LEN(namelen);
-	unsigned nlen, rlen, i, blocks;
-	char *top;
+	unsigned i, blocks;
 	int err;
 	/* Data for deferred index creation path... */
 	struct dx_entry *entries, *at;
@@ -823,4 +912,5 @@
 		unsigned count, split, hash2, block2;
 		struct buffer_head *bh2;
+		ext2_dirent *top;
 		char *data2;
 		unsigned continued;
@@ -833,17 +923,14 @@
 			goto dx_err;
 		data1 = bh->b_data;
 		de = (ext2_dirent *) data1;
-		top = data1 + (0? 200: blocksize);
-		while ((char *) de < top)
-		{
-			nlen = EXT2_DIR_REC_LEN(de->name_len);
-			rlen = le16_to_cpu(de->rec_len);
-			if ((de->inode? rlen - nlen: rlen) >= reclen) goto dx_add;
-			de = (ext2_dirent *) ((char *) de + rlen);
-		}
+		top = (ext2_dirent *)(data1 + blocksize -
+				      EXT2_DIR_REC_LEN(reclen));
+		if ((de = ext2_next_empty(de, top, reclen)))
+			goto dx_add;
 
 		/* Block full, should compress but for now just split */
-		dxtrace(printk("using %u of %u node entries\n", dx_get_count(entries), dx_get_limit(entries)));
+		dxtrace(printk("using %u of %u node entries\n",
+			       dx_get_count(entries), dx_get_limit(entries)));
 		/* Need to split index? */
 		if (dx_get_count(entries) == dx_get_limit(entries))
 		{
@@ -953,8 +1041,6 @@
 		dx_insert_block (frame, hash2 + continued, block2);
 		mark_buffer_dirty (frame->bh);
 		dxtrace(dx_show_index (frame == frames? "root": "node", frame->entries));
-		nlen = EXT2_DIR_REC_LEN(de->name_len);
-		rlen = le16_to_cpu(de->rec_len);
 dx_add:
 		dx_release (frames);
 		goto add;
@@ -971,28 +1057,25 @@
 		goto fail;
 	}
 	blocks = dir->i_size >> blockshift;
-	for (i = 0; i <= blocks; i++)
-	{
+	for (i = 0; i <= blocks; i++) {
+		ext2_dirent *top;
+
 		if (IS_ERR(bh = ext2_bread (dir, i, i == blocks)))
 			goto fail1;
 		de = (ext2_dirent *) bh->b_data;
-		if (i == blocks)
-		{
+		if (i == blocks) {
 			dir->i_size += blocksize;
+			de->inode = 0;
 			de->rec_len = cpu_to_le16(blocksize);
 		}
-		top = bh->b_data + blocksize - reclen;
-		while ((char *) de <= top)
-		{
-			nlen = EXT2_DIR_REC_LEN(de->name_len);
-			rlen = le16_to_cpu(de->rec_len);
-			if ((de->inode? rlen - nlen: rlen) >= reclen) goto add;
-			de = (ext2_dirent *) ((char *) de + rlen);
-		}
-		if (ext2_dx && blocks == 1 && dir->u.ext2_i.i_flags & EXT2_INDEX_FL)
-		{
+		top = (ext2_dirent *)(bh->b_data + blocksize -
+				      EXT2_DIR_REC_LEN(reclen));
+		if ((de = ext2_next_empty(de, top, reclen)))
+			goto add;
+		if (ext2_dx && blocks == 1 && test_opt(dir->i_sb, INDEX)) {
 			struct dx_root *root;
 			struct buffer_head *newbh, *rootbh = bh;
+			ext2_dirent *top;
 			unsigned len;
 
 			dxtrace_on(printk("Creating index\n"));
@@ -1007,24 +1091,27 @@
 			data1 = newbh->b_data;
 			root = (struct dx_root *) rootbh->b_data;
 
-			/* The 0th block becomes the root, move the dirents out */
+			/* 0th block becomes dx_root, move the dirents out */
 			de = (ext2_dirent *) &root->info;
-			assert(de == ext2_next_entry (ext2_next_entry ((ext2_dirent *) root)));
 			len = ((char *) root) + blocksize - (char *) de;
 			memcpy (data1, de, len);
 			de = (ext2_dirent *) data1;
-			top = data1 + len;
-			while (((char *) de2 = ext2_next_entry (de)) < top)
+			top = (ext2_dirent *)(data1 + len);
+			while ((de2 = ext2_next_entry(de, top)) < top)
 				de = de2;
 			de->rec_len = cpu_to_le16(data1 + blocksize - (char *) de);
 			dxtrace(dx_show_leaf ((ext2_dirent *) data1, len, 1));
 
-			/* Initialize the root; the fake dots already exist */
-			de = ext2_next_entry ((ext2_dirent *) &root->fake1);
-			assert(!strncmp(de->name, "..", 2)); // should fall back, set ext2_error
-			de->rec_len = cpu_to_le16(blocksize - EXT2_DIR_REC_LEN(1));
-			memset(&root->info, 0, sizeof(struct dx_root_info));
-			root->info.info_length = sizeof(struct dx_root_info);
+			/* Initialize the root; the dots already exist */
+			de = (ext2_dirent *)(&root->fake2);
+			assert(!memcmp(de->name, "..", 2)); // fix . and ..
+			de->rec_len = cpu_to_le16(blocksize -
+						  EXT2_DIR_REC_LEN(2));
+			root->info.reserved_zero = 0;
+			root->info.hash_version = 0;
+			root->info.info_length = sizeof(root->info);
+			root->info.indirect_levels = 0;
+			root->info.unused_flags = 0;
 			entries = root->entries;
 			dx_set_block (entries, 1);
 			dx_set_count (entries, 1);
@@ -1046,10 +1135,11 @@
 fail:
 	return -ENOENT;
 add:
-	if (de->inode)
-	{
+	if (de->inode) {
+		unsigned nlen = de->name_len;
+
 		de2 = (ext2_dirent *) ((char *) de + nlen);
-		de2->rec_len = cpu_to_le16(rlen - nlen);
+		de2->rec_len = cpu_to_le16(le16_to_cpu(de->rec_len) - nlen);
 		de->rec_len = cpu_to_le16(nlen);
 		de = de2;
 	}
@@ -1074,7 +1164,7 @@
 	while ((char*) de2 < (char*) de)
 	{
 		pde = de2;
-		de2 = ext2_next_entry(de2);
+		de2 = ext2_next_entry(de2, de);
 	}
 	if (pde)
 		pde->rec_len = cpu_to_le16((char*) de + 
@@ -1143,7 +1221,7 @@
 	
 	for (i = 0; i < npages; i++) {
 		char *kaddr;
-		ext2_dirent * de;
+		ext2_dirent * de, *top;
 		page = ext2_get_page(inode, i);
 
 		if (IS_ERR(page))
@@ -1151,9 +1229,10 @@
 
 		kaddr = (char *)page_address(page);
 		de = (ext2_dirent *)kaddr;
-		kaddr += PAGE_CACHE_SIZE-EXT2_DIR_REC_LEN(1);
+		top = (ext2_dirent *)(kaddr + PAGE_CACHE_SIZE -
+				      EXT2_DIR_REC_LEN(0));
 
-		while ((char *)de <= kaddr) {
+		while (de < top) {
 			if (de->inode != 0) {
 				/* check for . and .. */
 				if (de->name[0] != '.')
@@ -1167,10 +1246,14 @@
 				} else if (de->name[1] != '.')
 					goto not_empty;
 			}
-			de = ext2_next_entry(de);
+			de = ext2_next_entry(de, top);
 		}
 		ext2_put_page(page);
 	}
+	if (!EXT2_DIR_LINK_EMPTY(inode))
+		ext2_warning(inode->i_sb, __FUNCTION__,
+			     "empty directory has too many links (%d)",
+			     inode->i_nlink);
 	return 1;
 
 not_empty:

^ permalink raw reply	[flat|nested] 30+ messages in thread
* Re: [PATCH][CFT] (updated) ext2 directories in pagecache
@ 2001-05-10 20:53 Andreas Dilger
  2001-05-11  1:10 ` Daniel Phillips
  0 siblings, 1 reply; 30+ messages in thread
From: Andreas Dilger @ 2001-05-10 20:53 UTC (permalink / raw)
  To: phillips; +Cc: Linux kernel development list

[-- Attachment #1: Type: text/plain, Size: 2352 bytes --]

I previously wrote:
> I have changed the code to do the following:
> - If the COMPAT_DIR_INDEX flag is set at mount/remount time, set the
>   INDEX mount option (the same as "mount -o index").  This removes
>   the need to specify the "-o index" option each time for filesystems
>   which already have indexed directories.
> - New directories NEVER have the INDEX flag set on them.
> - If the INDEX mount option is set, then when directories grow past 1
>   block (and have the index added) they will get the directory INDEX
>   flag set and turn on the superblock COMPAT_DIR_INDEX flag (if off).
> 
> This means that you can have common code for indexed and non-indexed ext2
> filesystems, and the admin either needs to explicitly set COMPAT_DIR_INDEX
> in the superblock or mount with "-o index" (and create a directory > 1 block).
> 
> I have also added some tricks to ext2_inc_count() and ext2_dec_count() so
> that indexed directories are not subject to the EXT2_LINK_MAX.  I've done
> the same as reiserfs, and set i_nlink = 1 if we overflow EXT2_LINK_MAX
> (which has been increased to 65500 for indexed directories). Apparently
> i_nlink = 1 is the right think to do w.r.t. find and other user tools.

OK, here are the patches described above.

The first one changes the use of the various INDEX flags, so that they
only appear when we have mounted with "-o index" (or COMPAT_DIR_INDEX)
and actually created an indexed directory.

The second one changes the i_nlink counting on indexed dirs so that if
it overflows the 16-bit i_link_count field it is set to 1 and we no
longer track link counts on this directory.

Unfortunately, they haven't been tested.  I've given them several
eyeballings and they appear OK, but when I try to run the ext2 index code
(even without "-o index" mount option) my system deadlocks somwhere
inside my ext2i module (tight loop, but SysRQ still works).  I doubt
it is due to these patches, but possibly because I am also working on
ext3 code in the same kernel.  I'm just in the process of getting kdb
installed into that kernel so I can find out just where it is looping.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

[-- Attachment #2: Type: text/plain, Size: 3750 bytes --]

diff -u linux/ext2.orig/dir.c linux/ext2/dir.c
--- linux/ext2.orig/dir.c	Thu May 10 12:10:22 2001
+++ linux/ext2/dir.c	Thu May 10 11:09:51 2001
@@ -27,6 +27,7 @@
 #include <linux/fs.h>
 #include <linux/ext2_fs.h>
 #include <linux/locks.h>
+#include <asm/smplock.h>
 #include <linux/pagemap.h>
 
 #ifndef assert
@@ -201,6 +204,21 @@
 	return hash0;
 }
 
+static void ext2_update_to_indexed(struct inode *dir)
+{
+	struct super_block *sb = dir->i_sb;
+
+	if (test_opt(sb, INDEX) && !EXT2_HAS_COMPAT_FEATURE(sb, DIR_INDEX)) {
+		dxtrace_on(printk("Adding COMPAT_DIR_INDEX feature flag\n"));
+		ext2_update_dynamic_rev(sb);
+		lock_kernel();
+		EXT2_SET_COMPAT_FEATURE(sb, DIR_INDEX);
+		unlock_kernel();
+		ext2_write_super(dir->i_sb);
+	}
+	dir->u.ext2_i.i_flags |= EXT2_INDEX_FL;
+}
+
 /*
  * Debug
  */
@@ -696,8 +720,7 @@
 	char *data, *top;
 	*result = NULL;
 	if (namelen > EXT2_NAME_LEN) return NULL;
-	if (ext2_dx && is_dx(dir))
-	{
+	if (is_dx(dir)) {
 		u32 hash = dx_hash (name, namelen);
 		struct dx_frame frames[2], *frame;
 		if (!(frame = dx_probe (dir, hash, frames)))
@@ -818,8 +860,7 @@
 	char *data1;
 
 	if (!namelen) return -EINVAL;
-	if (ext2_dx && is_dx(dir))
-	{
+	if (is_dx(dir)) {
 		unsigned count, split, hash2, block2;
 		struct buffer_head *bh2;
 		char *data2;
@@ -989,7 +1032,7 @@
 			if ((de->inode? rlen - nlen: rlen) >= reclen) goto add;
 			de = (ext2_dirent *) ((char *) de + rlen);
 		}
-		if (ext2_dx && blocks == 1 && dir->u.ext2_i.i_flags & EXT2_INDEX_FL)
+		if (ext2_dx && blocks == 1 && test_opt(dir->i_sb, INDEX))
 		{
 			struct dx_root *root;
 			struct buffer_head *newbh, *rootbh = bh;
@@ -1029,6 +1072,7 @@
 			dx_set_block (entries, 1);
 			dx_set_count (entries, 1);
			dx_set_limit (entries, dx_root_limit(dir, sizeof(struct dx_root_info)));
+			ext2_update_to_indexed(dir);
 
 			/* Initialize as for dx_probe */
 			hash = dx_hash (name, namelen);
@@ -1107,26 +1154,11 @@
 
 int ext2_make_empty(struct inode *dir, struct inode *parent)
 {
-	struct super_block *sb = dir->i_sb;
 	struct buffer_head *bh;
-	int make_dx = test_opt (sb, INDEX);
-	dir->i_size = sb->s_blocksize;
+
+	dir->i_size = dir->i_sb->s_blocksize;
 	if (IS_ERR(bh = ext2_bread (dir, 0, 1)))
 		return PTR_ERR(bh);
-	if (ext2_dx && make_dx)
-	{
-		// this is a common sequence, need to generalize it
-		if (!EXT2_HAS_COMPAT_FEATURE(sb, EXT2_FEATURE_COMPAT_DIR_INDEX))
-		{
-			dxtrace_on(printk("Adding COMPAT_DIR_INDEX feature\n"));
-			lock_super(sb);
-			ext2_update_dynamic_rev(sb);
-			EXT2_SET_COMPAT_FEATURE(sb, EXT2_FEATURE_COMPAT_DIR_INDEX);
-			unlock_super(sb);
-			ext2_write_super(sb);
-		}
-		dir->u.ext2_i.i_flags |= EXT2_INDEX_FL;
-	}
 	makedots ((ext2_dirent *) bh->b_data, dir, parent);
 	mark_buffer_dirty_inode(bh, parent);
 	brelse (bh);
diff -u linux/ext2.orig/super.c linux/ext2/super.c
--- linux/ext2.orig/super.c	Thu May 10 12:10:22 2001
+++ linux/ext2/super.c	Thu May 10 00:16:31 2001
@@ -330,6 +330,10 @@
 			EXT2_BLOCKS_PER_GROUP(sb),
 			EXT2_INODES_PER_GROUP(sb),
 			sb->u.ext2_sb.s_mount_opt);
+
+	if (EXT2_HAS_COMPAT_FEATURE(sb, EXT2_FEATURE_COMPAT_DIR_INDEX))
+		set_opt (EXT2_SB(sb)->s_mount_opt, INDEX);
+
 #ifdef CONFIG_EXT2_CHECK
 	if (test_opt (sb, CHECK)) {
 		ext2_check_blocks_bitmap (sb);
diff -ru linux/include/linux/ext2_fs.h.orig linux/include/linux/ext2_fs.h
--- include/linux/ext2_fs.h.orig	Thu May 10 12:10:22 2001
+++ include/linux/ext2_fs.h	Thu May 10 12:50:15 2001
@@ -529,7 +549,7 @@
 #ifdef CONFIG_EXT2_INDEX
   enum {ext2_dx = 1};
   #define dx_static static
-  #define is_dx(dir) (dir->u.ext2_i.i_flags & EXT2_INDEX_FL && dir->i_size > dir->i_sb->s_blocksize)
+  #define is_dx(dir) ((dir)->u.ext2_i.i_flags & EXT2_INDEX_FL)
 #else
   enum {ext2_dx = 0};
   #define dx_static

[-- Attachment #3: Type: text/plain, Size: 3135 bytes --]

diff -u linux/ext2.orig/dir.c linux/ext2/dir.c
--- linux/ext2.orig/dir.c	Thu May 10 12:10:22 2001
+++ linux/ext2/dir.c	Thu May 10 11:09:51 2001
@@ -1171,6 +1203,10 @@
 		}
 		ext2_put_page(page);
 	}
+	if (!EXT2_DIR_LINK_EMPTY(inode))
+		ext2_warning(inode->i_sb, __FUNCTION__,
+			     "empty directory has too many links (%d)",
+			     inode->i_nlink);
 	return 1;
 
 not_empty:
diff -u linux/ext2.orig/namei.c linux/ext2/namei.c
--- linux/ext2.orig/namei.c	Thu May 10 12:10:22 2001
+++ linux/ext2/namei.c	Thu May 10 01:06:42 2001
@@ -41,6 +41,25 @@
  * Couple of helper functions - make the code slightly cleaner.
  */
 
+static inline void ext2_inc_count_dir(struct inode *dir)
+{
+	if (is_dx(dir)) {
+		if (dir->i_nlink > 1) {
+			dir->i_nlink++;
+			if (dir->i_nlink >= EXT2_LINK_MAX * 2)
+				dir->i_nlink = 1;
+		}
+	} else
+		dir->i_nlink++;
+	mark_inode_dirty(dir);
+}
+
+static inline void ext2_dec_count_dir(struct inode *dir)
+{
+	if (!is_dx(dir) || dir->i_nlink > 2) dir->i_nlink--;
+	mark_inode_dirty(dir);
+}
+
 static inline void ext2_inc_count(struct inode *inode)
 {
 	inode->i_nlink++;
@@ -186,10 +210,10 @@
 	struct inode * inode;
 	int err = -EMLINK;
 
-	if (dir->i_nlink >= EXT2_LINK_MAX)
+	if (EXT2_DIR_LINK_MAX(dir))
 		goto out;
 
-	ext2_inc_count(dir);
+	ext2_inc_count_dir(dir);
 
 	inode = ext2_new_inode (dir, S_IFDIR | mode);
 	err = PTR_ERR(inode);
@@ -219,7 +243,7 @@
 	ext2_dec_count(inode);
 	iput(inode);
 out_dir:
-	ext2_dec_count(dir);
+	ext2_dec_count_dir(dir);
 	goto out;
 }
 
@@ -255,7 +278,7 @@
 		if (!err) {
 			inode->i_size = 0;
 			ext2_dec_count(inode);
-			ext2_dec_count(dir);
+			ext2_dec_count_dir(dir);
 		}
 	}
 	return err;
@@ -302,10 +325,9 @@
 			new_inode->i_nlink--;
 		ext2_dec_count(new_inode);
 	} else {
-		if (dir_de) {
+		if (dir_de && EXT2_DIR_LINK_MAX(new_dir)) {
 			err = -EMLINK;
-			if (new_dir->i_nlink >= EXT2_LINK_MAX)
-				goto out_dir;
+			goto out_dir;
 		}
 		ext2_inc_count(old_inode);
 		err = ext2_add_link(new_dentry, old_inode);
@@ -314,7 +336,7 @@
 			goto out_dir;
 		}
 		if (dir_de)
-			ext2_inc_count(new_dir);
+			ext2_inc_count_dir(new_dir);
 	}
 
 	ext2_delete_entry (old_dir, old_de, old_bh);
@@ -322,7 +344,7 @@
 
 	if (dir_de) {
 		ext2_set_link(old_inode, dir_de, dir_bh, new_dir);
-		ext2_dec_count(old_dir);
+		ext2_dec_count_dir(old_dir);
 	}
 	return 0;
 
diff -ru linux/include/linux/ext2_fs.h.orig linux/include/linux/ext2_fs.h
--- include/linux/ext2_fs.h.dx	Thu May 10 12:10:22 2001
+++ include/linux/ext2_fs.h	Thu May 10 12:50:15 2001
@@ -529,11 +549,15 @@
 #ifdef CONFIG_EXT2_INDEX
   enum {ext2_dx = 1};
   #define dx_static static
   #define is_dx(dir) ((dir)->u.ext2_i.i_flags & EXT2_INDEX_FL)
+#define EXT2_DIR_LINK_MAX(dir) (!is_dx(dir) && (dir)->i_nlink >= EXT2_LINK_MAX)
+#define EXT2_DIR_LINK_EMPTY(dir) ((dir)->i_nlink == 2 || (dir)->i_nlink == 1)
 #else
   enum {ext2_dx = 0};
   #define dx_static
   #define is_dx(dir) 0
+#define EXT2_DIR_LINK_MAX(dir) ((dir)->i_nlink >= EXT2_LINK_MAX)
+#define EXT2_DIR_LINK_EMPTY(dir) ((dir)->i_nlink == 2)
 #endif
 
 #ifdef __KERNEL__
 extern void ext2_put_inode (struct inode *);

^ permalink raw reply	[flat|nested] 30+ messages in thread
* Re: [PATCH][CFT] (updated) ext2 directories in pagecache
@ 2001-05-10  7:21 Andreas Dilger
  2001-05-13 22:15 ` Daniel Phillips
  0 siblings, 1 reply; 30+ messages in thread
From: Andreas Dilger @ 2001-05-10  7:21 UTC (permalink / raw)
  To: phillips; +Cc: Linux kernel development list

I previously wrote:
> I was looking at the new patch, and I saw something that puzzles me.
> Why do you set the EXT2_INDEX_FL on a new (empty) directory, rather
> than only setting it when the dx_root index is created?
> 
> Setting the flag earlier than that makes it mostly useless, since it
> will be set on basically every directory.  Not setting it would also
> make your is_dx() check simply a check for the EXT2_INDEX_FL bit (no
> need to also check size).
> 
> Also no need to set EXT2_COMPAT_DIR_INDEX until such a time that we have
> a (real) directory with an index, to avoid gratuitous incompatibility
> with e2fsck.

I have changed the code to do the following:
- If the COMPAT_DIR_INDEX flag is set at mount/remount time, set the
  INDEX mount option (the same as "mount -o index").  This removes
  the need to specify the "-o index" option each time for filesystems
  which already have indexed directories.
- New directories NEVER have the INDEX flag set on them.
- If the INDEX mount option is set, then when directories grow past 1
  block (and have the index added) they will get the directory INDEX
  flag set and turn on the superblock COMPAT_DIR_INDEX flag (if off).

This means that you can have common code for indexed and non-indexed ext2
filesystems, and the admin either needs to explicitly set COMPAT_DIR_INDEX
in the superblock or mount with "-o index" (and create a directory > 1 block).

I have also added some tricks to ext2_inc_count() and ext2_dec_count() so
that indexed directories are not subject to the EXT2_LINK_MAX.  I've done
the same as reiserfs, and set i_nlink = 1 if we overflow EXT2_LINK_MAX
(which has been increased to 65500 for indexed directories). Apparently
i_nlink = 1 is the right think to do w.r.t. find and other user tools.

Patches need some light testing before being posted.

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

^ permalink raw reply	[flat|nested] 30+ messages in thread
* Re: [PATCH][CFT] (updated) ext2 directories in pagecache
@ 2001-05-06 23:16 Daniel Phillips
  2001-05-09 21:22 ` Andreas Dilger
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Phillips @ 2001-05-06 23:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: Albert Cranford

This patch updates ext2_getblk and ext2_bread to use the ERR_PTR style 
of error return.  As Al Viro pointed out, this is a better way of doing 
things for a function returning a pointer.  This approach would have 
prevented the bug I fixed with the previous patch.  20 20 hindsight, 
and I can only plead that I was following the interface of the old 
ext2_getblk.  But since these functions are only used only by the ext2 
directory code - which in turn is the only part of ext2 that is 
interested in file data - there was no problem changing the interface.

The patch is at:

    http://nl.linux.org/~phillips/htree/dx.pcache-2.4.4-4

This is lightly tested and apparently stable.  I wish I could say the 
same for kernel 2.4.4 - cache performance sucks horribly.  
(Nontechnical evaluation.)  So it is probably not a good idea to take 
benchmarks too seriously this month.  The previous stable kernels, 
2.4.2 and 2.4.3, had their problems too, fixable via patching.  Maybe 
next month...

This patch requires Al Viro's directory-in-page-cache patch to be 
applied first, available from:

   ftp://ftp.math.psu.edu/pub/viro/ext2-dir-patch-S4.gz

The other flavor of indexing patch, dx.testme..., also does 
directory-in-page-cache, using the good old ext2 directory code.  This 
works fine and is stable, but IMHO Al's patches constitute a pretty 
major cleanup.

To apply:

    cd source/tree
    zcat ext2-dir-patch-S4.gz | patch -p1
    cat dx.pcache-2.4.4-4 | patch -p0

To create an indexed directory:

    mount /dev/hdxxx /test -o index
    mkdir /test/foo

--
Daniel

^ permalink raw reply	[flat|nested] 30+ messages in thread
* Re: [PATCH][CFT] (updated) ext2 directories in pagecache
@ 2001-05-03 21:10 Daniel Phillips
  2001-05-03 22:59 ` Albert Cranford
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Phillips @ 2001-05-03 21:10 UTC (permalink / raw)
  To: ac9410; +Cc: linux-kernel

> This combination against 2.4.4 won't allow directories to be moved.
> Ex: mv a b #fails with I/O error.  See attached strace.
> But with ext2-dir-patch-S4 by itself, mv works as it should.

Now it also works with my index patch as it should:

    http://nl.linux.org/~phillips/htree/dx.pcache-2.4.4-3

It was just an uninitialized *err return.  Next patch I'll follow Al
Viro's suggestion and change ext2_getblk (used *only* in the directory  
code) to use ERR_PTR instead of *err, saving an argument and eliminating
the chance for this kind of dumb error.
    
ext2_getblk now looks like:

struct buffer_head *ext2_getblk (struct inode *inode, u32 block, int create, int *err)
{
	unsigned blockshift = inode->i_sb->s_blocksize_bits;
	unsigned blocksize = 1 << blockshift;
	unsigned pshift = PAGE_CACHE_SHIFT - blockshift;
	struct buffer_head *bh;
	struct page *page;

	if (!(page = grab_cache_page(inode->i_mapping, block >> pshift)))
		goto fail_page;
	if (!page->buffers)
		create_empty_buffers (page, inode->i_dev, blocksize);
	bh = page_buffer (page, block & ((1 << pshift) - 1));
	atomic_inc(&bh->b_count);
	UnlockPage(page);
	page_cache_release(page);
	*err = 0;
	if (buffer_mapped(bh))
		return bh;
	if ((*err = ext2_get_block(inode, block, bh, create)))
		goto out_null;
	if (!buffer_mapped(bh))
		goto out_null;
	if (!buffer_new(bh))
		return bh;
	lock_buffer(bh);
	memset(bh->b_data, 0, blocksize);
	mark_buffer_uptodate(bh, 1);
	mark_buffer_dirty_inode(bh, inode);
	unlock_buffer(bh);
	return bh;
out_null:
	brelse(bh);
	return NULL;
fail_page:
	*err = -EIO;
	return NULL;
}

A little longer, somewhat clearer and much more correct.

--
Daniel

^ permalink raw reply	[flat|nested] 30+ messages in thread
[parent not found: <01050303150500.00633@starship>]
* [PATCH][CFT] ext2 directories in pagecache
@ 2001-04-12 16:33 Alexander Viro
  2001-04-23 22:21 ` [PATCH][CFT] (updated) " Alexander Viro
  0 siblings, 1 reply; 30+ messages in thread
From: Alexander Viro @ 2001-04-12 16:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: Theodore Y. Ts'o

	Folks, IMO ext2-dir-patch got to the stable stage. Currently
it's against 2.4.4-pre2, but it should apply to anything starting with
2.4.2 or so.

	Ted, could you review it for potential inclusion into 2.4 once
it gets enough testing? It's ext2-only (the only change outside of
ext2 is exporting waitfor_one_page()), it doesn't change fs layout,
it seriously simplifies ext2/dir.c and ext2/namei.c and it gives better
VM behaviour.

	Patch is on ftp.math.psu.edu/pub/viro/ext2-dir-patch.gz

	Folks, please give it a good beating - it works here, but I'd
really like it to get wide testing. Help would be very welcome.
								Al


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2001-05-16  3:34 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-04-29 20:35 [PATCH][CFT] (updated) ext2 directories in pagecache Daniel Phillips
2001-05-02  3:03 ` Albert Cranford
  -- strict thread matches above, loose matches on Subject: below --
2001-05-11  7:10 Andreas Dilger
2001-05-11  7:19 ` Alexander Viro
2001-05-11 16:34   ` Andreas Dilger
2001-05-11 20:20     ` Daniel Phillips
2001-05-12 21:41       ` Andreas Dilger
2001-05-12 22:18         ` Alexander Viro
2001-05-13  2:13           ` Daniel Phillips
2001-05-13  2:34         ` Daniel Phillips
2001-05-14 18:33           ` Andreas Dilger
2001-05-14 19:29             ` Daniel Phillips
2001-05-14 21:50             ` Daniel Phillips
2001-05-16  3:11               ` Daniel Phillips
2001-05-11 13:02 ` Daniel Phillips
2001-05-10 20:53 Andreas Dilger
2001-05-11  1:10 ` Daniel Phillips
2001-05-10  7:21 Andreas Dilger
2001-05-13 22:15 ` Daniel Phillips
2001-05-14 20:04   ` Andreas Dilger
2001-05-14 22:18     ` Daniel Phillips
2001-05-14 22:23     ` Daniel Phillips
2001-05-06 23:16 Daniel Phillips
2001-05-09 21:22 ` Andreas Dilger
2001-05-11  1:04   ` Daniel Phillips
2001-05-03 21:10 Daniel Phillips
2001-05-03 22:59 ` Albert Cranford
     [not found] <01050303150500.00633@starship>
2001-05-03  1:43 ` Daniel Phillips
2001-04-12 16:33 [PATCH][CFT] " Alexander Viro
2001-04-23 22:21 ` [PATCH][CFT] (updated) " Alexander Viro
2001-04-28 18:16   ` Alexander Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox