public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [CHECKER] warnings in fs/ext3/namei.c (2.4.19) where disk read errors get ignored, causing non-empty dir to be deleted
@ 2004-04-27  6:41 Junfeng Yang
  2004-04-27  7:44 ` [Ext2-devel] " Andreas Dilger
  0 siblings, 1 reply; 7+ messages in thread
From: Junfeng Yang @ 2004-04-27  6:41 UTC (permalink / raw)
  To: Linux Kernel Mailing List, ext2-devel, mc, Madanlal S Musuvathi,
	David L. Dill

Hi,

We checked EXT3 filesystem on 2.4.19 recently and found 2 cases that look
like bugs.  For both of the cases, disk read errors are ignored, which
appears to cause a non-empty directory to be wrongly deleted or a dir to
contain more than one entries with identical names.

I'm not sure if they are real bugs or not, so your confirmations
/clarifications are appericated.

Please let me know if anything isn't clear

all warnings are in file fs/ext3/namei.c

----------------------------------------------------------------------------
[BUG] A non-empty dir may be deleted because ext3_read errors are ignored
by ext3_find_entry.  empty_dir is called whenenver ext3_rmdir tries to
remove a directory.


static int empty_dir (struct inode * inode)
{
			bh = ext3_bread (NULL, inode,
				offset >> EXT3_BLOCK_SIZE_BITS(sb), 0, &err);
			if (!bh) {
#if 0
				ext3_error (sb, "empty_dir",
				"directory #%lu contains a hole at offset %lu",
					inode->i_ino, offset);
#endif
				offset += sb->s_blocksize;
ERROR --->			continue;
			}
			de = (struct ext3_dir_entry_2 *) bh->b_data;
		}

----------------------------------------------------------------------------
[BUG] A dir may end up containing more than one entries with identical
names because because disk read errors are ignored by ext3_find_entry.
ext3_find_entry is called by lots of other ext3 functions (ext3_add_entry,
ext3_unlink, ext3_rename)

static struct buffer_head * ext3_find_entry (struct dentry *dentry,
					struct ext3_dir_entry_2 ** res_dir)
{
.....
		if ((bh = bh_use[ra_ptr++]) == NULL)
			goto next;
		wait_on_buffer(bh);
		if (!buffer_uptodate(bh)) {
			/* read error, skip block & hope for the best */
			brelse(bh);
ERROR --->		goto next;
		}





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ext2-devel] [CHECKER] warnings in fs/ext3/namei.c (2.4.19) where disk read errors get ignored, causing non-empty dir to be deleted
  2004-04-27  6:41 [CHECKER] warnings in fs/ext3/namei.c (2.4.19) where disk read errors get ignored, causing non-empty dir to be deleted Junfeng Yang
@ 2004-04-27  7:44 ` Andreas Dilger
  2004-05-03  9:59   ` Pavel Machek
  2004-05-03 14:10   ` Jörn Engel
  0 siblings, 2 replies; 7+ messages in thread
From: Andreas Dilger @ 2004-04-27  7:44 UTC (permalink / raw)
  To: Junfeng Yang
  Cc: Linux Kernel Mailing List, ext2-devel, mc, Madanlal S Musuvathi,
	David L. Dill

On Apr 26, 2004  23:41 -0700, Junfeng Yang wrote:
> We checked EXT3 filesystem on 2.4.19 recently and found 2 cases that look
> like bugs.  For both of the cases, disk read errors are ignored, which
> appears to cause a non-empty directory to be wrongly deleted or a dir to
> contain more than one entries with identical names.
> 
> I'm not sure if they are real bugs or not, so your confirmations
> /clarifications are appericated.

I don't consider this a bug, but rather a conscious decision on the part of
the developers.  If you are trying to delete a directory and it has read
errors, then it is better to let the unlink succeed than to refuse to unlink
the directory.

> ----------------------------------------------------------------------------
> [BUG] A non-empty dir may be deleted because ext3_read errors are ignored
> by ext3_find_entry.  empty_dir is called whenenver ext3_rmdir tries to
> remove a directory.
> 
> 
> static int empty_dir (struct inode * inode)
> {
> 			bh = ext3_bread (NULL, inode,
> 				offset >> EXT3_BLOCK_SIZE_BITS(sb), 0, &err);
> 			if (!bh) {
> #if 0
> 				ext3_error (sb, "empty_dir",
> 				"directory #%lu contains a hole at offset %lu",
> 					inode->i_ino, offset);
> #endif
> 				offset += sb->s_blocksize;
> ERROR --->			continue;
> 			}
> 			de = (struct ext3_dir_entry_2 *) bh->b_data;
> 		}

What is more of a bug is a few lines down.  The error return from
ext3_check_dir_entry() causes the rest of the directory to be skipped and
"1" returned instead of skipping that block/page and continuing to check
the rest of the directory.  With this if there is a read error anywhere
in the directory it is possible to rmdir the directory without actually
deleting entries that are returned by ls.

> ----------------------------------------------------------------------------
> [BUG] A dir may end up containing more than one entries with identical
> names because because disk read errors are ignored by ext3_find_entry.
> ext3_find_entry is called by lots of other ext3 functions (ext3_add_entry,
> ext3_unlink, ext3_rename)
> 
> static struct buffer_head * ext3_find_entry (struct dentry *dentry,
> 					struct ext3_dir_entry_2 ** res_dir)
> {
> .....
> 		if ((bh = bh_use[ra_ptr++]) == NULL)
> 			goto next;
> 		wait_on_buffer(bh);
> 		if (!buffer_uptodate(bh)) {
> 			/* read error, skip block & hope for the best */
> 			brelse(bh);
> ERROR --->		goto next;
> 		}

Again a conscious decision.  If a name is potentially inaccessible because
of an IO error it is better to allow the creation of a potentially duplicate
name than refuse creation of any new entries in the directory.  It's a matter
of allowing the filesystem to be used as well as possible in the face of
failures vs. just giving up and refusing to do anything.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ext2-devel] [CHECKER] warnings in fs/ext3/namei.c (2.4.19) where disk read errors get ignored, causing non-empty dir to be deleted
  2004-04-27  7:44 ` [Ext2-devel] " Andreas Dilger
@ 2004-05-03  9:59   ` Pavel Machek
  2004-05-03 14:10   ` Jörn Engel
  1 sibling, 0 replies; 7+ messages in thread
From: Pavel Machek @ 2004-05-03  9:59 UTC (permalink / raw)
  To: Junfeng Yang, Linux Kernel Mailing List, ext2-devel, mc,
	Madanlal S Musuvathi, David L. Dill

Hi!

> > I'm not sure if they are real bugs or not, so your confirmations
> > /clarifications are appericated.
> 
> I don't consider this a bug, but rather a conscious decision on the part of
> the developers.  If you are trying to delete a directory and it has read
> errors, then it is better to let the unlink succeed than to refuse to unlink
> the directory.

Perhaps this should be documented with a comment?
Its rather subtle.
-- 
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms         


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ext2-devel] [CHECKER] warnings in fs/ext3/namei.c (2.4.19) where disk read errors get ignored, causing non-empty dir to be deleted
  2004-04-27  7:44 ` [Ext2-devel] " Andreas Dilger
  2004-05-03  9:59   ` Pavel Machek
@ 2004-05-03 14:10   ` Jörn Engel
  2004-05-03 16:16     ` Andreas Dilger
  1 sibling, 1 reply; 7+ messages in thread
From: Jörn Engel @ 2004-05-03 14:10 UTC (permalink / raw)
  To: Junfeng Yang, Linux Kernel Mailing List, ext2-devel, mc,
	Madanlal S Musuvathi, David L. Dill

On Tue, 27 April 2004 01:44:55 -0600, Andreas Dilger wrote:
> 
> Again a conscious decision.  If a name is potentially inaccessible because
> of an IO error it is better to allow the creation of a potentially duplicate
> name than refuse creation of any new entries in the directory.  It's a matter
> of allowing the filesystem to be used as well as possible in the face of
> failures vs. just giving up and refusing to do anything.

Do you mind if I doubt the sanity of whoever made that decision?  When
my hard drive fails, I don't care about writing to the fs too much
anymore, I want to *notice* the failure early and to *read* as much as
possible, then put the drive on a pile for test hardware.

But then again, that's just me.

Jörn

-- 
A surrounded army must be given a way out.
-- Sun Tzu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ext2-devel] [CHECKER] warnings in fs/ext3/namei.c (2.4.19) where disk read errors get ignored, causing non-empty dir to be deleted
  2004-05-03 14:10   ` Jörn Engel
@ 2004-05-03 16:16     ` Andreas Dilger
  2004-05-03 17:33       ` Jörn Engel
  0 siblings, 1 reply; 7+ messages in thread
From: Andreas Dilger @ 2004-05-03 16:16 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Junfeng Yang, Linux Kernel Mailing List, ext2-devel, mc,
	Madanlal S Musuvathi, David L. Dill, Andrew Morton,
	Marcelo Tosatti

On May 03, 2004  16:10 +0200, Jörn Engel wrote:
> On Tue, 27 April 2004 01:44:55 -0600, Andreas Dilger wrote:
> > Again a conscious decision.  If a name is potentially inaccessible
> > because of an IO error it is better to allow the creation of a
> > potentially duplicate name than refuse creation of any new entries
> > in the directory.  It's a matter of allowing the filesystem to be
> > used as well as possible in the face of failures vs. just giving up
> > and refusing to do anything.
> 
> Do you mind if I doubt the sanity of whoever made that decision?  When
> my hard drive fails, I don't care about writing to the fs too much
> anymore, I want to *notice* the failure early and to *read* as much as
> possible, then put the drive on a pile for test hardware.

If that's what you want, then mount the filesystem with "errors=remount-ro"
and you will get it.  You can even mount it with "errors=panic" so that the
node reboots and does a full fsck immediately.  For users that have a few
bad blocks on their disk and can't afford to throw the whole disk away this
is a reasonable course of action.

Two things that do deserve fixing in these call paths are that
a) we don't call ext3_error() for an IO error in ext3_find_entry(), so we
   won't do the normal ext3 error handling (mark SB in error, remount-ro
   or panic if desired);
b) in empty_dir() we don't continue checking for non-empty blocks after a
   content error (ext3_check_dir_entry() calls ext3_error() already);
c) we had decided not to mark the SB in error for holes in directories to
   allow leway in the indexed-directory implementation, but this change
   incorrectly also disabled marking the SB in error for real IO errors.

Patch applies equally well to 2.4.25 and 2.6-current.

Cheers, Andreas
========================== ext3-direrrors.diff =============================
--- ./fs/ext3/namei.c.orig	2004-03-09 16:46:43.000000000 -0700
+++ ./fs/ext3/namei.c	2004-05-03 10:10:53.000000000 -0600
@@ -825,6 +825,8 @@ restart:
 		wait_on_buffer(bh);
 		if (!buffer_uptodate(bh)) {
 			/* read error, skip block & hope for the best */
+			ext3_error(sb, __FUNCTION__, "reading directory #%lu "
+				   "offset %lu\n", dir->i_ino, block);
 			brelse(bh);
 			goto next;
 		}
@@ -1712,14 +1720,19 @@ static int empty_dir (struct inode * ino
 	struct buffer_head * bh;
 	struct ext3_dir_entry_2 * de, * de1;
 	struct super_block * sb;
-	int err;
+	int err = 0;
 
 	sb = inode->i_sb;
 	if (inode->i_size < EXT3_DIR_REC_LEN(1) + EXT3_DIR_REC_LEN(2) ||
 	    !(bh = ext3_bread (NULL, inode, 0, 0, &err))) {
-	    	ext3_warning (inode->i_sb, "empty_dir",
-			      "bad directory (dir #%lu) - no data block",
-			      inode->i_ino);
+		if (err)
+			ext3_error(inode->i_sb, __FUNCTION__,
+				   "error %d reading directory #%lu offset 0",
+				   err, inode->i_ino);
+		else
+			ext3_warning(inode->i_sb, __FUNCTION__,
+				     "bad directory (dir #%lu) - no data block",
+				     inode->i_ino);
 		return 1;
 	}
 	de = (struct ext3_dir_entry_2 *) bh->b_data;
@@ -1741,24 +1754,26 @@ static int empty_dir (struct inode * ino
 	while (offset < inode->i_size ) {
 		if (!bh ||
 			(void *) de >= (void *) (bh->b_data+sb->s_blocksize)) {
+			err = 0;
 			brelse (bh);
 			bh = ext3_bread (NULL, inode,
 				offset >> EXT3_BLOCK_SIZE_BITS(sb), 0, &err);
 			if (!bh) {
-#if 0
-				ext3_error (sb, "empty_dir",
-				"directory #%lu contains a hole at offset %lu",
-					inode->i_ino, offset);
-#endif
+				if (err)
+					ext3_error(sb, __FUNCTION__,
+						   "error %d reading directory"
+						   " #%lu offset %lu",
+						   err, inode->i_ino, offset);
 				offset += sb->s_blocksize;
 				continue;
 			}
 			de = (struct ext3_dir_entry_2 *) bh->b_data;
 		}
-		if (!ext3_check_dir_entry ("empty_dir", inode, de, bh,
-					   offset)) {
-			brelse (bh);
-			return 1;
+		if (!ext3_check_dir_entry("empty_dir", inode, de, bh, offset)) {
+			de = (struct ext3_dir_entry_2 *)(bh->b_data +
+							 sb->s_blocksize);
+			offset = (offset | (sb->s_blocksize - 1)) + 1;
+			continue;
 		}
 		if (le32_to_cpu(de->inode)) {
 			brelse (bh);
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ext2-devel] [CHECKER] warnings in fs/ext3/namei.c (2.4.19) where disk read errors get ignored, causing non-empty dir to be deleted
  2004-05-03 16:16     ` Andreas Dilger
@ 2004-05-03 17:33       ` Jörn Engel
  2004-05-03 17:54         ` Andreas Dilger
  0 siblings, 1 reply; 7+ messages in thread
From: Jörn Engel @ 2004-05-03 17:33 UTC (permalink / raw)
  To: Junfeng Yang, Linux Kernel Mailing List, ext2-devel, mc,
	Madanlal S Musuvathi, David L. Dill, Andrew Morton,
	Marcelo Tosatti

On Mon, 3 May 2004 10:16:06 -0600, Andreas Dilger wrote:
> On May 03, 2004  16:10 +0200, Jörn Engel wrote:
> > 
> > Do you mind if I doubt the sanity of whoever made that decision?  When
> > my hard drive fails, I don't care about writing to the fs too much
> > anymore, I want to *notice* the failure early and to *read* as much as
> > possible, then put the drive on a pile for test hardware.
> 
> If that's what you want, then mount the filesystem with "errors=remount-ro"
> and you will get it.  You can even mount it with "errors=panic" so that the
> node reboots and does a full fsck immediately.  For users that have a few
> bad blocks on their disk and can't afford to throw the whole disk away this
> is a reasonable course of action.

Ok, "errors=remount-ro" is good enough for me.  For the record, do
non-historic disks with a few bad blocks still exist?  I though the
driver firmware already detected those and used spare blocks at one
side of the disk as a replacement.  Nicely visible when sequential
read performance over the whole disk has a few non-continuous spots.

Jörn

-- 
Geld macht nicht glücklich.
Glück macht nicht satt.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ext2-devel] [CHECKER] warnings in fs/ext3/namei.c (2.4.19) where disk read errors get ignored, causing non-empty dir to be deleted
  2004-05-03 17:33       ` Jörn Engel
@ 2004-05-03 17:54         ` Andreas Dilger
  0 siblings, 0 replies; 7+ messages in thread
From: Andreas Dilger @ 2004-05-03 17:54 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Junfeng Yang, Linux Kernel Mailing List, ext2-devel, mcs,
	Madanlal S Musuvathi, David L. Dill

On May 03, 2004  19:33 +0200, Jörn Engel wrote:
> On Mon, 3 May 2004 10:16:06 -0600, Andreas Dilger wrote:
> > If that's what you want, then mount the filesystem with "errors=remount-ro"
> > and you will get it.  You can even mount it with "errors=panic" so that the
> > node reboots and does a full fsck immediately.  For users that have a few
> > bad blocks on their disk and can't afford to throw the whole disk away this
> > is a reasonable course of action.
> 
> Ok, "errors=remount-ro" is good enough for me.  For the record, do
> non-historic disks with a few bad blocks still exist?  I though the
> driver firmware already detected those and used spare blocks at one
> side of the disk as a replacement.  Nicely visible when sequential
> read performance over the whole disk has a few non-continuous spots.

AFAIK, even modern disks will only remap blocks at write time and not
at read time.  That is why it is possible to get an IO error, but when
someone runs e.g. "badblocks" on the disk they are confused when it
doesn't report any errors.

Things like allowing the directory to be removed when there is a read
error actually helps such situations because when the block is re-used
it will first be written to and that will cause the remapping to happen.
Otherwise you have this useless directory that you don't want and can't
remove.  It doesn't help anyone to refuse removing it.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-05-03 17:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-27  6:41 [CHECKER] warnings in fs/ext3/namei.c (2.4.19) where disk read errors get ignored, causing non-empty dir to be deleted Junfeng Yang
2004-04-27  7:44 ` [Ext2-devel] " Andreas Dilger
2004-05-03  9:59   ` Pavel Machek
2004-05-03 14:10   ` Jörn Engel
2004-05-03 16:16     ` Andreas Dilger
2004-05-03 17:33       ` Jörn Engel
2004-05-03 17:54         ` Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox