* problems with ext3 fs, kernels up to 2.6.6-rc2
@ 2004-05-19 10:41 Paul P Komkoff Jr
2004-05-19 17:06 ` Andreas Dilger
0 siblings, 1 reply; 5+ messages in thread
From: Paul P Komkoff Jr @ 2004-05-19 10:41 UTC (permalink / raw)
To: Linux Kernel Mailing List
Hi.
For a long time I'm sorta have the following problem.
I have ext3 partition with dir_index turned on. I have programs, which
store many files on it (for example, Maildir mailboxes for 500+ users,
about 200k files).
Sometimes something going wrong. I am noticing it by rdiff-backup on
this partition producing the following output:
ListError goloub/Maildir/cur/1082623479.1763_0.ns:2,S [Errno 5]
Input/output error:
+'/mnt/mail/goloub/Maildir/cur/1082623479.1763_0.ns:2,S'
Yes, when I am doing strace ls -al (failed file), I am seeing -EIO
lstat64("/mnt/mail/goloub/Maildir/cur/1082623479.1763_0.ns:2,S",
0x806408c) = -1 EIO (Input/output error)
I know, so when I will e2fsck it it will be repaired. But how I can
help debug it?
Which on-disk structs I need to examine, maybe extract, and send to
someone?
--
Paul P 'Stingray' Komkoff Jr // http://stingr.net/key <- my pgp key
This message represents the official view of the voices in my head
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: problems with ext3 fs, kernels up to 2.6.6-rc2 2004-05-19 10:41 problems with ext3 fs, kernels up to 2.6.6-rc2 Paul P Komkoff Jr @ 2004-05-19 17:06 ` Andreas Dilger 2004-05-20 6:40 ` Paul P Komkoff Jr 0 siblings, 1 reply; 5+ messages in thread From: Andreas Dilger @ 2004-05-19 17:06 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Andrew Morton, ext2-devel On May 19, 2004 14:41 +0400, Paul P Komkoff Jr wrote: > For a long time I'm sorta have the following problem. > > I have ext3 partition with dir_index turned on. I have programs, which > store many files on it (for example, Maildir mailboxes for 500+ users, > about 200k files). > > Sometimes something going wrong. I am noticing it by rdiff-backup on > this partition producing the following output: > ListError goloub/Maildir/cur/1082623479.1763_0.ns:2,S [Errno 5] > Input/output error: > +'/mnt/mail/goloub/Maildir/cur/1082623479.1763_0.ns:2,S' > > Yes, when I am doing strace ls -al (failed file), I am seeing -EIO > > lstat64("/mnt/mail/goloub/Maildir/cur/1082623479.1763_0.ns:2,S", > 0x806408c) = -1 EIO (Input/output error) > > I know, so when I will e2fsck it it will be repaired. But how I can > help debug it? > > Which on-disk structs I need to examine, maybe extract, and send to > someone? A problem with htree was recently discovered during Lustre testing when files were being renamed within the same directory. In some cases the addition of the new name caused a directory block split and the old dir_entry was pointing at the wrong entry, and the wrong entry was removed. This would seem entirely possible in a Maildir directory, since the MTA will be doing a lot of renames within the same directory. This seems to fix the majority of the problems, although there are still some rare failures in the rename test. ===== fs/ext3/namei.c 1.52 vs edited ===== --- 1.52/fs/ext3/namei.c Mon May 10 05:25:34 2004 +++ edited/fs/ext3/namei.c Wed May 19 10:59:39 2004 @@ -2265,7 +2265,9 @@ * ok, that's it */ retval = ext3_delete_entry(handle, old_dir, old_de, old_bh); - if (retval == -ENOENT) { + if (le32_to_cpu(old_de->inode) != old_inode->i_ino || + (retval = ext3_delete_entry(handle, old_dir, old_de, old_bh)) == + -ENOENT) { /* * old_de could have moved out from under us. */ Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: problems with ext3 fs, kernels up to 2.6.6-rc2 2004-05-19 17:06 ` Andreas Dilger @ 2004-05-20 6:40 ` Paul P Komkoff Jr 2004-05-20 9:22 ` [Ext2-devel] " Andreas Dilger 0 siblings, 1 reply; 5+ messages in thread From: Paul P Komkoff Jr @ 2004-05-20 6:40 UTC (permalink / raw) To: Linux Kernel Mailing List, Andrew Morton, ext2-devel Replying to Andreas Dilger: > This seems to fix the majority of the problems, although there are still > some rare failures in the rename test. Just curious. Is it really doing what it should? Is there are cases where ext3_delete_entry(handle, old_dir, old_de, old_bh) will be called twice with the same set of parameters? :() -- Paul P 'Stingray' Komkoff Jr // http://stingr.net/key <- my pgp key This message represents the official view of the voices in my head ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Ext2-devel] Re: problems with ext3 fs, kernels up to 2.6.6-rc2 2004-05-20 6:40 ` Paul P Komkoff Jr @ 2004-05-20 9:22 ` Andreas Dilger 2004-05-25 16:58 ` Andreas Dilger 0 siblings, 1 reply; 5+ messages in thread From: Andreas Dilger @ 2004-05-20 9:22 UTC (permalink / raw) To: Linux Kernel Mailing List, Andrew Morton, ext2-devel On May 20, 2004 10:40 +0400, Paul P Komkoff Jr wrote: > Replying to Andreas Dilger: > > This seems to fix the majority of the problems, although there are still > > some rare failures in the rename test. > > Just curious. Is it really doing what it should? Is there are cases > where ext3_delete_entry(handle, old_dir, old_de, old_bh) will be > called twice with the same set of parameters? :() Grr, my bad. I had moved this by hand from 2.4 (where this was discovered and where I'm testing) to 2.6.current just to make sure the context and everything was right for submissions and of course botched it. The right patch removes the old call to ext3_delete_entry(): ===== fs/ext3/namei.c 1.52 vs edited ===== --- 1.52/fs/ext3/namei.c Mon May 10 05:25:34 2004 +++ edited/fs/ext3/namei.c Thu May 20 03:16:43 2004 @@ -2264,8 +2264,9 @@ /* * ok, that's it */ - retval = ext3_delete_entry(handle, old_dir, old_de, old_bh); - if (retval == -ENOENT) { + if (le32_to_cpu(old_de->inode) != old_inode->i_ino || + (retval = ext3_delete_entry(handle, old_dir, + old_de, old_bh)) == -ENOENT) { /* * old_de could have moved out from under us. */ Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Ext2-devel] Re: problems with ext3 fs, kernels up to 2.6.6-rc2 2004-05-20 9:22 ` [Ext2-devel] " Andreas Dilger @ 2004-05-25 16:58 ` Andreas Dilger 0 siblings, 0 replies; 5+ messages in thread From: Andreas Dilger @ 2004-05-25 16:58 UTC (permalink / raw) To: Linux Kernel Mailing List, Andrew Morton, ext2-devel; +Cc: Adam Cassar On May 20, 2004 03:22 -0600, Andreas Dilger wrote: > This seems to fix the majority of the problems, although there are still > some rare failures in the rename test. > > ===== fs/ext3/namei.c 1.52 vs edited ===== > --- 1.52/fs/ext3/namei.c Mon May 10 05:25:34 2004 > +++ edited/fs/ext3/namei.c Thu May 20 03:16:43 2004 > @@ -2264,8 +2264,9 @@ > /* > * ok, that's it > */ > - retval = ext3_delete_entry(handle, old_dir, old_de, old_bh); > - if (retval == -ENOENT) { > + if (le32_to_cpu(old_de->inode) != old_inode->i_ino || > + (retval = ext3_delete_entry(handle, old_dir, > + old_de, old_bh)) == -ENOENT) { > /* > * old_de could have moved out from under us. > */ I isolated the source of the remaining problems. In very rare cases even with the above patch we could still do the wrong thing. If old_de is pointing to the newly-added entry (i_ino is the same) we end up deleting the new entry instead of the old one. It looks as if the rename never happened. We need to verify that the name we are unlinking is what we expect. If is also possible that old_de is pointing to the now-unused space at the end of a newly-split leaf block, so we still need to try ext3_delete_entry() (which will skip the stale entry and return ENOENT) instead of just relying on the inum + name check. ===== fs/ext3/namei.c 1.52 vs edited ===== --- 1.52/fs/ext3/namei.c Mon May 10 05:25:34 2004 +++ edited/fs/ext3/namei.c Thu May 20 19:57:10 2004 @@ -2264,11 +2264,15 @@ /* * ok, that's it */ - retval = ext3_delete_entry(handle, old_dir, old_de, old_bh); - if (retval == -ENOENT) { - /* - * old_de could have moved out from under us. - */ + if (le32_to_cpu(old_de->inode) != old_inode->i_ino || + old_de->name_len != old_dentry->d_name.len || + strncmp(old_de->name, old_dentry->d_name.name, old_de->name_len) || + (retval = ext3_delete_entry(handle, old_dir, + old_de, old_bh)) == -ENOENT) { + /* old_de could have moved from under us during htree split, so + * make sure that we are deleting the right entry. We might + * also be pointing to a stale entry in the unused part of + * old_bh so just checking inum and the name isn't enough. */ struct buffer_head *old_bh2; struct ext3_dir_entry_2 *old_de2; Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2004-05-25 16:58 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-05-19 10:41 problems with ext3 fs, kernels up to 2.6.6-rc2 Paul P Komkoff Jr 2004-05-19 17:06 ` Andreas Dilger 2004-05-20 6:40 ` Paul P Komkoff Jr 2004-05-20 9:22 ` [Ext2-devel] " Andreas Dilger 2004-05-25 16:58 ` Andreas Dilger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox