From: Carsten Otte <cotte@freenet.de>
To: "Stephen C. Tweedie" <sct@redhat.com>,
Linus Torvalds <torvalds@osdl.org>
Cc: linux-fsdevel@vger.kernel.org, Andrew Morton <akpm@osdl.org>,
schwidefsky@de.ibm.com, cotte@de.ibm.com,
Stephen Tweedie <sct@redhat.com>
Subject: Re: [PATCH] ext3 [linux-2.6.2.]: accessing already freed inodes when under memory pressure
Date: Thu, 19 Feb 2004 21:14:50 +0100 [thread overview]
Message-ID: <200402192114.55214.cotte@freenet.de> (raw)
In-Reply-To: <1077212393.2070.571.camel@sisko.scot.redhat.com>
Stephen wrote:
> The orphan list entry should persist only as long
> as inode->i_count is raised, and if it's raised, there should be no risk
> of the struct getting reused.
I tried the following *dirty* debugging patch that triggers in iput_final()
exactly when i_count is lowered to zero while the orphan list entry does
still persist.
I have not been able to startup my system, panic output shows the
sys_unlink() path -like fully described in the initial mail starting this
thread- causes running into the panic()
diff -ruN linux-2.6.3/fs/inode.c linux-2.6.3+debug/fs/inode.c
--- linux-2.6.3/fs/inode.c 2004-01-28 09:02:25.000000000 +0100
+++ linux-2.6.3+debug/fs/inode.c 2004-02-19 21:03:26.000000000 +0100
@@ -1058,6 +1058,96 @@
generic_forget_inode(inode);
}
+struct ext3_inode_info {
+ __u32 i_data[15];
+ __u32 i_flags;
+#ifdef EXT3_FRAGMENTS
+ __u32 i_faddr;
+ __u8 i_frag_no;
+ __u8 i_frag_size;
+#endif
+ __u32 i_file_acl;
+ __u32 i_dir_acl;
+ __u32 i_dtime;
+
+ /*
+ * i_block_group is the number of the block group which contains
+ * this file's inode. Constant across the lifetime of the inode,
+ * it is ued for making block allocation decisions - we try to
+ * place a file's data blocks near its inode block, and new inodes
+ * near to their parent directory's inode.
+ */
+ __u32 i_block_group;
+ __u32 i_state; /* Dynamic state flags for ext3 */
+
+ /*
+ * i_next_alloc_block is the logical (file-relative) number of the
+ * most-recently-allocated block in this file. Yes, it is misnamed.
+ * We use this for detecting linearly ascending allocation requests.
+ */
+ __u32 i_next_alloc_block;
+
+ /*
+ * i_next_alloc_goal is the *physical* companion to
i_next_alloc_block.
+ * it the the physical block number of the block which was
most-recently
+ * allocated to this file. This give us the goal (target) for the
next
+ * allocation when we detect linearly ascending requests.
+ */
+ __u32 i_next_alloc_goal;
+#ifdef EXT3_PREALLOCATE
+ __u32 i_prealloc_block;
+ __u32 i_prealloc_count;
+#endif
+ __u32 i_dir_start_lookup;
+#ifdef CONFIG_EXT3_FS_XATTR
+ /*
+ * Extended attributes can be read independently of the main file
+ * data. Taking i_sem even when reading would cause contention
+ * between readers of EAs and writers of regular file data, so
+ * instead we synchronize on xattr_sem when reading or changing
+ * EAs.
+ */
+ struct rw_semaphore xattr_sem;
+#endif
+#ifdef CONFIG_EXT3_FS_POSIX_ACL
+ struct posix_acl *i_acl;
+ struct posix_acl *i_default_acl;
+#endif
+
+ struct list_head i_orphan; /* unlinked but open inodes */
+
+ /*
+ * i_disksize keeps track of what the inode size is ON DISK, not
+ * in memory. During truncate, i_size is set to the new size by
+ * the VFS prior to calling ext3_truncate(), but the filesystem won't
+ * set i_disksize to 0 until the truncate is actually under way.
+ *
+ * The intent is that i_disksize always represents the blocks which
+ * are used by this file. This allows recovery to restart truncate
+ * on orphans if we crash during truncate. We actually write
i_disksize
+ * into the on-disk inode when writing inodes out, instead of i_size.
+ *
+ * The only time when i_disksize and i_size may be different is when
+ * a truncate is in progress. The only things which change
i_disksize
+ * are ext3_get_block (growth) and ext3_truncate (shrinkth).
+ */
+ loff_t i_disksize;
+
+ /*
+ * truncate_sem is for serialising ext3_truncate() against
+ * ext3_getblock(). In the 2.4 ext2 design, great chunks of inode's
+ * data tree are chopped off during truncate. We can't do that in
+ * ext3 because whenever we perform intermediate commits during
+ * truncate, the inode and all the metadata blocks *must* be in a
+ * consistent state which allows truncation of the orphans to restart
+ * during recovery. Hence we must fix the get_block-vs-truncate race
+ * by other means, so we have truncate_sem.
+ */
+ struct semaphore truncate_sem;
+ struct inode vfs_inode;
+};
+
+
/*
* Called when we're dropping the last reference
* to an inode.
@@ -1073,6 +1163,15 @@
{
struct super_operations *op = inode->i_sb->s_op;
void (*drop)(struct inode *) = generic_drop_inode;
+ struct ext3_inode_info* ext3_i;
+
+ if (!strcmp (inode->i_sb->s_type->name, "ext3")) {
+ // this is an ext3 inode. check if it is still orphan
+ ext3_i = container_of (inode, struct ext3_inode_info, vfs_inode);
+ if (!list_empty (&ext3_i->i_orphan)) {
+ panic ("iput_final: found inode in ext3's orphan list, am asked to drop it
\n");
+ }
+ }
if (op && op->drop_inode)
drop = op->drop_inode;
next prev parent reply other threads:[~2004-02-19 20:11 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-02-19 12:21 [PATCH] ext3 [linux-2.6.2.]: accessing already freed inodes when under memory pressure Carsten Otte
2004-02-19 16:53 ` Linus Torvalds
2004-02-19 17:39 ` Stephen C. Tweedie
2004-02-19 18:49 ` Andrew Morton
2004-02-19 20:28 ` Carsten Otte
2004-02-19 20:26 ` viro
2004-02-19 20:35 ` Carsten Otte
2004-02-19 20:14 ` Carsten Otte [this message]
2004-02-20 3:41 ` Andrew Morton
2004-02-19 20:19 ` Carsten Otte
[not found] ` <20040220164325.659c4e45.akpm@osdl.org>
[not found] ` <200402241338.57855.cotte@freenet.de>
2004-02-24 22:55 ` Andrew Morton
-- strict thread matches above, loose matches on Subject: below --
2004-02-19 18:00 Martin Schwidefsky
2004-03-29 19:07 Martin Schwidefsky
2004-03-29 20:11 ` Linus Torvalds
2004-03-29 20:29 ` Dave Kleikamp
2004-03-30 11:57 Martin Schwidefsky
2004-03-30 13:39 ` David Woodhouse
2004-03-30 14:16 ` Matthew Wilcox
2004-03-30 15:51 ` Linus Torvalds
2004-04-02 16:12 ` viro
2004-04-02 18:01 ` viro
2004-04-02 18:52 ` Linus Torvalds
2004-04-02 19:02 ` Linus Torvalds
2004-04-02 19:10 ` viro
2004-04-02 19:07 ` viro
2004-04-02 20:23 ` viro
2004-04-02 22:40 ` Trond Myklebust
2004-04-02 23:06 ` viro
2004-04-02 23:23 ` Trond Myklebust
2004-04-03 0:53 ` Neil Brown
2004-04-02 23:19 ` Trond Myklebust
2004-04-02 19:17 ` Jamie Lokier
2004-04-02 19:25 ` viro
2004-04-02 19:32 ` Linus Torvalds
2004-04-02 19:37 ` viro
2004-04-02 19:45 ` Linus Torvalds
2004-04-02 20:08 ` viro
2004-04-02 20:40 ` Jamie Lokier
2004-04-02 20:59 ` Christoph Hellwig
2004-04-02 21:09 ` viro
2004-04-02 23:42 ` Jamie Lokier
2004-04-02 21:08 ` viro
2004-04-03 0:39 ` Jamie Lokier
2004-04-05 14:07 ` Stephen C. Tweedie
2004-03-30 15:07 ` Linus Torvalds
2004-04-02 16:14 ` viro
2004-03-30 15:13 Martin Schwidefsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200402192114.55214.cotte@freenet.de \
--to=cotte@freenet.de \
--cc=akpm@osdl.org \
--cc=cotte@de.ibm.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=schwidefsky@de.ibm.com \
--cc=sct@redhat.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox