From: Carsten Otte <cotte@freenet.de>
To: "Stephen C. Tweedie" <sct@redhat.com>,
Linus Torvalds <torvalds@osdl.org>
Cc: linux-fsdevel@vger.kernel.org, Andrew Morton <akpm@osdl.org>,
schwidefsky@de.ibm.com, cotte@de.ibm.com,
Stephen Tweedie <sct@redhat.com>
Subject: Re: [PATCH] ext3 [linux-2.6.2.]: accessing already freed inodes when under memory pressure
Date: Thu, 19 Feb 2004 21:14:50 +0100 [thread overview]
Message-ID: <200402192114.55214.cotte@freenet.de> (raw)
In-Reply-To: <1077212393.2070.571.camel@sisko.scot.redhat.com>
Stephen wrote:
> The orphan list entry should persist only as long
> as inode->i_count is raised, and if it's raised, there should be no risk
> of the struct getting reused.
I tried the following *dirty* debugging patch that triggers in iput_final()
exactly when i_count is lowered to zero while the orphan list entry does
still persist.
I have not been able to startup my system, panic output shows the
sys_unlink() path -like fully described in the initial mail starting this
thread- causes running into the panic()
diff -ruN linux-2.6.3/fs/inode.c linux-2.6.3+debug/fs/inode.c
--- linux-2.6.3/fs/inode.c 2004-01-28 09:02:25.000000000 +0100
+++ linux-2.6.3+debug/fs/inode.c 2004-02-19 21:03:26.000000000 +0100
@@ -1058,6 +1058,96 @@
generic_forget_inode(inode);
}
+struct ext3_inode_info {
+ __u32 i_data[15];
+ __u32 i_flags;
+#ifdef EXT3_FRAGMENTS
+ __u32 i_faddr;
+ __u8 i_frag_no;
+ __u8 i_frag_size;
+#endif
+ __u32 i_file_acl;
+ __u32 i_dir_acl;
+ __u32 i_dtime;
+
+ /*
+ * i_block_group is the number of the block group which contains
+ * this file's inode. Constant across the lifetime of the inode,
+ * it is ued for making block allocation decisions - we try to
+ * place a file's data blocks near its inode block, and new inodes
+ * near to their parent directory's inode.
+ */
+ __u32 i_block_group;
+ __u32 i_state; /* Dynamic state flags for ext3 */
+
+ /*
+ * i_next_alloc_block is the logical (file-relative) number of the
+ * most-recently-allocated block in this file. Yes, it is misnamed.
+ * We use this for detecting linearly ascending allocation requests.
+ */
+ __u32 i_next_alloc_block;
+
+ /*
+ * i_next_alloc_goal is the *physical* companion to
i_next_alloc_block.
+ * it the the physical block number of the block which was
most-recently
+ * allocated to this file. This give us the goal (target) for the
next
+ * allocation when we detect linearly ascending requests.
+ */
+ __u32 i_next_alloc_goal;
+#ifdef EXT3_PREALLOCATE
+ __u32 i_prealloc_block;
+ __u32 i_prealloc_count;
+#endif
+ __u32 i_dir_start_lookup;
+#ifdef CONFIG_EXT3_FS_XATTR
+ /*
+ * Extended attributes can be read independently of the main file
+ * data. Taking i_sem even when reading would cause contention
+ * between readers of EAs and writers of regular file data, so
+ * instead we synchronize on xattr_sem when reading or changing
+ * EAs.
+ */
+ struct rw_semaphore xattr_sem;
+#endif
+#ifdef CONFIG_EXT3_FS_POSIX_ACL
+ struct posix_acl *i_acl;
+ struct posix_acl *i_default_acl;
+#endif
+
+ struct list_head i_orphan; /* unlinked but open inodes */
+
+ /*
+ * i_disksize keeps track of what the inode size is ON DISK, not
+ * in memory. During truncate, i_size is set to the new size by
+ * the VFS prior to calling ext3_truncate(), but the filesystem won't
+ * set i_disksize to 0 until the truncate is actually under way.
+ *
+ * The intent is that i_disksize always represents the blocks which
+ * are used by this file. This allows recovery to restart truncate
+ * on orphans if we crash during truncate. We actually write
i_disksize
+ * into the on-disk inode when writing inodes out, instead of i_size.
+ *
+ * The only time when i_disksize and i_size may be different is when
+ * a truncate is in progress. The only things which change
i_disksize
+ * are ext3_get_block (growth) and ext3_truncate (shrinkth).
+ */
+ loff_t i_disksize;
+
+ /*
+ * truncate_sem is for serialising ext3_truncate() against
+ * ext3_getblock(). In the 2.4 ext2 design, great chunks of inode's
+ * data tree are chopped off during truncate. We can't do that in
+ * ext3 because whenever we perform intermediate commits during
+ * truncate, the inode and all the metadata blocks *must* be in a
+ * consistent state which allows truncation of the orphans to restart
+ * during recovery. Hence we must fix the get_block-vs-truncate race
+ * by other means, so we have truncate_sem.
+ */
+ struct semaphore truncate_sem;
+ struct inode vfs_inode;
+};
+
+
/*
* Called when we're dropping the last reference
* to an inode.
@@ -1073,6 +1163,15 @@
{
struct super_operations *op = inode->i_sb->s_op;
void (*drop)(struct inode *) = generic_drop_inode;
+ struct ext3_inode_info* ext3_i;
+
+ if (!strcmp (inode->i_sb->s_type->name, "ext3")) {
+ // this is an ext3 inode. check if it is still orphan
+ ext3_i = container_of (inode, struct ext3_inode_info, vfs_inode);
+ if (!list_empty (&ext3_i->i_orphan)) {
+ panic ("iput_final: found inode in ext3's orphan list, am asked to drop it
\n");
+ }
+ }
if (op && op->drop_inode)
drop = op->drop_inode;
next prev parent reply other threads:[~2004-02-19 20:11 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-02-19 12:21 [PATCH] ext3 [linux-2.6.2.]: accessing already freed inodes when under memory pressure Carsten Otte
2004-02-19 16:53 ` Linus Torvalds
2004-02-19 17:39 ` Stephen C. Tweedie
2004-02-19 18:49 ` Andrew Morton
2004-02-19 20:28 ` Carsten Otte
2004-02-19 20:26 ` viro
2004-02-19 20:35 ` Carsten Otte
2004-02-19 20:14 ` Carsten Otte [this message]
2004-02-20 3:41 ` Andrew Morton
2004-02-19 20:19 ` Carsten Otte
[not found] ` <20040220164325.659c4e45.akpm@osdl.org>
[not found] ` <200402241338.57855.cotte@freenet.de>
2004-02-24 22:55 ` Andrew Morton
-- strict thread matches above, loose matches on Subject: below --
2004-02-19 18:00 Martin Schwidefsky
2004-03-29 19:07 Martin Schwidefsky
2004-03-29 20:11 ` Linus Torvalds
2004-03-29 20:29 ` Dave Kleikamp
2004-03-30 11:57 Martin Schwidefsky
2004-03-30 13:39 ` David Woodhouse
2004-03-30 14:16 ` Matthew Wilcox
2004-03-30 15:51 ` Linus Torvalds
2004-04-02 16:12 ` viro
2004-04-02 18:01 ` viro
2004-04-02 18:52 ` Linus Torvalds
2004-04-02 19:02 ` Linus Torvalds
2004-04-02 19:10 ` viro
2004-04-02 19:07 ` viro
2004-04-02 20:23 ` viro
2004-04-02 22:40 ` Trond Myklebust
2004-04-02 23:06 ` viro
2004-04-02 23:23 ` Trond Myklebust
2004-04-03 0:53 ` Neil Brown
2004-04-02 23:19 ` Trond Myklebust
2004-04-02 19:17 ` Jamie Lokier
2004-04-02 19:25 ` viro
2004-04-02 19:32 ` Linus Torvalds
2004-04-02 19:37 ` viro
2004-04-02 19:45 ` Linus Torvalds
2004-04-02 20:08 ` viro
2004-04-02 20:40 ` Jamie Lokier
2004-04-02 20:59 ` Christoph Hellwig
2004-04-02 21:09 ` viro
2004-04-02 23:42 ` Jamie Lokier
2004-04-02 21:08 ` viro
2004-04-03 0:39 ` Jamie Lokier
2004-04-05 14:07 ` Stephen C. Tweedie
2004-03-30 15:07 ` Linus Torvalds
2004-04-02 16:14 ` viro
2004-03-30 15:13 Martin Schwidefsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200402192114.55214.cotte@freenet.de \
--to=cotte@freenet.de \
--cc=akpm@osdl.org \
--cc=cotte@de.ibm.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=schwidefsky@de.ibm.com \
--cc=sct@redhat.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.