public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Carsten Otte <cotte@freenet.de>
To: "Stephen C. Tweedie" <sct@redhat.com>,
	Linus Torvalds <torvalds@osdl.org>
Cc: linux-fsdevel@vger.kernel.org, Andrew Morton <akpm@osdl.org>,
	schwidefsky@de.ibm.com, cotte@de.ibm.com,
	Stephen Tweedie <sct@redhat.com>
Subject: Re: [PATCH] ext3 [linux-2.6.2.]: accessing already freed inodes when under memory pressure
Date: Thu, 19 Feb 2004 21:14:50 +0100	[thread overview]
Message-ID: <200402192114.55214.cotte@freenet.de> (raw)
In-Reply-To: <1077212393.2070.571.camel@sisko.scot.redhat.com>

Stephen wrote:
> The orphan list entry should persist only as long
> as inode->i_count is raised, and if it's raised, there should be no risk
> of the struct getting reused.
I tried the following *dirty* debugging patch that triggers in iput_final() 
exactly when i_count is lowered to zero while the orphan list entry does 
still persist.
I have not been able to startup my system, panic output shows the 
sys_unlink() path  -like fully described in the initial mail starting this 
thread- causes running into the panic()
diff -ruN linux-2.6.3/fs/inode.c linux-2.6.3+debug/fs/inode.c
--- linux-2.6.3/fs/inode.c	2004-01-28 09:02:25.000000000 +0100
+++ linux-2.6.3+debug/fs/inode.c	2004-02-19 21:03:26.000000000 +0100
@@ -1058,6 +1058,96 @@
 		generic_forget_inode(inode);
 }
 
+struct ext3_inode_info {
+        __u32   i_data[15];
+        __u32   i_flags;
+#ifdef EXT3_FRAGMENTS
+        __u32   i_faddr;
+        __u8    i_frag_no;
+        __u8    i_frag_size;
+#endif
+        __u32   i_file_acl;
+        __u32   i_dir_acl;
+        __u32   i_dtime;
+
+        /*
+         * i_block_group is the number of the block group which contains
+         * this file's inode.  Constant across the lifetime of the inode,
+         * it is ued for making block allocation decisions - we try to
+         * place a file's data blocks near its inode block, and new inodes
+         * near to their parent directory's inode.
+         */
+        __u32   i_block_group;
+        __u32   i_state;                /* Dynamic state flags for ext3 */
+
+        /*
+         * i_next_alloc_block is the logical (file-relative) number of the
+         * most-recently-allocated block in this file.  Yes, it is misnamed.
+         * We use this for detecting linearly ascending allocation requests.
+         */
+        __u32   i_next_alloc_block;
+
+        /*
+         * i_next_alloc_goal is the *physical* companion to 
i_next_alloc_block.
+         * it the the physical block number of the block which was 
most-recently
+         * allocated to this file.  This give us the goal (target) for the 
next
+         * allocation when we detect linearly ascending requests.
+         */
+        __u32   i_next_alloc_goal;
+#ifdef EXT3_PREALLOCATE
+        __u32   i_prealloc_block;
+        __u32   i_prealloc_count;
+#endif
+        __u32   i_dir_start_lookup;
+#ifdef CONFIG_EXT3_FS_XATTR
+        /*
+         * Extended attributes can be read independently of the main file
+         * data. Taking i_sem even when reading would cause contention
+         * between readers of EAs and writers of regular file data, so
+         * instead we synchronize on xattr_sem when reading or changing
+         * EAs.
+         */
+        struct rw_semaphore xattr_sem;
+#endif
+#ifdef CONFIG_EXT3_FS_POSIX_ACL
+        struct posix_acl        *i_acl;
+        struct posix_acl        *i_default_acl;
+#endif
+
+        struct list_head i_orphan;      /* unlinked but open inodes */
+
+        /*
+         * i_disksize keeps track of what the inode size is ON DISK, not
+         * in memory.  During truncate, i_size is set to the new size by
+         * the VFS prior to calling ext3_truncate(), but the filesystem won't
+         * set i_disksize to 0 until the truncate is actually under way.
+         *
+         * The intent is that i_disksize always represents the blocks which
+         * are used by this file.  This allows recovery to restart truncate
+         * on orphans if we crash during truncate.  We actually write 
i_disksize
+         * into the on-disk inode when writing inodes out, instead of i_size.
+         *
+         * The only time when i_disksize and i_size may be different is when
+         * a truncate is in progress.  The only things which change 
i_disksize
+         * are ext3_get_block (growth) and ext3_truncate (shrinkth).
+         */
+        loff_t  i_disksize;
+
+        /*
+         * truncate_sem is for serialising ext3_truncate() against
+         * ext3_getblock().  In the 2.4 ext2 design, great chunks of inode's
+         * data tree are chopped off during truncate. We can't do that in
+         * ext3 because whenever we perform intermediate commits during
+         * truncate, the inode and all the metadata blocks *must* be in a
+         * consistent state which allows truncation of the orphans to restart
+         * during recovery.  Hence we must fix the get_block-vs-truncate race
+         * by other means, so we have truncate_sem.
+         */
+        struct semaphore truncate_sem;
+        struct inode vfs_inode;
+};
+
+
 /*
  * Called when we're dropping the last reference
  * to an inode. 
@@ -1073,6 +1163,15 @@
 {
 	struct super_operations *op = inode->i_sb->s_op;
 	void (*drop)(struct inode *) = generic_drop_inode;
+	struct ext3_inode_info* ext3_i;
+
+	if (!strcmp (inode->i_sb->s_type->name, "ext3")) {
+		// this is an ext3 inode. check if it is still orphan
+		ext3_i = container_of (inode, struct ext3_inode_info, vfs_inode);
+		if (!list_empty (&ext3_i->i_orphan)) {
+			panic ("iput_final: found inode in ext3's orphan list, am asked to drop it
\n");
+		}
+	}
 
 	if (op && op->drop_inode)
 		drop = op->drop_inode;



  parent reply	other threads:[~2004-02-19 20:11 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-19 12:21 [PATCH] ext3 [linux-2.6.2.]: accessing already freed inodes when under memory pressure Carsten Otte
2004-02-19 16:53 ` Linus Torvalds
2004-02-19 17:39   ` Stephen C. Tweedie
2004-02-19 18:49     ` Andrew Morton
2004-02-19 20:28       ` Carsten Otte
2004-02-19 20:26         ` viro
2004-02-19 20:35           ` Carsten Otte
2004-02-19 20:14     ` Carsten Otte [this message]
2004-02-20  3:41       ` Andrew Morton
2004-02-19 20:19   ` Carsten Otte
     [not found] ` <20040220164325.659c4e45.akpm@osdl.org>
     [not found]   ` <200402241338.57855.cotte@freenet.de>
2004-02-24 22:55     ` Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2004-02-19 18:00 Martin Schwidefsky
2004-03-29 19:07 Martin Schwidefsky
2004-03-29 20:11 ` Linus Torvalds
2004-03-29 20:29   ` Dave Kleikamp
2004-03-30 11:57 Martin Schwidefsky
2004-03-30 13:39 ` David Woodhouse
2004-03-30 14:16   ` Matthew Wilcox
2004-03-30 15:51   ` Linus Torvalds
2004-04-02 16:12     ` viro
2004-04-02 18:01       ` viro
2004-04-02 18:52         ` Linus Torvalds
2004-04-02 19:02           ` Linus Torvalds
2004-04-02 19:10             ` viro
2004-04-02 19:07           ` viro
2004-04-02 20:23             ` viro
2004-04-02 22:40               ` Trond Myklebust
2004-04-02 23:06                 ` viro
2004-04-02 23:23                   ` Trond Myklebust
2004-04-03  0:53                     ` Neil Brown
2004-04-02 23:19                 ` Trond Myklebust
2004-04-02 19:17           ` Jamie Lokier
2004-04-02 19:25             ` viro
2004-04-02 19:32             ` Linus Torvalds
2004-04-02 19:37               ` viro
2004-04-02 19:45                 ` Linus Torvalds
2004-04-02 20:08                   ` viro
2004-04-02 20:40               ` Jamie Lokier
2004-04-02 20:59                 ` Christoph Hellwig
2004-04-02 21:09                   ` viro
2004-04-02 23:42                   ` Jamie Lokier
2004-04-02 21:08                 ` viro
2004-04-03  0:39                   ` Jamie Lokier
2004-04-05 14:07                   ` Stephen C. Tweedie
2004-03-30 15:07 ` Linus Torvalds
2004-04-02 16:14   ` viro
2004-03-30 15:13 Martin Schwidefsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200402192114.55214.cotte@freenet.de \
    --to=cotte@freenet.de \
    --cc=akpm@osdl.org \
    --cc=cotte@de.ibm.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=sct@redhat.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox