linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@mit.edu>
To: Curt Wohlgemuth <curtw@google.com>
Cc: ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: Dirent blocks leaking into data file blocks
Date: Sat, 14 Nov 2009 18:29:12 -0500	[thread overview]
Message-ID: <20091114232912.GF4221@mit.edu> (raw)
In-Reply-To: <6601abe90911131546v43838123g3bc312d46a97e199@mail.gmail.com>

On Fri, Nov 13, 2009 at 03:46:09PM -0800, Curt Wohlgemuth wrote:
> So when a directory is removed with "rm -rf foo" , as the files are deleted,
> the directory block(s) are marked dirty.  But when the directory blocks
> themselves are freed up, bforget() isn't called for their bufferheads, and
> so they remain dirty in the page cache, and can be written down later, after
> their blocks have been reused.

Well, it should be happening as part of the call to ext4_forget().  It
looks like it's happening on the code paths that release the blocks.
This is critically important if journalling is enabled, since we have
to call jbd2_journal_revoke() on directory blocks before they can be
reused as data blocks.  Hmm, if the buffer was part of a transaction,
we don't call __bforget() on it in jbd2_journal_forget().  So I can
see how you might be seeing a problem with journalling enabled, but
I'm puzzled why you were also seeing a problem without journalling.

So to help debug what is going on, I tried adding the a new tracepoint
to ext4_forget().  I've attached it to the end of this message.  Using
it, I can confirm that ext4_forget() is getting called for
directories.  I do see a problem though that we're not checking to see
if the inode is a directory; in that case, we need to treat it as if
it were metadata, and call ext4_journal_revoke() instead of
ext4_journal_forget().  Otherwise we could have the problem that after
a crash, on a journal replay, we might end up replaying the directory
block after it has been reallocated and used as a data block.
(Hmm.... I think this might be a problem for ext3 as well.)

I am very puzzled why you are seeing a problem in no journal mode,
though.  It looks like the right thing should be happening.  Is the
8MB data file is getting written via direct I/O or buffered I/O?

    	      	 	 	     	    	- Ted


diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 554c679..13de1dd 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -89,6 +89,7 @@ int ext4_forget(handle_t *handle, int is_metadata, struct inode *inode,
 
 	might_sleep();
 
+	trace_ext4_forget(inode, is_metadata, blocknr);
 	BUFFER_TRACE(bh, "enter");
 
 	jbd_debug(4, "forgetting bh %p: is_metadata = %d, mode %o, "
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index d09550b..b390e1f 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -907,6 +907,32 @@ TRACE_EVENT(ext4_mballoc_free,
 		  __entry->result_len, __entry->result_logical)
 );
 
+TRACE_EVENT(ext4_forget,
+	TP_PROTO(struct inode *inode, int is_metadata, __u64 block),
+
+	TP_ARGS(inode, is_metadata, block),
+
+	TP_STRUCT__entry(
+		__field(	dev_t,	dev			)
+		__field(	ino_t,	ino			)
+		__field(	umode_t, mode			)
+		__field(	int,	is_metadata		)
+		__field(	__u64,	block			)
+	),
+
+	TP_fast_assign(
+		__entry->dev	= inode->i_sb->s_dev;
+		__entry->ino	= inode->i_ino;
+		__entry->mode	= inode->i_mode;
+		__entry->is_metadata = is_metadata;
+		__entry->block	= block;
+	),
+
+	TP_printk("dev %s ino %lu mode %d is_metadata %d block %llu",
+		  jbd2_dev_to_name(__entry->dev), (unsigned long) __entry->ino,
+		  __entry->mode, __entry->is_metadata, __entry->block)
+);
+
 #endif /* _TRACE_EXT4_H */
 
 /* This part must be outside protection */

  reply	other threads:[~2009-11-14 23:29 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-13 23:46 Dirent blocks leaking into data file blocks Curt Wohlgemuth
2009-11-14 23:29 ` Theodore Tso [this message]
2009-11-15  0:30   ` [PATCH] ext4: directory blocks must be treated as metadata by ext4_forget() Theodore Ts'o
2009-11-15  7:04     ` Aneesh Kumar K.V
2009-11-15  7:16       ` Aneesh Kumar K.V
2009-11-15 20:43       ` Theodore Tso
2009-11-15 20:48         ` [PATCH] ext4: ext4_forget() must treat directory or symlink blocks as metadata Theodore Ts'o
2009-11-15 23:48         ` [PATCH] ext4: directory blocks must be treated as metadata by ext4_forget() Curt Wohlgemuth
2009-11-16  7:01         ` Aneesh Kumar K.V
2009-11-16 13:56           ` Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091114232912.GF4221@mit.edu \
    --to=tytso@mit.edu \
    --cc=curtw@google.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).