From: Theodore Tso <tytso@mit.edu>
To: Curt Wohlgemuth <curtw@google.com>
Cc: ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: Dirent blocks leaking into data file blocks
Date: Sat, 14 Nov 2009 18:29:12 -0500 [thread overview]
Message-ID: <20091114232912.GF4221@mit.edu> (raw)
In-Reply-To: <6601abe90911131546v43838123g3bc312d46a97e199@mail.gmail.com>
On Fri, Nov 13, 2009 at 03:46:09PM -0800, Curt Wohlgemuth wrote:
> So when a directory is removed with "rm -rf foo" , as the files are deleted,
> the directory block(s) are marked dirty. But when the directory blocks
> themselves are freed up, bforget() isn't called for their bufferheads, and
> so they remain dirty in the page cache, and can be written down later, after
> their blocks have been reused.
Well, it should be happening as part of the call to ext4_forget(). It
looks like it's happening on the code paths that release the blocks.
This is critically important if journalling is enabled, since we have
to call jbd2_journal_revoke() on directory blocks before they can be
reused as data blocks. Hmm, if the buffer was part of a transaction,
we don't call __bforget() on it in jbd2_journal_forget(). So I can
see how you might be seeing a problem with journalling enabled, but
I'm puzzled why you were also seeing a problem without journalling.
So to help debug what is going on, I tried adding the a new tracepoint
to ext4_forget(). I've attached it to the end of this message. Using
it, I can confirm that ext4_forget() is getting called for
directories. I do see a problem though that we're not checking to see
if the inode is a directory; in that case, we need to treat it as if
it were metadata, and call ext4_journal_revoke() instead of
ext4_journal_forget(). Otherwise we could have the problem that after
a crash, on a journal replay, we might end up replaying the directory
block after it has been reallocated and used as a data block.
(Hmm.... I think this might be a problem for ext3 as well.)
I am very puzzled why you are seeing a problem in no journal mode,
though. It looks like the right thing should be happening. Is the
8MB data file is getting written via direct I/O or buffered I/O?
- Ted
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 554c679..13de1dd 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -89,6 +89,7 @@ int ext4_forget(handle_t *handle, int is_metadata, struct inode *inode,
might_sleep();
+ trace_ext4_forget(inode, is_metadata, blocknr);
BUFFER_TRACE(bh, "enter");
jbd_debug(4, "forgetting bh %p: is_metadata = %d, mode %o, "
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index d09550b..b390e1f 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -907,6 +907,32 @@ TRACE_EVENT(ext4_mballoc_free,
__entry->result_len, __entry->result_logical)
);
+TRACE_EVENT(ext4_forget,
+ TP_PROTO(struct inode *inode, int is_metadata, __u64 block),
+
+ TP_ARGS(inode, is_metadata, block),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field( ino_t, ino )
+ __field( umode_t, mode )
+ __field( int, is_metadata )
+ __field( __u64, block )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = inode->i_sb->s_dev;
+ __entry->ino = inode->i_ino;
+ __entry->mode = inode->i_mode;
+ __entry->is_metadata = is_metadata;
+ __entry->block = block;
+ ),
+
+ TP_printk("dev %s ino %lu mode %d is_metadata %d block %llu",
+ jbd2_dev_to_name(__entry->dev), (unsigned long) __entry->ino,
+ __entry->mode, __entry->is_metadata, __entry->block)
+);
+
#endif /* _TRACE_EXT4_H */
/* This part must be outside protection */
next prev parent reply other threads:[~2009-11-14 23:29 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-13 23:46 Dirent blocks leaking into data file blocks Curt Wohlgemuth
2009-11-14 23:29 ` Theodore Tso [this message]
2009-11-15 0:30 ` [PATCH] ext4: directory blocks must be treated as metadata by ext4_forget() Theodore Ts'o
2009-11-15 7:04 ` Aneesh Kumar K.V
2009-11-15 7:16 ` Aneesh Kumar K.V
2009-11-15 20:43 ` Theodore Tso
2009-11-15 20:48 ` [PATCH] ext4: ext4_forget() must treat directory or symlink blocks as metadata Theodore Ts'o
2009-11-15 23:48 ` [PATCH] ext4: directory blocks must be treated as metadata by ext4_forget() Curt Wohlgemuth
2009-11-16 7:01 ` Aneesh Kumar K.V
2009-11-16 13:56 ` Theodore Tso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091114232912.GF4221@mit.edu \
--to=tytso@mit.edu \
--cc=curtw@google.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.