linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@mit.edu>
To: Curt Wohlgemuth <curtw@google.com>
Cc: Valerie Aurora <vaurora@redhat.com>,
	ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: Odd "leak" of extent info into data blocks?
Date: Wed, 9 Sep 2009 11:19:11 -0400	[thread overview]
Message-ID: <20090909151911.GX22901@mit.edu> (raw)
In-Reply-To: <6601abe90909082100n48afdba9qee087ff46bfe4e3f@mail.gmail.com>

On Tue, Sep 08, 2009 at 09:00:50PM -0700, Curt Wohlgemuth wrote:
> 
> > In ext3 and ext4, metadata blocks (such as
> > extent tree blocks), aren't stored in the page cache.
> 
> Hmm.  You're saying that in the absence of a journal, all metadata
> writes go direct to disk?  Where should I look for this in the code?

Sorry, let me be more precise.  All metadata writes, regardless of
whether a journal is present or not, are written via the buffer head
(bh) abstraction.  They have to, because that's how we do our
journalling; the jbd/jbd2 layer is built on top of the bh I/O request
layer, and even when a journal is not present, we are still doing our
metadata I/O via the submit_bh and ll_rw_block interface.

It used to be the case (in Linux 2.4) that the buffer cache was stored
separately from the page cache.  In Linux 2.6, the buffer cache is
implemented on top of the page cache, so technically, the metadata
blocks are stored in the page cache; however, they are only *accessed*
via the buffer cache abstraction.

> The problem is that I've seen this in real life.  And the patch below
> seems to fix it.  (Unfortunately, I haven't been able to recreate this
> in a simple example, after several days work.  I've only seen this in
> a *very* small number of cases on heavily loaded machines.)

I believe that you have a problem.  The problem is you have a dirty bh
which is getting written out after the block gets reallocated for use
as a data block.  But a bforget() call should have the problem just as
as well.  In fact, I think the real fix should be this.

commit 1b58b00e02893b4bbab2b5f137316b82feadac52
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Wed Sep 9 11:18:42 2009 -0400

    ext4: Use bforget() in no journal mode when in ext4_journal_forget()
    
    When ext4 is using a journal, a metadata block which is deallocated
    must be passed into the journal layer so it can be "revoked".  The
    jbd2_journal_forget() function is also responsible for calling
    bforget().  Without a journal, ext4_journal_forget() must call
    bforget(), to avoid a race from a dirty metadata block getting written
    back after it has been reallocated and reused for another inode's data
    block.
    
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
index eb27fd0..d4f4b39 100644
--- a/fs/ext4/ext4_jbd2.c
+++ b/fs/ext4/ext4_jbd2.c
@@ -44,7 +44,7 @@ int __ext4_journal_forget(const char *where, handle_t *handle,
 						  handle, err);
 	}
 	else
-		brelse(bh);
+		bforget(bh);
 	return err;
 }
 

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

      reply	other threads:[~2009-09-09 15:19 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-22 23:10 Odd "leak" of extent info into data blocks? Curt Wohlgemuth
     [not found] ` <20090908175605.GB7801@shell>
2009-09-08 18:21   ` Curt Wohlgemuth
2009-09-08 19:40     ` Theodore Tso
2009-09-08 21:18       ` Curt Wohlgemuth
2009-09-08 23:36         ` Theodore Tso
2009-09-09  4:00           ` Curt Wohlgemuth
2009-09-09 15:19             ` Theodore Tso [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090909151911.GX22901@mit.edu \
    --to=tytso@mit.edu \
    --cc=curtw@google.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=vaurora@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).