* ext4: ext23 support leaks buffer pages
@ 2011-01-10 10:54 Hugh Dickins
2011-01-10 17:52 ` Ted Ts'o
0 siblings, 1 reply; 5+ messages in thread
From: Hugh Dickins @ 2011-01-10 10:54 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Amir Goldstein, linux-kernel, linux-ext4
I switched to CONFIG_EXT4_USE_FOR_EXT23=y with 2.6.37, but was
then surprised by OOM kills: 2.6.36 and current are also bad.
Try something like this:
mkfs -t ext2 /dev/sda10
while :
do mount /dev/sda10 /mnt
rm -rf /mnt/linux-2.6.37
( cd /mnt; tar xf ~hughd/linux-2.6.37.tar.bz2 )
umount /mnt
grep "Active(file)" /proc/meminfo
grep buffer_head /proc/slabinfo
done
and watch Active and buffer_head grow. Buffers remains stable,
but that's because each unmount truncates those pages from the
blockdev, yet try_to_free_buffers() cannot detach their buffer
heads because one is still "busy" (raised b_count). Such pages
are put back on the Active(file) list, but in practice they're
unevictable by now.
ext3 behaves in the same way. And it's no better if you don't
keep on unmounting - then you can just watch the Buffers grow,
until they cover almost the whole device (if you've enough memory).
Bisection arrives at the commit below; and reverting it
(obviously not the right solution) fixes the problem.
Hugh
commit 40389687382bf0ae71458e7c0f828137a438a956
Author: Amir G <amir73il@users.sourceforge.net>
Date: Tue Jul 27 11:56:05 2010 -0400
ext4: Fix block bitmap inconsistencies after a crash when deleting files
We have experienced bitmap inconsistencies after crash during file
delete under heavy load. The crash is not file system related and I
the following patch in ext4_free_branches() fixes the recovery
problem.
If the transaction is restarted and there is a crash before the new
transaction is committed, then after recovery, the blocks that this
indirect block points to have been freed, but the indirect block
itself has not been freed and may still point to some of the free
blocks (because of the ext4_forget()).
So ext4_forget() should be called inside ext4_free_blocks() to avoid
this problem.
Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 755ba86..699d1d0 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4490,27 +4490,6 @@ static void ext4_free_branches(handle_t *handle, struct inode *inode,
depth);
/*
- * We've probably journalled the indirect block several
- * times during the truncate. But it's no longer
- * needed and we now drop it from the transaction via
- * jbd2_journal_revoke().
- *
- * That's easy if it's exclusively part of this
- * transaction. But if it's part of the committing
- * transaction then jbd2_journal_forget() will simply
- * brelse() it. That means that if the underlying
- * block is reallocated in ext4_get_block(),
- * unmap_underlying_metadata() will find this block
- * and will try to get rid of it. damn, damn.
- *
- * If this block has already been committed to the
- * journal, a revoke record will be written. And
- * revoke records must be emitted *before* clearing
- * this block's bit in the bitmaps.
- */
- ext4_forget(handle, 1, inode, bh, bh->b_blocknr);
-
- /*
* Everything below this this pointer has been
* released. Now let this top-of-subtree go.
*
@@ -4534,8 +4513,20 @@ static void ext4_free_branches(handle_t *handle, struct inode *inode,
blocks_for_truncate(inode));
}
+ /*
+ * The forget flag here is critical because if
+ * we are journaling (and not doing data
+ * journaling), we have to make sure a revoke
+ * record is written to prevent the journal
+ * replay from overwriting the (former)
+ * indirect block if it gets reallocated as a
+ * data block. This must happen in the same
+ * transaction where the data blocks are
+ * actually freed.
+ */
ext4_free_blocks(handle, inode, 0, nr, 1,
- EXT4_FREE_BLOCKS_METADATA);
+ EXT4_FREE_BLOCKS_METADATA|
+ EXT4_FREE_BLOCKS_FORGET);
if (parent_bh) {
/*
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: ext4: ext23 support leaks buffer pages
2011-01-10 10:54 ext4: ext23 support leaks buffer pages Hugh Dickins
@ 2011-01-10 17:52 ` Ted Ts'o
2011-01-10 17:56 ` [PATCH] ext4: fix memory leak in ext4_free_branches Theodore Ts'o
0 siblings, 1 reply; 5+ messages in thread
From: Ted Ts'o @ 2011-01-10 17:52 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Amir Goldstein, linux-kernel, linux-ext4
On Mon, Jan 10, 2011 at 02:54:26AM -0800, Hugh Dickins wrote:
> I switched to CONFIG_EXT4_USE_FOR_EXT23=y with 2.6.37, but was
> then surprised by OOM kills: 2.6.36 and current are also bad.
Oops. Yeah, the problem was when ext4_forget() was moved out of
ext4_free_branches, we needed to replace it with a call to brelse().
Thanks for pointing this out.
Patch follows...
- Ted
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] ext4: fix memory leak in ext4_free_branches
2011-01-10 17:52 ` Ted Ts'o
@ 2011-01-10 17:56 ` Theodore Ts'o
2011-01-11 0:02 ` Hugh Dickins
2011-01-11 10:16 ` Amir Goldstein
0 siblings, 2 replies; 5+ messages in thread
From: Theodore Ts'o @ 2011-01-10 17:56 UTC (permalink / raw)
To: Ext4 Developers List; +Cc: hughd, Theodore Ts'o, stable
Commit 40389687 moved a call to ext4_forget() out of
ext4_free_branches and let ext4_free_blocks() handle calling
bforget(). But that change unfortunately did not replace the call to
ext4_forget() with brelse(), which was needed to drop the in-use count
of the indirect block's buffer head, which lead to a memory leak when
deleting files that used indirect blocks. Fix this.
Thanks to Hugh Dickins for pointing this out.
Cc: stable@kernel.org
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
fs/ext4/inode.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 84b6162..e80fc51 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4378,6 +4378,7 @@ static void ext4_free_branches(handle_t *handle, struct inode *inode,
(__le32 *) bh->b_data,
(__le32 *) bh->b_data + addr_per_block,
depth);
+ brelse(bh);
/*
* Everything below this this pointer has been
--
1.7.3.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] ext4: fix memory leak in ext4_free_branches
2011-01-10 17:56 ` [PATCH] ext4: fix memory leak in ext4_free_branches Theodore Ts'o
@ 2011-01-11 0:02 ` Hugh Dickins
2011-01-11 10:16 ` Amir Goldstein
1 sibling, 0 replies; 5+ messages in thread
From: Hugh Dickins @ 2011-01-11 0:02 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Ext4 Developers List, stable
On Mon, 10 Jan 2011, Theodore Ts'o wrote:
> Commit 40389687 moved a call to ext4_forget() out of
> ext4_free_branches and let ext4_free_blocks() handle calling
> bforget(). But that change unfortunately did not replace the call to
> ext4_forget() with brelse(), which was needed to drop the in-use count
> of the indirect block's buffer head, which lead to a memory leak when
> deleting files that used indirect blocks. Fix this.
>
> Thanks to Hugh Dickins for pointing this out.
>
> Cc: stable@kernel.org
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Many thanks: I've given that a run with the added debug I'd put in,
and it fixes the problems nicely. I thought it would be a brelse(bh)
somewhere, but wouldn't dare trust my data to my own guesses of where!
Hugh
> ---
> fs/ext4/inode.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 84b6162..e80fc51 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4378,6 +4378,7 @@ static void ext4_free_branches(handle_t *handle, struct inode *inode,
> (__le32 *) bh->b_data,
> (__le32 *) bh->b_data + addr_per_block,
> depth);
> + brelse(bh);
>
> /*
> * Everything below this this pointer has been
> --
> 1.7.3.1
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] ext4: fix memory leak in ext4_free_branches
2011-01-10 17:56 ` [PATCH] ext4: fix memory leak in ext4_free_branches Theodore Ts'o
2011-01-11 0:02 ` Hugh Dickins
@ 2011-01-11 10:16 ` Amir Goldstein
1 sibling, 0 replies; 5+ messages in thread
From: Amir Goldstein @ 2011-01-11 10:16 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Ext4 Developers List, hughd, stable
On Mon, Jan 10, 2011 at 7:56 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> Commit 40389687 moved a call to ext4_forget() out of
> ext4_free_branches and let ext4_free_blocks() handle calling
> bforget(). But that change unfortunately did not replace the call to
> ext4_forget() with brelse(), which was needed to drop the in-use count
> of the indirect block's buffer head, which lead to a memory leak when
> deleting files that used indirect blocks. Fix this.
My bad... what I should have done was to pass 'bh' to ext4_free_blocks.
this would make the code equivalent to ext3 code.
Ted's fix is probably as good, but I think it is nicer to fix it by passing bh.
>
> Thanks to Hugh Dickins for pointing this out.
>
> Cc: stable@kernel.org
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> ---
> fs/ext4/inode.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 84b6162..e80fc51 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4378,6 +4378,7 @@ static void ext4_free_branches(handle_t *handle, struct inode *inode,
> (__le32 *) bh->b_data,
> (__le32 *) bh->b_data + addr_per_block,
> depth);
> + brelse(bh);
>
> /*
> * Everything below this this pointer has been
> --
> 1.7.3.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-01-11 10:16 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-10 10:54 ext4: ext23 support leaks buffer pages Hugh Dickins
2011-01-10 17:52 ` Ted Ts'o
2011-01-10 17:56 ` [PATCH] ext4: fix memory leak in ext4_free_branches Theodore Ts'o
2011-01-11 0:02 ` Hugh Dickins
2011-01-11 10:16 ` Amir Goldstein
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).