From mboxrd@z Thu Jan 1 00:00:00 1970 From: Namjae Jeon Subject: RE: memory leak: data=journal and {collapse,insert,zero}_range Date: Tue, 17 Nov 2015 13:47:50 +0900 Message-ID: <005901d120f3$1d880410$58980c30$@samsung.com> References: <20151017160230.GA19968@thunk.org> <009301d10b2f$b410e6b0$1c32b410$@samsung.com> <20151020155443.GM2972@thunk.org> <011f01d10be5$099d38d0$1cd7aa70$@samsung.com> <20151021145214.GC2165@thunk.org> <004301d11aae$72683a40$5738aec0$@samsung.com> <20151110144946.GB3156@quack.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: 'Theodore Ts'o' , linux-ext4@vger.kernel.org To: 'Jan Kara' Return-path: Received: from mailout1.samsung.com ([203.254.224.24]:53378 "EHLO mailout1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751790AbbKQEry (ORCPT ); Mon, 16 Nov 2015 23:47:54 -0500 Received: from epcpsbgr4.samsung.com (u144.gpu120.samsung.co.kr [203.254.230.144]) by mailout1.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0NXX01CHOZZR6NC0@mailout1.samsung.com> for linux-ext4@vger.kernel.org; Tue, 17 Nov 2015 13:47:51 +0900 (KST) In-reply-to: <20151110144946.GB3156@quack.suse.cz> Content-language: ko Sender: linux-ext4-owner@vger.kernel.org List-ID: > On Mon 09-11-15 14:21:11, Namjae Jeon wrote: > > > > > > On Wed, Oct 21, 2015 at 06:44:10PM +0900, Namjae Jeon wrote: > > > > > Interestingly we're not seeing these memory leaks on the truncate > > > > > path, so I suspect the issue is in how collapse range is clearing > > > > > pages from the page cache, especially pages that were freshly written > > > > > to the journal by the commit but which hadn't yet been writtten to > > > > > disk and then marked as complete so we can allow the relevant > > > > > transaction to be checkpointed. (Although we're not leaking the > > > > > journal head structures, but only the buffer heads, so the story most > > > > > be a bit more complicated than that.) > > > > > > > > Okay, Thanks for sharing your view and points !! > > > > > > > > Currently I can reproduce memory leak issue without collase/insert/zero range. > > > > conditions like the following.(collase/insert/zero range are disable with -I -C -z option and > add -y > > > option instead of -W) > > > > 1. small size parition(1GB) > > > > 2. run fsx with these options "./fsx -N 30000 -o 128000 -l 500000 -r 4096 -t 512 -w 512 -Z -R > -y - > > > I -C -z testfile" > > > > And same result with generic/091 is showing (buffer_head leak) > > > > > > > > So I am starting to find root-cause base on your points. > > > > I will share the result or the patch. > > > > > > Thanks, that's very interesting data point. So this makes it appear > > > that the problem *is* probably with how we deal with checkpointing > > > buffers after the pages get discarded using either a truncate or a > > > collapse_range, since the 'y' option causes a lot fsync's, and hence > > > commits, some of which are happening after a truncate command. > > > > > > Thanks for a taking a look at this. I really appreciate it. > > > > > > Cheers, > > > > > > Hi Ted, > > > > Could you review this patch? > > > > Thanks! > > --------------------------------------------------------------------------------- > > Subject: [PATCH] jbd2: try to free buffers from truncated page after > > checkpoint > > > > when ext4 is mounted in data=journal mode, and truncate operation > > such as settatr(size), collopse, insert and zero range are used, there are > > are many truncated pages with NULL page->mapping. Such truncated pages > > pile up quickly due to truncate_pagecache on data pages associated with journal. > > As page->mapping is NULL for such truncated pages, they are not freed > > by drop cache(3) or umount. As a result, MemFree in /proc/meminfo decreases > > quickly and active buffer_head slab objects grow in /proc/slabinfo. > > This patch attempts to free buffers from such pages at the end of jbd2 > > checkpoint, if pages do not have any busy buffers and NULL mapping. > > Hum, why such pages didn't get freed by release_buffer_page() call > happening when processing transaction's forget list? Because the idea is > that such pages should be discarded at that point... Hi Jan, When I checked this code, release_buffer_page is not called when buffer_jbddirty is true. Such buffers with JBD_Dirty are added to new checkpoint. if (buffer_jbddirty(bh)) { JBUFFER_TRACE(jh, "add to new checkpointing trans"); __jbd2_journal_insert_checkpoint(jh, commit_transaction); if (is_journal_aborted(journal)) clear_buffer_jbddirty(bh); } else { J_ASSERT_BH(bh, !buffer_dirty(bh)); /* * The buffer on BJ_Forget list and not jbddirty means * it has been freed by this transaction and hence it * could not have been reallocated until this * transaction has committed. *BUT* it could be * reallocated once we have written all the data to * disk and before we process the buffer on BJ_Forget * list. */ if (!jh->b_next_transaction) try_to_free = 1; } JBUFFER_TRACE(jh, "refile or unfile buffer"); __jbd2_journal_refile_buffer(jh); jbd_unlock_bh_state(bh); Next buffer was unfiled by __jbd2_journal_refile_buffer, JBD_Dirty cleared and BH_Dirty was set in same function. Later it must have been written back as BH_Dirty was cleared. And ext4_wait_for_tail_page_commit-> __ext4_journalled_invalidatepage -> journal_unmap_buffer zaps the buffer: if (!buffer_dirty(bh)) { /* bdflush has written it. We can drop it now */ goto zap_buffer; } next, truncate_pagecache is called, which clears the page mapping. Eventually, remove checkpoint is called, but such page with NULL mapping was not freed. So, I had added release_buffer_page at the end of remove checkpoint to attempt to free such free buffer pages. Please let me know your opinion. Thanks. > Honza > > > > --- > > fs/jbd2/checkpoint.c | 11 +++++++++++ > > fs/jbd2/commit.c | 2 +- > > include/linux/jbd2.h | 1 + > > 3 files changed, 13 insertions(+), 1 deletion(-) > > > > diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c > > index 4227dc4..bf68442 100644 > > --- a/fs/jbd2/checkpoint.c > > +++ b/fs/jbd2/checkpoint.c > > @@ -529,6 +529,7 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh) > > transaction_t *transaction; > > journal_t *journal; > > int ret = 0; > > + struct buffer_head *bh; > > > > JBUFFER_TRACE(jh, "entry"); > > > > @@ -538,10 +539,20 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh) > > } > > journal = transaction->t_journal; > > > > + bh = jh2bh(jh); > > + get_bh(bh); > > JBUFFER_TRACE(jh, "removing from transaction"); > > __buffer_unlink(jh); > > jh->b_cp_transaction = NULL; > > jbd2_journal_put_journal_head(jh); > > + /* > > + * if journal head is freed, try to free buffers from a truncated > > + * page, if page buffers are not busy and page->mapping is NULL > > + */ > > + if (!buffer_jbd(bh)) > > + release_buffer_page(bh); > > + else > > + __brelse(bh); > > > > if (transaction->t_checkpoint_list != NULL || > > transaction->t_checkpoint_io_list != NULL) > > diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c > > index b73e021..d94ec3f 100644 > > --- a/fs/jbd2/commit.c > > +++ b/fs/jbd2/commit.c > > @@ -63,7 +63,7 @@ static void journal_end_buffer_io_sync(struct buffer_head *bh, int uptodate) > > * Called under lock_journal(), and possibly under journal_datalist_lock. The > > * caller provided us with a ref against the buffer, and we drop that here. > > */ > > -static void release_buffer_page(struct buffer_head *bh) > > +void release_buffer_page(struct buffer_head *bh) > > { > > struct page *page; > > > > diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h > > index edb640a..523f345 100644 > > --- a/include/linux/jbd2.h > > +++ b/include/linux/jbd2.h > > @@ -1039,6 +1039,7 @@ int __jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block); > > void jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block); > > > > /* Commit management */ > > +extern void release_buffer_page(struct buffer_head *); > > extern void jbd2_journal_commit_transaction(journal_t *); > > > > /* Checkpoint list management */ > > > > ---------------------------------------------------------- > > > > > > - Ted > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > Jan Kara > SUSE Labs, CR