All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Cc: linux-ext4@vger.kernel.org, sct@redhat.com
Subject: Re: [RFC][PATCH] JBD: release checkpoint journal heads through try_to_release_page when the memory is exhausted
Date: Mon, 20 Oct 2008 16:02:49 -0700	[thread overview]
Message-ID: <20081020160249.ff41f762.akpm@linux-foundation.org> (raw)
In-Reply-To: <20081017.223716.147444348.00960188@stratos.soft.fujitsu.com>

On Fri, 17 Oct 2008 22:37:16 +0900 (JST)
Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> wrote:

> Hi.
> 
> I found the situation where OOM-Killer happens easily.
> I will inform you of it. 
> I tried to fix this problem to make OOM-Killer not happen easily as much as 
> possible.
> As a result, I made a reference patch to fix it. 
> 
> Any comments are welcome.
> (The comments for making much simpler or epoch-making approach are 
> very welcome.)
> 
> ------------------------------------------------------------------------------
> 
> If the following is satisfied, OOM-Killer happens easily.
> (1) A quarter of a summation of each total log size of all filesystems which 
>    use jbd exceeds the memory size of Normal Zone.
> (2) We commit a huge number of data which include many metadata to each 
>    filesystem and then we stop committing data to them. 
>     For example, a process creates many files whose size are huge and 
>    which have a huge number of indirect blocks. Then all processes stop I/O 
>    to all filesystems which use jbd.
> (3) After (2), we request to get a big size memory.
> (NOTE: A oom-killer can happen easily on a system whose architecture is x86. 
> Because a x86 system can have only a small Normal Zone of less than 1GB.)
> 
> The reason is that jbd does not positively release journal heads(jh-s)
>   even if there are many jh-s which can be released.
> 
> Releasing jh-s is only executed at the following timing:
> - if free log space becomes a quarter of the total log size 
>   (log_do_checkpoint())
> - if a transaction begins to commit (journal_cleanup_checkpoint_list() 
>  which is called by journal_commit_transaction())
> (NOTE: A jh-s which corresponds to buffer heads (bh-s) which is a direct block 
>       can be released at journal_try_to_free_buffers() which is called 
>       by try_to_release_page())   
> 
> Therefore,  if we let filesystems do above (2), jh-s remains because 
> new transaction isn't generated. 
> However, when the system memory is exhausted, try_to_release_page() can be 
> called, but it cannot release bh-s which are metadata (indirect blocks 
> and so on).  
> Because the mapping to the page is owned by a block device not a filesystem 
> (ext3).
> 
> If the mapping is owned by a block device, try_to_release_page() calls 
> try_to_free_buffers(). It can release generic bh, but cannot release the bh 
> which is referring by the jh. Because the reference counter of the bh is 
> larger than 0.
> Therefore it is necessary to release the jh before the bh is released.
> 
> To achieve it, I added a new member function into buffer head structure.
> The function releases the bh which correspond to a page whose mapping
> is block device. And the release target of the bh has private data 
> (journal head).
> The function resembles journal_try_to_free_buffers().
> Then I changed try_to_release_page(), which calls try_to_free_buffers()
> after the new function.
> 
> As a result, I think it becomes difficult for oom-killer to happen 
> than before because try_to_free_buffers() via try_to_release_page() 
> which is called when the system memory is exhausted can release bh-s. 
> 

OK.

> ---
>  fs/buffer.c                 |   23 ++++++++++++++++++++++-
>  fs/jbd/journal.c            |    7 +++++++
>  fs/jbd/transaction.c        |   39 +++++++++++++++++++++++++++++++++++++++
>  include/linux/buffer_head.h |    7 +++++++
>  include/linux/jbd.h         |    1 +
>  5 files changed, 76 insertions(+), 1 deletion(-) 

The patch is fairly complex, and increasing the buffer_head size can be
rather costly.  An alternative might be to implement a shrinker
callback function for the journal_head slab cache.  Did you consider
this?


  reply	other threads:[~2008-10-20 23:03 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-17 13:37 [RFC][PATCH] JBD: release checkpoint journal heads through try_to_release_page when the memory is exhausted Toshiyuki Okajima
2008-10-20 23:02 ` Andrew Morton [this message]
2008-10-21  2:49   ` Toshiyuki Okajima
2008-10-23  8:41   ` Toshiyuki Okajima
2008-10-27 21:26     ` Andrew Morton
2008-10-28  2:46       ` Toshiyuki Okajima
2008-11-05  4:11         ` [PATCH][BUG] jbd: fix the root cause of "no transactions" error in __log_wait_for_space() Toshiyuki Okajima
2008-11-05 13:53           ` Theodore Tso
2008-11-05 14:05             ` [PATCH] jbd: don't give up looking for space so easily in __log_wait_for_space Theodore Ts'o
2008-11-05 14:05               ` [PATCH] jbd2: don't give up looking for space so easily in __jbd2_log_wait_for_space Theodore Ts'o
2008-11-07  3:17             ` [PATCH][BUG] jbd: fix the root cause of "no transactions" error in __log_wait_for_space() Toshiyuki Okajima
2008-11-12  7:49               ` [PATCH 0/3][RFC] release block-device-mapping buffer_heads which have the filesystem private data for avoiding oom-killer Toshiyuki Okajima
2008-11-12  7:51               ` [PATCH 1/3][RFC] vfs: " Toshiyuki Okajima
2008-11-12  7:53               ` [PATCH 2/3][RFC] ext3: " Toshiyuki Okajima
2008-11-12  7:55               ` [PATCH 3/3][RFC] ext4: " Toshiyuki Okajima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081020160249.ff41f762.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sct@redhat.com \
    --cc=toshi.okajima@jp.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.