All of lore.kernel.org
 help / color / mirror / Atom feed
From: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: toshi.okajima@jp.fujitsu.com, tytso@mit.edu,
	viro@zeniv.linux.org.uk, sct@redhat.com, adilger@sun.com,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [RESEND][PATCH 0/3 BUG,RFC] release block-device-mapping buffer_heads which have the filesystem private data for avoiding oom-killer
Date: Tue, 25 Nov 2008 15:13:37 +0900	[thread overview]
Message-ID: <492B9791.30007@jp.fujitsu.com> (raw)
In-Reply-To: <20081124131352.f5485398.akpm@linux-foundation.org>

Hi Andrew,
Thanks for your comments.

 > On Thu, 20 Nov 2008 09:27:11 +0900
 > Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> wrote:
<SNIP>
 >
 > I'm scratching my head trying to work out why we never encountered and
 > fixed this before.

 > Is it possible that you have a very large number of filesystems
 > mounted, and/or that they have large journals?

Yes, I think it happen more easily under those conditions.

Actually, I encountered this situation if conditions were:
- on the x86 architecture (The size of Normal zone is only 800MB
    even if the huge memory (more than 4GB) install.)
- reserving the big memory (more than 100MB) for the kdump kernel.
   (The memory obtains from Normal Zone.)
- mounting the large number of ext3 filesystems (more than 50).

And the following operations were done:
- many I/Os were issued to many filesystems sequentially and continuously.
(They made many journal_heads (and buffer_heads).
  => they were metadata.)
- issuing the I/Os to many filesystems were stopped.
(This caused many metadata to remain.)

By their operations, the number of remaining the journal_heads was
more than 100000 (They occupied 400MB (The same number of buffer_heads remained
and the block size was 4096B)). We cannot release those journal_heads because
checkpointing the transactions are not executed till some I/Os are issued to
the filesystems or the filesystems were unmounting.
And many other slab caches which couldn't be released occupied about 300MB.
Therefore about 800MB memory couldn't be released.
As a result, there was no room in the Normal zone.

I think you could not encounter it because you haven't done such the following:
- You reserve the big memory for the kdump kernel.
- You issue many I/Os to each ext3 filesystem sequentially and continuously,
  and then you never issue some I/Os to the filesystems at all afterwards.
  (Especially, you do the operations which causes many metadata to remain.
   Example: Delete many files which are huge.)

 > Would it not be more logical if the ->client_releasepage function
 > pointer were a member of the blockdev address_space_operations, rather
 > than some random field in the blockdev inode?  That arrangement might
 > well be reused in the future, when some other address_space needs to
 > talk to a different address_space to make a page reclaimable.

I think it logical to replace a default ->releasepage with a function pointer
which a client (FS) passed, but I don't think it logical to add a new member
function in address space in order to release a client page. Because new
function is called from ->releasepage, so I think this function pointer should
not be put in the same level as the releasepage of address space.

Though, it is difficult to replace ->releasepage member with a client function
because there is no exclusive operation while this function is calling.

So, I made this patch (without replacing ->releasepage).

How about my thought?

Best Regards,
Toshiyuki Okajima


  reply	other threads:[~2008-11-25  6:13 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-20  0:27 [RESEND][PATCH 0/3 BUG,RFC] release block-device-mapping buffer_heads which have the filesystem private data for avoiding oom-killer Toshiyuki Okajima
2008-11-24 21:13 ` Andrew Morton
2008-11-25  6:13   ` Toshiyuki Okajima [this message]
2008-11-25  6:29     ` Andrew Morton
2008-11-25  6:22   ` Theodore Tso
2008-11-25  7:32 ` Theodore Tso
2008-11-25  8:06   ` Toshiyuki Okajima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=492B9791.30007@jp.fujitsu.com \
    --to=toshi.okajima@jp.fujitsu.com \
    --cc=adilger@sun.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sct@redhat.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.