From: Pavel Emelyanov <xemul@parallels.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Hugh Dickins <hughd@google.com>, Nick Piggin <npiggin@kernel.dk>,
Andrea Arcangeli <aarcange@redhat.com>,
Rik van Riel <riel@redhat.com>,
Dave Hansen <dave@linux.vnet.ibm.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [RFC][PATCH 0/13] Per-container dcache management (and a bit more)
Date: Fri, 06 May 2011 16:15:50 +0400 [thread overview]
Message-ID: <4DC3E676.7030706@parallels.com> (raw)
In-Reply-To: <20110506010537.GE26837@dastard>
On 05/06/2011 05:05 AM, Dave Chinner wrote:
> On Tue, May 03, 2011 at 04:14:37PM +0400, Pavel Emelyanov wrote:
>> Hi.
>>
>> According to the "release early, release often" strategy :) I'm
>> glad to propose this scratch implementation of what I was talking
>> about at the LSF - the way to limit the dcache grow for both
>> containerized and not systems (the set applies to 2.6.38).
>
> dcache growth is rarely the memory consumption problem in systems -
> it's inode cache growth that is the issue. Each inodes consumes 4-5x
> as much memory as a dentry, and the dentry lifecycle is a subset of
> the inode lifecycle. Limiting the number of dentries will do very
> little to relieve memory problems because of this.
No, you don't take into account that once we have the dentry cache shrunk
the inode cache can be also shrunk (since there's no objects other than
dentries, that hold inodes in cache), but not the vice versa. That said --
if we keep the dentry cache from growing it becomes possible to keep the
inode cache from growing.
> Indeed, I actually get a request from embedded folks every so often
> to limit the size of the inode cache - they never have troubles with
> the size of the dentry cache (and I do ask) - so perhaps you need to
> consider this aspect of the problem a bit more.
>
> FWIW, I often see machines during tests where the dentry cache is
> empty, yet there are millions of inodes cached on the inode LRU
> consuming gigabytes of memory. e.g a snapshot from my 4GB RAM test
> VM right now:
>
> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
> 2180754 2107387 96% 0.21K 121153 18 484612K xfs_ili
> 2132964 2107389 98% 1.00K 533241 4 2132964K xfs_inode
> 1625922 944034 58% 0.06K 27558 59 110232K size-64
> 415320 415301 99% 0.19K 20766 20 83064K dentry
>
> You see 400k active dentries consume 83MB of ram, yet 2.1M active
> inodes consuming ~2.6GB of RAM. We've already reclaimed the dentry
> cache down quite small, while the inode cache remains the dominant
> memory consumer.....
Same here - this 2.6GB of RAM is shrinkable memory (unless xfs inodes
references are leaked).
> I'm also concerned about the scalability issues - moving back to
> global lists and locks for LRU, shrinker and mob management is the
> opposite direction we are taking - we want to make the LRUs more
> fine-grained and more closely related to the MM structures,
> shrinkers confined to per-sb context (no more lifecycle issues,
> ever) and operate per-node/-zone rather than globally, etc. It
> seems to me that this containerisation will make much of that work
> difficult to acheive effectively because it doesn't take any of this
> ongoing scalability work into account.
Two things from my side on this:
1. Can you be more specific on this - which parts of VFS suffer from the
LRU being global? The only thing I found was the problem with shrinking
the dcache for some sb on umount, but in my patch #4 I made both routines
doing it work on dentry tree, not the LRU list and thus the global LRU is
no longer an issue at this point.
2. If for any reason you do need to keep LRU per super block (please share
one if you do) we can create mobs per super block :) In other words - with
mobs we're much more flexible with how to manage dentry LRU-s rather than
with per-sb LRU-s.
>> The first 5 patches are preparations for this, descriptive (I hope)
>> comments are inside them.
>>
>> The general idea of this set is -- make the dentries subtrees be
>> limited in size and shrink them as they hit the configured limit.
>
> And if the inode cache that does not shrink with it?
Yet again - that's not a big deal. Once we killed dentries, the inodes are
no longer pinned in memory and the very first try_to_free_pages can free them.
>> Why subtrees? Because this lets having the [dentry -> group] reference
>> without the reference count, letting the [dentry -> parent] one handle
>> this.
>>
>> Why limited? For containers the answer is simple -- a container
>> should not be allowed to consume too much of the host memory. For
>> non-containerized systems the answer is -- to protect the kernel
>> from the non-privileged attacks on the dcache memory like the
>> "while :; do mkdir x; cd x; done" one and similar.
>
> Which will stop as soon as the path gets too long.
No, it will *not*! Bash will start complaining, that the won't be able to set
the CWD env. variable, but once you turn this into a C program...
> And if this is really a problem on your systems, quotas can prevent this from
> ever being an issue....
Disagree.
Let's take the minimal CentOS5.5 container. It contains ~30K files, but in this
container there's no data like web server static pages/scripts, databases,
devel tools, etc. Thus we cannot configure the quota for this container with
less limit. I'd assume that 50K inodes is the minimal what we should set (for
the record - default quota size for this in OpenVZ is 200000, but people most
often increase one).
Having on x64_64 one dentry take ~200 bytes and one ext4 inode take ~1K we
give this container the ability to lock 50K * (200 + 1K) ~ 60M of RAM.
As our experience shows if you have a node with e.g. 2G of RAM you can easily host
up to 20 containers with LAMP stack (you can host more, but this will be notably slow).
Thus, trying to handle the issue with disk quota you are giving your containers
the ability to lock up to 1.2Gb of RAM with dcache + icache. This is way too many.
>> What isn't in this patch yet, but should be done after the discussion
>>
>> * API. I haven't managed to invent any perfect solution, and would
>> really like to have it discussed. In order to be able to play with it
>> the ioctls + proc for listing are proposed.
>>
>> * New mounts management. Right now if you mount some new FS to a
>> dentry which belongs to some managed set (I named it "mob" in this
>> patchset), the new mount is managed with the system settings. This is
>> not OK, the new mount should be managed with the settings of the
>> mountpoint's mob.
>>
>> * Elegant shrink_dcache_memory on global memory shortage. By now the
>> code walks the mobs and shinks some equal amount of dentries from them.
>> Better shrinking policy can and probably should be implemented.
>
> See above.
>
> Cheers,
>
> Dave.
Thanks,
Pavel
next prev parent reply other threads:[~2011-05-06 12:16 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-03 12:14 [RFC][PATCH 0/13] Per-container dcache management (and a bit more) Pavel Emelyanov
2011-05-03 12:15 ` [PATCH 1/13] vfs: Lighten r/o rename_lock lockers Pavel Emelyanov
2011-05-03 12:15 ` [PATCH 2/13] vfs: Factor out rename_lock locking Pavel Emelyanov
2011-05-03 12:16 ` [PATCH 3/13] vfs: Make the rename_lock per-sb Pavel Emelyanov
2011-05-03 12:16 ` [PATCH 4/13] vfs: Factor out tree (of four) shrinkers code Pavel Emelyanov
2011-05-03 12:17 ` [PATCH 5/13] vfs: Make dentry LRU list global Pavel Emelyanov
2011-05-03 12:17 ` [PATCH 6/13] vfs: Turn the nr_dentry into percpu_counter Pavel Emelyanov
2011-05-03 12:18 ` [PATCH 7/13] vfs: Limit the number of dentries globally Pavel Emelyanov
2011-05-03 12:18 ` [PATCH 8/13] vfs: Introduce the dentry mobs Pavel Emelyanov
2011-06-18 13:40 ` Andrea Arcangeli
2011-05-03 12:18 ` [PATCH 9/13] vfs: More than one mob management Pavel Emelyanov
2011-05-03 12:19 ` [PATCH 10/13] vfs: Routnes for setting mob size and getting stats Pavel Emelyanov
2011-05-03 12:19 ` [PATCH 11/13] vfs: Make shrink_dcache_memory prune dcache from all mobs Pavel Emelyanov
2011-05-03 12:20 ` [PATCH 12/13] vfs: Mobs creation and mgmt API Pavel Emelyanov
2011-05-03 12:20 ` [PATCH 13/13] vfs: Dentry mobs listing in proc Pavel Emelyanov
2011-05-06 1:05 ` [RFC][PATCH 0/13] Per-container dcache management (and a bit more) Dave Chinner
2011-05-06 12:15 ` Pavel Emelyanov [this message]
2011-05-07 0:01 ` Dave Chinner
2011-05-10 11:18 ` Pavel Emelyanov
2011-06-18 13:30 ` Andrea Arcangeli
2011-06-20 0:49 ` Dave Chinner
2011-07-04 5:32 ` Pavel Emelyanov
2011-05-23 6:43 ` Pavel Emelyanov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4DC3E676.7030706@parallels.com \
--to=xemul@parallels.com \
--cc=aarcange@redhat.com \
--cc=dave@linux.vnet.ibm.com \
--cc=david@fromorbit.com \
--cc=hughd@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=npiggin@kernel.dk \
--cc=riel@redhat.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).