From: Tejun Heo <tj@kernel.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Glauber Costa <glommer@parallels.com>,
Michal Hocko <mhocko@suse.cz>,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
kamezawa.hiroyu@jp.fujitsu.com, devel@openvz.org,
linux-mm@kvack.org, Suleiman Souhlal <suleiman@google.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
David Rientjes <rientjes@google.com>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH v3 04/13] kmem accounting basic infrastructure
Date: Thu, 27 Sep 2012 07:49:42 -0700 [thread overview]
Message-ID: <20120927144942.GB4251@mtj.dyndns.org> (raw)
In-Reply-To: <20120927142822.GG3429@suse.de>
Hello, Mel.
On Thu, Sep 27, 2012 at 03:28:22PM +0100, Mel Gorman wrote:
> > In addition, how is userland supposed to know which
> > workload is shared kmem heavy or not?
>
> By using a bit of common sense.
>
> An application may not be able to figure this out but the administrator
> is going to be able to make a very educated guess. If processes running
> within two containers are not sharing a filesystem hierarchy for example
> then it'll be clear they are not sharing dentries.
>
> If there was a suspicion they were then it could be analysed with
> something like SystemTap probing when files are opened and see if files
> are being opened that are shared between containers.
>
> It's not super-easy but it's not impossible either and I fail to see why
> it's such a big deal for you.
Because we're not even trying to actually solve the problem but just
dumping it to userland. If dentry/inode usage is the only case we're
being worried about, there can be better ways to solve it or at least
we should strive for that.
Also, the problem is not that it is impossible if you know and
carefully plan for things beforehand (that would be one extremely
competent admin) but that the problem is undiscoverable. With kmemcg
accounting disabled, there's no way to tell a looking cgroup the admin
thinks running something which doesn'ft tax kmem much could be
generating a ton without the admin ever noticing.
> > The fact that the numbers don't really mean what they apparently
> > should mean.
>
> I think it is a reasonable limitation that only some kernel allocations are
> accounted for although I'll freely admit I'm not a cgroup or memcg user
> either.
>
> My understanding is that this comes down to cost -- accounting for the
> kernel memory usage is expensive so it is limited only to the allocations
> that are easy to abuse by an unprivileged process. Hence this is
> initially concerned with stack pages with dentries and TCP usage to
> follow in later patches.
I think the cost isn't too prohibitive considering it's already using
memcg. Charging / uncharging happens only as pages enter and leave
slab caches and the hot path overhead is essentially single
indirection. Glauber's benchmark seemed pretty reasonable to me and I
don't yet think that warrants exposing this subtle tree of
configuration.
> > Sure, conferences are useful for building consensus but that's the
> > extent of it. Sorry that I didn't realize the implications then but
> > conferences don't really add any finality to decisions.
> >
> > So, this seems properly crazy to me at the similar level of
> > use_hierarchy fiasco. I'm gonna NACK on this.
>
> I think you're over-reacting to say the very least :|
The part I nacked is enabling kmemcg on a populated cgroup and then
starting accounting from then without any apparent indication that any
past allocation hasn't been considered. You end up with numbers which
nobody can't tell what they really mean and there's no mechanism to
guarantee any kind of ordering between populating the cgroup and
configuring it and there's *no* way to find out what happened
afterwards neither. This is properly crazy and definitely deserves a
nack.
Thanks.
--
tejun
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-09-27 14:49 UTC|newest]
Thread overview: 127+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-18 14:03 [PATCH v3 00/13] kmem controller for memcg Glauber Costa
2012-09-18 14:03 ` [PATCH v3 01/13] memcg: Make it possible to use the stock for more than one page Glauber Costa
2012-10-01 18:48 ` Johannes Weiner
2012-09-18 14:03 ` [PATCH v3 02/13] memcg: Reclaim when more than one page needed Glauber Costa
2012-10-01 19:00 ` Johannes Weiner
2012-09-18 14:04 ` [PATCH v3 03/13] memcg: change defines to an enum Glauber Costa
2012-10-01 19:06 ` Johannes Weiner
2012-10-02 9:10 ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 04/13] kmem accounting basic infrastructure Glauber Costa
2012-09-21 16:34 ` Tejun Heo
2012-09-24 8:09 ` Glauber Costa
2012-09-26 14:03 ` Michal Hocko
2012-09-26 14:33 ` Glauber Costa
2012-09-26 16:01 ` Michal Hocko
2012-09-26 17:34 ` Glauber Costa
2012-09-26 16:36 ` Tejun Heo
2012-09-26 17:36 ` Glauber Costa
2012-09-26 17:44 ` Tejun Heo
2012-09-26 17:53 ` Glauber Costa
2012-09-26 18:01 ` Tejun Heo
2012-09-26 18:56 ` Glauber Costa
2012-09-26 19:34 ` Tejun Heo
2012-09-26 19:46 ` Glauber Costa
2012-09-26 19:56 ` Tejun Heo
2012-09-26 20:02 ` Glauber Costa
2012-09-26 20:16 ` Tejun Heo
2012-09-26 21:24 ` Glauber Costa
2012-09-26 22:10 ` Tejun Heo
2012-09-26 22:29 ` Glauber Costa
2012-09-26 22:42 ` Tejun Heo
2012-09-26 22:54 ` Glauber Costa
2012-09-26 23:08 ` Tejun Heo
2012-09-26 23:20 ` Glauber Costa
2012-09-26 23:33 ` Tejun Heo
2012-09-27 12:15 ` Michal Hocko
2012-09-27 12:20 ` Glauber Costa
2012-09-27 12:40 ` Michal Hocko
2012-09-27 12:40 ` Glauber Costa
2012-09-27 12:54 ` Michal Hocko
2012-09-27 14:28 ` Mel Gorman
2012-09-27 14:49 ` Tejun Heo [this message]
2012-09-27 14:57 ` Glauber Costa
2012-09-27 17:46 ` Tejun Heo
2012-09-27 17:56 ` Michal Hocko
2012-09-27 18:45 ` Glauber Costa
2012-09-30 7:57 ` Tejun Heo
2012-09-30 8:02 ` Tejun Heo
2012-09-30 8:56 ` James Bottomley
2012-09-30 10:37 ` Tejun Heo
2012-09-30 11:25 ` James Bottomley
2012-10-01 0:57 ` Tejun Heo
2012-10-01 8:43 ` Glauber Costa
2012-10-01 8:46 ` Glauber Costa
2012-10-03 22:59 ` Tejun Heo
2012-10-01 8:36 ` Glauber Costa
2012-09-27 12:08 ` Michal Hocko
2012-09-27 12:11 ` Glauber Costa
2012-09-27 14:33 ` Tejun Heo
2012-09-27 14:43 ` Mel Gorman
2012-09-27 14:58 ` Tejun Heo
2012-09-27 18:30 ` Glauber Costa
2012-09-30 8:23 ` Tejun Heo
2012-10-01 8:45 ` Glauber Costa
2012-10-03 22:54 ` Tejun Heo
2012-10-04 11:55 ` Glauber Costa
2012-10-06 2:19 ` Tejun Heo
2012-09-27 15:09 ` Michal Hocko
2012-09-30 8:47 ` Tejun Heo
2012-10-01 9:27 ` Michal Hocko
2012-10-03 22:43 ` Tejun Heo
2012-10-05 13:47 ` Michal Hocko
2012-09-26 22:11 ` Johannes Weiner
2012-09-26 22:45 ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 05/13] Add a __GFP_KMEMCG flag Glauber Costa
2012-09-18 14:15 ` Rik van Riel
2012-09-18 15:06 ` Christoph Lameter
2012-09-19 7:39 ` Glauber Costa
2012-09-19 14:07 ` Christoph Lameter
2012-09-27 13:34 ` Mel Gorman
2012-09-27 13:41 ` Glauber Costa
2012-10-01 19:09 ` Johannes Weiner
2012-09-18 14:04 ` [PATCH v3 06/13] memcg: kmem controller infrastructure Glauber Costa
2012-09-20 16:05 ` JoonSoo Kim
2012-09-21 8:41 ` Glauber Costa
2012-09-21 9:14 ` JoonSoo Kim
2012-09-26 15:51 ` Michal Hocko
2012-09-27 11:31 ` Glauber Costa
2012-09-27 13:44 ` Michal Hocko
2012-09-28 11:34 ` Glauber Costa
2012-09-30 8:25 ` Tejun Heo
2012-10-01 8:28 ` Glauber Costa
2012-10-03 22:11 ` Tejun Heo
2012-10-01 9:44 ` Michal Hocko
2012-10-01 9:48 ` Michal Hocko
2012-10-01 10:09 ` Glauber Costa
2012-10-01 11:51 ` Michal Hocko
2012-10-01 11:51 ` Glauber Costa
2012-10-01 11:58 ` Michal Hocko
2012-10-01 12:04 ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 07/13] mm: Allocate kernel pages to the right memcg Glauber Costa
2012-09-27 13:50 ` Mel Gorman
2012-09-28 9:43 ` Glauber Costa
2012-09-28 13:28 ` Mel Gorman
2012-09-27 13:52 ` Michal Hocko
2012-09-18 14:04 ` [PATCH v3 08/13] res_counter: return amount of charges after res_counter_uncharge Glauber Costa
2012-10-01 10:00 ` Michal Hocko
2012-10-01 10:01 ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 09/13] memcg: kmem accounting lifecycle management Glauber Costa
2012-10-01 12:15 ` Michal Hocko
2012-10-01 12:29 ` Glauber Costa
2012-10-01 12:36 ` Michal Hocko
2012-10-01 12:43 ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 10/13] memcg: use static branches when code not in use Glauber Costa
2012-10-01 12:25 ` Michal Hocko
2012-10-01 12:27 ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 11/13] memcg: allow a memcg with kmem charges to be destructed Glauber Costa
2012-10-01 12:30 ` Michal Hocko
2012-09-18 14:04 ` [PATCH v3 12/13] execute the whole memcg freeing in rcu callback Glauber Costa
2012-09-21 17:23 ` Tejun Heo
2012-09-24 8:48 ` Glauber Costa
2012-10-01 13:27 ` Michal Hocko
2012-10-04 10:53 ` Glauber Costa
2012-10-04 14:20 ` Glauber Costa
2012-10-05 15:31 ` Johannes Weiner
2012-10-08 9:45 ` Glauber Costa
2012-09-18 14:04 ` [PATCH v3 13/13] protect architectures where THREAD_SIZE >= PAGE_SIZE against fork bombs Glauber Costa
2012-10-01 13:17 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120927144942.GB4251@mtj.dyndns.org \
--to=tj@kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=devel@openvz.org \
--cc=fweisbec@gmail.com \
--cc=glommer@parallels.com \
--cc=hannes@cmpxchg.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.cz \
--cc=rientjes@google.com \
--cc=suleiman@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).