* [PATCH 1/2] mm: memcontrol: basic memory statistics in cgroup2 memory controller
2016-01-13 22:01 [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics Johannes Weiner
@ 2016-01-13 22:01 ` Johannes Weiner
2016-01-13 22:49 ` Andrew Morton
2016-01-13 22:01 ` [PATCH 2/2] mm: memcontrol: add "sock" to cgroup2 memory.stat Johannes Weiner
2016-01-13 22:49 ` [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics Andrew Morton
2 siblings, 1 reply; 11+ messages in thread
From: Johannes Weiner @ 2016-01-13 22:01 UTC (permalink / raw)
To: Andrew Morton
Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
kernel-team
Provide a cgroup2 memory.stat that provides statistics on LRU memory
and fault event counters. More consumers and breakdowns will follow.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/memcontrol.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 57 insertions(+)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c26ffac..8645852 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2767,6 +2767,18 @@ static unsigned long tree_stat(struct mem_cgroup *memcg,
return val;
}
+static unsigned long tree_events(struct mem_cgroup *memcg,
+ enum mem_cgroup_events_index idx)
+{
+ struct mem_cgroup *iter;
+ unsigned long val = 0;
+
+ for_each_mem_cgroup_tree(iter, memcg)
+ val += mem_cgroup_read_events(iter, idx);
+
+ return val;
+}
+
static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
{
unsigned long val;
@@ -5095,6 +5107,46 @@ static int memory_events_show(struct seq_file *m, void *v)
return 0;
}
+static int memory_stat_show(struct seq_file *m, void *v)
+{
+ struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
+ int i;
+
+ /* Memory consumer totals */
+
+ seq_printf(m, "anon %lu\n",
+ tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE);
+ seq_printf(m, "file %lu\n",
+ tree_stat(memcg, MEM_CGROUP_STAT_CACHE) * PAGE_SIZE);
+
+ /* Per-consumer breakdowns */
+
+ for (i = 0; i < NR_LRU_LISTS; i++) {
+ struct mem_cgroup *mi;
+ unsigned long val = 0;
+
+ for_each_mem_cgroup_tree(mi, memcg)
+ val += mem_cgroup_nr_lru_pages(mi, BIT(i)) * PAGE_SIZE;
+ seq_printf(m, "%s %lu\n", mem_cgroup_lru_names[i], val);
+ }
+
+ seq_printf(m, "file_mapped %lu\n",
+ tree_stat(memcg, MEM_CGROUP_STAT_FILE_MAPPED) * PAGE_SIZE);
+ seq_printf(m, "file_dirty %lu\n",
+ tree_stat(memcg, MEM_CGROUP_STAT_DIRTY) * PAGE_SIZE);
+ seq_printf(m, "file_writeback %lu\n",
+ tree_stat(memcg, MEM_CGROUP_STAT_WRITEBACK) * PAGE_SIZE);
+
+ /* Memory management events */
+
+ seq_printf(m, "pgfault %lu\n",
+ tree_events(memcg, MEM_CGROUP_EVENTS_PGFAULT));
+ seq_printf(m, "pgmajfault %lu\n",
+ tree_events(memcg, MEM_CGROUP_EVENTS_PGMAJFAULT));
+
+ return 0;
+}
+
static struct cftype memory_files[] = {
{
.name = "current",
@@ -5125,6 +5177,11 @@ static struct cftype memory_files[] = {
.file_offset = offsetof(struct mem_cgroup, events_file),
.seq_show = memory_events_show,
},
+ {
+ .name = "stat",
+ .flags = CFTYPE_NOT_ON_ROOT,
+ .seq_show = memory_stat_show,
+ },
{ } /* terminate */
};
--
2.7.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] mm: memcontrol: basic memory statistics in cgroup2 memory controller
2016-01-13 22:01 ` [PATCH 1/2] mm: memcontrol: basic memory statistics in cgroup2 memory controller Johannes Weiner
@ 2016-01-13 22:49 ` Andrew Morton
2016-01-14 20:25 ` Johannes Weiner
0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2016-01-13 22:49 UTC (permalink / raw)
To: Johannes Weiner
Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
kernel-team
On Wed, 13 Jan 2016 17:01:08 -0500 Johannes Weiner <hannes@cmpxchg.org> wrote:
> Provide a cgroup2 memory.stat that provides statistics on LRU memory
> and fault event counters. More consumers and breakdowns will follow.
>
> ...
>
> @@ -5095,6 +5107,46 @@ static int memory_events_show(struct seq_file *m, void *v)
> return 0;
> }
>
> +static int memory_stat_show(struct seq_file *m, void *v)
> +{
> + struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
> + int i;
> +
> + /* Memory consumer totals */
> +
> + seq_printf(m, "anon %lu\n",
> + tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE);
Is there any reason why this won't overflow a longword on 32-bit?
> + seq_printf(m, "file %lu\n",
> + tree_stat(memcg, MEM_CGROUP_STAT_CACHE) * PAGE_SIZE);
> +
> + /* Per-consumer breakdowns */
> +
> + for (i = 0; i < NR_LRU_LISTS; i++) {
> + struct mem_cgroup *mi;
> + unsigned long val = 0;
> +
> + for_each_mem_cgroup_tree(mi, memcg)
> + val += mem_cgroup_nr_lru_pages(mi, BIT(i)) * PAGE_SIZE;
> + seq_printf(m, "%s %lu\n", mem_cgroup_lru_names[i], val);
> + }
> +
> + seq_printf(m, "file_mapped %lu\n",
> + tree_stat(memcg, MEM_CGROUP_STAT_FILE_MAPPED) * PAGE_SIZE);
> + seq_printf(m, "file_dirty %lu\n",
> + tree_stat(memcg, MEM_CGROUP_STAT_DIRTY) * PAGE_SIZE);
> + seq_printf(m, "file_writeback %lu\n",
> + tree_stat(memcg, MEM_CGROUP_STAT_WRITEBACK) * PAGE_SIZE);
> +
> + /* Memory management events */
> +
> + seq_printf(m, "pgfault %lu\n",
> + tree_events(memcg, MEM_CGROUP_EVENTS_PGFAULT));
> + seq_printf(m, "pgmajfault %lu\n",
> + tree_events(memcg, MEM_CGROUP_EVENTS_PGMAJFAULT));
> +
> + return 0;
> +}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] mm: memcontrol: basic memory statistics in cgroup2 memory controller
2016-01-13 22:49 ` Andrew Morton
@ 2016-01-14 20:25 ` Johannes Weiner
0 siblings, 0 replies; 11+ messages in thread
From: Johannes Weiner @ 2016-01-14 20:25 UTC (permalink / raw)
To: Andrew Morton
Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
kernel-team
On Wed, Jan 13, 2016 at 02:49:30PM -0800, Andrew Morton wrote:
> > @@ -5095,6 +5107,46 @@ static int memory_events_show(struct seq_file *m, void *v)
> > return 0;
> > }
> >
> > +static int memory_stat_show(struct seq_file *m, void *v)
> > +{
> > + struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m));
> > + int i;
> > +
> > + /* Memory consumer totals */
> > +
> > + seq_printf(m, "anon %lu\n",
> > + tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE);
>
> Is there any reason why this won't overflow a longword on 32-bit?
It will, I don't know what I was thinking there. Fixed in the update.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] mm: memcontrol: add "sock" to cgroup2 memory.stat
2016-01-13 22:01 [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics Johannes Weiner
2016-01-13 22:01 ` [PATCH 1/2] mm: memcontrol: basic memory statistics in cgroup2 memory controller Johannes Weiner
@ 2016-01-13 22:01 ` Johannes Weiner
2016-01-14 20:26 ` [PATCH 2/2 V2] " Johannes Weiner
2016-01-13 22:49 ` [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics Andrew Morton
2 siblings, 1 reply; 11+ messages in thread
From: Johannes Weiner @ 2016-01-13 22:01 UTC (permalink / raw)
To: Andrew Morton
Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
kernel-team
Provide statistics on how much of a cgroup's memory footprint is made
up of socket buffers from network connections owned by the group.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
include/linux/memcontrol.h | 5 ++++-
mm/memcontrol.c | 6 ++++++
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1666617..9ae48d4 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -50,6 +50,9 @@ enum mem_cgroup_stat_index {
MEM_CGROUP_STAT_WRITEBACK, /* # of pages under writeback */
MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */
MEM_CGROUP_STAT_NSTATS,
+ /* default hierarchy stats */
+ MEMCG_SOCK,
+ MEMCG_NR_STAT,
};
struct mem_cgroup_reclaim_cookie {
@@ -87,7 +90,7 @@ enum mem_cgroup_events_target {
#ifdef CONFIG_MEMCG
struct mem_cgroup_stat_cpu {
- long count[MEM_CGROUP_STAT_NSTATS];
+ long count[MEMCG_NR_STAT];
unsigned long events[MEMCG_NR_EVENTS];
unsigned long nr_page_events;
unsigned long targets[MEM_CGROUP_NTARGETS];
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8645852..6bb23a7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5118,6 +5118,8 @@ static int memory_stat_show(struct seq_file *m, void *v)
tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE);
seq_printf(m, "file %lu\n",
tree_stat(memcg, MEM_CGROUP_STAT_CACHE) * PAGE_SIZE);
+ seq_printf(m, "sock %lu\n",
+ tree_stat(memcg, MEMCG_SOCK) * PAGE_SIZE);
/* Per-consumer breakdowns */
@@ -5619,6 +5621,8 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
if (in_softirq())
gfp_mask = GFP_NOWAIT;
+ this_cpu_add(memcg->stat->count[MEMCG_SOCK], nr_pages);
+
if (try_charge(memcg, gfp_mask, nr_pages) == 0)
return true;
@@ -5638,6 +5642,8 @@ void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
return;
}
+ this_cpu_sub(memcg->stat->count[MEMCG_SOCK], nr_pages);
+
page_counter_uncharge(&memcg->memory, nr_pages);
css_put_many(&memcg->css, nr_pages);
}
--
2.7.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/2 V2] mm: memcontrol: add "sock" to cgroup2 memory.stat
2016-01-13 22:01 ` [PATCH 2/2] mm: memcontrol: add "sock" to cgroup2 memory.stat Johannes Weiner
@ 2016-01-14 20:26 ` Johannes Weiner
0 siblings, 0 replies; 11+ messages in thread
From: Johannes Weiner @ 2016-01-14 20:26 UTC (permalink / raw)
To: Andrew Morton
Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
kernel-team
Provide statistics on how much of a cgroup's memory footprint is made
up of socket buffers from network connections owned by the group.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
v2: add item description in cgroup.txt and rebase to updated 1/2
Documentation/cgroup.txt | 4 ++++
include/linux/memcontrol.h | 5 ++++-
mm/memcontrol.c | 6 ++++++
3 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/Documentation/cgroup.txt b/Documentation/cgroup.txt
index 65b3eac..e8d25e7 100644
--- a/Documentation/cgroup.txt
+++ b/Documentation/cgroup.txt
@@ -843,6 +843,10 @@ PAGE_SIZE multiple when read back.
Amount of memory used to cache filesystem data,
including tmpfs and shared memory.
+ sock
+
+ Amount of memory used in network transmission buffers
+
file_mapped
Amount of cached filesystem data mapped with mmap()
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1666617..9ae48d4 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -50,6 +50,9 @@ enum mem_cgroup_stat_index {
MEM_CGROUP_STAT_WRITEBACK, /* # of pages under writeback */
MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */
MEM_CGROUP_STAT_NSTATS,
+ /* default hierarchy stats */
+ MEMCG_SOCK,
+ MEMCG_NR_STAT,
};
struct mem_cgroup_reclaim_cookie {
@@ -87,7 +90,7 @@ enum mem_cgroup_events_target {
#ifdef CONFIG_MEMCG
struct mem_cgroup_stat_cpu {
- long count[MEM_CGROUP_STAT_NSTATS];
+ long count[MEMCG_NR_STAT];
unsigned long events[MEMCG_NR_EVENTS];
unsigned long nr_page_events;
unsigned long targets[MEM_CGROUP_NTARGETS];
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index cdb51a9..97e47de 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5127,6 +5127,8 @@ static int memory_stat_show(struct seq_file *m, void *v)
(u64)tree_stat(memcg, MEM_CGROUP_STAT_RSS) * PAGE_SIZE);
seq_printf(m, "file %llu\n",
(u64)tree_stat(memcg, MEM_CGROUP_STAT_CACHE) * PAGE_SIZE);
+ seq_printf(m, "sock %llu\n",
+ (u64)tree_stat(memcg, MEMCG_SOCK) * PAGE_SIZE);
seq_printf(m, "file_mapped %llu\n",
(u64)tree_stat(memcg, MEM_CGROUP_STAT_FILE_MAPPED) *
@@ -5630,6 +5632,8 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
if (in_softirq())
gfp_mask = GFP_NOWAIT;
+ this_cpu_add(memcg->stat->count[MEMCG_SOCK], nr_pages);
+
if (try_charge(memcg, gfp_mask, nr_pages) == 0)
return true;
@@ -5649,6 +5653,8 @@ void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
return;
}
+ this_cpu_sub(memcg->stat->count[MEMCG_SOCK], nr_pages);
+
page_counter_uncharge(&memcg->memory, nr_pages);
css_put_many(&memcg->css, nr_pages);
}
--
2.7.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
2016-01-13 22:01 [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics Johannes Weiner
2016-01-13 22:01 ` [PATCH 1/2] mm: memcontrol: basic memory statistics in cgroup2 memory controller Johannes Weiner
2016-01-13 22:01 ` [PATCH 2/2] mm: memcontrol: add "sock" to cgroup2 memory.stat Johannes Weiner
@ 2016-01-13 22:49 ` Andrew Morton
2016-01-14 20:24 ` Johannes Weiner
2 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2016-01-13 22:49 UTC (permalink / raw)
To: Johannes Weiner
Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
kernel-team
On Wed, 13 Jan 2016 17:01:07 -0500 Johannes Weiner <hannes@cmpxchg.org> wrote:
> Hi Andrew,
>
> these patches add basic memory statistics so that new users of cgroup2
> have some inkling of what's going on, and are not just confronted with
> a single number of bytes used.
>
> This is very short-notice, but also straight-forward. It would be cool
> to get this in along with the lifting of the cgroup2 devel flag.
>
> Michal, Vladimir, what do you think? We'll also have to figure out how
> we're going to represent and break down the "kmem" consumers.
>
It would be nice to see example output, and a description of why this
output was chosen: what was included, what was omitted, why it was
presented this way, what units were chosen for displaying the stats and
why. Will the things which are being displayed still be relevant (or
even available) 10 years from now. etcetera.
And the interface should be documented at some point. Doing it now
will help with the review of the proposed interface.
Because this stuff is forever and we have to get it right.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
2016-01-13 22:49 ` [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics Andrew Morton
@ 2016-01-14 20:24 ` Johannes Weiner
2016-01-15 9:58 ` Vladimir Davydov
0 siblings, 1 reply; 11+ messages in thread
From: Johannes Weiner @ 2016-01-14 20:24 UTC (permalink / raw)
To: Andrew Morton
Cc: Michal Hocko, Vladimir Davydov, linux-mm, cgroups, linux-kernel,
kernel-team
On Wed, Jan 13, 2016 at 02:49:16PM -0800, Andrew Morton wrote:
> It would be nice to see example output, and a description of why this
> output was chosen: what was included, what was omitted, why it was
> presented this way, what units were chosen for displaying the stats and
> why. Will the things which are being displayed still be relevant (or
> even available) 10 years from now. etcetera.
>
> And the interface should be documented at some point. Doing it now
> will help with the review of the proposed interface.
>
> Because this stuff is forever and we have to get it right.
Here is a follow-up to 1/2 that hopefully addresses all that, as well
as the 32-bit overflow problem. What do you think? I'm probably a bit
too optimistic with being able to maintain a meaningful sort order of
the file when adding new entries. It depends on whether people start
relying on items staying at fixed offsets and what we tell them in
response when that breaks. I hope that we can at least get the main
memory consumers in before this is released, just in case.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
2016-01-14 20:24 ` Johannes Weiner
@ 2016-01-15 9:58 ` Vladimir Davydov
2016-01-15 20:30 ` Johannes Weiner
0 siblings, 1 reply; 11+ messages in thread
From: Vladimir Davydov @ 2016-01-15 9:58 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Michal Hocko, linux-mm, cgroups, linux-kernel,
kernel-team
Hi Johannes,
On Thu, Jan 14, 2016 at 03:24:08PM -0500, Johannes Weiner wrote:
...
> The output of this file looks as follows:
>
> $ cat memory.stat
> anon 167936
> file 87302144
> file_mapped 0
> file_dirty 0
> file_writeback 0
> inactive_anon 0
> active_anon 155648
> inactive_file 87298048
> active_file 4096
> unevictable 0
> pgfault 636
> pgmajfault 0
>
> The list consists of two sections: statistics reflecting the current
> state of the memory management subsystem, and statistics reflecting
> past events. The items themselves are sorted such that generic big
> picture items come before specific details, and items related to
> userspace activity come before items related to kernel heuristics.
>
> All memory counters are in bytes to eliminate all ambiguity with
> variable page sizes.
>
> There will be more items and statistics added in the future, but this
> is a good initial set to get a minimum of insight into how a cgroup is
> using memory, and the items chosen for now are likely to remain valid
> even with significant changes to the memory management implementation.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
With the follow-up it looks good to me. All exported counters look
justified enough and the format follows that of other cgroup2
controllers (cpu, blkio). Thanks!
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
One addition though. May be, we could add 'total' field which would show
memory.current? Yeah, this would result in a little redundancy, but I
think that from userspace pov it's much more convenient to read the
only file and get all stat counters than having them spread throughout
several files.
Come to think of it, do we really need separate memory.events file?
Can't these counters live in memory.stat either? Yeah, this file
generates events, but IMHO it's not very useful the way it is currently
implemented:
Suppose, a user wants to receive notifications about OOM or LOW events,
which are rather rare normally and might require immediate action. The
only way to do that is to listen to memory.events, but this file can
generate tons of MAX/HIGH when the cgroup is performing normally. The
userspace app will have to wake up every time the cgroup performs
reclaim and check memory.events just to ensure no OOM happened and this
all will result in wasting cpu time.
May be, we could generate LOW/HIGH/MAX events on memory.low/high/max?
This would look natural IMO. Don't know where OOM events should go in
this case though.
Thanks,
Vladimir
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
2016-01-15 9:58 ` Vladimir Davydov
@ 2016-01-15 20:30 ` Johannes Weiner
2016-01-28 16:21 ` Michal Hocko
0 siblings, 1 reply; 11+ messages in thread
From: Johannes Weiner @ 2016-01-15 20:30 UTC (permalink / raw)
To: Vladimir Davydov
Cc: Andrew Morton, Michal Hocko, linux-mm, cgroups, linux-kernel,
kernel-team
On Fri, Jan 15, 2016 at 12:58:34PM +0300, Vladimir Davydov wrote:
> With the follow-up it looks good to me. All exported counters look
> justified enough and the format follows that of other cgroup2
> controllers (cpu, blkio). Thanks!
>
> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Thanks Vladimir.
> One addition though. May be, we could add 'total' field which would show
> memory.current? Yeah, this would result in a little redundancy, but I
> think that from userspace pov it's much more convenient to read the
> only file and get all stat counters than having them spread throughout
> several files.
I am not fully convinced that a total value or even memory.current
will be looked at that often in practice, because in all but a few
cornercases that value will be pegged to the configured limit. In
those instances I think it should be okay to check another file.
> Come to think of it, do we really need separate memory.events file?
> Can't these counters live in memory.stat either?
I think it sits at a different level of the interface. The events file
indicates cgroup-specific dynamics between configuration and memory
footprint, and so it sits on the same level as low, high, max, and
current. These are the parts involved in the most basic control loop
between the kernel and the job scheduler--monitor and adjust or notify
the admin. It's for the entity that allocates and manages the system.
The memory.stat file on the other hand is geared toward analyzing and
understanding workload-specific performance (whether by humans or with
some automated heuristics) and if necessary correcting the config file
that describes the application's requirements to the job scheduler.
I think it makes sense to not conflate these two interfaces.
> Yeah, this file
> generates events, but IMHO it's not very useful the way it is currently
> implemented:
>
> Suppose, a user wants to receive notifications about OOM or LOW events,
> which are rather rare normally and might require immediate action. The
> only way to do that is to listen to memory.events, but this file can
> generate tons of MAX/HIGH when the cgroup is performing normally. The
> userspace app will have to wake up every time the cgroup performs
> reclaim and check memory.events just to ensure no OOM happened and this
> all will result in wasting cpu time.
Under optimal system load there is no limit reclaim, and memory
pressure comes exclusively from a shortage of physical pages that
global reclaim balances based on memory.low. If groups run into their
own limits, it means that there are idle resources left on the table.
So events only happen when the machine is over or under utilized, and
as per above, the events file is mainly meant for something like a job
scheduler tasked with allocating the machine's resources. It's hard to
imagine a job scheduler scenario where the long-term goal is anything
other than optimal utilization.
There are reasonable cases in which memory could be temporarily left
idle, say to keep startup latency of new jobs low. In those it's true
that the max and high notifications might become annoying. But do you
really think that could become problematic in practice? In that case
it should be enough if we ratelimit the file-changed notifications.
> May be, we could generate LOW/HIGH/MAX events on memory.low/high/max?
> This would look natural IMO. Don't know where OOM events should go in
> this case though.
Without a natural place for OOM notifications, it probably makes sense
to stick with memory.events.
Thanks,
Johannes
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
2016-01-15 20:30 ` Johannes Weiner
@ 2016-01-28 16:21 ` Michal Hocko
0 siblings, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2016-01-28 16:21 UTC (permalink / raw)
To: Johannes Weiner
Cc: Vladimir Davydov, Andrew Morton, linux-mm, cgroups, linux-kernel,
kernel-team
[I am sorry I am coming here late but I didn't have time earlier]
On Fri 15-01-16 15:30:59, Johannes Weiner wrote:
> On Fri, Jan 15, 2016 at 12:58:34PM +0300, Vladimir Davydov wrote:
> > With the follow-up it looks good to me. All exported counters look
> > justified enough and the format follows that of other cgroup2
> > controllers (cpu, blkio). Thanks!
> >
> > Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
>
> Thanks Vladimir.
Patches got merged in the meantime but anyway just for the record
Acked-by: Michal Hocko <mhocko@suse.com>
I would have liked nr_pages more than B because they are neither
following /proc/vmstat nor /proc/meminfo (which is in kB). It is true
that other memcg knobs are primarily bytes oriented so there is some
reason to use B here as well.
> > One addition though. May be, we could add 'total' field which would show
> > memory.current? Yeah, this would result in a little redundancy, but I
> > think that from userspace pov it's much more convenient to read the
> > only file and get all stat counters than having them spread throughout
> > several files.
>
> I am not fully convinced that a total value or even memory.current
> will be looked at that often in practice, because in all but a few
> cornercases that value will be pegged to the configured limit. In
> those instances I think it should be okay to check another file.
Agreed
> > Come to think of it, do we really need separate memory.events file?
> > Can't these counters live in memory.stat either?
>
> I think it sits at a different level of the interface. The events file
> indicates cgroup-specific dynamics between configuration and memory
> footprint, and so it sits on the same level as low, high, max, and
> current. These are the parts involved in the most basic control loop
> between the kernel and the job scheduler--monitor and adjust or notify
> the admin. It's for the entity that allocates and manages the system.
>
> The memory.stat file on the other hand is geared toward analyzing and
> understanding workload-specific performance (whether by humans or with
> some automated heuristics) and if necessary correcting the config file
> that describes the application's requirements to the job scheduler.
>
> I think it makes sense to not conflate these two interfaces.
Agreed here as well.
> > Yeah, this file
> > generates events, but IMHO it's not very useful the way it is currently
> > implemented:
> >
> > Suppose, a user wants to receive notifications about OOM or LOW events,
> > which are rather rare normally and might require immediate action. The
> > only way to do that is to listen to memory.events, but this file can
> > generate tons of MAX/HIGH when the cgroup is performing normally. The
> > userspace app will have to wake up every time the cgroup performs
> > reclaim and check memory.events just to ensure no OOM happened and this
> > all will result in wasting cpu time.
>
> Under optimal system load there is no limit reclaim, and memory
> pressure comes exclusively from a shortage of physical pages that
> global reclaim balances based on memory.low. If groups run into their
> own limits, it means that there are idle resources left on the table.
I would expect that the high limit will be routinely hit due to page
cache consumption.
> So events only happen when the machine is over or under utilized, and
> as per above, the events file is mainly meant for something like a job
> scheduler tasked with allocating the machine's resources. It's hard to
> imagine a job scheduler scenario where the long-term goal is anything
> other than optimal utilization.
>
> There are reasonable cases in which memory could be temporarily left
> idle, say to keep startup latency of new jobs low. In those it's true
> that the max and high notifications might become annoying. But do you
> really think that could become problematic in practice? In that case
> it should be enough if we ratelimit the file-changed notifications.
I am not sure this would be a real problem either. Sure you can see a
lot of events but AFAIU no events will be lost, right?
> > May be, we could generate LOW/HIGH/MAX events on memory.low/high/max?
> > This would look natural IMO. Don't know where OOM events should go in
> > this case though.
>
> Without a natural place for OOM notifications, it probably makes sense
> to stick with memory.events.
yes
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread