* [PATCH 01/14] mm: memcontrol: export root_mem_cgroup
2015-11-12 23:41 [PATCH 00/14] mm: memcontrol: account socket memory in unified hierarchy Johannes Weiner
@ 2015-11-12 23:41 ` Johannes Weiner
[not found] ` <1447371693-25143-2-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2015-11-14 12:17 ` Vladimir Davydov
2015-11-12 23:41 ` [PATCH 02/14] mm: vmscan: simplify memcg vs. global shrinker invocation Johannes Weiner
` (12 subsequent siblings)
13 siblings, 2 replies; 61+ messages in thread
From: Johannes Weiner @ 2015-11-12 23:41 UTC (permalink / raw)
To: David Miller, Andrew Morton
Cc: Vladimir Davydov, Tejun Heo, Michal Hocko, netdev, linux-mm,
cgroups, linux-kernel, kernel-team
A later patch will need this symbol in files other than memcontrol.c,
so export it now and replace mem_cgroup_root_css at the same time.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
---
include/linux/memcontrol.h | 3 ++-
mm/backing-dev.c | 2 +-
mm/memcontrol.c | 5 ++---
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index ffc5460..9a7a24a 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -275,7 +275,8 @@ struct mem_cgroup {
struct mem_cgroup_per_node *nodeinfo[0];
/* WARNING: nodeinfo must be the last member here */
};
-extern struct cgroup_subsys_state *mem_cgroup_root_css;
+
+extern struct mem_cgroup *root_mem_cgroup;
/**
* mem_cgroup_events - count memory events against a cgroup
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 9160853..fdc6f4d 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -707,7 +707,7 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
ret = wb_init(&bdi->wb, bdi, 1, GFP_KERNEL);
if (!ret) {
- bdi->wb.memcg_css = mem_cgroup_root_css;
+ bdi->wb.memcg_css = &root_mem_cgroup->css;
bdi->wb.blkcg_css = blkcg_root_css;
}
return ret;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fcd7c4e..a5d2586 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -76,9 +76,9 @@
struct cgroup_subsys memory_cgrp_subsys __read_mostly;
EXPORT_SYMBOL(memory_cgrp_subsys);
+struct mem_cgroup *root_mem_cgroup __read_mostly;
+
#define MEM_CGROUP_RECLAIM_RETRIES 5
-static struct mem_cgroup *root_mem_cgroup __read_mostly;
-struct cgroup_subsys_state *mem_cgroup_root_css __read_mostly;
/* Whether the swap controller is active */
#ifdef CONFIG_MEMCG_SWAP
@@ -4211,7 +4211,6 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
/* root ? */
if (parent_css == NULL) {
root_mem_cgroup = memcg;
- mem_cgroup_root_css = &memcg->css;
page_counter_init(&memcg->memory, NULL);
memcg->high = PAGE_COUNTER_MAX;
memcg->soft_limit = PAGE_COUNTER_MAX;
--
2.6.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 61+ messages in thread[parent not found: <1447371693-25143-2-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>]
* Re: [PATCH 01/14] mm: memcontrol: export root_mem_cgroup
[not found] ` <1447371693-25143-2-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
@ 2015-11-13 15:59 ` David Miller
0 siblings, 0 replies; 61+ messages in thread
From: David Miller @ 2015-11-13 15:59 UTC (permalink / raw)
To: hannes-druUgvl0LCNAfugRpC6u6w
Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
vdavydov-5HdwGun5lf+gSpxsJD1C4w, tj-DgEjT+Ai2ygdnm+yROfE0A,
mhocko-AlSwsSmVLrQ, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
From: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Date: Thu, 12 Nov 2015 18:41:20 -0500
> A later patch will need this symbol in files other than memcontrol.c,
> so export it now and replace mem_cgroup_root_css at the same time.
>
> Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> Acked-by: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>
Acked-by: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 01/14] mm: memcontrol: export root_mem_cgroup
2015-11-12 23:41 ` [PATCH 01/14] mm: memcontrol: export root_mem_cgroup Johannes Weiner
[not found] ` <1447371693-25143-2-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
@ 2015-11-14 12:17 ` Vladimir Davydov
1 sibling, 0 replies; 61+ messages in thread
From: Vladimir Davydov @ 2015-11-14 12:17 UTC (permalink / raw)
To: Johannes Weiner
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Thu, Nov 12, 2015 at 06:41:20PM -0500, Johannes Weiner wrote:
> A later patch will need this symbol in files other than memcontrol.c,
> so export it now and replace mem_cgroup_root_css at the same time.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 02/14] mm: vmscan: simplify memcg vs. global shrinker invocation
2015-11-12 23:41 [PATCH 00/14] mm: memcontrol: account socket memory in unified hierarchy Johannes Weiner
2015-11-12 23:41 ` [PATCH 01/14] mm: memcontrol: export root_mem_cgroup Johannes Weiner
@ 2015-11-12 23:41 ` Johannes Weiner
2015-11-13 15:59 ` David Miller
2015-11-14 12:36 ` Vladimir Davydov
2015-11-12 23:41 ` [PATCH 03/14] net: tcp_memcontrol: properly detect ancestor socket pressure Johannes Weiner
` (11 subsequent siblings)
13 siblings, 2 replies; 61+ messages in thread
From: Johannes Weiner @ 2015-11-12 23:41 UTC (permalink / raw)
To: David Miller, Andrew Morton
Cc: Vladimir Davydov, Tejun Heo, Michal Hocko, netdev, linux-mm,
cgroups, linux-kernel, kernel-team
Letting shrink_slab() handle the root_mem_cgroup, and implicitely the
!CONFIG_MEMCG case, allows shrink_zone() to invoke the shrinkers
unconditionally from within the memcg iteration loop.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
---
include/linux/memcontrol.h | 2 ++
mm/vmscan.c | 31 ++++++++++++++++---------------
2 files changed, 18 insertions(+), 15 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 9a7a24a..251bb51 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -502,6 +502,8 @@ void mem_cgroup_split_huge_fixup(struct page *head);
#else /* CONFIG_MEMCG */
struct mem_cgroup;
+#define root_mem_cgroup NULL
+
static inline void mem_cgroup_events(struct mem_cgroup *memcg,
enum mem_cgroup_events_index idx,
unsigned int nr)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a4507ec..e4f5b3c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -411,6 +411,10 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
struct shrinker *shrinker;
unsigned long freed = 0;
+ /* Global shrinker mode */
+ if (memcg == root_mem_cgroup)
+ memcg = NULL;
+
if (memcg && !memcg_kmem_is_active(memcg))
return 0;
@@ -2410,11 +2414,22 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
shrink_lruvec(lruvec, swappiness, sc, &lru_pages);
zone_lru_pages += lru_pages;
- if (memcg && is_classzone)
+ /*
+ * Shrink the slab caches in the same proportion that
+ * the eligible LRU pages were scanned.
+ */
+ if (is_classzone) {
shrink_slab(sc->gfp_mask, zone_to_nid(zone),
memcg, sc->nr_scanned - scanned,
lru_pages);
+ if (reclaim_state) {
+ sc->nr_reclaimed +=
+ reclaim_state->reclaimed_slab;
+ reclaim_state->reclaimed_slab = 0;
+ }
+ }
+
/*
* Direct reclaim and kswapd have to scan all memory
* cgroups to fulfill the overall scan target for the
@@ -2432,20 +2447,6 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
}
} while ((memcg = mem_cgroup_iter(root, memcg, &reclaim)));
- /*
- * Shrink the slab caches in the same proportion that
- * the eligible LRU pages were scanned.
- */
- if (global_reclaim(sc) && is_classzone)
- shrink_slab(sc->gfp_mask, zone_to_nid(zone), NULL,
- sc->nr_scanned - nr_scanned,
- zone_lru_pages);
-
- if (reclaim_state) {
- sc->nr_reclaimed += reclaim_state->reclaimed_slab;
- reclaim_state->reclaimed_slab = 0;
- }
-
vmpressure(sc->gfp_mask, sc->target_mem_cgroup,
sc->nr_scanned - nr_scanned,
sc->nr_reclaimed - nr_reclaimed);
--
2.6.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: [PATCH 02/14] mm: vmscan: simplify memcg vs. global shrinker invocation
2015-11-12 23:41 ` [PATCH 02/14] mm: vmscan: simplify memcg vs. global shrinker invocation Johannes Weiner
@ 2015-11-13 15:59 ` David Miller
2015-11-14 12:36 ` Vladimir Davydov
1 sibling, 0 replies; 61+ messages in thread
From: David Miller @ 2015-11-13 15:59 UTC (permalink / raw)
To: hannes
Cc: akpm, vdavydov, tj, mhocko, netdev, linux-mm, cgroups,
linux-kernel, kernel-team
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Thu, 12 Nov 2015 18:41:21 -0500
> Letting shrink_slab() handle the root_mem_cgroup, and implicitely the
> !CONFIG_MEMCG case, allows shrink_zone() to invoke the shrinkers
> unconditionally from within the memcg iteration loop.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: David S. Miller <davem@davemloft.net>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 02/14] mm: vmscan: simplify memcg vs. global shrinker invocation
2015-11-12 23:41 ` [PATCH 02/14] mm: vmscan: simplify memcg vs. global shrinker invocation Johannes Weiner
2015-11-13 15:59 ` David Miller
@ 2015-11-14 12:36 ` Vladimir Davydov
2015-11-14 15:06 ` Johannes Weiner
1 sibling, 1 reply; 61+ messages in thread
From: Vladimir Davydov @ 2015-11-14 12:36 UTC (permalink / raw)
To: Johannes Weiner
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Thu, Nov 12, 2015 at 06:41:21PM -0500, Johannes Weiner wrote:
...
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a4507ec..e4f5b3c 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -411,6 +411,10 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
> struct shrinker *shrinker;
> unsigned long freed = 0;
>
> + /* Global shrinker mode */
> + if (memcg == root_mem_cgroup)
> + memcg = NULL;
> +
> if (memcg && !memcg_kmem_is_active(memcg))
> return 0;
>
> @@ -2410,11 +2414,22 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
> shrink_lruvec(lruvec, swappiness, sc, &lru_pages);
> zone_lru_pages += lru_pages;
>
> - if (memcg && is_classzone)
> + /*
> + * Shrink the slab caches in the same proportion that
> + * the eligible LRU pages were scanned.
> + */
> + if (is_classzone) {
> shrink_slab(sc->gfp_mask, zone_to_nid(zone),
> memcg, sc->nr_scanned - scanned,
> lru_pages);
>
> + if (reclaim_state) {
> + sc->nr_reclaimed +=
> + reclaim_state->reclaimed_slab;
> + reclaim_state->reclaimed_slab = 0;
> + }
> + }
> +
> /*
> * Direct reclaim and kswapd have to scan all memory
> * cgroups to fulfill the overall scan target for the
> @@ -2432,20 +2447,6 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
> }
> } while ((memcg = mem_cgroup_iter(root, memcg, &reclaim)));
>
> - /*
> - * Shrink the slab caches in the same proportion that
> - * the eligible LRU pages were scanned.
> - */
> - if (global_reclaim(sc) && is_classzone)
> - shrink_slab(sc->gfp_mask, zone_to_nid(zone), NULL,
> - sc->nr_scanned - nr_scanned,
> - zone_lru_pages);
> -
> - if (reclaim_state) {
> - sc->nr_reclaimed += reclaim_state->reclaimed_slab;
> - reclaim_state->reclaimed_slab = 0;
> - }
> -
AFAICS this patch deadly breaks memcg-unaware shrinkers vs LRU balance:
currently we scan (*total* LRU scanned / *total* LRU pages) of all such
objects; with this patch we'd use the numbers from the root cgroup
instead. If most processes reside in memory cgroups, the root cgroup
will have only a few LRU pages and hence the pressure exerted upon such
objects will be unfairly severe.
Thanks,
Vladimir
> vmpressure(sc->gfp_mask, sc->target_mem_cgroup,
> sc->nr_scanned - nr_scanned,
> sc->nr_reclaimed - nr_reclaimed);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [PATCH 02/14] mm: vmscan: simplify memcg vs. global shrinker invocation
2015-11-14 12:36 ` Vladimir Davydov
@ 2015-11-14 15:06 ` Johannes Weiner
0 siblings, 0 replies; 61+ messages in thread
From: Johannes Weiner @ 2015-11-14 15:06 UTC (permalink / raw)
To: Vladimir Davydov
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Sat, Nov 14, 2015 at 03:36:50PM +0300, Vladimir Davydov wrote:
> On Thu, Nov 12, 2015 at 06:41:21PM -0500, Johannes Weiner wrote:
> > @@ -2432,20 +2447,6 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc,
> > }
> > } while ((memcg = mem_cgroup_iter(root, memcg, &reclaim)));
> >
> > - /*
> > - * Shrink the slab caches in the same proportion that
> > - * the eligible LRU pages were scanned.
> > - */
> > - if (global_reclaim(sc) && is_classzone)
> > - shrink_slab(sc->gfp_mask, zone_to_nid(zone), NULL,
> > - sc->nr_scanned - nr_scanned,
> > - zone_lru_pages);
> > -
> > - if (reclaim_state) {
> > - sc->nr_reclaimed += reclaim_state->reclaimed_slab;
> > - reclaim_state->reclaimed_slab = 0;
> > - }
> > -
>
> AFAICS this patch deadly breaks memcg-unaware shrinkers vs LRU balance:
> currently we scan (*total* LRU scanned / *total* LRU pages) of all such
> objects; with this patch we'd use the numbers from the root cgroup
> instead. If most processes reside in memory cgroups, the root cgroup
> will have only a few LRU pages and hence the pressure exerted upon such
> objects will be unfairly severe.
You're absolutely right, good catch.
Please disregard this patch. It's not necessary for this series after
v2, I just kept it because I thought it's a nice simplification that's
possible after making root_mem_cgroup public.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 03/14] net: tcp_memcontrol: properly detect ancestor socket pressure
2015-11-12 23:41 [PATCH 00/14] mm: memcontrol: account socket memory in unified hierarchy Johannes Weiner
2015-11-12 23:41 ` [PATCH 01/14] mm: memcontrol: export root_mem_cgroup Johannes Weiner
2015-11-12 23:41 ` [PATCH 02/14] mm: vmscan: simplify memcg vs. global shrinker invocation Johannes Weiner
@ 2015-11-12 23:41 ` Johannes Weiner
2015-11-13 16:00 ` David Miller
[not found] ` <1447371693-25143-4-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2015-11-12 23:41 ` [PATCH 04/14] net: tcp_memcontrol: remove bogus hierarchy pressure propagation Johannes Weiner
` (10 subsequent siblings)
13 siblings, 2 replies; 61+ messages in thread
From: Johannes Weiner @ 2015-11-12 23:41 UTC (permalink / raw)
To: David Miller, Andrew Morton
Cc: Vladimir Davydov, Tejun Heo, Michal Hocko, netdev, linux-mm,
cgroups, linux-kernel, kernel-team
When charging socket memory, the code currently checks only the local
page counter for excess to determine whether the memcg is under socket
pressure. But even if the local counter is fine, one of the ancestors
could have breached its limit, which should also force this child to
enter socket pressure. This currently doesn't happen.
Fix this by using page_counter_try_charge() first. If that fails, it
means that either the local counter or one of the ancestors are in
excess of their limit, and the child should enter socket pressure.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
include/net/sock.h | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index bbf7c2c..c4b33c9 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1190,11 +1190,13 @@ static inline void memcg_memory_allocated_add(struct cg_proto *prot,
unsigned long amt,
int *parent_status)
{
- page_counter_charge(&prot->memory_allocated, amt);
+ struct page_counter *counter;
+
+ if (page_counter_try_charge(&prot->memory_allocated, amt, &counter))
+ return;
- if (page_counter_read(&prot->memory_allocated) >
- prot->memory_allocated.limit)
- *parent_status = OVER_LIMIT;
+ page_counter_charge(&prot->memory_allocated, amt);
+ *parent_status = OVER_LIMIT;
}
static inline void memcg_memory_allocated_sub(struct cg_proto *prot,
--
2.6.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: [PATCH 03/14] net: tcp_memcontrol: properly detect ancestor socket pressure
2015-11-12 23:41 ` [PATCH 03/14] net: tcp_memcontrol: properly detect ancestor socket pressure Johannes Weiner
@ 2015-11-13 16:00 ` David Miller
[not found] ` <1447371693-25143-4-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
1 sibling, 0 replies; 61+ messages in thread
From: David Miller @ 2015-11-13 16:00 UTC (permalink / raw)
To: hannes
Cc: akpm, vdavydov, tj, mhocko, netdev, linux-mm, cgroups,
linux-kernel, kernel-team
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Thu, 12 Nov 2015 18:41:22 -0500
> When charging socket memory, the code currently checks only the local
> page counter for excess to determine whether the memcg is under socket
> pressure. But even if the local counter is fine, one of the ancestors
> could have breached its limit, which should also force this child to
> enter socket pressure. This currently doesn't happen.
>
> Fix this by using page_counter_try_charge() first. If that fails, it
> means that either the local counter or one of the ancestors are in
> excess of their limit, and the child should enter socket pressure.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David S. Miller <davem@davemloft.net>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread[parent not found: <1447371693-25143-4-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>]
* Re: [PATCH 03/14] net: tcp_memcontrol: properly detect ancestor socket pressure
[not found] ` <1447371693-25143-4-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
@ 2015-11-14 12:45 ` Vladimir Davydov
2015-11-14 15:15 ` Johannes Weiner
0 siblings, 1 reply; 61+ messages in thread
From: Vladimir Davydov @ 2015-11-14 12:45 UTC (permalink / raw)
To: Johannes Weiner
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko,
netdev-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
On Thu, Nov 12, 2015 at 06:41:22PM -0500, Johannes Weiner wrote:
> When charging socket memory, the code currently checks only the local
> page counter for excess to determine whether the memcg is under socket
> pressure. But even if the local counter is fine, one of the ancestors
> could have breached its limit, which should also force this child to
> enter socket pressure. This currently doesn't happen.
>
> Fix this by using page_counter_try_charge() first. If that fails, it
> means that either the local counter or one of the ancestors are in
> excess of their limit, and the child should enter socket pressure.
>
> Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Reviewed-by: Vladimir Davydov <vdavydov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
For the record: it was broken by commit 3e32cb2e0a12 ("mm: memcontrol:
lockless page counters").
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [PATCH 03/14] net: tcp_memcontrol: properly detect ancestor socket pressure
2015-11-14 12:45 ` Vladimir Davydov
@ 2015-11-14 15:15 ` Johannes Weiner
0 siblings, 0 replies; 61+ messages in thread
From: Johannes Weiner @ 2015-11-14 15:15 UTC (permalink / raw)
To: Vladimir Davydov
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Sat, Nov 14, 2015 at 03:45:52PM +0300, Vladimir Davydov wrote:
> On Thu, Nov 12, 2015 at 06:41:22PM -0500, Johannes Weiner wrote:
> > When charging socket memory, the code currently checks only the local
> > page counter for excess to determine whether the memcg is under socket
> > pressure. But even if the local counter is fine, one of the ancestors
> > could have breached its limit, which should also force this child to
> > enter socket pressure. This currently doesn't happen.
> >
> > Fix this by using page_counter_try_charge() first. If that fails, it
> > means that either the local counter or one of the ancestors are in
> > excess of their limit, and the child should enter socket pressure.
> >
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
>
> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Thanks Vladimir!
> For the record: it was broken by commit 3e32cb2e0a12 ("mm: memcontrol:
> lockless page counters").
I didn't realize that.
Fixes: 3e32cb2e0a12 ("mm: memcontrol: lockless page counters")
^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 04/14] net: tcp_memcontrol: remove bogus hierarchy pressure propagation
2015-11-12 23:41 [PATCH 00/14] mm: memcontrol: account socket memory in unified hierarchy Johannes Weiner
` (2 preceding siblings ...)
2015-11-12 23:41 ` [PATCH 03/14] net: tcp_memcontrol: properly detect ancestor socket pressure Johannes Weiner
@ 2015-11-12 23:41 ` Johannes Weiner
2015-11-13 16:00 ` David Miller
2015-11-20 9:07 ` Vladimir Davydov
2015-11-12 23:41 ` [PATCH 05/14] net: tcp_memcontrol: protect all tcp_memcontrol calls by jump-label Johannes Weiner
` (9 subsequent siblings)
13 siblings, 2 replies; 61+ messages in thread
From: Johannes Weiner @ 2015-11-12 23:41 UTC (permalink / raw)
To: David Miller, Andrew Morton
Cc: Vladimir Davydov, Tejun Heo, Michal Hocko, netdev, linux-mm,
cgroups, linux-kernel, kernel-team
When a cgroup currently breaches its socket memory limit, it enters
memory pressure mode for itself and its *ancestors*. This throttles
transmission in unrelated sibling and cousin subtrees that have
nothing to do with the breached limit.
On the contrary, breaching a limit should make that group and its
*children* enter memory pressure mode. But this happens already,
albeit lazily: if an ancestor limit is breached, siblings will enter
memory pressure on their own once the next packet arrives for them.
So no additional hierarchy code is needed. Remove the bogus stuff.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
include/net/sock.h | 19 ++++---------------
1 file changed, 4 insertions(+), 15 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index c4b33c9..6fc9147 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1152,14 +1152,8 @@ static inline void sk_leave_memory_pressure(struct sock *sk)
if (*memory_pressure)
*memory_pressure = 0;
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
- struct cg_proto *cg_proto = sk->sk_cgrp;
- struct proto *prot = sk->sk_prot;
-
- for (; cg_proto; cg_proto = parent_cg_proto(prot, cg_proto))
- cg_proto->memory_pressure = 0;
- }
-
+ if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+ sk->sk_cgrp->memory_pressure = 0;
}
static inline void sk_enter_memory_pressure(struct sock *sk)
@@ -1167,13 +1161,8 @@ static inline void sk_enter_memory_pressure(struct sock *sk)
if (!sk->sk_prot->enter_memory_pressure)
return;
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
- struct cg_proto *cg_proto = sk->sk_cgrp;
- struct proto *prot = sk->sk_prot;
-
- for (; cg_proto; cg_proto = parent_cg_proto(prot, cg_proto))
- cg_proto->memory_pressure = 1;
- }
+ if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+ sk->sk_cgrp->memory_pressure = 1;
sk->sk_prot->enter_memory_pressure(sk);
}
--
2.6.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: [PATCH 04/14] net: tcp_memcontrol: remove bogus hierarchy pressure propagation
2015-11-12 23:41 ` [PATCH 04/14] net: tcp_memcontrol: remove bogus hierarchy pressure propagation Johannes Weiner
@ 2015-11-13 16:00 ` David Miller
2015-11-20 9:07 ` Vladimir Davydov
1 sibling, 0 replies; 61+ messages in thread
From: David Miller @ 2015-11-13 16:00 UTC (permalink / raw)
To: hannes
Cc: akpm, vdavydov, tj, mhocko, netdev, linux-mm, cgroups,
linux-kernel, kernel-team
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Thu, 12 Nov 2015 18:41:23 -0500
> When a cgroup currently breaches its socket memory limit, it enters
> memory pressure mode for itself and its *ancestors*. This throttles
> transmission in unrelated sibling and cousin subtrees that have
> nothing to do with the breached limit.
>
> On the contrary, breaching a limit should make that group and its
> *children* enter memory pressure mode. But this happens already,
> albeit lazily: if an ancestor limit is breached, siblings will enter
> memory pressure on their own once the next packet arrives for them.
>
> So no additional hierarchy code is needed. Remove the bogus stuff.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 04/14] net: tcp_memcontrol: remove bogus hierarchy pressure propagation
2015-11-12 23:41 ` [PATCH 04/14] net: tcp_memcontrol: remove bogus hierarchy pressure propagation Johannes Weiner
2015-11-13 16:00 ` David Miller
@ 2015-11-20 9:07 ` Vladimir Davydov
1 sibling, 0 replies; 61+ messages in thread
From: Vladimir Davydov @ 2015-11-20 9:07 UTC (permalink / raw)
To: Johannes Weiner
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Thu, Nov 12, 2015 at 06:41:23PM -0500, Johannes Weiner wrote:
> When a cgroup currently breaches its socket memory limit, it enters
> memory pressure mode for itself and its *ancestors*. This throttles
> transmission in unrelated sibling and cousin subtrees that have
> nothing to do with the breached limit.
>
> On the contrary, breaching a limit should make that group and its
> *children* enter memory pressure mode. But this happens already,
> albeit lazily: if an ancestor limit is breached, siblings will enter
> memory pressure on their own once the next packet arrives for them.
Hmm, we still call sk_prot->enter_memory_pressure, which might hurt a
workload in the root cgroup AFAICS. Strange. You fix it in patch 8
though.
>
> So no additional hierarchy code is needed. Remove the bogus stuff.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 05/14] net: tcp_memcontrol: protect all tcp_memcontrol calls by jump-label
2015-11-12 23:41 [PATCH 00/14] mm: memcontrol: account socket memory in unified hierarchy Johannes Weiner
` (3 preceding siblings ...)
2015-11-12 23:41 ` [PATCH 04/14] net: tcp_memcontrol: remove bogus hierarchy pressure propagation Johannes Weiner
@ 2015-11-12 23:41 ` Johannes Weiner
[not found] ` <1447371693-25143-6-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2015-11-14 16:33 ` Vladimir Davydov
2015-11-12 23:41 ` [PATCH 06/14] net: tcp_memcontrol: remove dead per-memcg count of allocated sockets Johannes Weiner
` (8 subsequent siblings)
13 siblings, 2 replies; 61+ messages in thread
From: Johannes Weiner @ 2015-11-12 23:41 UTC (permalink / raw)
To: David Miller, Andrew Morton
Cc: Vladimir Davydov, Tejun Heo, Michal Hocko, netdev, linux-mm,
cgroups, linux-kernel, kernel-team
Move the jump-label from sock_update_memcg() and sock_release_memcg()
to the callsite, and so eliminate those function calls when socket
accounting is not enabled.
This also eliminates the need for dummy functions because the calls
will be optimized away if the Kconfig options are not enabled.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
include/linux/memcontrol.h | 9 +-------
mm/memcontrol.c | 56 +++++++++++++++++++++-------------------------
net/core/sock.c | 9 ++------
net/ipv4/tcp.c | 3 ++-
net/ipv4/tcp_ipv4.c | 4 +++-
5 files changed, 33 insertions(+), 48 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 251bb51..e930803 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -709,17 +709,10 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
#endif /* CONFIG_CGROUP_WRITEBACK */
-struct sock;
#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
+struct sock;
void sock_update_memcg(struct sock *sk);
void sock_release_memcg(struct sock *sk);
-#else
-static inline void sock_update_memcg(struct sock *sk)
-{
-}
-static inline void sock_release_memcg(struct sock *sk)
-{
-}
#endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
#ifdef CONFIG_MEMCG_KMEM
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a5d2586..57f4539 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -293,46 +293,40 @@ static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
void sock_update_memcg(struct sock *sk)
{
- if (mem_cgroup_sockets_enabled) {
- struct mem_cgroup *memcg;
- struct cg_proto *cg_proto;
+ struct mem_cgroup *memcg;
+ struct cg_proto *cg_proto;
- BUG_ON(!sk->sk_prot->proto_cgroup);
+ BUG_ON(!sk->sk_prot->proto_cgroup);
- /* Socket cloning can throw us here with sk_cgrp already
- * filled. It won't however, necessarily happen from
- * process context. So the test for root memcg given
- * the current task's memcg won't help us in this case.
- *
- * Respecting the original socket's memcg is a better
- * decision in this case.
- */
- if (sk->sk_cgrp) {
- BUG_ON(mem_cgroup_is_root(sk->sk_cgrp->memcg));
- css_get(&sk->sk_cgrp->memcg->css);
- return;
- }
+ /* Socket cloning can throw us here with sk_cgrp already
+ * filled. It won't however, necessarily happen from
+ * process context. So the test for root memcg given
+ * the current task's memcg won't help us in this case.
+ *
+ * Respecting the original socket's memcg is a better
+ * decision in this case.
+ */
+ if (sk->sk_cgrp) {
+ BUG_ON(mem_cgroup_is_root(sk->sk_cgrp->memcg));
+ css_get(&sk->sk_cgrp->memcg->css);
+ return;
+ }
- rcu_read_lock();
- memcg = mem_cgroup_from_task(current);
- cg_proto = sk->sk_prot->proto_cgroup(memcg);
- if (cg_proto && test_bit(MEMCG_SOCK_ACTIVE, &cg_proto->flags) &&
- css_tryget_online(&memcg->css)) {
- sk->sk_cgrp = cg_proto;
- }
- rcu_read_unlock();
+ rcu_read_lock();
+ memcg = mem_cgroup_from_task(current);
+ cg_proto = sk->sk_prot->proto_cgroup(memcg);
+ if (cg_proto && test_bit(MEMCG_SOCK_ACTIVE, &cg_proto->flags) &&
+ css_tryget_online(&memcg->css)) {
+ sk->sk_cgrp = cg_proto;
}
+ rcu_read_unlock();
}
EXPORT_SYMBOL(sock_update_memcg);
void sock_release_memcg(struct sock *sk)
{
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
- struct mem_cgroup *memcg;
- WARN_ON(!sk->sk_cgrp->memcg);
- memcg = sk->sk_cgrp->memcg;
- css_put(&sk->sk_cgrp->memcg->css);
- }
+ WARN_ON(!sk->sk_cgrp->memcg);
+ css_put(&sk->sk_cgrp->memcg->css);
}
struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg)
diff --git a/net/core/sock.c b/net/core/sock.c
index 1e4dd54..04e54bc 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1488,12 +1488,6 @@ void sk_free(struct sock *sk)
}
EXPORT_SYMBOL(sk_free);
-static void sk_update_clone(const struct sock *sk, struct sock *newsk)
-{
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
- sock_update_memcg(newsk);
-}
-
/**
* sk_clone_lock - clone a socket, and lock its clone
* @sk: the socket to clone
@@ -1589,7 +1583,8 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
sk_set_socket(newsk, NULL);
newsk->sk_wq = NULL;
- sk_update_clone(sk, newsk);
+ if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+ sock_update_memcg(newsk);
if (newsk->sk_prot->sockets_allocated)
sk_sockets_allocated_inc(newsk);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0cfa7c0..1b324606 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -422,7 +422,8 @@ void tcp_init_sock(struct sock *sk)
sk->sk_rcvbuf = sysctl_tcp_rmem[1];
local_bh_disable();
- sock_update_memcg(sk);
+ if (mem_cgroup_sockets_enabled)
+ sock_update_memcg(sk);
sk_sockets_allocated_inc(sk);
local_bh_enable();
}
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 950e28c..317a246 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1812,7 +1812,9 @@ void tcp_v4_destroy_sock(struct sock *sk)
tcp_saved_syn_free(tp);
sk_sockets_allocated_dec(sk);
- sock_release_memcg(sk);
+
+ if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+ sock_release_memcg(sk);
}
EXPORT_SYMBOL(tcp_v4_destroy_sock);
--
2.6.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 61+ messages in thread[parent not found: <1447371693-25143-6-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>]
* Re: [PATCH 05/14] net: tcp_memcontrol: protect all tcp_memcontrol calls by jump-label
[not found] ` <1447371693-25143-6-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
@ 2015-11-13 16:01 ` David Miller
0 siblings, 0 replies; 61+ messages in thread
From: David Miller @ 2015-11-13 16:01 UTC (permalink / raw)
To: hannes-druUgvl0LCNAfugRpC6u6w
Cc: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
vdavydov-5HdwGun5lf+gSpxsJD1C4w, tj-DgEjT+Ai2ygdnm+yROfE0A,
mhocko-AlSwsSmVLrQ, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
From: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Date: Thu, 12 Nov 2015 18:41:24 -0500
> Move the jump-label from sock_update_memcg() and sock_release_memcg()
> to the callsite, and so eliminate those function calls when socket
> accounting is not enabled.
>
> This also eliminates the need for dummy functions because the calls
> will be optimized away if the Kconfig options are not enabled.
>
> Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Acked-by: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 05/14] net: tcp_memcontrol: protect all tcp_memcontrol calls by jump-label
2015-11-12 23:41 ` [PATCH 05/14] net: tcp_memcontrol: protect all tcp_memcontrol calls by jump-label Johannes Weiner
[not found] ` <1447371693-25143-6-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
@ 2015-11-14 16:33 ` Vladimir Davydov
2015-11-16 17:52 ` Johannes Weiner
1 sibling, 1 reply; 61+ messages in thread
From: Vladimir Davydov @ 2015-11-14 16:33 UTC (permalink / raw)
To: Johannes Weiner
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Thu, Nov 12, 2015 at 06:41:24PM -0500, Johannes Weiner wrote:
> Move the jump-label from sock_update_memcg() and sock_release_memcg()
> to the callsite, and so eliminate those function calls when socket
> accounting is not enabled.
I don't believe this patch's necessary, because these functions aren't
hot paths. Neither do I think it makes the code look better. Anyway,
it's rather a matter of personal preference, and the patch looks correct
to me, so
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
>
> This also eliminates the need for dummy functions because the calls
> will be optimized away if the Kconfig options are not enabled.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [PATCH 05/14] net: tcp_memcontrol: protect all tcp_memcontrol calls by jump-label
2015-11-14 16:33 ` Vladimir Davydov
@ 2015-11-16 17:52 ` Johannes Weiner
0 siblings, 0 replies; 61+ messages in thread
From: Johannes Weiner @ 2015-11-16 17:52 UTC (permalink / raw)
To: Vladimir Davydov
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko,
netdev-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
On Sat, Nov 14, 2015 at 07:33:10PM +0300, Vladimir Davydov wrote:
> On Thu, Nov 12, 2015 at 06:41:24PM -0500, Johannes Weiner wrote:
> > Move the jump-label from sock_update_memcg() and sock_release_memcg()
> > to the callsite, and so eliminate those function calls when socket
> > accounting is not enabled.
>
> I don't believe this patch's necessary, because these functions aren't
> hot paths. Neither do I think it makes the code look better. Anyway,
> it's rather a matter of personal preference, and the patch looks correct
> to me, so
Yeah, it's not a hotpath. What I like primarily about this patch I
guess is that makes it more consistent how memcg entry is gated. You
don't have to remember which functions have the checks in the caller
and which have it in the function themselves. And I really hate the
static inline void foo(void)
{
if (foo_enabled())
__foo()
}
in the headerfile pattern.
> Reviewed-by: Vladimir Davydov <vdavydov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
Thanks, I appreciate you acking it despite your personal preference!
^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 06/14] net: tcp_memcontrol: remove dead per-memcg count of allocated sockets
2015-11-12 23:41 [PATCH 00/14] mm: memcontrol: account socket memory in unified hierarchy Johannes Weiner
` (4 preceding siblings ...)
2015-11-12 23:41 ` [PATCH 05/14] net: tcp_memcontrol: protect all tcp_memcontrol calls by jump-label Johannes Weiner
@ 2015-11-12 23:41 ` Johannes Weiner
2015-11-13 16:01 ` David Miller
2015-11-20 9:48 ` Vladimir Davydov
2015-11-12 23:41 ` [PATCH 07/14] net: tcp_memcontrol: simplify the per-memcg limit access Johannes Weiner
` (7 subsequent siblings)
13 siblings, 2 replies; 61+ messages in thread
From: Johannes Weiner @ 2015-11-12 23:41 UTC (permalink / raw)
To: David Miller, Andrew Morton
Cc: Vladimir Davydov, Tejun Heo, Michal Hocko, netdev, linux-mm,
cgroups, linux-kernel, kernel-team
The number of allocated sockets is used for calculations in the soft
limit phase, where packets are accepted but the socket is under memory
pressure. Since there is no soft limit phase in tcp_memcontrol, and
memory pressure is only entered when packets are already dropped, this
is actually dead code. Remove it.
As this is the last user of parent_cg_proto(), remove that too.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
include/linux/memcontrol.h | 1 -
include/net/sock.h | 39 +++------------------------------------
net/ipv4/tcp_memcontrol.c | 3 ---
3 files changed, 3 insertions(+), 40 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e930803..185df8c 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -97,7 +97,6 @@ enum cg_proto_flags {
struct cg_proto {
struct page_counter memory_allocated; /* Current allocated memory. */
- struct percpu_counter sockets_allocated; /* Current number of sockets. */
int memory_pressure;
long sysctl_mem[3];
unsigned long flags;
diff --git a/include/net/sock.h b/include/net/sock.h
index 6fc9147..ed141b3 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1095,19 +1095,9 @@ static inline void sk_refcnt_debug_release(const struct sock *sk)
#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_NET)
extern struct static_key memcg_socket_limit_enabled;
-static inline struct cg_proto *parent_cg_proto(struct proto *proto,
- struct cg_proto *cg_proto)
-{
- return proto->proto_cgroup(parent_mem_cgroup(cg_proto->memcg));
-}
#define mem_cgroup_sockets_enabled static_key_false(&memcg_socket_limit_enabled)
#else
#define mem_cgroup_sockets_enabled 0
-static inline struct cg_proto *parent_cg_proto(struct proto *proto,
- struct cg_proto *cg_proto)
-{
- return NULL;
-}
#endif
static inline bool sk_stream_memory_free(const struct sock *sk)
@@ -1233,41 +1223,18 @@ sk_memory_allocated_sub(struct sock *sk, int amt)
static inline void sk_sockets_allocated_dec(struct sock *sk)
{
- struct proto *prot = sk->sk_prot;
-
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
- struct cg_proto *cg_proto = sk->sk_cgrp;
-
- for (; cg_proto; cg_proto = parent_cg_proto(prot, cg_proto))
- percpu_counter_dec(&cg_proto->sockets_allocated);
- }
-
- percpu_counter_dec(prot->sockets_allocated);
+ percpu_counter_dec(sk->sk_prot->sockets_allocated);
}
static inline void sk_sockets_allocated_inc(struct sock *sk)
{
- struct proto *prot = sk->sk_prot;
-
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
- struct cg_proto *cg_proto = sk->sk_cgrp;
-
- for (; cg_proto; cg_proto = parent_cg_proto(prot, cg_proto))
- percpu_counter_inc(&cg_proto->sockets_allocated);
- }
-
- percpu_counter_inc(prot->sockets_allocated);
+ percpu_counter_inc(sk->sk_prot->sockets_allocated);
}
static inline int
sk_sockets_allocated_read_positive(struct sock *sk)
{
- struct proto *prot = sk->sk_prot;
-
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
- return percpu_counter_read_positive(&sk->sk_cgrp->sockets_allocated);
-
- return percpu_counter_read_positive(prot->sockets_allocated);
+ return percpu_counter_read_positive(sk->sk_prot->sockets_allocated);
}
static inline int
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index 2379c1b..8965638 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -32,7 +32,6 @@ int tcp_init_cgroup(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
counter_parent = &parent_cg->memory_allocated;
page_counter_init(&cg_proto->memory_allocated, counter_parent);
- percpu_counter_init(&cg_proto->sockets_allocated, 0, GFP_KERNEL);
return 0;
}
@@ -46,8 +45,6 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg)
if (!cg_proto)
return;
- percpu_counter_destroy(&cg_proto->sockets_allocated);
-
if (test_bit(MEMCG_SOCK_ACTIVATED, &cg_proto->flags))
static_key_slow_dec(&memcg_socket_limit_enabled);
--
2.6.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: [PATCH 06/14] net: tcp_memcontrol: remove dead per-memcg count of allocated sockets
2015-11-12 23:41 ` [PATCH 06/14] net: tcp_memcontrol: remove dead per-memcg count of allocated sockets Johannes Weiner
@ 2015-11-13 16:01 ` David Miller
2015-11-20 9:48 ` Vladimir Davydov
1 sibling, 0 replies; 61+ messages in thread
From: David Miller @ 2015-11-13 16:01 UTC (permalink / raw)
To: hannes
Cc: akpm, vdavydov, tj, mhocko, netdev, linux-mm, cgroups,
linux-kernel, kernel-team
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Thu, 12 Nov 2015 18:41:25 -0500
> The number of allocated sockets is used for calculations in the soft
> limit phase, where packets are accepted but the socket is under memory
> pressure. Since there is no soft limit phase in tcp_memcontrol, and
> memory pressure is only entered when packets are already dropped, this
> is actually dead code. Remove it.
>
> As this is the last user of parent_cg_proto(), remove that too.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David S. Miller <davem@davemloft.net>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 06/14] net: tcp_memcontrol: remove dead per-memcg count of allocated sockets
2015-11-12 23:41 ` [PATCH 06/14] net: tcp_memcontrol: remove dead per-memcg count of allocated sockets Johannes Weiner
2015-11-13 16:01 ` David Miller
@ 2015-11-20 9:48 ` Vladimir Davydov
1 sibling, 0 replies; 61+ messages in thread
From: Vladimir Davydov @ 2015-11-20 9:48 UTC (permalink / raw)
To: Johannes Weiner
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Thu, Nov 12, 2015 at 06:41:25PM -0500, Johannes Weiner wrote:
> The number of allocated sockets is used for calculations in the soft
> limit phase, where packets are accepted but the socket is under memory
> pressure. Since there is no soft limit phase in tcp_memcontrol, and
> memory pressure is only entered when packets are already dropped, this
> is actually dead code. Remove it.
Actually, we can get into the soft limit phase due to the global limit
(tcp_memory_pressure is set), but then using per-memcg sockets_allocated
counter is just wrong.
>
> As this is the last user of parent_cg_proto(), remove that too.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 07/14] net: tcp_memcontrol: simplify the per-memcg limit access
2015-11-12 23:41 [PATCH 00/14] mm: memcontrol: account socket memory in unified hierarchy Johannes Weiner
` (5 preceding siblings ...)
2015-11-12 23:41 ` [PATCH 06/14] net: tcp_memcontrol: remove dead per-memcg count of allocated sockets Johannes Weiner
@ 2015-11-12 23:41 ` Johannes Weiner
2015-11-20 9:51 ` Vladimir Davydov
2015-11-12 23:41 ` [PATCH 08/14] net: tcp_memcontrol: sanitize tcp memory accounting callbacks Johannes Weiner
` (6 subsequent siblings)
13 siblings, 1 reply; 61+ messages in thread
From: Johannes Weiner @ 2015-11-12 23:41 UTC (permalink / raw)
To: David Miller, Andrew Morton
Cc: Vladimir Davydov, Tejun Heo, Michal Hocko, netdev, linux-mm,
cgroups, linux-kernel, kernel-team
tcp_memcontrol replicates the global sysctl_mem limit array per
cgroup, but it only ever sets these entries to the value of the
memory_allocated page_counter limit. Use the latter directly.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
include/linux/memcontrol.h | 1 -
include/net/sock.h | 8 +++++---
net/ipv4/tcp_memcontrol.c | 8 --------
3 files changed, 5 insertions(+), 12 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 185df8c..96ca3d3 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -98,7 +98,6 @@ enum cg_proto_flags {
struct cg_proto {
struct page_counter memory_allocated; /* Current allocated memory. */
int memory_pressure;
- long sysctl_mem[3];
unsigned long flags;
/*
* memcg field is used to find which memcg we belong directly
diff --git a/include/net/sock.h b/include/net/sock.h
index ed141b3..2eefc99 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1159,10 +1159,12 @@ static inline void sk_enter_memory_pressure(struct sock *sk)
static inline long sk_prot_mem_limits(const struct sock *sk, int index)
{
- long *prot = sk->sk_prot->sysctl_mem;
+ long limit = sk->sk_prot->sysctl_mem[index];
+
if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
- prot = sk->sk_cgrp->sysctl_mem;
- return prot[index];
+ limit = min_t(long, limit, sk->sk_cgrp->memory_allocated.limit);
+
+ return limit;
}
static inline void memcg_memory_allocated_add(struct cg_proto *prot,
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index 8965638..c383e68 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -21,9 +21,6 @@ int tcp_init_cgroup(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
if (!cg_proto)
return 0;
- cg_proto->sysctl_mem[0] = sysctl_tcp_mem[0];
- cg_proto->sysctl_mem[1] = sysctl_tcp_mem[1];
- cg_proto->sysctl_mem[2] = sysctl_tcp_mem[2];
cg_proto->memory_pressure = 0;
cg_proto->memcg = memcg;
@@ -54,7 +51,6 @@ EXPORT_SYMBOL(tcp_destroy_cgroup);
static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
{
struct cg_proto *cg_proto;
- int i;
int ret;
cg_proto = tcp_prot.proto_cgroup(memcg);
@@ -65,10 +61,6 @@ static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
if (ret)
return ret;
- for (i = 0; i < 3; i++)
- cg_proto->sysctl_mem[i] = min_t(long, nr_pages,
- sysctl_tcp_mem[i]);
-
if (nr_pages == PAGE_COUNTER_MAX)
clear_bit(MEMCG_SOCK_ACTIVE, &cg_proto->flags);
else {
--
2.6.2
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: [PATCH 07/14] net: tcp_memcontrol: simplify the per-memcg limit access
2015-11-12 23:41 ` [PATCH 07/14] net: tcp_memcontrol: simplify the per-memcg limit access Johannes Weiner
@ 2015-11-20 9:51 ` Vladimir Davydov
0 siblings, 0 replies; 61+ messages in thread
From: Vladimir Davydov @ 2015-11-20 9:51 UTC (permalink / raw)
To: Johannes Weiner
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Thu, Nov 12, 2015 at 06:41:26PM -0500, Johannes Weiner wrote:
> tcp_memcontrol replicates the global sysctl_mem limit array per
> cgroup, but it only ever sets these entries to the value of the
> memory_allocated page_counter limit. Use the latter directly.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 08/14] net: tcp_memcontrol: sanitize tcp memory accounting callbacks
2015-11-12 23:41 [PATCH 00/14] mm: memcontrol: account socket memory in unified hierarchy Johannes Weiner
` (6 preceding siblings ...)
2015-11-12 23:41 ` [PATCH 07/14] net: tcp_memcontrol: simplify the per-memcg limit access Johannes Weiner
@ 2015-11-12 23:41 ` Johannes Weiner
2015-11-13 4:53 ` Eric Dumazet
2015-11-20 10:58 ` Vladimir Davydov
2015-11-12 23:41 ` [PATCH 09/14] net: tcp_memcontrol: simplify linkage between socket and page counter Johannes Weiner
` (5 subsequent siblings)
13 siblings, 2 replies; 61+ messages in thread
From: Johannes Weiner @ 2015-11-12 23:41 UTC (permalink / raw)
To: David Miller, Andrew Morton
Cc: Vladimir Davydov, Tejun Heo, Michal Hocko, netdev, linux-mm,
cgroups, linux-kernel, kernel-team
There won't be a tcp control soft limit, so integrating the memcg code
into the global skmem limiting scheme complicates things
unnecessarily. Replace this with simple and clear charge and uncharge
calls--hidden behind a jump label--to account skb memory.
Note that this is not purely aesthetic: as a result of shoehorning the
per-memcg code into the same memory accounting functions that handle
the global level, the old code would compare the per-memcg consumption
against the smaller of the per-memcg limit and the global limit. This
allowed the total consumption of multiple sockets to exceed the global
limit, as long as the individual sockets stayed within bounds. After
this change, the code will always compare the per-memcg consumption to
the per-memcg limit, and the global consumption to the global limit,
and thus close this loophole.
Without a soft limit, the per-memcg memory pressure state in sockets
is generally questionable. However, we did it until now, so we
continue to enter it when the hard limit is hit, and packets are
dropped, to let other sockets in the cgroup know that they shouldn't
grow their transmit windows, either. However, keep it simple in the
new callback model and leave memory pressure lazily when the next
packet is accepted (as opposed to doing it synchroneously when packets
are processed). When packets are dropped, network performance will
already be in the toilet, so that should be a reasonable trade-off.
As described above, consumption is now checked on the per-memcg level
and the global level separately. Likewise, memory pressure states are
maintained on both the per-memcg level and the global level, and a
socket is considered under pressure when either level asserts as much.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
include/linux/memcontrol.h | 12 ++++-----
include/net/sock.h | 63 ++++++----------------------------------------
include/net/tcp.h | 5 ++--
mm/memcontrol.c | 32 +++++++++++++++++++++++
net/core/sock.c | 26 +++++++++++--------
net/ipv4/tcp_output.c | 7 ++++--
6 files changed, 69 insertions(+), 76 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 96ca3d3..906dfff 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -676,12 +676,6 @@ void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx)
}
#endif /* CONFIG_MEMCG */
-enum {
- UNDER_LIMIT,
- SOFT_LIMIT,
- OVER_LIMIT,
-};
-
#ifdef CONFIG_CGROUP_WRITEBACK
struct list_head *mem_cgroup_cgwb_list(struct mem_cgroup *memcg);
@@ -711,6 +705,12 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
struct sock;
void sock_update_memcg(struct sock *sk);
void sock_release_memcg(struct sock *sk);
+bool mem_cgroup_charge_skmem(struct cg_proto *proto, unsigned int nr_pages);
+void mem_cgroup_uncharge_skmem(struct cg_proto *proto, unsigned int nr_pages);
+static inline bool mem_cgroup_under_socket_pressure(struct cg_proto *proto)
+{
+ return proto->memory_pressure;
+}
#endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
#ifdef CONFIG_MEMCG_KMEM
diff --git a/include/net/sock.h b/include/net/sock.h
index 2eefc99..8cc7613 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1126,8 +1126,8 @@ static inline bool sk_under_memory_pressure(const struct sock *sk)
if (!sk->sk_prot->memory_pressure)
return false;
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
- return !!sk->sk_cgrp->memory_pressure;
+ if (mem_cgroup_sockets_enabled && sk->sk_cgrp &&
+ mem_cgroup_under_socket_pressure(sk->sk_cgrp))
return !!*sk->sk_prot->memory_pressure;
}
@@ -1141,9 +1141,6 @@ static inline void sk_leave_memory_pressure(struct sock *sk)
if (*memory_pressure)
*memory_pressure = 0;
-
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
- sk->sk_cgrp->memory_pressure = 0;
}
static inline void sk_enter_memory_pressure(struct sock *sk)
@@ -1151,76 +1148,30 @@ static inline void sk_enter_memory_pressure(struct sock *sk)
if (!sk->sk_prot->enter_memory_pressure)
return;
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
- sk->sk_cgrp->memory_pressure = 1;
-
sk->sk_prot->enter_memory_pressure(sk);
}
static inline long sk_prot_mem_limits(const struct sock *sk, int index)
{
- long limit = sk->sk_prot->sysctl_mem[index];
-
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
- limit = min_t(long, limit, sk->sk_cgrp->memory_allocated.limit);
-
- return limit;
-}
-
-static inline void memcg_memory_allocated_add(struct cg_proto *prot,
- unsigned long amt,
- int *parent_status)
-{
- struct page_counter *counter;
-
- if (page_counter_try_charge(&prot->memory_allocated, amt, &counter))
- return;
-
- page_counter_charge(&prot->memory_allocated, amt);
- *parent_status = OVER_LIMIT;
-}
-
-static inline void memcg_memory_allocated_sub(struct cg_proto *prot,
- unsigned long amt)
-{
- page_counter_uncharge(&prot->memory_allocated, amt);
+ return sk->sk_prot->sysctl_mem[index];
}
static inline long
sk_memory_allocated(const struct sock *sk)
{
- struct proto *prot = sk->sk_prot;
-
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
- return page_counter_read(&sk->sk_cgrp->memory_allocated);
-
- return atomic_long_read(prot->memory_allocated);
+ return atomic_long_read(sk->sk_prot->memory_allocated);
}
static inline long
-sk_memory_allocated_add(struct sock *sk, int amt, int *parent_status)
+sk_memory_allocated_add(struct sock *sk, int amt)
{
- struct proto *prot = sk->sk_prot;
-
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
- memcg_memory_allocated_add(sk->sk_cgrp, amt, parent_status);
- /* update the root cgroup regardless */
- atomic_long_add_return(amt, prot->memory_allocated);
- return page_counter_read(&sk->sk_cgrp->memory_allocated);
- }
-
- return atomic_long_add_return(amt, prot->memory_allocated);
+ return atomic_long_add_return(amt, sk->sk_prot->memory_allocated);
}
static inline void
sk_memory_allocated_sub(struct sock *sk, int amt)
{
- struct proto *prot = sk->sk_prot;
-
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
- memcg_memory_allocated_sub(sk->sk_cgrp, amt);
-
- atomic_long_sub(amt, prot->memory_allocated);
+ atomic_long_sub(amt, sk->sk_prot->memory_allocated);
}
static inline void sk_sockets_allocated_dec(struct sock *sk)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index f80e74c..04517d6 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -292,8 +292,9 @@ extern int tcp_memory_pressure;
/* optimized version of sk_under_memory_pressure() for TCP sockets */
static inline bool tcp_under_memory_pressure(const struct sock *sk)
{
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
- return !!sk->sk_cgrp->memory_pressure;
+ if (mem_cgroup_sockets_enabled && sk->sk_cgrp &&
+ mem_cgroup_under_socket_pressure(sk->sk_cgrp))
+ return true;
return tcp_memory_pressure;
}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 57f4539..3462a52 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -338,6 +338,38 @@ struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg)
}
EXPORT_SYMBOL(tcp_proto_cgroup);
+/**
+ * mem_cgroup_charge_skmem - charge socket memory
+ * @proto: proto to charge
+ * @nr_pages: number of pages to charge
+ *
+ * Charges @nr_pages to @proto. Returns %true if the charge fit within
+ * @proto's configured limit, %false if the charge had to be forced.
+ */
+bool mem_cgroup_charge_skmem(struct cg_proto *proto, unsigned int nr_pages)
+{
+ struct page_counter *counter;
+
+ if (page_counter_try_charge(&proto->memory_allocated,
+ nr_pages, &counter)) {
+ proto->memory_pressure = 0;
+ return true;
+ }
+ page_counter_charge(&proto->memory_allocated, nr_pages);
+ proto->memory_pressure = 1;
+ return false;
+}
+
+/**
+ * mem_cgroup_uncharge_skmem - uncharge socket memory
+ * @proto - proto to uncharge
+ * @nr_pages - number of pages to uncharge
+ */
+void mem_cgroup_uncharge_skmem(struct cg_proto *proto, unsigned int nr_pages)
+{
+ page_counter_uncharge(&proto->memory_allocated, nr_pages);
+}
+
#endif
#ifdef CONFIG_MEMCG_KMEM
diff --git a/net/core/sock.c b/net/core/sock.c
index 04e54bc..5b1b96f 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2066,27 +2066,27 @@ int __sk_mem_schedule(struct sock *sk, int size, int kind)
struct proto *prot = sk->sk_prot;
int amt = sk_mem_pages(size);
long allocated;
- int parent_status = UNDER_LIMIT;
sk->sk_forward_alloc += amt * SK_MEM_QUANTUM;
- allocated = sk_memory_allocated_add(sk, amt, &parent_status);
+ allocated = sk_memory_allocated_add(sk, amt);
+
+ if (mem_cgroup_sockets_enabled && sk->sk_cgrp &&
+ !mem_cgroup_charge_skmem(sk->sk_cgrp, amt))
+ goto suppress_allocation;
/* Under limit. */
- if (parent_status == UNDER_LIMIT &&
- allocated <= sk_prot_mem_limits(sk, 0)) {
+ if (allocated <= sk_prot_mem_limits(sk, 0)) {
sk_leave_memory_pressure(sk);
return 1;
}
- /* Under pressure. (we or our parents) */
- if ((parent_status > SOFT_LIMIT) ||
- allocated > sk_prot_mem_limits(sk, 1))
+ /* Under pressure. */
+ if (allocated > sk_prot_mem_limits(sk, 1))
sk_enter_memory_pressure(sk);
- /* Over hard limit (we or our parents) */
- if ((parent_status == OVER_LIMIT) ||
- (allocated > sk_prot_mem_limits(sk, 2)))
+ /* Over hard limit. */
+ if (allocated > sk_prot_mem_limits(sk, 2))
goto suppress_allocation;
/* guarantee minimum buffer size under pressure */
@@ -2135,6 +2135,9 @@ suppress_allocation:
sk_memory_allocated_sub(sk, amt);
+ if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+ mem_cgroup_uncharge_skmem(sk->sk_cgrp, amt);
+
return 0;
}
EXPORT_SYMBOL(__sk_mem_schedule);
@@ -2150,6 +2153,9 @@ void __sk_mem_reclaim(struct sock *sk, int amount)
sk_memory_allocated_sub(sk, amount);
sk->sk_forward_alloc -= amount << SK_MEM_QUANTUM_SHIFT;
+ if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+ mem_cgroup_uncharge_skmem(sk->sk_cgrp, amount);
+
if (sk_under_memory_pressure(sk) &&
(sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0)))
sk_leave_memory_pressure(sk);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index cb7ca56..7aa168a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2813,13 +2813,16 @@ begin_fwd:
*/
void sk_forced_mem_schedule(struct sock *sk, int size)
{
- int amt, status;
+ int amt;
if (size <= sk->sk_forward_alloc)
return;
amt = sk_mem_pages(size);
sk->sk_forward_alloc += amt * SK_MEM_QUANTUM;
- sk_memory_allocated_add(sk, amt, &status);
+ sk_memory_allocated_add(sk, amt);
+
+ if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+ mem_cgroup_charge_skmem(sk->sk_cgrp, amt);
}
/* Send a FIN. The caller locks the socket for us.
--
2.6.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: [PATCH 08/14] net: tcp_memcontrol: sanitize tcp memory accounting callbacks
2015-11-12 23:41 ` [PATCH 08/14] net: tcp_memcontrol: sanitize tcp memory accounting callbacks Johannes Weiner
@ 2015-11-13 4:53 ` Eric Dumazet
[not found] ` <1447390418.22599.34.camel-XN9IlZ5yJG9HTL0Zs8A6p/gx64E7kk8eUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>
2015-11-20 10:58 ` Vladimir Davydov
1 sibling, 1 reply; 61+ messages in thread
From: Eric Dumazet @ 2015-11-13 4:53 UTC (permalink / raw)
To: Johannes Weiner
Cc: David Miller, Andrew Morton, Vladimir Davydov, Tejun Heo,
Michal Hocko, netdev, linux-mm, cgroups, linux-kernel,
kernel-team
On Thu, 2015-11-12 at 18:41 -0500, Johannes Weiner wrote:
> @@ -711,6 +705,12 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
> struct sock;
> void sock_update_memcg(struct sock *sk);
> void sock_release_memcg(struct sock *sk);
> +bool mem_cgroup_charge_skmem(struct cg_proto *proto, unsigned int nr_pages);
> +void mem_cgroup_uncharge_skmem(struct cg_proto *proto, unsigned int nr_pages);
> +static inline bool mem_cgroup_under_socket_pressure(struct cg_proto *proto)
> +{
> + return proto->memory_pressure;
> +}
> #endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
>
> #ifdef CONFIG_MEMCG_KMEM
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 2eefc99..8cc7613 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1126,8 +1126,8 @@ static inline bool sk_under_memory_pressure(const struct sock *sk)
> if (!sk->sk_prot->memory_pressure)
> return false;
>
> - if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
> - return !!sk->sk_cgrp->memory_pressure;
> + if (mem_cgroup_sockets_enabled && sk->sk_cgrp &&
> + mem_cgroup_under_socket_pressure(sk->sk_cgrp))
>
> return !!*sk->sk_prot->memory_pressure;
> }
This looks wrong ?
if (A && B && C)
return !!*sk->sk_prot->memory_pressure;
<compiler should eventually barf,
as this function should not return void>
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [PATCH 08/14] net: tcp_memcontrol: sanitize tcp memory accounting callbacks
2015-11-12 23:41 ` [PATCH 08/14] net: tcp_memcontrol: sanitize tcp memory accounting callbacks Johannes Weiner
2015-11-13 4:53 ` Eric Dumazet
@ 2015-11-20 10:58 ` Vladimir Davydov
2015-11-20 18:42 ` Johannes Weiner
1 sibling, 1 reply; 61+ messages in thread
From: Vladimir Davydov @ 2015-11-20 10:58 UTC (permalink / raw)
To: Johannes Weiner
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Thu, Nov 12, 2015 at 06:41:27PM -0500, Johannes Weiner wrote:
> There won't be a tcp control soft limit, so integrating the memcg code
> into the global skmem limiting scheme complicates things
> unnecessarily. Replace this with simple and clear charge and uncharge
> calls--hidden behind a jump label--to account skb memory.
>
> Note that this is not purely aesthetic: as a result of shoehorning the
> per-memcg code into the same memory accounting functions that handle
> the global level, the old code would compare the per-memcg consumption
> against the smaller of the per-memcg limit and the global limit. This
> allowed the total consumption of multiple sockets to exceed the global
> limit, as long as the individual sockets stayed within bounds. After
> this change, the code will always compare the per-memcg consumption to
> the per-memcg limit, and the global consumption to the global limit,
> and thus close this loophole.
>
> Without a soft limit, the per-memcg memory pressure state in sockets
> is generally questionable. However, we did it until now, so we
> continue to enter it when the hard limit is hit, and packets are
> dropped, to let other sockets in the cgroup know that they shouldn't
> grow their transmit windows, either. However, keep it simple in the
> new callback model and leave memory pressure lazily when the next
> packet is accepted (as opposed to doing it synchroneously when packets
> are processed). When packets are dropped, network performance will
> already be in the toilet, so that should be a reasonable trade-off.
>
> As described above, consumption is now checked on the per-memcg level
> and the global level separately. Likewise, memory pressure states are
> maintained on both the per-memcg level and the global level, and a
> socket is considered under pressure when either level asserts as much.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
It leaves the legacy functionality intact, while making the code look
much better.
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 08/14] net: tcp_memcontrol: sanitize tcp memory accounting callbacks
2015-11-20 10:58 ` Vladimir Davydov
@ 2015-11-20 18:42 ` Johannes Weiner
0 siblings, 0 replies; 61+ messages in thread
From: Johannes Weiner @ 2015-11-20 18:42 UTC (permalink / raw)
To: Vladimir Davydov
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko,
netdev-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
On Fri, Nov 20, 2015 at 01:58:57PM +0300, Vladimir Davydov wrote:
> On Thu, Nov 12, 2015 at 06:41:27PM -0500, Johannes Weiner wrote:
> > There won't be a tcp control soft limit, so integrating the memcg code
> > into the global skmem limiting scheme complicates things
> > unnecessarily. Replace this with simple and clear charge and uncharge
> > calls--hidden behind a jump label--to account skb memory.
> >
> > Note that this is not purely aesthetic: as a result of shoehorning the
> > per-memcg code into the same memory accounting functions that handle
> > the global level, the old code would compare the per-memcg consumption
> > against the smaller of the per-memcg limit and the global limit. This
> > allowed the total consumption of multiple sockets to exceed the global
> > limit, as long as the individual sockets stayed within bounds. After
> > this change, the code will always compare the per-memcg consumption to
> > the per-memcg limit, and the global consumption to the global limit,
> > and thus close this loophole.
> >
> > Without a soft limit, the per-memcg memory pressure state in sockets
> > is generally questionable. However, we did it until now, so we
> > continue to enter it when the hard limit is hit, and packets are
> > dropped, to let other sockets in the cgroup know that they shouldn't
> > grow their transmit windows, either. However, keep it simple in the
> > new callback model and leave memory pressure lazily when the next
> > packet is accepted (as opposed to doing it synchroneously when packets
> > are processed). When packets are dropped, network performance will
> > already be in the toilet, so that should be a reasonable trade-off.
> >
> > As described above, consumption is now checked on the per-memcg level
> > and the global level separately. Likewise, memory pressure states are
> > maintained on both the per-memcg level and the global level, and a
> > socket is considered under pressure when either level asserts as much.
> >
> > Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
>
> It leaves the legacy functionality intact, while making the code look
> much better.
>
> Reviewed-by: Vladimir Davydov <vdavydov-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
Thank you very much!
^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 09/14] net: tcp_memcontrol: simplify linkage between socket and page counter
2015-11-12 23:41 [PATCH 00/14] mm: memcontrol: account socket memory in unified hierarchy Johannes Weiner
` (7 preceding siblings ...)
2015-11-12 23:41 ` [PATCH 08/14] net: tcp_memcontrol: sanitize tcp memory accounting callbacks Johannes Weiner
@ 2015-11-12 23:41 ` Johannes Weiner
2015-11-20 12:42 ` Vladimir Davydov
2015-11-12 23:41 ` [PATCH 10/14] mm: memcontrol: generalize the socket accounting jump label Johannes Weiner
` (4 subsequent siblings)
13 siblings, 1 reply; 61+ messages in thread
From: Johannes Weiner @ 2015-11-12 23:41 UTC (permalink / raw)
To: David Miller, Andrew Morton
Cc: Vladimir Davydov, Tejun Heo, Michal Hocko, netdev, linux-mm,
cgroups, linux-kernel, kernel-team
There won't be any separate counters for socket memory consumed by
protocols other than TCP in the future. Remove the indirection and
link sockets directly to their owning memory cgroup.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
include/linux/memcontrol.h | 18 +++---------
include/net/sock.h | 37 ++++-------------------
include/net/tcp.h | 4 +--
include/net/tcp_memcontrol.h | 1 -
mm/memcontrol.c | 57 ++++++++++++++----------------------
net/core/sock.c | 52 +++++---------------------------
net/ipv4/tcp_ipv4.c | 7 +----
net/ipv4/tcp_memcontrol.c | 70 ++++++++++++++++++--------------------------
net/ipv4/tcp_output.c | 4 +--
net/ipv6/tcp_ipv6.c | 3 --
10 files changed, 71 insertions(+), 182 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 906dfff..1c71f27 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -99,16 +99,6 @@ struct cg_proto {
struct page_counter memory_allocated; /* Current allocated memory. */
int memory_pressure;
unsigned long flags;
- /*
- * memcg field is used to find which memcg we belong directly
- * Each memcg struct can hold more than one cg_proto, so container_of
- * won't really cut.
- *
- * The elegant solution would be having an inverse function to
- * proto_cgroup in struct proto, but that means polluting the structure
- * for everybody, instead of just for memcg users.
- */
- struct mem_cgroup *memcg;
};
#ifdef CONFIG_MEMCG
@@ -705,11 +695,11 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
struct sock;
void sock_update_memcg(struct sock *sk);
void sock_release_memcg(struct sock *sk);
-bool mem_cgroup_charge_skmem(struct cg_proto *proto, unsigned int nr_pages);
-void mem_cgroup_uncharge_skmem(struct cg_proto *proto, unsigned int nr_pages);
-static inline bool mem_cgroup_under_socket_pressure(struct cg_proto *proto)
+bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
+void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
+static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
{
- return proto->memory_pressure;
+ return memcg->tcp_mem.memory_pressure;
}
#endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
diff --git a/include/net/sock.h b/include/net/sock.h
index 8cc7613..b439dcc 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -69,22 +69,6 @@
#include <net/tcp_states.h>
#include <linux/net_tstamp.h>
-struct cgroup;
-struct cgroup_subsys;
-#ifdef CONFIG_NET
-int mem_cgroup_sockets_init(struct mem_cgroup *memcg, struct cgroup_subsys *ss);
-void mem_cgroup_sockets_destroy(struct mem_cgroup *memcg);
-#else
-static inline
-int mem_cgroup_sockets_init(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
-{
- return 0;
-}
-static inline
-void mem_cgroup_sockets_destroy(struct mem_cgroup *memcg)
-{
-}
-#endif
/*
* This structure really needs to be cleaned up.
* Most of it is for TCP, and not used by any of
@@ -310,7 +294,7 @@ struct cg_proto;
* @sk_security: used by security modules
* @sk_mark: generic packet mark
* @sk_classid: this socket's cgroup classid
- * @sk_cgrp: this socket's cgroup-specific proto data
+ * @sk_memcg: this socket's memory cgroup association
* @sk_write_pending: a write to stream socket waits to start
* @sk_state_change: callback to indicate change in the state of the sock
* @sk_data_ready: callback to indicate there is data to be processed
@@ -447,7 +431,7 @@ struct sock {
#ifdef CONFIG_CGROUP_NET_CLASSID
u32 sk_classid;
#endif
- struct cg_proto *sk_cgrp;
+ struct mem_cgroup *sk_memcg;
void (*sk_state_change)(struct sock *sk);
void (*sk_data_ready)(struct sock *sk);
void (*sk_write_space)(struct sock *sk);
@@ -1051,18 +1035,6 @@ struct proto {
#ifdef SOCK_REFCNT_DEBUG
atomic_t socks;
#endif
-#ifdef CONFIG_MEMCG_KMEM
- /*
- * cgroup specific init/deinit functions. Called once for all
- * protocols that implement it, from cgroups populate function.
- * This function has to setup any files the protocol want to
- * appear in the kmem cgroup filesystem.
- */
- int (*init_cgroup)(struct mem_cgroup *memcg,
- struct cgroup_subsys *ss);
- void (*destroy_cgroup)(struct mem_cgroup *memcg);
- struct cg_proto *(*proto_cgroup)(struct mem_cgroup *memcg);
-#endif
};
int proto_register(struct proto *prot, int alloc_slab);
@@ -1126,8 +1098,9 @@ static inline bool sk_under_memory_pressure(const struct sock *sk)
if (!sk->sk_prot->memory_pressure)
return false;
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp &&
- mem_cgroup_under_socket_pressure(sk->sk_cgrp))
+ if (mem_cgroup_sockets_enabled && sk->sk_memcg &&
+ mem_cgroup_under_socket_pressure(sk->sk_memcg))
+ return true;
return !!*sk->sk_prot->memory_pressure;
}
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 04517d6..c008535 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -292,8 +292,8 @@ extern int tcp_memory_pressure;
/* optimized version of sk_under_memory_pressure() for TCP sockets */
static inline bool tcp_under_memory_pressure(const struct sock *sk)
{
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp &&
- mem_cgroup_under_socket_pressure(sk->sk_cgrp))
+ if (mem_cgroup_sockets_enabled && sk->sk_memcg &&
+ mem_cgroup_under_socket_pressure(sk->sk_memcg))
return true;
return tcp_memory_pressure;
diff --git a/include/net/tcp_memcontrol.h b/include/net/tcp_memcontrol.h
index 05b94d9..3a17b16 100644
--- a/include/net/tcp_memcontrol.h
+++ b/include/net/tcp_memcontrol.h
@@ -1,7 +1,6 @@
#ifndef _TCP_MEMCG_H
#define _TCP_MEMCG_H
-struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg);
int tcp_init_cgroup(struct mem_cgroup *memcg, struct cgroup_subsys *ss);
void tcp_destroy_cgroup(struct mem_cgroup *memcg);
#endif /* _TCP_MEMCG_H */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3462a52..89b1d9e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -294,9 +294,6 @@ static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
void sock_update_memcg(struct sock *sk)
{
struct mem_cgroup *memcg;
- struct cg_proto *cg_proto;
-
- BUG_ON(!sk->sk_prot->proto_cgroup);
/* Socket cloning can throw us here with sk_cgrp already
* filled. It won't however, necessarily happen from
@@ -306,68 +303,58 @@ void sock_update_memcg(struct sock *sk)
* Respecting the original socket's memcg is a better
* decision in this case.
*/
- if (sk->sk_cgrp) {
- BUG_ON(mem_cgroup_is_root(sk->sk_cgrp->memcg));
- css_get(&sk->sk_cgrp->memcg->css);
+ if (sk->sk_memcg) {
+ BUG_ON(mem_cgroup_is_root(sk->sk_memcg));
+ css_get(&sk->sk_memcg->css);
return;
}
rcu_read_lock();
memcg = mem_cgroup_from_task(current);
- cg_proto = sk->sk_prot->proto_cgroup(memcg);
- if (cg_proto && test_bit(MEMCG_SOCK_ACTIVE, &cg_proto->flags) &&
- css_tryget_online(&memcg->css)) {
- sk->sk_cgrp = cg_proto;
- }
+ if (memcg != root_mem_cgroup &&
+ test_bit(MEMCG_SOCK_ACTIVE, &memcg->tcp_mem.flags) &&
+ css_tryget_online(&memcg->css))
+ sk->sk_memcg = memcg;
rcu_read_unlock();
}
EXPORT_SYMBOL(sock_update_memcg);
void sock_release_memcg(struct sock *sk)
{
- WARN_ON(!sk->sk_cgrp->memcg);
- css_put(&sk->sk_cgrp->memcg->css);
-}
-
-struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg)
-{
- if (!memcg || mem_cgroup_is_root(memcg))
- return NULL;
-
- return &memcg->tcp_mem;
+ WARN_ON(!sk->sk_memcg);
+ css_put(&sk->sk_memcg->css);
}
-EXPORT_SYMBOL(tcp_proto_cgroup);
/**
* mem_cgroup_charge_skmem - charge socket memory
- * @proto: proto to charge
+ * @memcg: memcg to charge
* @nr_pages: number of pages to charge
*
- * Charges @nr_pages to @proto. Returns %true if the charge fit within
- * @proto's configured limit, %false if the charge had to be forced.
+ * Charges @nr_pages to @memcg. Returns %true if the charge fit within
+ * @memcg's configured limit, %false if the charge had to be forced.
*/
-bool mem_cgroup_charge_skmem(struct cg_proto *proto, unsigned int nr_pages)
+bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
{
struct page_counter *counter;
- if (page_counter_try_charge(&proto->memory_allocated,
+ if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
nr_pages, &counter)) {
- proto->memory_pressure = 0;
+ memcg->tcp_mem.memory_pressure = 0;
return true;
}
- page_counter_charge(&proto->memory_allocated, nr_pages);
- proto->memory_pressure = 1;
+ page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
+ memcg->tcp_mem.memory_pressure = 1;
return false;
}
/**
* mem_cgroup_uncharge_skmem - uncharge socket memory
- * @proto - proto to uncharge
+ * @memcg - memcg to uncharge
* @nr_pages - number of pages to uncharge
*/
-void mem_cgroup_uncharge_skmem(struct cg_proto *proto, unsigned int nr_pages)
+void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
{
- page_counter_uncharge(&proto->memory_allocated, nr_pages);
+ page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
}
#endif
@@ -3623,7 +3610,7 @@ static int memcg_init_kmem(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
if (ret)
return ret;
- return mem_cgroup_sockets_init(memcg, ss);
+ return tcp_init_cgroup(memcg, ss);
}
static void memcg_deactivate_kmem(struct mem_cgroup *memcg)
@@ -3679,7 +3666,7 @@ static void memcg_destroy_kmem(struct mem_cgroup *memcg)
static_key_slow_dec(&memcg_kmem_enabled_key);
WARN_ON(page_counter_read(&memcg->kmem));
}
- mem_cgroup_sockets_destroy(memcg);
+ tcp_destroy_cgroup(memcg);
}
#else
static int memcg_init_kmem(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
diff --git a/net/core/sock.c b/net/core/sock.c
index 5b1b96f..6486b0d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -194,44 +194,6 @@ bool sk_net_capable(const struct sock *sk, int cap)
}
EXPORT_SYMBOL(sk_net_capable);
-
-#ifdef CONFIG_MEMCG_KMEM
-int mem_cgroup_sockets_init(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
-{
- struct proto *proto;
- int ret = 0;
-
- mutex_lock(&proto_list_mutex);
- list_for_each_entry(proto, &proto_list, node) {
- if (proto->init_cgroup) {
- ret = proto->init_cgroup(memcg, ss);
- if (ret)
- goto out;
- }
- }
-
- mutex_unlock(&proto_list_mutex);
- return ret;
-out:
- list_for_each_entry_continue_reverse(proto, &proto_list, node)
- if (proto->destroy_cgroup)
- proto->destroy_cgroup(memcg);
- mutex_unlock(&proto_list_mutex);
- return ret;
-}
-
-void mem_cgroup_sockets_destroy(struct mem_cgroup *memcg)
-{
- struct proto *proto;
-
- mutex_lock(&proto_list_mutex);
- list_for_each_entry_reverse(proto, &proto_list, node)
- if (proto->destroy_cgroup)
- proto->destroy_cgroup(memcg);
- mutex_unlock(&proto_list_mutex);
-}
-#endif
-
/*
* Each address family might have different locking rules, so we have
* one slock key per address family:
@@ -1583,7 +1545,7 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
sk_set_socket(newsk, NULL);
newsk->sk_wq = NULL;
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+ if (mem_cgroup_sockets_enabled && sk->sk_memcg)
sock_update_memcg(newsk);
if (newsk->sk_prot->sockets_allocated)
@@ -2071,8 +2033,8 @@ int __sk_mem_schedule(struct sock *sk, int size, int kind)
allocated = sk_memory_allocated_add(sk, amt);
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp &&
- !mem_cgroup_charge_skmem(sk->sk_cgrp, amt))
+ if (mem_cgroup_sockets_enabled && sk->sk_memcg &&
+ !mem_cgroup_charge_skmem(sk->sk_memcg, amt))
goto suppress_allocation;
/* Under limit. */
@@ -2135,8 +2097,8 @@ suppress_allocation:
sk_memory_allocated_sub(sk, amt);
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
- mem_cgroup_uncharge_skmem(sk->sk_cgrp, amt);
+ if (mem_cgroup_sockets_enabled && sk->sk_memcg)
+ mem_cgroup_uncharge_skmem(sk->sk_memcg, amt);
return 0;
}
@@ -2153,8 +2115,8 @@ void __sk_mem_reclaim(struct sock *sk, int amount)
sk_memory_allocated_sub(sk, amount);
sk->sk_forward_alloc -= amount << SK_MEM_QUANTUM_SHIFT;
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
- mem_cgroup_uncharge_skmem(sk->sk_cgrp, amount);
+ if (mem_cgroup_sockets_enabled && sk->sk_memcg)
+ mem_cgroup_uncharge_skmem(sk->sk_memcg, amount);
if (sk_under_memory_pressure(sk) &&
(sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0)))
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 317a246..045d3af 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1813,7 +1813,7 @@ void tcp_v4_destroy_sock(struct sock *sk)
sk_sockets_allocated_dec(sk);
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+ if (mem_cgroup_sockets_enabled && sk->sk_memcg)
sock_release_memcg(sk);
}
EXPORT_SYMBOL(tcp_v4_destroy_sock);
@@ -2336,11 +2336,6 @@ struct proto tcp_prot = {
.compat_setsockopt = compat_tcp_setsockopt,
.compat_getsockopt = compat_tcp_getsockopt,
#endif
-#ifdef CONFIG_MEMCG_KMEM
- .init_cgroup = tcp_init_cgroup,
- .destroy_cgroup = tcp_destroy_cgroup,
- .proto_cgroup = tcp_proto_cgroup,
-#endif
};
EXPORT_SYMBOL(tcp_prot);
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index c383e68..47addc3 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -8,61 +8,48 @@
int tcp_init_cgroup(struct mem_cgroup *memcg, struct cgroup_subsys *ss)
{
+ struct mem_cgroup *parent = parent_mem_cgroup(memcg);
+ struct page_counter *counter_parent = NULL;
/*
* The root cgroup does not use page_counters, but rather,
* rely on the data already collected by the network
* subsystem
*/
- struct mem_cgroup *parent = parent_mem_cgroup(memcg);
- struct page_counter *counter_parent = NULL;
- struct cg_proto *cg_proto, *parent_cg;
-
- cg_proto = tcp_prot.proto_cgroup(memcg);
- if (!cg_proto)
+ if (memcg == root_mem_cgroup)
return 0;
- cg_proto->memory_pressure = 0;
- cg_proto->memcg = memcg;
+ memcg->tcp_mem.memory_pressure = 0;
- parent_cg = tcp_prot.proto_cgroup(parent);
- if (parent_cg)
- counter_parent = &parent_cg->memory_allocated;
+ if (parent)
+ counter_parent = &parent->tcp_mem.memory_allocated;
- page_counter_init(&cg_proto->memory_allocated, counter_parent);
+ page_counter_init(&memcg->tcp_mem.memory_allocated, counter_parent);
return 0;
}
-EXPORT_SYMBOL(tcp_init_cgroup);
void tcp_destroy_cgroup(struct mem_cgroup *memcg)
{
- struct cg_proto *cg_proto;
-
- cg_proto = tcp_prot.proto_cgroup(memcg);
- if (!cg_proto)
+ if (memcg == root_mem_cgroup)
return;
- if (test_bit(MEMCG_SOCK_ACTIVATED, &cg_proto->flags))
+ if (test_bit(MEMCG_SOCK_ACTIVATED, &memcg->tcp_mem.flags))
static_key_slow_dec(&memcg_socket_limit_enabled);
-
}
-EXPORT_SYMBOL(tcp_destroy_cgroup);
static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
{
- struct cg_proto *cg_proto;
int ret;
- cg_proto = tcp_prot.proto_cgroup(memcg);
- if (!cg_proto)
+ if (memcg == root_mem_cgroup)
return -EINVAL;
- ret = page_counter_limit(&cg_proto->memory_allocated, nr_pages);
+ ret = page_counter_limit(&memcg->tcp_mem.memory_allocated, nr_pages);
if (ret)
return ret;
if (nr_pages == PAGE_COUNTER_MAX)
- clear_bit(MEMCG_SOCK_ACTIVE, &cg_proto->flags);
+ clear_bit(MEMCG_SOCK_ACTIVE, &memcg->tcp_mem.flags);
else {
/*
* The active bit needs to be written after the static_key
@@ -84,9 +71,10 @@ static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
* will do the update in the same memcg. Without that, we can't
* properly shutdown the static key.
*/
- if (!test_and_set_bit(MEMCG_SOCK_ACTIVATED, &cg_proto->flags))
+ if (!test_and_set_bit(MEMCG_SOCK_ACTIVATED,
+ &memcg->tcp_mem.flags))
static_key_slow_inc(&memcg_socket_limit_enabled);
- set_bit(MEMCG_SOCK_ACTIVE, &cg_proto->flags);
+ set_bit(MEMCG_SOCK_ACTIVE, &memcg->tcp_mem.flags);
}
return 0;
@@ -130,32 +118,32 @@ static ssize_t tcp_cgroup_write(struct kernfs_open_file *of,
static u64 tcp_cgroup_read(struct cgroup_subsys_state *css, struct cftype *cft)
{
struct mem_cgroup *memcg = mem_cgroup_from_css(css);
- struct cg_proto *cg_proto = tcp_prot.proto_cgroup(memcg);
u64 val;
switch (cft->private) {
case RES_LIMIT:
- if (!cg_proto)
- return PAGE_COUNTER_MAX;
- val = cg_proto->memory_allocated.limit;
+ if (memcg == root_mem_cgroup)
+ val = PAGE_COUNTER_MAX;
+ else
+ val = memcg->tcp_mem.memory_allocated.limit;
val *= PAGE_SIZE;
break;
case RES_USAGE:
- if (!cg_proto)
+ if (memcg == root_mem_cgroup)
val = atomic_long_read(&tcp_memory_allocated);
else
- val = page_counter_read(&cg_proto->memory_allocated);
+ val = page_counter_read(&memcg->tcp_mem.memory_allocated);
val *= PAGE_SIZE;
break;
case RES_FAILCNT:
- if (!cg_proto)
+ if (memcg == root_mem_cgroup)
return 0;
- val = cg_proto->memory_allocated.failcnt;
+ val = memcg->tcp_mem.memory_allocated.failcnt;
break;
case RES_MAX_USAGE:
- if (!cg_proto)
+ if (memcg == root_mem_cgroup)
return 0;
- val = cg_proto->memory_allocated.watermark;
+ val = memcg->tcp_mem.memory_allocated.watermark;
val *= PAGE_SIZE;
break;
default:
@@ -168,19 +156,17 @@ static ssize_t tcp_cgroup_reset(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
{
struct mem_cgroup *memcg;
- struct cg_proto *cg_proto;
memcg = mem_cgroup_from_css(of_css(of));
- cg_proto = tcp_prot.proto_cgroup(memcg);
- if (!cg_proto)
+ if (memcg == root_mem_cgroup)
return nbytes;
switch (of_cft(of)->private) {
case RES_MAX_USAGE:
- page_counter_reset_watermark(&cg_proto->memory_allocated);
+ page_counter_reset_watermark(&memcg->tcp_mem.memory_allocated);
break;
case RES_FAILCNT:
- cg_proto->memory_allocated.failcnt = 0;
+ memcg->tcp_mem.memory_allocated.failcnt = 0;
break;
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 7aa168a..7b83a65 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2821,8 +2821,8 @@ void sk_forced_mem_schedule(struct sock *sk, int size)
sk->sk_forward_alloc += amt * SK_MEM_QUANTUM;
sk_memory_allocated_add(sk, amt);
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
- mem_cgroup_charge_skmem(sk->sk_cgrp, amt);
+ if (mem_cgroup_sockets_enabled && sk->sk_memcg)
+ mem_cgroup_charge_skmem(sk->sk_memcg, amt);
}
/* Send a FIN. The caller locks the socket for us.
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 5baa8e7..6340537 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1869,9 +1869,6 @@ struct proto tcpv6_prot = {
.compat_setsockopt = compat_tcp_setsockopt,
.compat_getsockopt = compat_tcp_getsockopt,
#endif
-#ifdef CONFIG_MEMCG_KMEM
- .proto_cgroup = tcp_proto_cgroup,
-#endif
.clear_sk = tcp_v6_clear_sk,
};
--
2.6.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: [PATCH 09/14] net: tcp_memcontrol: simplify linkage between socket and page counter
2015-11-12 23:41 ` [PATCH 09/14] net: tcp_memcontrol: simplify linkage between socket and page counter Johannes Weiner
@ 2015-11-20 12:42 ` Vladimir Davydov
2015-11-20 18:56 ` Johannes Weiner
0 siblings, 1 reply; 61+ messages in thread
From: Vladimir Davydov @ 2015-11-20 12:42 UTC (permalink / raw)
To: Johannes Weiner
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Thu, Nov 12, 2015 at 06:41:28PM -0500, Johannes Weiner wrote:
> There won't be any separate counters for socket memory consumed by
> protocols other than TCP in the future. Remove the indirection and
I really want to believe you're right. And with vmpressure propagation
implemented properly you are likely to be right.
However, we might still want to account other socket protos to
memcg->memory in the unified hierarchy, e.g. UDP, or SCTP, or whatever
else. Adding new consumers should be trivial, but it will break the
legacy usecase, where only TCP sockets are supposed to be accounted.
What about adding a check to sock_update_memcg() so that it would enable
accounting only for TCP sockets in case legacy hierarchy is used?
For the same reason, I think we'd better rename memcg->tcp_mem to
something like memcg->sk_mem or we can even drop the cg_proto struct
altogether embedding its fields directly to mem_cgroup struct.
Also, I don't see any reason to have tcp_memcontrol.c file. It's tiny
and with this patch it does not depend on tcp code any more. Let's move
it to memcontrol.c?
Other than that this patch looks OK to me.
Thanks,
Vladimir
> link sockets directly to their owning memory cgroup.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 09/14] net: tcp_memcontrol: simplify linkage between socket and page counter
2015-11-20 12:42 ` Vladimir Davydov
@ 2015-11-20 18:56 ` Johannes Weiner
2015-11-23 9:36 ` Vladimir Davydov
0 siblings, 1 reply; 61+ messages in thread
From: Johannes Weiner @ 2015-11-20 18:56 UTC (permalink / raw)
To: Vladimir Davydov
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Fri, Nov 20, 2015 at 03:42:16PM +0300, Vladimir Davydov wrote:
> On Thu, Nov 12, 2015 at 06:41:28PM -0500, Johannes Weiner wrote:
> > There won't be any separate counters for socket memory consumed by
> > protocols other than TCP in the future. Remove the indirection and
>
> I really want to believe you're right. And with vmpressure propagation
> implemented properly you are likely to be right.
>
> However, we might still want to account other socket protos to
> memcg->memory in the unified hierarchy, e.g. UDP, or SCTP, or whatever
> else. Adding new consumers should be trivial, but it will break the
> legacy usecase, where only TCP sockets are supposed to be accounted.
> What about adding a check to sock_update_memcg() so that it would enable
> accounting only for TCP sockets in case legacy hierarchy is used?
Yup, I was thinking the same thing. But we can cross that bridge when
we come to it and are actually adding further packet types.
> For the same reason, I think we'd better rename memcg->tcp_mem to
> something like memcg->sk_mem or we can even drop the cg_proto struct
> altogether embedding its fields directly to mem_cgroup struct.
>
> Also, I don't see any reason to have tcp_memcontrol.c file. It's tiny
> and with this patch it does not depend on tcp code any more. Let's move
> it to memcontrol.c?
I actually had all this at first, but then wondered if it makes more
sense to keep the legacy code in isolation. Don't you think it would
be easier to keep track of what's v1 and what's v2 if we keep the
legacy stuff physically separate as much as possible? In particular I
found that 'tcp_mem.' marker really useful while working on the code.
In the same vein, tcp_memcontrol.c doesn't really hurt anybody and I'd
expect it to remain mostly unopened and unchanged in the future. But
if we merge it into memcontrol.c, that code will likely be in the way
and we'd have to make it explicit somehow that this is not actually
part of the new memory controller anymore.
What do you think?
> Other than that this patch looks OK to me.
Thank you!
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 09/14] net: tcp_memcontrol: simplify linkage between socket and page counter
2015-11-20 18:56 ` Johannes Weiner
@ 2015-11-23 9:36 ` Vladimir Davydov
2015-11-23 18:20 ` Johannes Weiner
0 siblings, 1 reply; 61+ messages in thread
From: Vladimir Davydov @ 2015-11-23 9:36 UTC (permalink / raw)
To: Johannes Weiner
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Fri, Nov 20, 2015 at 01:56:48PM -0500, Johannes Weiner wrote:
> On Fri, Nov 20, 2015 at 03:42:16PM +0300, Vladimir Davydov wrote:
> > On Thu, Nov 12, 2015 at 06:41:28PM -0500, Johannes Weiner wrote:
> > > There won't be any separate counters for socket memory consumed by
> > > protocols other than TCP in the future. Remove the indirection and
> >
> > I really want to believe you're right. And with vmpressure propagation
> > implemented properly you are likely to be right.
> >
> > However, we might still want to account other socket protos to
> > memcg->memory in the unified hierarchy, e.g. UDP, or SCTP, or whatever
> > else. Adding new consumers should be trivial, but it will break the
> > legacy usecase, where only TCP sockets are supposed to be accounted.
> > What about adding a check to sock_update_memcg() so that it would enable
> > accounting only for TCP sockets in case legacy hierarchy is used?
>
> Yup, I was thinking the same thing. But we can cross that bridge when
> we come to it and are actually adding further packet types.
Fair enough.
>
> > For the same reason, I think we'd better rename memcg->tcp_mem to
> > something like memcg->sk_mem or we can even drop the cg_proto struct
> > altogether embedding its fields directly to mem_cgroup struct.
> >
> > Also, I don't see any reason to have tcp_memcontrol.c file. It's tiny
> > and with this patch it does not depend on tcp code any more. Let's move
> > it to memcontrol.c?
>
> I actually had all this at first, but then wondered if it makes more
> sense to keep the legacy code in isolation. Don't you think it would
> be easier to keep track of what's v1 and what's v2 if we keep the
> legacy stuff physically separate as much as possible? In particular I
> found that 'tcp_mem.' marker really useful while working on the code.
>
> In the same vein, tcp_memcontrol.c doesn't really hurt anybody and I'd
> expect it to remain mostly unopened and unchanged in the future. But
> if we merge it into memcontrol.c, that code will likely be in the way
> and we'd have to make it explicit somehow that this is not actually
> part of the new memory controller anymore.
>
> What do you think?
There isn't much code left in tcp_memcontrol.c, and not all of it is
legacy. We still want to call tcp_init_cgroup and tcp_destroy_cgroup
from memcontrol.c - in fact, it's the only call site, so I think we'd
better keep these functions there. Apart from init/destroy, there is
only stuff for handling legacy files, which is relatively small and
isolated. We can just put it along with memsw and kmem legacy files in
the end of memcontrol.c adding a comment that it's legacy. Personally,
I'd find the code easier to follow then, because currently the logic
behind the ACTIVE flag as well as memcg->tcp_mem init/use/destroy turns
out to be scattered between two files in different subsystems for no
apparent reason now, as it does not need tcp_prot any more. Besides,
this would allow us to accurately reuse the ACTIVE flag in init/destroy
for inc/dec static branch and probably in sock_update_memcg instead of
sprinkling cgroup_subsys_on_dfl all over the place, which would make the
code a bit cleaner IMO (in fact, that's why I proposed to drop ACTIVATED
bit and replace cg_proto->flags with ->active bool).
Regarding, tcp_mem marker, well, currently it's OK, because we don't
account anything but TCP sockets, but when it changes (and I'm pretty
sure it will), we'll have to rename it anyway. For now, I'm OK with
leaving it as is though.
Thanks,
Vladimir
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 09/14] net: tcp_memcontrol: simplify linkage between socket and page counter
2015-11-23 9:36 ` Vladimir Davydov
@ 2015-11-23 18:20 ` Johannes Weiner
[not found] ` <20151123182037.GE13000-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
0 siblings, 1 reply; 61+ messages in thread
From: Johannes Weiner @ 2015-11-23 18:20 UTC (permalink / raw)
To: Vladimir Davydov
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Mon, Nov 23, 2015 at 12:36:46PM +0300, Vladimir Davydov wrote:
> On Fri, Nov 20, 2015 at 01:56:48PM -0500, Johannes Weiner wrote:
> > I actually had all this at first, but then wondered if it makes more
> > sense to keep the legacy code in isolation. Don't you think it would
> > be easier to keep track of what's v1 and what's v2 if we keep the
> > legacy stuff physically separate as much as possible? In particular I
> > found that 'tcp_mem.' marker really useful while working on the code.
> >
> > In the same vein, tcp_memcontrol.c doesn't really hurt anybody and I'd
> > expect it to remain mostly unopened and unchanged in the future. But
> > if we merge it into memcontrol.c, that code will likely be in the way
> > and we'd have to make it explicit somehow that this is not actually
> > part of the new memory controller anymore.
> >
> > What do you think?
>
> There isn't much code left in tcp_memcontrol.c, and not all of it is
> legacy. We still want to call tcp_init_cgroup and tcp_destroy_cgroup
> from memcontrol.c - in fact, it's the only call site, so I think we'd
> better keep these functions there. Apart from init/destroy, there is
> only stuff for handling legacy files, which is relatively small and
> isolated. We can just put it along with memsw and kmem legacy files in
> the end of memcontrol.c adding a comment that it's legacy. Personally,
> I'd find the code easier to follow then, because currently the logic
> behind the ACTIVE flag as well as memcg->tcp_mem init/use/destroy turns
> out to be scattered between two files in different subsystems for no
> apparent reason now, as it does not need tcp_prot any more. Besides,
> this would allow us to accurately reuse the ACTIVE flag in init/destroy
> for inc/dec static branch and probably in sock_update_memcg instead of
> sprinkling cgroup_subsys_on_dfl all over the place, which would make the
> code a bit cleaner IMO (in fact, that's why I proposed to drop ACTIVATED
> bit and replace cg_proto->flags with ->active bool).
As far as I can see, all of tcp_memcontrol.c is legacy, including the
init and destroy functions. We only call them to set up the legacy
tcp_mem state and do legacy jump-label maintenance. Delete it all and
the unified hierarchy controller would still work. So I don't really
see the benefits of consolidating it, and more risk of convoluting.
That being said, if you care strongly about it and see opportunities
to cut down code and make things more readable, please feel free to
turn the flags -> bool patch into a followup series and I'll be happy
to review it.
Thanks!
^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 10/14] mm: memcontrol: generalize the socket accounting jump label
2015-11-12 23:41 [PATCH 00/14] mm: memcontrol: account socket memory in unified hierarchy Johannes Weiner
` (8 preceding siblings ...)
2015-11-12 23:41 ` [PATCH 09/14] net: tcp_memcontrol: simplify linkage between socket and page counter Johannes Weiner
@ 2015-11-12 23:41 ` Johannes Weiner
2015-11-13 10:43 ` Michal Hocko
2015-11-14 13:29 ` Vladimir Davydov
2015-11-12 23:41 ` [PATCH 11/14] mm: memcontrol: do not account memory+swap on unified hierarchy Johannes Weiner
` (3 subsequent siblings)
13 siblings, 2 replies; 61+ messages in thread
From: Johannes Weiner @ 2015-11-12 23:41 UTC (permalink / raw)
To: David Miller, Andrew Morton
Cc: Vladimir Davydov, Tejun Heo, Michal Hocko, netdev, linux-mm,
cgroups, linux-kernel, kernel-team
The unified hierarchy memory controller is going to use this jump
label as well to control the networking callbacks. Move it to the
memory controller code and give it a more generic name.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
include/linux/memcontrol.h | 4 ++++
include/net/sock.h | 7 -------
mm/memcontrol.c | 3 +++
net/core/sock.c | 5 -----
net/ipv4/tcp_memcontrol.c | 4 ++--
5 files changed, 9 insertions(+), 14 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1c71f27..4cf5afa 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -693,6 +693,8 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
struct sock;
+extern struct static_key memcg_sockets_enabled_key;
+#define mem_cgroup_sockets_enabled static_key_false(&memcg_sockets_enabled_key)
void sock_update_memcg(struct sock *sk);
void sock_release_memcg(struct sock *sk);
bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
@@ -701,6 +703,8 @@ static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
{
return memcg->tcp_mem.memory_pressure;
}
+#else
+#define mem_cgroup_sockets_enabled 0
#endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
#ifdef CONFIG_MEMCG_KMEM
diff --git a/include/net/sock.h b/include/net/sock.h
index b439dcc..bf1b901 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1065,13 +1065,6 @@ static inline void sk_refcnt_debug_release(const struct sock *sk)
#define sk_refcnt_debug_release(sk) do { } while (0)
#endif /* SOCK_REFCNT_DEBUG */
-#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_NET)
-extern struct static_key memcg_socket_limit_enabled;
-#define mem_cgroup_sockets_enabled static_key_false(&memcg_socket_limit_enabled)
-#else
-#define mem_cgroup_sockets_enabled 0
-#endif
-
static inline bool sk_stream_memory_free(const struct sock *sk)
{
if (sk->sk_wmem_queued >= sk->sk_sndbuf)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 89b1d9e..658bef2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -291,6 +291,9 @@ static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
/* Writing them here to avoid exposing memcg's inner layout */
#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
+struct static_key memcg_sockets_enabled_key;
+EXPORT_SYMBOL(memcg_sockets_enabled_key);
+
void sock_update_memcg(struct sock *sk)
{
struct mem_cgroup *memcg;
diff --git a/net/core/sock.c b/net/core/sock.c
index 6486b0d..c5435b5 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -201,11 +201,6 @@ EXPORT_SYMBOL(sk_net_capable);
static struct lock_class_key af_family_keys[AF_MAX];
static struct lock_class_key af_family_slock_keys[AF_MAX];
-#if defined(CONFIG_MEMCG_KMEM)
-struct static_key memcg_socket_limit_enabled;
-EXPORT_SYMBOL(memcg_socket_limit_enabled);
-#endif
-
/*
* Make lock validator output more readable. (we pre-construct these
* strings build-time, so that runtime initialization of socket
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
index 47addc3..17df9dd 100644
--- a/net/ipv4/tcp_memcontrol.c
+++ b/net/ipv4/tcp_memcontrol.c
@@ -34,7 +34,7 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg)
return;
if (test_bit(MEMCG_SOCK_ACTIVATED, &memcg->tcp_mem.flags))
- static_key_slow_dec(&memcg_socket_limit_enabled);
+ static_key_slow_dec(&memcg_sockets_enabled_key);
}
static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
@@ -73,7 +73,7 @@ static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
*/
if (!test_and_set_bit(MEMCG_SOCK_ACTIVATED,
&memcg->tcp_mem.flags))
- static_key_slow_inc(&memcg_socket_limit_enabled);
+ static_key_slow_inc(&memcg_sockets_enabled_key);
set_bit(MEMCG_SOCK_ACTIVE, &memcg->tcp_mem.flags);
}
--
2.6.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: [PATCH 10/14] mm: memcontrol: generalize the socket accounting jump label
2015-11-12 23:41 ` [PATCH 10/14] mm: memcontrol: generalize the socket accounting jump label Johannes Weiner
@ 2015-11-13 10:43 ` Michal Hocko
2015-11-14 13:29 ` Vladimir Davydov
1 sibling, 0 replies; 61+ messages in thread
From: Michal Hocko @ 2015-11-13 10:43 UTC (permalink / raw)
To: Johannes Weiner
Cc: David Miller, Andrew Morton, Vladimir Davydov, Tejun Heo, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Thu 12-11-15 18:41:29, Johannes Weiner wrote:
> The unified hierarchy memory controller is going to use this jump
> label as well to control the networking callbacks. Move it to the
> memory controller code and give it a more generic name.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Yes it makes more sense in memcg proper
Acked-by: Michal Hocko <mhocko@suse.com>
> ---
> include/linux/memcontrol.h | 4 ++++
> include/net/sock.h | 7 -------
> mm/memcontrol.c | 3 +++
> net/core/sock.c | 5 -----
> net/ipv4/tcp_memcontrol.c | 4 ++--
> 5 files changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 1c71f27..4cf5afa 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -693,6 +693,8 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
>
> #if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
> struct sock;
> +extern struct static_key memcg_sockets_enabled_key;
> +#define mem_cgroup_sockets_enabled static_key_false(&memcg_sockets_enabled_key)
> void sock_update_memcg(struct sock *sk);
> void sock_release_memcg(struct sock *sk);
> bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
> @@ -701,6 +703,8 @@ static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
> {
> return memcg->tcp_mem.memory_pressure;
> }
> +#else
> +#define mem_cgroup_sockets_enabled 0
> #endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
>
> #ifdef CONFIG_MEMCG_KMEM
> diff --git a/include/net/sock.h b/include/net/sock.h
> index b439dcc..bf1b901 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1065,13 +1065,6 @@ static inline void sk_refcnt_debug_release(const struct sock *sk)
> #define sk_refcnt_debug_release(sk) do { } while (0)
> #endif /* SOCK_REFCNT_DEBUG */
>
> -#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_NET)
> -extern struct static_key memcg_socket_limit_enabled;
> -#define mem_cgroup_sockets_enabled static_key_false(&memcg_socket_limit_enabled)
> -#else
> -#define mem_cgroup_sockets_enabled 0
> -#endif
> -
> static inline bool sk_stream_memory_free(const struct sock *sk)
> {
> if (sk->sk_wmem_queued >= sk->sk_sndbuf)
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 89b1d9e..658bef2 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -291,6 +291,9 @@ static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
> /* Writing them here to avoid exposing memcg's inner layout */
> #if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
>
> +struct static_key memcg_sockets_enabled_key;
> +EXPORT_SYMBOL(memcg_sockets_enabled_key);
> +
> void sock_update_memcg(struct sock *sk)
> {
> struct mem_cgroup *memcg;
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 6486b0d..c5435b5 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -201,11 +201,6 @@ EXPORT_SYMBOL(sk_net_capable);
> static struct lock_class_key af_family_keys[AF_MAX];
> static struct lock_class_key af_family_slock_keys[AF_MAX];
>
> -#if defined(CONFIG_MEMCG_KMEM)
> -struct static_key memcg_socket_limit_enabled;
> -EXPORT_SYMBOL(memcg_socket_limit_enabled);
> -#endif
> -
> /*
> * Make lock validator output more readable. (we pre-construct these
> * strings build-time, so that runtime initialization of socket
> diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c
> index 47addc3..17df9dd 100644
> --- a/net/ipv4/tcp_memcontrol.c
> +++ b/net/ipv4/tcp_memcontrol.c
> @@ -34,7 +34,7 @@ void tcp_destroy_cgroup(struct mem_cgroup *memcg)
> return;
>
> if (test_bit(MEMCG_SOCK_ACTIVATED, &memcg->tcp_mem.flags))
> - static_key_slow_dec(&memcg_socket_limit_enabled);
> + static_key_slow_dec(&memcg_sockets_enabled_key);
> }
>
> static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
> @@ -73,7 +73,7 @@ static int tcp_update_limit(struct mem_cgroup *memcg, unsigned long nr_pages)
> */
> if (!test_and_set_bit(MEMCG_SOCK_ACTIVATED,
> &memcg->tcp_mem.flags))
> - static_key_slow_inc(&memcg_socket_limit_enabled);
> + static_key_slow_inc(&memcg_sockets_enabled_key);
> set_bit(MEMCG_SOCK_ACTIVE, &memcg->tcp_mem.flags);
> }
>
> --
> 2.6.2
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [PATCH 10/14] mm: memcontrol: generalize the socket accounting jump label
2015-11-12 23:41 ` [PATCH 10/14] mm: memcontrol: generalize the socket accounting jump label Johannes Weiner
2015-11-13 10:43 ` Michal Hocko
@ 2015-11-14 13:29 ` Vladimir Davydov
1 sibling, 0 replies; 61+ messages in thread
From: Vladimir Davydov @ 2015-11-14 13:29 UTC (permalink / raw)
To: Johannes Weiner
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Thu, Nov 12, 2015 at 06:41:29PM -0500, Johannes Weiner wrote:
> The unified hierarchy memory controller is going to use this jump
> label as well to control the networking callbacks. Move it to the
> memory controller code and give it a more generic name.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 11/14] mm: memcontrol: do not account memory+swap on unified hierarchy
2015-11-12 23:41 [PATCH 00/14] mm: memcontrol: account socket memory in unified hierarchy Johannes Weiner
` (9 preceding siblings ...)
2015-11-12 23:41 ` [PATCH 10/14] mm: memcontrol: generalize the socket accounting jump label Johannes Weiner
@ 2015-11-12 23:41 ` Johannes Weiner
2015-11-13 10:37 ` Michal Hocko
2015-11-14 13:23 ` Vladimir Davydov
2015-11-12 23:41 ` [PATCH 12/14] mm: memcontrol: move socket code for unified hierarchy accounting Johannes Weiner
` (2 subsequent siblings)
13 siblings, 2 replies; 61+ messages in thread
From: Johannes Weiner @ 2015-11-12 23:41 UTC (permalink / raw)
To: David Miller, Andrew Morton
Cc: Vladimir Davydov, Tejun Heo, Michal Hocko, netdev, linux-mm,
cgroups, linux-kernel, kernel-team
The unified hierarchy memory controller doesn't expose the memory+swap
counter to userspace, but its accounting is hardcoded in all charge
paths right now, including the per-cpu charge cache ("the stock").
To avoid adding yet more pointless memory+swap accounting with the
socket memory support in unified hierarchy, disable the counter
altogether when in unified hierarchy mode.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/memcontrol.c | 44 +++++++++++++++++++++++++-------------------
1 file changed, 25 insertions(+), 19 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 658bef2..e7f1a79 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -87,6 +87,12 @@ int do_swap_account __read_mostly;
#define do_swap_account 0
#endif
+/* Whether legacy memory+swap accounting is active */
+static bool do_memsw_account(void)
+{
+ return !cgroup_subsys_on_dfl(memory_cgrp_subsys) && do_swap_account;
+}
+
static const char * const mem_cgroup_stat_names[] = {
"cache",
"rss",
@@ -1177,7 +1183,7 @@ static unsigned long mem_cgroup_margin(struct mem_cgroup *memcg)
if (count < limit)
margin = limit - count;
- if (do_swap_account) {
+ if (do_memsw_account()) {
count = page_counter_read(&memcg->memsw);
limit = READ_ONCE(memcg->memsw.limit);
if (count <= limit)
@@ -1280,7 +1286,7 @@ void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
pr_cont(":");
for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
- if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
+ if (i == MEM_CGROUP_STAT_SWAP && !do_memsw_account())
continue;
pr_cont(" %s:%luKB", mem_cgroup_stat_names[i],
K(mem_cgroup_read_stat(iter, i)));
@@ -1903,7 +1909,7 @@ static void drain_stock(struct memcg_stock_pcp *stock)
if (stock->nr_pages) {
page_counter_uncharge(&old->memory, stock->nr_pages);
- if (do_swap_account)
+ if (do_memsw_account())
page_counter_uncharge(&old->memsw, stock->nr_pages);
css_put_many(&old->css, stock->nr_pages);
stock->nr_pages = 0;
@@ -2033,11 +2039,11 @@ retry:
if (consume_stock(memcg, nr_pages))
return 0;
- if (!do_swap_account ||
+ if (!do_memsw_account() ||
page_counter_try_charge(&memcg->memsw, batch, &counter)) {
if (page_counter_try_charge(&memcg->memory, batch, &counter))
goto done_restock;
- if (do_swap_account)
+ if (do_memsw_account())
page_counter_uncharge(&memcg->memsw, batch);
mem_over_limit = mem_cgroup_from_counter(counter, memory);
} else {
@@ -2124,7 +2130,7 @@ force:
* temporarily by force charging it.
*/
page_counter_charge(&memcg->memory, nr_pages);
- if (do_swap_account)
+ if (do_memsw_account())
page_counter_charge(&memcg->memsw, nr_pages);
css_get_many(&memcg->css, nr_pages);
@@ -2161,7 +2167,7 @@ static void cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages)
return;
page_counter_uncharge(&memcg->memory, nr_pages);
- if (do_swap_account)
+ if (do_memsw_account())
page_counter_uncharge(&memcg->memsw, nr_pages);
css_put_many(&memcg->css, nr_pages);
@@ -2441,7 +2447,7 @@ void __memcg_kmem_uncharge(struct page *page, int order)
page_counter_uncharge(&memcg->kmem, nr_pages);
page_counter_uncharge(&memcg->memory, nr_pages);
- if (do_swap_account)
+ if (do_memsw_account())
page_counter_uncharge(&memcg->memsw, nr_pages);
page->mem_cgroup = NULL;
@@ -3154,7 +3160,7 @@ static int memcg_stat_show(struct seq_file *m, void *v)
BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS);
for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
- if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
+ if (i == MEM_CGROUP_STAT_SWAP && !do_memsw_account())
continue;
seq_printf(m, "%s %lu\n", mem_cgroup_stat_names[i],
mem_cgroup_read_stat(memcg, i) * PAGE_SIZE);
@@ -3176,14 +3182,14 @@ static int memcg_stat_show(struct seq_file *m, void *v)
}
seq_printf(m, "hierarchical_memory_limit %llu\n",
(u64)memory * PAGE_SIZE);
- if (do_swap_account)
+ if (do_memsw_account())
seq_printf(m, "hierarchical_memsw_limit %llu\n",
(u64)memsw * PAGE_SIZE);
for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
unsigned long long val = 0;
- if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
+ if (i == MEM_CGROUP_STAT_SWAP && !do_memsw_account())
continue;
for_each_mem_cgroup_tree(mi, memcg)
val += mem_cgroup_read_stat(mi, i) * PAGE_SIZE;
@@ -3314,7 +3320,7 @@ static void mem_cgroup_threshold(struct mem_cgroup *memcg)
{
while (memcg) {
__mem_cgroup_threshold(memcg, false);
- if (do_swap_account)
+ if (do_memsw_account())
__mem_cgroup_threshold(memcg, true);
memcg = parent_mem_cgroup(memcg);
@@ -4460,7 +4466,7 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
* we call find_get_page() with swapper_space directly.
*/
page = find_get_page(swap_address_space(ent), ent.val);
- if (do_swap_account)
+ if (do_memsw_account())
entry->val = ent.val;
return page;
@@ -4495,7 +4501,7 @@ static struct page *mc_handle_file_pte(struct vm_area_struct *vma,
page = find_get_entry(mapping, pgoff);
if (radix_tree_exceptional_entry(page)) {
swp_entry_t swp = radix_to_swp_entry(page);
- if (do_swap_account)
+ if (do_memsw_account())
*entry = swp;
page = find_get_page(swap_address_space(swp), swp.val);
}
@@ -5270,7 +5276,7 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
if (page->mem_cgroup)
goto out;
- if (do_swap_account) {
+ if (do_memsw_account()) {
swp_entry_t ent = { .val = page_private(page), };
unsigned short id = lookup_swap_cgroup_id(ent);
@@ -5334,7 +5340,7 @@ void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
memcg_check_events(memcg, page);
local_irq_enable();
- if (do_swap_account && PageSwapCache(page)) {
+ if (do_memsw_account() && PageSwapCache(page)) {
swp_entry_t entry = { .val = page_private(page) };
/*
* The swap entry might not get freed for a long time,
@@ -5379,7 +5385,7 @@ static void uncharge_batch(struct mem_cgroup *memcg, unsigned long pgpgout,
if (!mem_cgroup_is_root(memcg)) {
page_counter_uncharge(&memcg->memory, nr_pages);
- if (do_swap_account)
+ if (do_memsw_account())
page_counter_uncharge(&memcg->memsw, nr_pages);
memcg_oom_recover(memcg);
}
@@ -5587,7 +5593,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
VM_BUG_ON_PAGE(PageLRU(page), page);
VM_BUG_ON_PAGE(page_count(page), page);
- if (!do_swap_account)
+ if (!do_memsw_account())
return;
memcg = page->mem_cgroup;
@@ -5627,7 +5633,7 @@ void mem_cgroup_uncharge_swap(swp_entry_t entry)
struct mem_cgroup *memcg;
unsigned short id;
- if (!do_swap_account)
+ if (!do_memsw_account())
return;
id = swap_cgroup_record(entry, 0);
--
2.6.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: [PATCH 11/14] mm: memcontrol: do not account memory+swap on unified hierarchy
2015-11-12 23:41 ` [PATCH 11/14] mm: memcontrol: do not account memory+swap on unified hierarchy Johannes Weiner
@ 2015-11-13 10:37 ` Michal Hocko
2015-11-14 13:23 ` Vladimir Davydov
1 sibling, 0 replies; 61+ messages in thread
From: Michal Hocko @ 2015-11-13 10:37 UTC (permalink / raw)
To: Johannes Weiner
Cc: David Miller, Andrew Morton, Vladimir Davydov, Tejun Heo, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Thu 12-11-15 18:41:30, Johannes Weiner wrote:
> The unified hierarchy memory controller doesn't expose the memory+swap
> counter to userspace, but its accounting is hardcoded in all charge
> paths right now, including the per-cpu charge cache ("the stock").
>
> To avoid adding yet more pointless memory+swap accounting with the
> socket memory support in unified hierarchy, disable the counter
> altogether when in unified hierarchy mode.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
> ---
> mm/memcontrol.c | 44 +++++++++++++++++++++++++-------------------
> 1 file changed, 25 insertions(+), 19 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 658bef2..e7f1a79 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -87,6 +87,12 @@ int do_swap_account __read_mostly;
> #define do_swap_account 0
> #endif
>
> +/* Whether legacy memory+swap accounting is active */
> +static bool do_memsw_account(void)
> +{
> + return !cgroup_subsys_on_dfl(memory_cgrp_subsys) && do_swap_account;
> +}
> +
> static const char * const mem_cgroup_stat_names[] = {
> "cache",
> "rss",
> @@ -1177,7 +1183,7 @@ static unsigned long mem_cgroup_margin(struct mem_cgroup *memcg)
> if (count < limit)
> margin = limit - count;
>
> - if (do_swap_account) {
> + if (do_memsw_account()) {
> count = page_counter_read(&memcg->memsw);
> limit = READ_ONCE(memcg->memsw.limit);
> if (count <= limit)
> @@ -1280,7 +1286,7 @@ void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> pr_cont(":");
>
> for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
> - if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
> + if (i == MEM_CGROUP_STAT_SWAP && !do_memsw_account())
> continue;
> pr_cont(" %s:%luKB", mem_cgroup_stat_names[i],
> K(mem_cgroup_read_stat(iter, i)));
> @@ -1903,7 +1909,7 @@ static void drain_stock(struct memcg_stock_pcp *stock)
>
> if (stock->nr_pages) {
> page_counter_uncharge(&old->memory, stock->nr_pages);
> - if (do_swap_account)
> + if (do_memsw_account())
> page_counter_uncharge(&old->memsw, stock->nr_pages);
> css_put_many(&old->css, stock->nr_pages);
> stock->nr_pages = 0;
> @@ -2033,11 +2039,11 @@ retry:
> if (consume_stock(memcg, nr_pages))
> return 0;
>
> - if (!do_swap_account ||
> + if (!do_memsw_account() ||
> page_counter_try_charge(&memcg->memsw, batch, &counter)) {
> if (page_counter_try_charge(&memcg->memory, batch, &counter))
> goto done_restock;
> - if (do_swap_account)
> + if (do_memsw_account())
> page_counter_uncharge(&memcg->memsw, batch);
> mem_over_limit = mem_cgroup_from_counter(counter, memory);
> } else {
> @@ -2124,7 +2130,7 @@ force:
> * temporarily by force charging it.
> */
> page_counter_charge(&memcg->memory, nr_pages);
> - if (do_swap_account)
> + if (do_memsw_account())
> page_counter_charge(&memcg->memsw, nr_pages);
> css_get_many(&memcg->css, nr_pages);
>
> @@ -2161,7 +2167,7 @@ static void cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages)
> return;
>
> page_counter_uncharge(&memcg->memory, nr_pages);
> - if (do_swap_account)
> + if (do_memsw_account())
> page_counter_uncharge(&memcg->memsw, nr_pages);
>
> css_put_many(&memcg->css, nr_pages);
> @@ -2441,7 +2447,7 @@ void __memcg_kmem_uncharge(struct page *page, int order)
>
> page_counter_uncharge(&memcg->kmem, nr_pages);
> page_counter_uncharge(&memcg->memory, nr_pages);
> - if (do_swap_account)
> + if (do_memsw_account())
> page_counter_uncharge(&memcg->memsw, nr_pages);
>
> page->mem_cgroup = NULL;
> @@ -3154,7 +3160,7 @@ static int memcg_stat_show(struct seq_file *m, void *v)
> BUILD_BUG_ON(ARRAY_SIZE(mem_cgroup_lru_names) != NR_LRU_LISTS);
>
> for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
> - if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
> + if (i == MEM_CGROUP_STAT_SWAP && !do_memsw_account())
> continue;
> seq_printf(m, "%s %lu\n", mem_cgroup_stat_names[i],
> mem_cgroup_read_stat(memcg, i) * PAGE_SIZE);
> @@ -3176,14 +3182,14 @@ static int memcg_stat_show(struct seq_file *m, void *v)
> }
> seq_printf(m, "hierarchical_memory_limit %llu\n",
> (u64)memory * PAGE_SIZE);
> - if (do_swap_account)
> + if (do_memsw_account())
> seq_printf(m, "hierarchical_memsw_limit %llu\n",
> (u64)memsw * PAGE_SIZE);
>
> for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
> unsigned long long val = 0;
>
> - if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
> + if (i == MEM_CGROUP_STAT_SWAP && !do_memsw_account())
> continue;
> for_each_mem_cgroup_tree(mi, memcg)
> val += mem_cgroup_read_stat(mi, i) * PAGE_SIZE;
> @@ -3314,7 +3320,7 @@ static void mem_cgroup_threshold(struct mem_cgroup *memcg)
> {
> while (memcg) {
> __mem_cgroup_threshold(memcg, false);
> - if (do_swap_account)
> + if (do_memsw_account())
> __mem_cgroup_threshold(memcg, true);
>
> memcg = parent_mem_cgroup(memcg);
> @@ -4460,7 +4466,7 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
> * we call find_get_page() with swapper_space directly.
> */
> page = find_get_page(swap_address_space(ent), ent.val);
> - if (do_swap_account)
> + if (do_memsw_account())
> entry->val = ent.val;
>
> return page;
> @@ -4495,7 +4501,7 @@ static struct page *mc_handle_file_pte(struct vm_area_struct *vma,
> page = find_get_entry(mapping, pgoff);
> if (radix_tree_exceptional_entry(page)) {
> swp_entry_t swp = radix_to_swp_entry(page);
> - if (do_swap_account)
> + if (do_memsw_account())
> *entry = swp;
> page = find_get_page(swap_address_space(swp), swp.val);
> }
> @@ -5270,7 +5276,7 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
> if (page->mem_cgroup)
> goto out;
>
> - if (do_swap_account) {
> + if (do_memsw_account()) {
> swp_entry_t ent = { .val = page_private(page), };
> unsigned short id = lookup_swap_cgroup_id(ent);
>
> @@ -5334,7 +5340,7 @@ void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
> memcg_check_events(memcg, page);
> local_irq_enable();
>
> - if (do_swap_account && PageSwapCache(page)) {
> + if (do_memsw_account() && PageSwapCache(page)) {
> swp_entry_t entry = { .val = page_private(page) };
> /*
> * The swap entry might not get freed for a long time,
> @@ -5379,7 +5385,7 @@ static void uncharge_batch(struct mem_cgroup *memcg, unsigned long pgpgout,
>
> if (!mem_cgroup_is_root(memcg)) {
> page_counter_uncharge(&memcg->memory, nr_pages);
> - if (do_swap_account)
> + if (do_memsw_account())
> page_counter_uncharge(&memcg->memsw, nr_pages);
> memcg_oom_recover(memcg);
> }
> @@ -5587,7 +5593,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
> VM_BUG_ON_PAGE(PageLRU(page), page);
> VM_BUG_ON_PAGE(page_count(page), page);
>
> - if (!do_swap_account)
> + if (!do_memsw_account())
> return;
>
> memcg = page->mem_cgroup;
> @@ -5627,7 +5633,7 @@ void mem_cgroup_uncharge_swap(swp_entry_t entry)
> struct mem_cgroup *memcg;
> unsigned short id;
>
> - if (!do_swap_account)
> + if (!do_memsw_account())
> return;
>
> id = swap_cgroup_record(entry, 0);
> --
> 2.6.2
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [PATCH 11/14] mm: memcontrol: do not account memory+swap on unified hierarchy
2015-11-12 23:41 ` [PATCH 11/14] mm: memcontrol: do not account memory+swap on unified hierarchy Johannes Weiner
2015-11-13 10:37 ` Michal Hocko
@ 2015-11-14 13:23 ` Vladimir Davydov
1 sibling, 0 replies; 61+ messages in thread
From: Vladimir Davydov @ 2015-11-14 13:23 UTC (permalink / raw)
To: Johannes Weiner
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Thu, Nov 12, 2015 at 06:41:30PM -0500, Johannes Weiner wrote:
> The unified hierarchy memory controller doesn't expose the memory+swap
> counter to userspace, but its accounting is hardcoded in all charge
> paths right now, including the per-cpu charge cache ("the stock").
>
> To avoid adding yet more pointless memory+swap accounting with the
> socket memory support in unified hierarchy, disable the counter
> altogether when in unified hierarchy mode.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 12/14] mm: memcontrol: move socket code for unified hierarchy accounting
2015-11-12 23:41 [PATCH 00/14] mm: memcontrol: account socket memory in unified hierarchy Johannes Weiner
` (10 preceding siblings ...)
2015-11-12 23:41 ` [PATCH 11/14] mm: memcontrol: do not account memory+swap on unified hierarchy Johannes Weiner
@ 2015-11-12 23:41 ` Johannes Weiner
2015-11-20 12:44 ` Vladimir Davydov
2015-11-12 23:41 ` [PATCH 13/14] mm: memcontrol: account socket memory in unified hierarchy memory controller Johannes Weiner
2015-11-12 23:41 ` [PATCH 14/14] mm: memcontrol: hook up vmpressure to socket pressure Johannes Weiner
13 siblings, 1 reply; 61+ messages in thread
From: Johannes Weiner @ 2015-11-12 23:41 UTC (permalink / raw)
To: David Miller, Andrew Morton
Cc: Vladimir Davydov, Tejun Heo, Michal Hocko, netdev, linux-mm,
cgroups, linux-kernel, kernel-team
The unified hierarchy memory controller will account socket
memory. Move the infrastructure functions accordingly.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
---
mm/memcontrol.c | 148 ++++++++++++++++++++++++++++----------------------------
1 file changed, 74 insertions(+), 74 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e7f1a79..408fb04 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -294,80 +294,6 @@ static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
return mem_cgroup_from_css(css);
}
-/* Writing them here to avoid exposing memcg's inner layout */
-#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
-
-struct static_key memcg_sockets_enabled_key;
-EXPORT_SYMBOL(memcg_sockets_enabled_key);
-
-void sock_update_memcg(struct sock *sk)
-{
- struct mem_cgroup *memcg;
-
- /* Socket cloning can throw us here with sk_cgrp already
- * filled. It won't however, necessarily happen from
- * process context. So the test for root memcg given
- * the current task's memcg won't help us in this case.
- *
- * Respecting the original socket's memcg is a better
- * decision in this case.
- */
- if (sk->sk_memcg) {
- BUG_ON(mem_cgroup_is_root(sk->sk_memcg));
- css_get(&sk->sk_memcg->css);
- return;
- }
-
- rcu_read_lock();
- memcg = mem_cgroup_from_task(current);
- if (memcg != root_mem_cgroup &&
- test_bit(MEMCG_SOCK_ACTIVE, &memcg->tcp_mem.flags) &&
- css_tryget_online(&memcg->css))
- sk->sk_memcg = memcg;
- rcu_read_unlock();
-}
-EXPORT_SYMBOL(sock_update_memcg);
-
-void sock_release_memcg(struct sock *sk)
-{
- WARN_ON(!sk->sk_memcg);
- css_put(&sk->sk_memcg->css);
-}
-
-/**
- * mem_cgroup_charge_skmem - charge socket memory
- * @memcg: memcg to charge
- * @nr_pages: number of pages to charge
- *
- * Charges @nr_pages to @memcg. Returns %true if the charge fit within
- * @memcg's configured limit, %false if the charge had to be forced.
- */
-bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
-{
- struct page_counter *counter;
-
- if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
- nr_pages, &counter)) {
- memcg->tcp_mem.memory_pressure = 0;
- return true;
- }
- page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
- memcg->tcp_mem.memory_pressure = 1;
- return false;
-}
-
-/**
- * mem_cgroup_uncharge_skmem - uncharge socket memory
- * @memcg - memcg to uncharge
- * @nr_pages - number of pages to uncharge
- */
-void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
-{
- page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
-}
-
-#endif
-
#ifdef CONFIG_MEMCG_KMEM
/*
* This will be the memcg's index in each cache's ->memcg_params.memcg_caches.
@@ -5538,6 +5464,80 @@ void mem_cgroup_replace_page(struct page *oldpage, struct page *newpage)
commit_charge(newpage, memcg, true);
}
+/* Writing them here to avoid exposing memcg's inner layout */
+#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
+
+struct static_key memcg_sockets_enabled_key;
+EXPORT_SYMBOL(memcg_sockets_enabled_key);
+
+void sock_update_memcg(struct sock *sk)
+{
+ struct mem_cgroup *memcg;
+
+ /* Socket cloning can throw us here with sk_cgrp already
+ * filled. It won't however, necessarily happen from
+ * process context. So the test for root memcg given
+ * the current task's memcg won't help us in this case.
+ *
+ * Respecting the original socket's memcg is a better
+ * decision in this case.
+ */
+ if (sk->sk_memcg) {
+ BUG_ON(mem_cgroup_is_root(sk->sk_memcg));
+ css_get(&sk->sk_memcg->css);
+ return;
+ }
+
+ rcu_read_lock();
+ memcg = mem_cgroup_from_task(current);
+ if (memcg != root_mem_cgroup &&
+ test_bit(MEMCG_SOCK_ACTIVE, &memcg->tcp_mem.flags) &&
+ css_tryget_online(&memcg->css))
+ sk->sk_memcg = memcg;
+ rcu_read_unlock();
+}
+EXPORT_SYMBOL(sock_update_memcg);
+
+void sock_release_memcg(struct sock *sk)
+{
+ WARN_ON(!sk->sk_memcg);
+ css_put(&sk->sk_memcg->css);
+}
+
+/**
+ * mem_cgroup_charge_skmem - charge socket memory
+ * @memcg: memcg to charge
+ * @nr_pages: number of pages to charge
+ *
+ * Charges @nr_pages to @memcg. Returns %true if the charge fit within
+ * @memcg's configured limit, %false if the charge had to be forced.
+ */
+bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
+{
+ struct page_counter *counter;
+
+ if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
+ nr_pages, &counter)) {
+ memcg->tcp_mem.memory_pressure = 0;
+ return true;
+ }
+ page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
+ memcg->tcp_mem.memory_pressure = 1;
+ return false;
+}
+
+/**
+ * mem_cgroup_uncharge_skmem - uncharge socket memory
+ * @memcg - memcg to uncharge
+ * @nr_pages - number of pages to uncharge
+ */
+void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
+{
+ page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
+}
+
+#endif
+
/*
* subsys_initcall() for memory controller.
*
--
2.6.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: [PATCH 12/14] mm: memcontrol: move socket code for unified hierarchy accounting
2015-11-12 23:41 ` [PATCH 12/14] mm: memcontrol: move socket code for unified hierarchy accounting Johannes Weiner
@ 2015-11-20 12:44 ` Vladimir Davydov
0 siblings, 0 replies; 61+ messages in thread
From: Vladimir Davydov @ 2015-11-20 12:44 UTC (permalink / raw)
To: Johannes Weiner
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Thu, Nov 12, 2015 at 06:41:31PM -0500, Johannes Weiner wrote:
> The unified hierarchy memory controller will account socket
> memory. Move the infrastructure functions accordingly.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 13/14] mm: memcontrol: account socket memory in unified hierarchy memory controller
2015-11-12 23:41 [PATCH 00/14] mm: memcontrol: account socket memory in unified hierarchy Johannes Weiner
` (11 preceding siblings ...)
2015-11-12 23:41 ` [PATCH 12/14] mm: memcontrol: move socket code for unified hierarchy accounting Johannes Weiner
@ 2015-11-12 23:41 ` Johannes Weiner
2015-11-16 15:59 ` Michal Hocko
[not found] ` <1447371693-25143-14-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2015-11-12 23:41 ` [PATCH 14/14] mm: memcontrol: hook up vmpressure to socket pressure Johannes Weiner
13 siblings, 2 replies; 61+ messages in thread
From: Johannes Weiner @ 2015-11-12 23:41 UTC (permalink / raw)
To: David Miller, Andrew Morton
Cc: Vladimir Davydov, Tejun Heo, Michal Hocko, netdev, linux-mm,
cgroups, linux-kernel, kernel-team
Socket memory can be a significant share of overall memory consumed by
common workloads. In order to provide reasonable resource isolation in
the unified hierarchy, this type of memory needs to be included in the
tracking/accounting of a cgroup under active memory resource control.
Overhead is only incurred when a non-root control group is created AND
the memory controller is instructed to track and account the memory
footprint of that group. cgroup.memory=nosocket can be specified on
the boot commandline to override any runtime configuration and
forcibly exclude socket memory from active memory resource control.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
include/linux/memcontrol.h | 12 ++++-
mm/memcontrol.c | 131 +++++++++++++++++++++++++++++++++++++--------
2 files changed, 118 insertions(+), 25 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 4cf5afa..809d6de 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -256,6 +256,10 @@ struct mem_cgroup {
struct wb_domain cgwb_domain;
#endif
+#ifdef CONFIG_INET
+ struct work_struct socket_work;
+#endif
+
/* List of events which userspace want to receive */
struct list_head event_list;
spinlock_t event_list_lock;
@@ -691,7 +695,7 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
#endif /* CONFIG_CGROUP_WRITEBACK */
-#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
+#ifdef CONFIG_INET
struct sock;
extern struct static_key memcg_sockets_enabled_key;
#define mem_cgroup_sockets_enabled static_key_false(&memcg_sockets_enabled_key)
@@ -701,11 +705,15 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
{
+#ifdef CONFIG_MEMCG_KMEM
return memcg->tcp_mem.memory_pressure;
+#else
+ return false;
+#endif
}
#else
#define mem_cgroup_sockets_enabled 0
-#endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
+#endif /* CONFIG_INET */
#ifdef CONFIG_MEMCG_KMEM
extern struct static_key memcg_kmem_enabled_key;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 408fb04..cad9525 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -80,6 +80,9 @@ struct mem_cgroup *root_mem_cgroup __read_mostly;
#define MEM_CGROUP_RECLAIM_RETRIES 5
+/* Socket memory accounting disabled? */
+static bool cgroup_memory_nosocket;
+
/* Whether the swap controller is active */
#ifdef CONFIG_MEMCG_SWAP
int do_swap_account __read_mostly;
@@ -1923,6 +1926,18 @@ static int memcg_cpu_hotplug_callback(struct notifier_block *nb,
return NOTIFY_OK;
}
+static void reclaim_high(struct mem_cgroup *memcg,
+ unsigned int nr_pages,
+ gfp_t gfp_mask)
+{
+ do {
+ if (page_counter_read(&memcg->memory) <= memcg->high)
+ continue;
+ mem_cgroup_events(memcg, MEMCG_HIGH, 1);
+ try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, true);
+ } while ((memcg = parent_mem_cgroup(memcg)));
+}
+
/*
* Scheduled by try_charge() to be executed from the userland return path
* and reclaims memory over the high limit.
@@ -1930,20 +1945,13 @@ static int memcg_cpu_hotplug_callback(struct notifier_block *nb,
void mem_cgroup_handle_over_high(void)
{
unsigned int nr_pages = current->memcg_nr_pages_over_high;
- struct mem_cgroup *memcg, *pos;
+ struct mem_cgroup *memcg;
if (likely(!nr_pages))
return;
- pos = memcg = get_mem_cgroup_from_mm(current->mm);
-
- do {
- if (page_counter_read(&pos->memory) <= pos->high)
- continue;
- mem_cgroup_events(pos, MEMCG_HIGH, 1);
- try_to_free_mem_cgroup_pages(pos, nr_pages, GFP_KERNEL, true);
- } while ((pos = parent_mem_cgroup(pos)));
-
+ memcg = get_mem_cgroup_from_mm(current->mm);
+ reclaim_high(memcg, nr_pages, GFP_KERNEL);
css_put(&memcg->css);
current->memcg_nr_pages_over_high = 0;
}
@@ -4141,6 +4149,8 @@ struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
}
EXPORT_SYMBOL(parent_mem_cgroup);
+static void socket_work_func(struct work_struct *work);
+
static struct cgroup_subsys_state * __ref
mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
{
@@ -4180,6 +4190,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
#ifdef CONFIG_CGROUP_WRITEBACK
INIT_LIST_HEAD(&memcg->cgwb_list);
#endif
+#ifdef CONFIG_INET
+ INIT_WORK(&memcg->socket_work, socket_work_func);
+#endif
return &memcg->css;
free_out:
@@ -4237,6 +4250,11 @@ mem_cgroup_css_online(struct cgroup_subsys_state *css)
if (ret)
return ret;
+#ifdef CONFIG_INET
+ if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
+ static_key_slow_inc(&memcg_sockets_enabled_key);
+#endif
+
/*
* Make sure the memcg is initialized: mem_cgroup_iter()
* orders reading memcg->initialized against its callers
@@ -4276,6 +4294,11 @@ static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
struct mem_cgroup *memcg = mem_cgroup_from_css(css);
memcg_destroy_kmem(memcg);
+#ifdef CONFIG_INET
+ if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
+ static_key_slow_dec(&memcg_sockets_enabled_key);
+ cancel_work_sync(&memcg->socket_work);
+#endif
__mem_cgroup_free(memcg);
}
@@ -5464,8 +5487,7 @@ void mem_cgroup_replace_page(struct page *oldpage, struct page *newpage)
commit_charge(newpage, memcg, true);
}
-/* Writing them here to avoid exposing memcg's inner layout */
-#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
+#ifdef CONFIG_INET
struct static_key memcg_sockets_enabled_key;
EXPORT_SYMBOL(memcg_sockets_enabled_key);
@@ -5490,10 +5512,16 @@ void sock_update_memcg(struct sock *sk)
rcu_read_lock();
memcg = mem_cgroup_from_task(current);
- if (memcg != root_mem_cgroup &&
- test_bit(MEMCG_SOCK_ACTIVE, &memcg->tcp_mem.flags) &&
- css_tryget_online(&memcg->css))
+ if (memcg == root_mem_cgroup)
+ goto out;
+#ifdef CONFIG_MEMCG_KMEM
+ if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) &&
+ !test_bit(MEMCG_SOCK_ACTIVE, &memcg->tcp_mem.flags))
+ goto out;
+#endif
+ if (css_tryget_online(&memcg->css))
sk->sk_memcg = memcg;
+out:
rcu_read_unlock();
}
EXPORT_SYMBOL(sock_update_memcg);
@@ -5504,6 +5532,14 @@ void sock_release_memcg(struct sock *sk)
css_put(&sk->sk_memcg->css);
}
+static void socket_work_func(struct work_struct *work)
+{
+ struct mem_cgroup *memcg;
+
+ memcg = container_of(work, struct mem_cgroup, socket_work);
+ reclaim_high(memcg, CHARGE_BATCH, GFP_KERNEL);
+}
+
/**
* mem_cgroup_charge_skmem - charge socket memory
* @memcg: memcg to charge
@@ -5514,16 +5550,43 @@ void sock_release_memcg(struct sock *sk)
*/
bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
{
+ unsigned int batch = max(CHARGE_BATCH, nr_pages);
struct page_counter *counter;
+ bool force = false;
- if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
- nr_pages, &counter)) {
- memcg->tcp_mem.memory_pressure = 0;
+#ifdef CONFIG_MEMCG_KMEM
+ if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
+ if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
+ nr_pages, &counter)) {
+ memcg->tcp_mem.memory_pressure = 0;
+ return true;
+ }
+ page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
+ memcg->tcp_mem.memory_pressure = 1;
+ return false;
+ }
+#endif
+ if (consume_stock(memcg, nr_pages))
return true;
+retry:
+ if (page_counter_try_charge(&memcg->memory, batch, &counter))
+ goto done;
+
+ if (batch > nr_pages) {
+ batch = nr_pages;
+ goto retry;
}
- page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
- memcg->tcp_mem.memory_pressure = 1;
- return false;
+
+ page_counter_charge(&memcg->memory, batch);
+ force = true;
+done:
+ css_get_many(&memcg->css, batch);
+ if (batch > nr_pages)
+ refill_stock(memcg, batch - nr_pages);
+
+ schedule_work(&memcg->socket_work);
+
+ return !force;
}
/**
@@ -5533,10 +5596,32 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
*/
void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
{
- page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
+#ifdef CONFIG_MEMCG_KMEM
+ if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
+ page_counter_uncharge(&memcg->tcp_mem.memory_allocated,
+ nr_pages);
+ return;
+ }
+#endif
+ page_counter_uncharge(&memcg->memory, nr_pages);
+ css_put_many(&memcg->css, nr_pages);
}
-#endif
+#endif /* CONFIG_INET */
+
+static int __init cgroup_memory(char *s)
+{
+ char *token;
+
+ while ((token = strsep(&s, ",")) != NULL) {
+ if (!*token)
+ continue;
+ if (!strcmp(token, "nosocket"))
+ cgroup_memory_nosocket = true;
+ }
+ return 0;
+}
+__setup("cgroup.memory=", cgroup_memory);
/*
* subsys_initcall() for memory controller.
--
2.6.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: [PATCH 13/14] mm: memcontrol: account socket memory in unified hierarchy memory controller
2015-11-12 23:41 ` [PATCH 13/14] mm: memcontrol: account socket memory in unified hierarchy memory controller Johannes Weiner
@ 2015-11-16 15:59 ` Michal Hocko
[not found] ` <20151116155923.GH14116-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
[not found] ` <1447371693-25143-14-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
1 sibling, 1 reply; 61+ messages in thread
From: Michal Hocko @ 2015-11-16 15:59 UTC (permalink / raw)
To: Johannes Weiner
Cc: David Miller, Andrew Morton, Vladimir Davydov, Tejun Heo, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Thu 12-11-15 18:41:32, Johannes Weiner wrote:
> Socket memory can be a significant share of overall memory consumed by
> common workloads. In order to provide reasonable resource isolation in
> the unified hierarchy, this type of memory needs to be included in the
> tracking/accounting of a cgroup under active memory resource control.
>
> Overhead is only incurred when a non-root control group is created AND
> the memory controller is instructed to track and account the memory
> footprint of that group. cgroup.memory=nosocket can be specified on
> the boot commandline to override any runtime configuration and
> forcibly exclude socket memory from active memory resource control.
Do you have any numbers about the overhead?
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
With a way to disable this feature I am OK with it.
cgroup.memory=nosocket should be documented (at least in
Documentation/kernel-parameters.txt)
Other than that
Acked-by: Michal Hocko <mhocko@suse.com>
> ---
> include/linux/memcontrol.h | 12 ++++-
> mm/memcontrol.c | 131 +++++++++++++++++++++++++++++++++++++--------
> 2 files changed, 118 insertions(+), 25 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 4cf5afa..809d6de 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -256,6 +256,10 @@ struct mem_cgroup {
> struct wb_domain cgwb_domain;
> #endif
>
> +#ifdef CONFIG_INET
> + struct work_struct socket_work;
> +#endif
> +
> /* List of events which userspace want to receive */
> struct list_head event_list;
> spinlock_t event_list_lock;
> @@ -691,7 +695,7 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback *wb,
>
> #endif /* CONFIG_CGROUP_WRITEBACK */
>
> -#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
> +#ifdef CONFIG_INET
> struct sock;
> extern struct static_key memcg_sockets_enabled_key;
> #define mem_cgroup_sockets_enabled static_key_false(&memcg_sockets_enabled_key)
> @@ -701,11 +705,15 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
> void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
> static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
> {
> +#ifdef CONFIG_MEMCG_KMEM
> return memcg->tcp_mem.memory_pressure;
> +#else
> + return false;
> +#endif
> }
> #else
> #define mem_cgroup_sockets_enabled 0
> -#endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
> +#endif /* CONFIG_INET */
>
> #ifdef CONFIG_MEMCG_KMEM
> extern struct static_key memcg_kmem_enabled_key;
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 408fb04..cad9525 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -80,6 +80,9 @@ struct mem_cgroup *root_mem_cgroup __read_mostly;
>
> #define MEM_CGROUP_RECLAIM_RETRIES 5
>
> +/* Socket memory accounting disabled? */
> +static bool cgroup_memory_nosocket;
> +
> /* Whether the swap controller is active */
> #ifdef CONFIG_MEMCG_SWAP
> int do_swap_account __read_mostly;
> @@ -1923,6 +1926,18 @@ static int memcg_cpu_hotplug_callback(struct notifier_block *nb,
> return NOTIFY_OK;
> }
>
> +static void reclaim_high(struct mem_cgroup *memcg,
> + unsigned int nr_pages,
> + gfp_t gfp_mask)
> +{
> + do {
> + if (page_counter_read(&memcg->memory) <= memcg->high)
> + continue;
> + mem_cgroup_events(memcg, MEMCG_HIGH, 1);
> + try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, true);
> + } while ((memcg = parent_mem_cgroup(memcg)));
> +}
> +
> /*
> * Scheduled by try_charge() to be executed from the userland return path
> * and reclaims memory over the high limit.
> @@ -1930,20 +1945,13 @@ static int memcg_cpu_hotplug_callback(struct notifier_block *nb,
> void mem_cgroup_handle_over_high(void)
> {
> unsigned int nr_pages = current->memcg_nr_pages_over_high;
> - struct mem_cgroup *memcg, *pos;
> + struct mem_cgroup *memcg;
>
> if (likely(!nr_pages))
> return;
>
> - pos = memcg = get_mem_cgroup_from_mm(current->mm);
> -
> - do {
> - if (page_counter_read(&pos->memory) <= pos->high)
> - continue;
> - mem_cgroup_events(pos, MEMCG_HIGH, 1);
> - try_to_free_mem_cgroup_pages(pos, nr_pages, GFP_KERNEL, true);
> - } while ((pos = parent_mem_cgroup(pos)));
> -
> + memcg = get_mem_cgroup_from_mm(current->mm);
> + reclaim_high(memcg, nr_pages, GFP_KERNEL);
> css_put(&memcg->css);
> current->memcg_nr_pages_over_high = 0;
> }
> @@ -4141,6 +4149,8 @@ struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
> }
> EXPORT_SYMBOL(parent_mem_cgroup);
>
> +static void socket_work_func(struct work_struct *work);
> +
> static struct cgroup_subsys_state * __ref
> mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> {
> @@ -4180,6 +4190,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> #ifdef CONFIG_CGROUP_WRITEBACK
> INIT_LIST_HEAD(&memcg->cgwb_list);
> #endif
> +#ifdef CONFIG_INET
> + INIT_WORK(&memcg->socket_work, socket_work_func);
> +#endif
> return &memcg->css;
>
> free_out:
> @@ -4237,6 +4250,11 @@ mem_cgroup_css_online(struct cgroup_subsys_state *css)
> if (ret)
> return ret;
>
> +#ifdef CONFIG_INET
> + if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
> + static_key_slow_inc(&memcg_sockets_enabled_key);
> +#endif
> +
> /*
> * Make sure the memcg is initialized: mem_cgroup_iter()
> * orders reading memcg->initialized against its callers
> @@ -4276,6 +4294,11 @@ static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
> struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>
> memcg_destroy_kmem(memcg);
> +#ifdef CONFIG_INET
> + if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
> + static_key_slow_dec(&memcg_sockets_enabled_key);
> + cancel_work_sync(&memcg->socket_work);
> +#endif
> __mem_cgroup_free(memcg);
> }
>
> @@ -5464,8 +5487,7 @@ void mem_cgroup_replace_page(struct page *oldpage, struct page *newpage)
> commit_charge(newpage, memcg, true);
> }
>
> -/* Writing them here to avoid exposing memcg's inner layout */
> -#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
> +#ifdef CONFIG_INET
>
> struct static_key memcg_sockets_enabled_key;
> EXPORT_SYMBOL(memcg_sockets_enabled_key);
> @@ -5490,10 +5512,16 @@ void sock_update_memcg(struct sock *sk)
>
> rcu_read_lock();
> memcg = mem_cgroup_from_task(current);
> - if (memcg != root_mem_cgroup &&
> - test_bit(MEMCG_SOCK_ACTIVE, &memcg->tcp_mem.flags) &&
> - css_tryget_online(&memcg->css))
> + if (memcg == root_mem_cgroup)
> + goto out;
> +#ifdef CONFIG_MEMCG_KMEM
> + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) &&
> + !test_bit(MEMCG_SOCK_ACTIVE, &memcg->tcp_mem.flags))
> + goto out;
> +#endif
> + if (css_tryget_online(&memcg->css))
> sk->sk_memcg = memcg;
> +out:
> rcu_read_unlock();
> }
> EXPORT_SYMBOL(sock_update_memcg);
> @@ -5504,6 +5532,14 @@ void sock_release_memcg(struct sock *sk)
> css_put(&sk->sk_memcg->css);
> }
>
> +static void socket_work_func(struct work_struct *work)
> +{
> + struct mem_cgroup *memcg;
> +
> + memcg = container_of(work, struct mem_cgroup, socket_work);
> + reclaim_high(memcg, CHARGE_BATCH, GFP_KERNEL);
> +}
> +
> /**
> * mem_cgroup_charge_skmem - charge socket memory
> * @memcg: memcg to charge
> @@ -5514,16 +5550,43 @@ void sock_release_memcg(struct sock *sk)
> */
> bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> {
> + unsigned int batch = max(CHARGE_BATCH, nr_pages);
> struct page_counter *counter;
> + bool force = false;
>
> - if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> - nr_pages, &counter)) {
> - memcg->tcp_mem.memory_pressure = 0;
> +#ifdef CONFIG_MEMCG_KMEM
> + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> + if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> + nr_pages, &counter)) {
> + memcg->tcp_mem.memory_pressure = 0;
> + return true;
> + }
> + page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> + memcg->tcp_mem.memory_pressure = 1;
> + return false;
> + }
> +#endif
> + if (consume_stock(memcg, nr_pages))
> return true;
> +retry:
> + if (page_counter_try_charge(&memcg->memory, batch, &counter))
> + goto done;
> +
> + if (batch > nr_pages) {
> + batch = nr_pages;
> + goto retry;
> }
> - page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> - memcg->tcp_mem.memory_pressure = 1;
> - return false;
> +
> + page_counter_charge(&memcg->memory, batch);
> + force = true;
> +done:
> + css_get_many(&memcg->css, batch);
> + if (batch > nr_pages)
> + refill_stock(memcg, batch - nr_pages);
> +
> + schedule_work(&memcg->socket_work);
> +
> + return !force;
> }
>
> /**
> @@ -5533,10 +5596,32 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> */
> void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> {
> - page_counter_uncharge(&memcg->tcp_mem.memory_allocated, nr_pages);
> +#ifdef CONFIG_MEMCG_KMEM
> + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> + page_counter_uncharge(&memcg->tcp_mem.memory_allocated,
> + nr_pages);
> + return;
> + }
> +#endif
> + page_counter_uncharge(&memcg->memory, nr_pages);
> + css_put_many(&memcg->css, nr_pages);
> }
>
> -#endif
> +#endif /* CONFIG_INET */
> +
> +static int __init cgroup_memory(char *s)
> +{
> + char *token;
> +
> + while ((token = strsep(&s, ",")) != NULL) {
> + if (!*token)
> + continue;
> + if (!strcmp(token, "nosocket"))
> + cgroup_memory_nosocket = true;
> + }
> + return 0;
> +}
> +__setup("cgroup.memory=", cgroup_memory);
>
> /*
> * subsys_initcall() for memory controller.
> --
> 2.6.2
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread[parent not found: <1447371693-25143-14-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>]
* Re: [PATCH 13/14] mm: memcontrol: account socket memory in unified hierarchy memory controller
[not found] ` <1447371693-25143-14-git-send-email-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
@ 2015-11-20 13:10 ` Vladimir Davydov
2015-11-20 19:25 ` Johannes Weiner
0 siblings, 1 reply; 61+ messages in thread
From: Vladimir Davydov @ 2015-11-20 13:10 UTC (permalink / raw)
To: Johannes Weiner
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko,
netdev-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
On Thu, Nov 12, 2015 at 06:41:32PM -0500, Johannes Weiner wrote:
...
> @@ -5514,16 +5550,43 @@ void sock_release_memcg(struct sock *sk)
> */
> bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> {
> + unsigned int batch = max(CHARGE_BATCH, nr_pages);
> struct page_counter *counter;
> + bool force = false;
>
> - if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> - nr_pages, &counter)) {
> - memcg->tcp_mem.memory_pressure = 0;
> +#ifdef CONFIG_MEMCG_KMEM
> + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> + if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> + nr_pages, &counter)) {
> + memcg->tcp_mem.memory_pressure = 0;
> + return true;
> + }
> + page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> + memcg->tcp_mem.memory_pressure = 1;
> + return false;
> + }
> +#endif
> + if (consume_stock(memcg, nr_pages))
> return true;
> +retry:
> + if (page_counter_try_charge(&memcg->memory, batch, &counter))
> + goto done;
> +
> + if (batch > nr_pages) {
> + batch = nr_pages;
> + goto retry;
> }
> - page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> - memcg->tcp_mem.memory_pressure = 1;
> - return false;
> +
> + page_counter_charge(&memcg->memory, batch);
> + force = true;
> +done:
> + css_get_many(&memcg->css, batch);
Is there any point to get css reference per each charged page? For kmem
it is absolutely necessary, because dangling slabs must block
destruction of memcg's kmem caches, which are destroyed on css_free. But
for sockets there's no such problem: memcg will be destroyed only after
all sockets are destroyed and therefore uncharged (since
sock_update_memcg pins css).
> + if (batch > nr_pages)
> + refill_stock(memcg, batch - nr_pages);
> +
> + schedule_work(&memcg->socket_work);
I think it's suboptimal to schedule the work even if we are below the
high threshold.
BTW why do we need this work at all? Why is reclaim_high called from
task_work not enough?
Thanks,
Vladimir
> +
> + return !force;
> }
>
> /**
^ permalink raw reply [flat|nested] 61+ messages in thread* Re: [PATCH 13/14] mm: memcontrol: account socket memory in unified hierarchy memory controller
2015-11-20 13:10 ` Vladimir Davydov
@ 2015-11-20 19:25 ` Johannes Weiner
[not found] ` <20151120192506.GD5623-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
0 siblings, 1 reply; 61+ messages in thread
From: Johannes Weiner @ 2015-11-20 19:25 UTC (permalink / raw)
To: Vladimir Davydov
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko,
netdev-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
On Fri, Nov 20, 2015 at 04:10:33PM +0300, Vladimir Davydov wrote:
> On Thu, Nov 12, 2015 at 06:41:32PM -0500, Johannes Weiner wrote:
> ...
> > @@ -5514,16 +5550,43 @@ void sock_release_memcg(struct sock *sk)
> > */
> > bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
> > {
> > + unsigned int batch = max(CHARGE_BATCH, nr_pages);
> > struct page_counter *counter;
> > + bool force = false;
> >
> > - if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> > - nr_pages, &counter)) {
> > - memcg->tcp_mem.memory_pressure = 0;
> > +#ifdef CONFIG_MEMCG_KMEM
> > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > + if (page_counter_try_charge(&memcg->tcp_mem.memory_allocated,
> > + nr_pages, &counter)) {
> > + memcg->tcp_mem.memory_pressure = 0;
> > + return true;
> > + }
> > + page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > + memcg->tcp_mem.memory_pressure = 1;
> > + return false;
> > + }
> > +#endif
> > + if (consume_stock(memcg, nr_pages))
> > return true;
> > +retry:
> > + if (page_counter_try_charge(&memcg->memory, batch, &counter))
> > + goto done;
> > +
> > + if (batch > nr_pages) {
> > + batch = nr_pages;
> > + goto retry;
> > }
> > - page_counter_charge(&memcg->tcp_mem.memory_allocated, nr_pages);
> > - memcg->tcp_mem.memory_pressure = 1;
> > - return false;
> > +
> > + page_counter_charge(&memcg->memory, batch);
> > + force = true;
> > +done:
>
> > + css_get_many(&memcg->css, batch);
>
> Is there any point to get css reference per each charged page? For kmem
> it is absolutely necessary, because dangling slabs must block
> destruction of memcg's kmem caches, which are destroyed on css_free. But
> for sockets there's no such problem: memcg will be destroyed only after
> all sockets are destroyed and therefore uncharged (since
> sock_update_memcg pins css).
I'm afraid we have to when we want to share 'stock' with cache and
anon pages, which hold individual references. drain_stock() always
assumes one reference per cached page.
> > + if (batch > nr_pages)
> > + refill_stock(memcg, batch - nr_pages);
> > +
> > + schedule_work(&memcg->socket_work);
>
> I think it's suboptimal to schedule the work even if we are below the
> high threshold.
Hm, it seemed unnecessary to duplicate the hierarchy check since this
is in the batch-exhausted slowpath anyway.
> BTW why do we need this work at all? Why is reclaim_high called from
> task_work not enough?
The problem lies in the memcg association: the random task that gets
interrupted by an arriving packet might not be in the same memcg as
the one owning receiving socket. And multiple interrupts could happen
while we're in the kernel already charging pages. We'd basically have
to maintain a list of memcgs that need to run reclaim_high associated
with current.
^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 14/14] mm: memcontrol: hook up vmpressure to socket pressure
2015-11-12 23:41 [PATCH 00/14] mm: memcontrol: account socket memory in unified hierarchy Johannes Weiner
` (12 preceding siblings ...)
2015-11-12 23:41 ` [PATCH 13/14] mm: memcontrol: account socket memory in unified hierarchy memory controller Johannes Weiner
@ 2015-11-12 23:41 ` Johannes Weiner
2015-11-15 13:54 ` Vladimir Davydov
13 siblings, 1 reply; 61+ messages in thread
From: Johannes Weiner @ 2015-11-12 23:41 UTC (permalink / raw)
To: David Miller, Andrew Morton
Cc: Vladimir Davydov, Tejun Heo, Michal Hocko, netdev, linux-mm,
cgroups, linux-kernel, kernel-team
Let the networking stack know when a memcg is under reclaim pressure
so that it can clamp its transmit windows accordingly.
Whenever the reclaim efficiency of a cgroup's LRU lists drops low
enough for a MEDIUM or HIGH vmpressure event to occur, assert a
pressure state in the socket and tcp memory code that tells it to curb
consumption growth from sockets associated with said control group.
vmpressure events are naturally edge triggered, so for hysteresis
assert socket pressure for a second to allow for subsequent vmpressure
events to occur before letting the socket code return to normal.
This will likely need finetuning for a wider variety of workloads, but
for now stick to the vmpressure presets and keep hysteresis simple.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
include/linux/memcontrol.h | 29 +++++++++++++++++++++++++----
mm/memcontrol.c | 15 +--------------
mm/vmpressure.c | 25 ++++++++++++++++++++-----
3 files changed, 46 insertions(+), 23 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 809d6de..dba43cb 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -258,6 +258,7 @@ struct mem_cgroup {
#ifdef CONFIG_INET
struct work_struct socket_work;
+ unsigned long socket_pressure;
#endif
/* List of events which userspace want to receive */
@@ -303,18 +304,34 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *);
bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg);
struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p);
-struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg);
static inline
struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){
return css ? container_of(css, struct mem_cgroup, css) : NULL;
}
+#define mem_cgroup_from_counter(counter, member) \
+ container_of(counter, struct mem_cgroup, member)
+
struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *,
struct mem_cgroup *,
struct mem_cgroup_reclaim_cookie *);
void mem_cgroup_iter_break(struct mem_cgroup *, struct mem_cgroup *);
+/**
+ * parent_mem_cgroup - find the accounting parent of a memcg
+ * @memcg: memcg whose parent to find
+ *
+ * Returns the parent memcg, or NULL if this is the root or the memory
+ * controller is in legacy no-hierarchy mode.
+ */
+static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
+{
+ if (!memcg->memory.parent)
+ return NULL;
+ return mem_cgroup_from_counter(memcg->memory.parent, memory);
+}
+
static inline bool mem_cgroup_is_descendant(struct mem_cgroup *memcg,
struct mem_cgroup *root)
{
@@ -706,10 +723,14 @@ void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
{
#ifdef CONFIG_MEMCG_KMEM
- return memcg->tcp_mem.memory_pressure;
-#else
- return false;
+ if (memcg->tcp_mem.memory_pressure)
+ return true;
#endif
+ do {
+ if (time_before(jiffies, memcg->socket_pressure))
+ return true;
+ } while ((memcg = parent_mem_cgroup(memcg)));
+ return false;
}
#else
#define mem_cgroup_sockets_enabled 0
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index cad9525..4068662 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1091,9 +1091,6 @@ bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg)
return ret;
}
-#define mem_cgroup_from_counter(counter, member) \
- container_of(counter, struct mem_cgroup, member)
-
/**
* mem_cgroup_margin - calculate chargeable space of a memory cgroup
* @memcg: the memory cgroup
@@ -4138,17 +4135,6 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
kfree(memcg);
}
-/*
- * Returns the parent mem_cgroup in memcgroup hierarchy with hierarchy enabled.
- */
-struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
-{
- if (!memcg->memory.parent)
- return NULL;
- return mem_cgroup_from_counter(memcg->memory.parent, memory);
-}
-EXPORT_SYMBOL(parent_mem_cgroup);
-
static void socket_work_func(struct work_struct *work);
static struct cgroup_subsys_state * __ref
@@ -4192,6 +4178,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
#endif
#ifdef CONFIG_INET
INIT_WORK(&memcg->socket_work, socket_work_func);
+ memcg->socket_pressure = jiffies;
#endif
return &memcg->css;
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 4c25e62..07e8440 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -137,14 +137,11 @@ struct vmpressure_event {
};
static bool vmpressure_event(struct vmpressure *vmpr,
- unsigned long scanned, unsigned long reclaimed)
+ enum vmpressure_levels level)
{
struct vmpressure_event *ev;
- enum vmpressure_levels level;
bool signalled = false;
- level = vmpressure_calc_level(scanned, reclaimed);
-
mutex_lock(&vmpr->events_lock);
list_for_each_entry(ev, &vmpr->events, node) {
@@ -162,6 +159,7 @@ static bool vmpressure_event(struct vmpressure *vmpr,
static void vmpressure_work_fn(struct work_struct *work)
{
struct vmpressure *vmpr = work_to_vmpressure(work);
+ enum vmpressure_levels level;
unsigned long scanned;
unsigned long reclaimed;
@@ -185,8 +183,25 @@ static void vmpressure_work_fn(struct work_struct *work)
vmpr->reclaimed = 0;
spin_unlock(&vmpr->sr_lock);
+ level = vmpressure_calc_level(scanned, reclaimed);
+
+ if (level > VMPRESSURE_LOW) {
+ struct mem_cgroup *memcg;
+ /*
+ * Let the socket buffer allocator know that we are
+ * having trouble reclaiming LRU pages.
+ *
+ * For hysteresis, keep the pressure state asserted
+ * for a second in which subsequent pressure events
+ * can occur.
+ */
+ memcg = container_of(vmpr, struct mem_cgroup, vmpressure);
+ if (memcg != root_mem_cgroup)
+ memcg->socket_pressure = jiffies + HZ;
+ }
+
do {
- if (vmpressure_event(vmpr, scanned, reclaimed))
+ if (vmpressure_event(vmpr, level))
break;
/*
* If not handled, propagate the event upward into the
--
2.6.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 61+ messages in thread* Re: [PATCH 14/14] mm: memcontrol: hook up vmpressure to socket pressure
2015-11-12 23:41 ` [PATCH 14/14] mm: memcontrol: hook up vmpressure to socket pressure Johannes Weiner
@ 2015-11-15 13:54 ` Vladimir Davydov
2015-11-16 18:53 ` Johannes Weiner
0 siblings, 1 reply; 61+ messages in thread
From: Vladimir Davydov @ 2015-11-15 13:54 UTC (permalink / raw)
To: Johannes Weiner
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Thu, Nov 12, 2015 at 06:41:33PM -0500, Johannes Weiner wrote:
> Let the networking stack know when a memcg is under reclaim pressure
> so that it can clamp its transmit windows accordingly.
>
> Whenever the reclaim efficiency of a cgroup's LRU lists drops low
> enough for a MEDIUM or HIGH vmpressure event to occur, assert a
> pressure state in the socket and tcp memory code that tells it to curb
> consumption growth from sockets associated with said control group.
>
> vmpressure events are naturally edge triggered, so for hysteresis
> assert socket pressure for a second to allow for subsequent vmpressure
> events to occur before letting the socket code return to normal.
AFAICS, in contrast to v1, now you don't modify vmpressure behavior,
which means socket_pressure will only be set when cgroup hits its
high/hard limit. On tightly packed system, this is unlikely IMO -
cgroups will mostly experience pressure due to memory shortage at the
global level and/or their low limit configuration, in which case no
vmpressure events will be triggered and therefore tcp window won't be
clamped accordingly.
May be, we could use a per memcg slab shrinker to detect memory
pressure? This looks like abusing shrinkers API though.
Thanks,
Vladimir
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 14/14] mm: memcontrol: hook up vmpressure to socket pressure
2015-11-15 13:54 ` Vladimir Davydov
@ 2015-11-16 18:53 ` Johannes Weiner
[not found] ` <20151116185316.GC32544-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
0 siblings, 1 reply; 61+ messages in thread
From: Johannes Weiner @ 2015-11-16 18:53 UTC (permalink / raw)
To: Vladimir Davydov
Cc: David Miller, Andrew Morton, Tejun Heo, Michal Hocko, netdev,
linux-mm, cgroups, linux-kernel, kernel-team
On Sun, Nov 15, 2015 at 04:54:57PM +0300, Vladimir Davydov wrote:
> On Thu, Nov 12, 2015 at 06:41:33PM -0500, Johannes Weiner wrote:
> > Let the networking stack know when a memcg is under reclaim pressure
> > so that it can clamp its transmit windows accordingly.
> >
> > Whenever the reclaim efficiency of a cgroup's LRU lists drops low
> > enough for a MEDIUM or HIGH vmpressure event to occur, assert a
> > pressure state in the socket and tcp memory code that tells it to curb
> > consumption growth from sockets associated with said control group.
> >
> > vmpressure events are naturally edge triggered, so for hysteresis
> > assert socket pressure for a second to allow for subsequent vmpressure
> > events to occur before letting the socket code return to normal.
>
> AFAICS, in contrast to v1, now you don't modify vmpressure behavior,
> which means socket_pressure will only be set when cgroup hits its
> high/hard limit. On tightly packed system, this is unlikely IMO -
> cgroups will mostly experience pressure due to memory shortage at the
> global level and/or their low limit configuration, in which case no
> vmpressure events will be triggered and therefore tcp window won't be
> clamped accordingly.
Yeah, this is an inherent problem in the vmpressure design and it
makes the feature significantly less useful than it could be IMO.
But you guys were wary about the patch that changed it, and this
series has kicked up enough dust already, so I backed it out.
But this will still be useful. Yes, it won't help in rebalancing an
regularly working system, which would be cool, but it'll still help
contain a worklad that is growing beyond expectations, which is the
scenario that kickstarted this work.
> May be, we could use a per memcg slab shrinker to detect memory
> pressure? This looks like abusing shrinkers API though.
Actually, I thought about doing this long-term.
Shrinkers are a nice way to export VM pressure to auxiliary allocators
and caches. But currently, the only metric we export is LRU scan rate,
whose application is limited to ageable caches: it doesn't make sense
to cause auxiliary workingsets to shrink when the VM is merely picking
up the drop-behind pages of a one-off page cache stream. I think it
would make sense for shrinkers to include reclaim efficiency so that
they can be used by caches that don't have 'accessed' bits and object
rotation, but are able to shrink based on the cost they're imposing.
But a change like this is beyond the scope of this series, IMO.
^ permalink raw reply [flat|nested] 61+ messages in thread