From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED9C4C433ED for ; Mon, 5 Apr 2021 18:09:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 92E5061246 for ; Mon, 5 Apr 2021 18:09:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 92E5061246 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id ECB586B0080; Mon, 5 Apr 2021 14:09:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E7BC56B0081; Mon, 5 Apr 2021 14:09:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CAD916B0082; Mon, 5 Apr 2021 14:09:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0003.hostedemail.com [216.40.44.3]) by kanga.kvack.org (Postfix) with ESMTP id 9E5FF6B0080 for ; Mon, 5 Apr 2021 14:09:08 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 57C8110EED for ; Mon, 5 Apr 2021 18:09:08 +0000 (UTC) X-FDA: 77999099976.30.B79D913 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf06.hostedemail.com (Postfix) with ESMTP id 4D646C0007CD for ; Mon, 5 Apr 2021 18:09:08 +0000 (UTC) IronPort-SDR: pNSMkDzL+eas6fmstSYlgNC3gIkFigSmQz+RojRwLkxlKLa4Ejgd5XNE48YRG0iljcoULbfC6k UNDkcEe/meHg== X-IronPort-AV: E=McAfee;i="6000,8403,9945"; a="172968216" X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="172968216" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2021 11:09:07 -0700 IronPort-SDR: esczU7xkNFYd+U792N4yBhq03+o9yWHZ+pxucD+0EXr0OSKlGEAfINce25lk9aQB5pgaByexfH ostO18fcP15A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="448153925" Received: from skl-02.jf.intel.com ([10.54.74.28]) by fmsmga002.fm.intel.com with ESMTP; 05 Apr 2021 11:09:06 -0700 From: Tim Chen To: Michal Hocko Cc: Tim Chen , Johannes Weiner , Andrew Morton , Dave Hansen , Ying Huang , Dan Williams , David Rientjes , Shakeel Butt , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v1 06/11] mm: Handle top tier memory in cgroup soft limit memory tree utilities Date: Mon, 5 Apr 2021 10:08:30 -0700 Message-Id: <86f4bad592a5232226c1779e6acce117a32b41ee.1617642417.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 4D646C0007CD X-Stat-Signature: tesg9pp5ka6cnqcuei3inrrf5tit8c7b Received-SPF: none (linux.intel.com>: No applicable sender policy available) receiver=imf06; identity=mailfrom; envelope-from=""; helo=mga17.intel.com; client-ip=192.55.52.151 X-HE-DKIM-Result: none/none X-HE-Tag: 1617646148-242536 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Update the utility functions __mem_cgroup_insert_exceeded() and __mem_cgroup_remove_exceeded(), to allow addition and removal of cgroups from the new red black tree that tracks the cgroups that exceed their toptier memory limits. Update also the function +mem_cgroup_largest_soft_limit_node(), to allow returning the cgroup that has the largest exceess usage of toptier memory. Signed-off-by: Tim Chen --- include/linux/memcontrol.h | 9 +++ mm/memcontrol.c | 152 +++++++++++++++++++++++++++---------- 2 files changed, 122 insertions(+), 39 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 609d8590950c..0ed8ddfd5436 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -124,6 +124,15 @@ struct mem_cgroup_per_node { unsigned long usage_in_excess;/* Set to the value by which */ /* the soft limit is exceeded*/ bool on_tree; + + struct rb_node toptier_tree_node; /* RB tree node */ + unsigned long toptier_usage_in_excess; /* Set to the value by which */ + /* the soft limit is exceeded*/ + bool on_toptier_tree; + + bool congested; /* memcg has many dirty pages */ + /* backed by a congested BDI */ + struct mem_cgroup *memcg; /* Back pointer, we cannot */ /* use container_of */ }; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 90a78ff3fca8..8a7648b79635 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -616,24 +616,44 @@ soft_limit_tree_from_page(struct page *page, enum n= ode_states type) =20 static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, struct mem_cgroup_tree_per_node *mctz, - unsigned long new_usage_in_excess) + unsigned long new_usage_in_excess, + enum node_states type) { struct rb_node **p =3D &mctz->rb_root.rb_node; - struct rb_node *parent =3D NULL; + struct rb_node *parent =3D NULL, *mz_tree_node; struct mem_cgroup_per_node *mz_node; - bool rightmost =3D true; + bool rightmost =3D true, *mz_on_tree; + unsigned long usage_in_excess, *mz_usage_in_excess; =20 - if (mz->on_tree) + if (type =3D=3D N_TOPTIER) { + mz_usage_in_excess =3D &mz->toptier_usage_in_excess; + mz_tree_node =3D &mz->toptier_tree_node; + mz_on_tree =3D &mz->on_toptier_tree; + } else { + mz_usage_in_excess =3D &mz->usage_in_excess; + mz_tree_node =3D &mz->tree_node; + mz_on_tree =3D &mz->on_tree; + } + + if (*mz_on_tree) return; =20 - mz->usage_in_excess =3D new_usage_in_excess; - if (!mz->usage_in_excess) + if (!new_usage_in_excess) return; + while (*p) { parent =3D *p; - mz_node =3D rb_entry(parent, struct mem_cgroup_per_node, + if (type =3D=3D N_TOPTIER) { + mz_node =3D rb_entry(parent, struct mem_cgroup_per_node, + toptier_tree_node); + usage_in_excess =3D mz_node->toptier_usage_in_excess; + } else { + mz_node =3D rb_entry(parent, struct mem_cgroup_per_node, tree_node); - if (mz->usage_in_excess < mz_node->usage_in_excess) { + usage_in_excess =3D mz_node->usage_in_excess; + } + + if (new_usage_in_excess < usage_in_excess) { p =3D &(*p)->rb_left; rightmost =3D false; } else { @@ -642,33 +662,47 @@ static void __mem_cgroup_insert_exceeded(struct mem= _cgroup_per_node *mz, } =20 if (rightmost) - mctz->rb_rightmost =3D &mz->tree_node; + mctz->rb_rightmost =3D mz_tree_node; =20 - rb_link_node(&mz->tree_node, parent, p); - rb_insert_color(&mz->tree_node, &mctz->rb_root); - mz->on_tree =3D true; + rb_link_node(mz_tree_node, parent, p); + rb_insert_color(mz_tree_node, &mctz->rb_root); + *mz_usage_in_excess =3D new_usage_in_excess; + *mz_on_tree =3D true; } =20 static void __mem_cgroup_remove_exceeded(struct mem_cgroup_per_node *mz, - struct mem_cgroup_tree_per_node *mctz) + struct mem_cgroup_tree_per_node *mctz, + enum node_states type) { - if (!mz->on_tree) + bool *mz_on_tree; + struct rb_node *mz_tree_node; + + if (type =3D=3D N_TOPTIER) { + mz_tree_node =3D &mz->toptier_tree_node; + mz_on_tree =3D &mz->on_toptier_tree; + } else { + mz_tree_node =3D &mz->tree_node; + mz_on_tree =3D &mz->on_tree; + } + + if (!(*mz_on_tree)) return; =20 - if (&mz->tree_node =3D=3D mctz->rb_rightmost) - mctz->rb_rightmost =3D rb_prev(&mz->tree_node); + if (mz_tree_node =3D=3D mctz->rb_rightmost) + mctz->rb_rightmost =3D rb_prev(mz_tree_node); =20 - rb_erase(&mz->tree_node, &mctz->rb_root); - mz->on_tree =3D false; + rb_erase(mz_tree_node, &mctz->rb_root); + *mz_on_tree =3D false; } =20 static void mem_cgroup_remove_exceeded(struct mem_cgroup_per_node *mz, - struct mem_cgroup_tree_per_node *mctz) + struct mem_cgroup_tree_per_node *mctz, + enum node_states type) { unsigned long flags; =20 spin_lock_irqsave(&mctz->lock, flags); - __mem_cgroup_remove_exceeded(mz, mctz); + __mem_cgroup_remove_exceeded(mz, mctz, type); spin_unlock_irqrestore(&mctz->lock, flags); } =20 @@ -696,13 +730,18 @@ static unsigned long soft_limit_excess(struct mem_c= group *memcg, enum node_state return excess; } =20 -static void mem_cgroup_update_tree(struct mem_cgroup *memcg, struct page= *page) +static void mem_cgroup_update_tree(struct mem_cgroup *bottom_memcg, stru= ct page *page) { unsigned long excess; struct mem_cgroup_per_node *mz; struct mem_cgroup_tree_per_node *mctz; + enum node_states type =3D N_MEMORY; + struct mem_cgroup *memcg; + +repeat_toptier: + memcg =3D bottom_memcg; + mctz =3D soft_limit_tree_from_page(page, type); =20 - mctz =3D soft_limit_tree_from_page(page, N_MEMORY); if (!mctz) return; /* @@ -710,27 +749,37 @@ static void mem_cgroup_update_tree(struct mem_cgrou= p *memcg, struct page *page) * because their event counter is not touched. */ for (; memcg; memcg =3D parent_mem_cgroup(memcg)) { + bool on_tree; + mz =3D mem_cgroup_page_nodeinfo(memcg, page); - excess =3D soft_limit_excess(memcg, N_MEMORY); + excess =3D soft_limit_excess(memcg, type); + + on_tree =3D (type =3D=3D N_MEMORY) ? mz->on_tree: mz->on_toptier_tree; /* * We have to update the tree if mz is on RB-tree or * mem is over its softlimit. */ - if (excess || mz->on_tree) { + if (excess || on_tree) { unsigned long flags; =20 spin_lock_irqsave(&mctz->lock, flags); /* if on-tree, remove it */ - if (mz->on_tree) - __mem_cgroup_remove_exceeded(mz, mctz); + if (on_tree) + __mem_cgroup_remove_exceeded(mz, mctz, type); + /* * Insert again. mz->usage_in_excess will be updated. * If excess is 0, no tree ops. */ - __mem_cgroup_insert_exceeded(mz, mctz, excess); + __mem_cgroup_insert_exceeded(mz, mctz, excess, type); + spin_unlock_irqrestore(&mctz->lock, flags); } } + if (type =3D=3D N_MEMORY) { + type =3D N_TOPTIER; + goto repeat_toptier; + } } =20 static void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg) @@ -743,12 +792,16 @@ static void mem_cgroup_remove_from_trees(struct mem= _cgroup *memcg) mz =3D mem_cgroup_nodeinfo(memcg, nid); mctz =3D soft_limit_tree_node(nid, N_MEMORY); if (mctz) - mem_cgroup_remove_exceeded(mz, mctz); + mem_cgroup_remove_exceeded(mz, mctz, N_MEMORY); + mctz =3D soft_limit_tree_node(nid, N_TOPTIER); + if (mctz) + mem_cgroup_remove_exceeded(mz, mctz, N_TOPTIER); } } =20 static struct mem_cgroup_per_node * -__mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mc= tz) +__mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mc= tz, + enum node_states type) { struct mem_cgroup_per_node *mz; =20 @@ -757,15 +810,19 @@ __mem_cgroup_largest_soft_limit_node(struct mem_cgr= oup_tree_per_node *mctz) if (!mctz->rb_rightmost) goto done; /* Nothing to reclaim from */ =20 - mz =3D rb_entry(mctz->rb_rightmost, + if (type =3D=3D N_TOPTIER) + mz =3D rb_entry(mctz->rb_rightmost, + struct mem_cgroup_per_node, toptier_tree_node); + else + mz =3D rb_entry(mctz->rb_rightmost, struct mem_cgroup_per_node, tree_node); /* * Remove the node now but someone else can add it back, * we will to add it back at the end of reclaim to its correct * position in the tree. */ - __mem_cgroup_remove_exceeded(mz, mctz); - if (!soft_limit_excess(mz->memcg, N_MEMORY) || + __mem_cgroup_remove_exceeded(mz, mctz, type); + if (!soft_limit_excess(mz->memcg, type) || !css_tryget(&mz->memcg->css)) goto retry; done: @@ -773,12 +830,13 @@ __mem_cgroup_largest_soft_limit_node(struct mem_cgr= oup_tree_per_node *mctz) } =20 static struct mem_cgroup_per_node * -mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz= ) +mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz= , + enum node_states type) { struct mem_cgroup_per_node *mz; =20 spin_lock_irq(&mctz->lock); - mz =3D __mem_cgroup_largest_soft_limit_node(mctz); + mz =3D __mem_cgroup_largest_soft_limit_node(mctz, type); spin_unlock_irq(&mctz->lock); return mz; } @@ -3472,7 +3530,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data= _t *pgdat, int order, struct mem_cgroup_per_node *mz, *next_mz =3D NULL; unsigned long reclaimed; int loop =3D 0; - struct mem_cgroup_tree_per_node *mctz; + struct mem_cgroup_tree_per_node *mctz, *mctz_sibling; unsigned long excess; unsigned long nr_scanned; int migration_nid; @@ -3481,6 +3539,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data= _t *pgdat, int order, return 0; =20 mctz =3D soft_limit_tree_node(pgdat->node_id, N_MEMORY); + mctz_sibling =3D soft_limit_tree_node(pgdat->node_id, N_TOPTIER); =20 /* * Do not even bother to check the largest node if the root @@ -3516,7 +3575,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data= _t *pgdat, int order, if (next_mz) mz =3D next_mz; else - mz =3D mem_cgroup_largest_soft_limit_node(mctz); + mz =3D mem_cgroup_largest_soft_limit_node(mctz, N_MEMORY); if (!mz) break; =20 @@ -3526,7 +3585,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data= _t *pgdat, int order, nr_reclaimed +=3D reclaimed; *total_scanned +=3D nr_scanned; spin_lock_irq(&mctz->lock); - __mem_cgroup_remove_exceeded(mz, mctz); + __mem_cgroup_remove_exceeded(mz, mctz, N_MEMORY); =20 /* * If we failed to reclaim anything from this memory cgroup @@ -3534,7 +3593,8 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data= _t *pgdat, int order, */ next_mz =3D NULL; if (!reclaimed) - next_mz =3D __mem_cgroup_largest_soft_limit_node(mctz); + next_mz =3D + __mem_cgroup_largest_soft_limit_node(mctz, N_MEMORY); =20 excess =3D soft_limit_excess(mz->memcg, N_MEMORY); /* @@ -3546,8 +3606,20 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_dat= a_t *pgdat, int order, * term TODO. */ /* If excess =3D=3D 0, no tree ops */ - __mem_cgroup_insert_exceeded(mz, mctz, excess); + __mem_cgroup_insert_exceeded(mz, mctz, excess, N_MEMORY); spin_unlock_irq(&mctz->lock); + + /* update both affected N_MEMORY and N_TOPTIER trees */ + if (mctz_sibling) { + spin_lock_irq(&mctz_sibling->lock); + __mem_cgroup_remove_exceeded(mz, mctz_sibling, + N_TOPTIER); + excess =3D soft_limit_excess(mz->memcg, N_TOPTIER); + __mem_cgroup_insert_exceeded(mz, mctz, excess, + N_TOPTIER); + spin_unlock_irq(&mctz_sibling->lock); + } + css_put(&mz->memcg->css); loop++; /* @@ -5312,6 +5384,8 @@ static int alloc_mem_cgroup_per_node_info(struct me= m_cgroup *memcg, int node) lruvec_init(&pn->lruvec); pn->usage_in_excess =3D 0; pn->on_tree =3D false; + pn->toptier_usage_in_excess =3D 0; + pn->on_toptier_tree =3D false; pn->memcg =3D memcg; =20 memcg->nodeinfo[node] =3D pn; --=20 2.20.1