From: Johannes Weiner <hannes@cmpxchg.org>
To: Nhat Pham <nphamcs@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
akpm@linux-foundation.org, riel@surriel.com, mhocko@kernel.org,
roman.gushchin@linux.dev, shakeelb@google.com,
muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com,
shuah@kernel.org, yosryahmed@google.com, fvdl@google.com,
linux-mm@kvack.org, kernel-team@meta.com,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org
Subject: Re: [PATCH v3 2/3] hugetlb: memcg: account hugetlb-backed memory in memory controller
Date: Tue, 3 Oct 2023 14:39:28 -0400 [thread overview]
Message-ID: <20231003183928.GC20979@cmpxchg.org> (raw)
In-Reply-To: <CAKEwX=POd1DZc2K5ym14R2DpU74DqV30_A6QGfsCAaOTMK2WJA@mail.gmail.com>
On Tue, Oct 03, 2023 at 11:01:24AM -0700, Nhat Pham wrote:
> On Tue, Oct 3, 2023 at 10:13 AM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> > On 10/02/23 17:18, Nhat Pham wrote:
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index de220e3ff8be..74472e911b0a 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -1902,6 +1902,7 @@ void free_huge_folio(struct folio *folio)
> > > pages_per_huge_page(h), folio);
> > > hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h),
> > > pages_per_huge_page(h), folio);
> > > + mem_cgroup_uncharge(folio);
> > > if (restore_reserve)
> > > h->resv_huge_pages++;
> > >
> > > @@ -3009,11 +3010,20 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> > > struct hugepage_subpool *spool = subpool_vma(vma);
> > > struct hstate *h = hstate_vma(vma);
> > > struct folio *folio;
> > > - long map_chg, map_commit;
> > > + long map_chg, map_commit, nr_pages = pages_per_huge_page(h);
> > > long gbl_chg;
> > > - int ret, idx;
> > > + int memcg_charge_ret, ret, idx;
> > > struct hugetlb_cgroup *h_cg = NULL;
> > > + struct mem_cgroup *memcg;
> > > bool deferred_reserve;
> > > + gfp_t gfp = htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL;
> > > +
> > > + memcg = get_mem_cgroup_from_current();
> > > + memcg_charge_ret = mem_cgroup_hugetlb_try_charge(memcg, gfp, nr_pages);
> > > + if (memcg_charge_ret == -ENOMEM) {
> > > + mem_cgroup_put(memcg);
> > > + return ERR_PTR(-ENOMEM);
> > > + }
> > >
> > > idx = hstate_index(h);
> > > /*
> > > @@ -3022,8 +3032,12 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> > > * code of zero indicates a reservation exists (no change).
> > > */
> > > map_chg = gbl_chg = vma_needs_reservation(h, vma, addr);
> > > - if (map_chg < 0)
> > > + if (map_chg < 0) {
> > > + if (!memcg_charge_ret)
> > > + mem_cgroup_cancel_charge(memcg, nr_pages);
> > > + mem_cgroup_put(memcg);
> > > return ERR_PTR(-ENOMEM);
> > > + }
> > >
> > > /*
> > > * Processes that did not create the mapping will have no
> > > @@ -3034,10 +3048,8 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> > > */
> > > if (map_chg || avoid_reserve) {
> > > gbl_chg = hugepage_subpool_get_pages(spool, 1);
> > > - if (gbl_chg < 0) {
> > > - vma_end_reservation(h, vma, addr);
> > > - return ERR_PTR(-ENOSPC);
> > > - }
> > > + if (gbl_chg < 0)
> > > + goto out_end_reservation;
> > >
> > > /*
> > > * Even though there was no reservation in the region/reserve
> > > @@ -3119,6 +3131,11 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> > > hugetlb_cgroup_uncharge_folio_rsvd(hstate_index(h),
> > > pages_per_huge_page(h), folio);
> > > }
> > > +
> > > + if (!memcg_charge_ret)
> > > + mem_cgroup_commit_charge(folio, memcg);
> > > + mem_cgroup_put(memcg);
> > > +
> > > return folio;
> > >
> > > out_uncharge_cgroup:
> > > @@ -3130,7 +3147,11 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> > > out_subpool_put:
> > > if (map_chg || avoid_reserve)
> > > hugepage_subpool_put_pages(spool, 1);
> > > +out_end_reservation:
> > > vma_end_reservation(h, vma, addr);
> > > + if (!memcg_charge_ret)
> > > + mem_cgroup_cancel_charge(memcg, nr_pages);
> > > + mem_cgroup_put(memcg);
> > > return ERR_PTR(-ENOSPC);
> > > }
> > >
> >
> > IIUC, huge page usage is charged in alloc_hugetlb_folio and uncharged in
> > free_huge_folio. During migration, huge pages are allocated via
> > alloc_migrate_hugetlb_folio, not alloc_hugetlb_folio. So, there is no
> > charging for the migration target page and we uncharge the source page.
> > It looks like there will be no charge for the huge page after migration?
> >
>
> Ah I see! This is a bit subtle indeed.
>
> For the hugetlb controller, it looks like they update the cgroup info
> inside move_hugetlb_state(), which calls hugetlb_cgroup_migrate()
> to transfer the hugetlb cgroup info to the destination folio.
>
> Perhaps we can do something analogous here.
>
> > If my analysis above is correct, then we may need to be careful about
> > this accounting. We may not want both source and target pages to be
> > charged at the same time.
>
> We can create a variant of mem_cgroup_migrate that does not double
> charge, but instead just copy the mem_cgroup information to the new
> folio, and then clear that info from the old folio. That way the memory
> usage counters are untouched.
>
> Somebody with more expertise on migration should fact check me
> of course :)
The only reason mem_cgroup_migrate() double charges right now is
because it's used by replace_page_cache_folio(). In that context, the
isolation of the old page isn't quite as thorough as with migration,
so it cannot transfer and uncharge directly. This goes back a long
time: 0a31bc97c80c3fa87b32c091d9a930ac19cd0c40
If you rename the current implementation to mem_cgroup_replace_page()
for that one caller, you can add a mem_cgroup_migrate() variant which
is charge neutral and clears old->memcg_data. This can be used for
regular and hugetlb page migration. Something like this (totally
untested):
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a4d3282493b6..17ec45bf3653 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7226,29 +7226,14 @@ void mem_cgroup_migrate(struct folio *old, struct folio *new)
if (mem_cgroup_disabled())
return;
- /* Page cache replacement: new folio already charged? */
- if (folio_memcg(new))
- return;
-
memcg = folio_memcg(old);
VM_WARN_ON_ONCE_FOLIO(!memcg, old);
if (!memcg)
return;
- /* Force-charge the new page. The old one will be freed soon */
- if (!mem_cgroup_is_root(memcg)) {
- page_counter_charge(&memcg->memory, nr_pages);
- if (do_memsw_account())
- page_counter_charge(&memcg->memsw, nr_pages);
- }
-
- css_get(&memcg->css);
+ /* Transfer the charge and the css ref */
commit_charge(new, memcg);
-
- local_irq_save(flags);
- mem_cgroup_charge_statistics(memcg, nr_pages);
- memcg_check_events(memcg, folio_nid(new));
- local_irq_restore(flags);
+ old->memcg_data = 0;
}
DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key);
next prev parent reply other threads:[~2023-10-03 18:39 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-03 0:18 [PATCH v3 0/3] hugetlb memcg accounting Nhat Pham
2023-10-03 0:18 ` [PATCH v3 1/3] memcontrol: add helpers for " Nhat Pham
2023-10-03 11:50 ` Michal Hocko
2023-10-03 12:47 ` Johannes Weiner
2023-10-03 0:18 ` [PATCH v3 2/3] hugetlb: memcg: account hugetlb-backed memory in memory controller Nhat Pham
2023-10-03 0:26 ` Nhat Pham
2023-10-03 12:54 ` Johannes Weiner
2023-10-03 12:58 ` Michal Hocko
2023-10-03 15:59 ` Johannes Weiner
2023-10-03 17:13 ` Mike Kravetz
2023-10-03 18:01 ` Nhat Pham
2023-10-03 18:39 ` Johannes Weiner [this message]
2023-10-03 22:09 ` Nhat Pham
2023-10-03 22:42 ` Mike Kravetz
2023-10-03 23:26 ` Nhat Pham
2023-10-03 23:14 ` [PATCH] memcontrol: only transfer the memcg data for migration Nhat Pham
2023-10-03 23:22 ` Yosry Ahmed
2023-10-03 23:31 ` Nhat Pham
2023-10-03 23:54 ` Yosry Ahmed
2023-10-04 0:02 ` Nhat Pham
2023-10-04 0:02 ` Nhat Pham
2023-10-04 14:17 ` Johannes Weiner
2023-10-04 19:45 ` [PATCH v3 2/3] hugetlb: memcg: account hugetlb-backed memory in memory controller (fix) Nhat Pham
2023-10-06 17:25 ` Andrew Morton
2023-10-06 18:23 ` Nhat Pham
2023-10-03 0:18 ` [PATCH v3 3/3] selftests: add a selftest to verify hugetlb usage in memcg Nhat Pham
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231003183928.GC20979@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=fvdl@google.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan.x@bytedance.com \
--cc=mhocko@kernel.org \
--cc=mike.kravetz@oracle.com \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=riel@surriel.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
--cc=shuah@kernel.org \
--cc=tj@kernel.org \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.