* [PATCH 01/19] mm: fix NUMA node file count error in replace_page_cache()
2020-05-08 18:30 [PATCH 00/19 V2] mm: memcontrol: charge swapin pages on instantiation Johannes Weiner
@ 2020-05-08 18:30 ` Johannes Weiner
2020-05-08 18:30 ` [PATCH 04/19] mm: memcontrol: move out cgroup swaprate throttling Johannes Weiner
` (9 subsequent siblings)
10 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2020-05-08 18:30 UTC (permalink / raw)
To: Andrew Morton
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins, Michal Hocko,
Kirill A. Shutemov, Roman Gushchin,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
When replacing one page with another one in the cache, we have to
decrease the file count of the old page's NUMA node and increase the
one of the new NUMA node, otherwise the old node leaks the count and
the new node eventually underflows its counter.
Fixes: 74d609585d8b ("page cache: Add and replace pages using the XArray")
Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Reviewed-by: Alex Shi <alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
Reviewed-by: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim-Hm3cg6mZ9cc@public.gmane.org>
---
mm/filemap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index af1c6adad5bd..2b057b0aa882 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -808,11 +808,11 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
old->mapping = NULL;
/* hugetlb pages do not participate in page cache accounting. */
if (!PageHuge(old))
- __dec_node_page_state(new, NR_FILE_PAGES);
+ __dec_node_page_state(old, NR_FILE_PAGES);
if (!PageHuge(new))
__inc_node_page_state(new, NR_FILE_PAGES);
if (PageSwapBacked(old))
- __dec_node_page_state(new, NR_SHMEM);
+ __dec_node_page_state(old, NR_SHMEM);
if (PageSwapBacked(new))
__inc_node_page_state(new, NR_SHMEM);
xas_unlock_irqrestore(&xas, flags);
--
2.26.2
^ permalink raw reply related [flat|nested] 58+ messages in thread* [PATCH 01/19] mm: fix NUMA node file count error in replace_page_cache()
@ 2020-05-08 18:30 ` Johannes Weiner
0 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2020-05-08 18:30 UTC (permalink / raw)
To: Andrew Morton
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins, Michal Hocko,
Kirill A. Shutemov, Roman Gushchin, linux-mm, cgroups,
linux-kernel, kernel-team
When replacing one page with another one in the cache, we have to
decrease the file count of the old page's NUMA node and increase the
one of the new NUMA node, otherwise the old node leaks the count and
the new node eventually underflows its counter.
Fixes: 74d609585d8b ("page cache: Add and replace pages using the XArray")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
mm/filemap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index af1c6adad5bd..2b057b0aa882 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -808,11 +808,11 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
old->mapping = NULL;
/* hugetlb pages do not participate in page cache accounting. */
if (!PageHuge(old))
- __dec_node_page_state(new, NR_FILE_PAGES);
+ __dec_node_page_state(old, NR_FILE_PAGES);
if (!PageHuge(new))
__inc_node_page_state(new, NR_FILE_PAGES);
if (PageSwapBacked(old))
- __dec_node_page_state(new, NR_SHMEM);
+ __dec_node_page_state(old, NR_SHMEM);
if (PageSwapBacked(new))
__inc_node_page_state(new, NR_SHMEM);
xas_unlock_irqrestore(&xas, flags);
--
2.26.2
^ permalink raw reply related [flat|nested] 58+ messages in thread* Re: [PATCH 01/19] mm: fix NUMA node file count error in replace_page_cache()
2020-05-08 18:30 ` Johannes Weiner
(?)
@ 2020-05-18 11:18 ` Balbir Singh
-1 siblings, 0 replies; 58+ messages in thread
From: Balbir Singh @ 2020-05-18 11:18 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins,
Michal Hocko, Kirill A. Shutemov, Roman Gushchin, linux-mm,
cgroups, linux-kernel, kernel-team
On Fri, May 08, 2020 at 02:30:48PM -0400, Johannes Weiner wrote:
> When replacing one page with another one in the cache, we have to
> decrease the file count of the old page's NUMA node and increase the
> one of the new NUMA node, otherwise the old node leaks the count and
> the new node eventually underflows its counter.
>
> Fixes: 74d609585d8b ("page cache: Add and replace pages using the XArray")
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
> Reviewed-by: Shakeel Butt <shakeelb@google.com>
> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
> mm/filemap.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index af1c6adad5bd..2b057b0aa882 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -808,11 +808,11 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask)
> old->mapping = NULL;
> /* hugetlb pages do not participate in page cache accounting. */
> if (!PageHuge(old))
> - __dec_node_page_state(new, NR_FILE_PAGES);
> + __dec_node_page_state(old, NR_FILE_PAGES);
> if (!PageHuge(new))
> __inc_node_page_state(new, NR_FILE_PAGES);
> if (PageSwapBacked(old))
> - __dec_node_page_state(new, NR_SHMEM);
> + __dec_node_page_state(old, NR_SHMEM);
> if (PageSwapBacked(new))
> __inc_node_page_state(new, NR_SHMEM);
> xas_unlock_irqrestore(&xas, flags);
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH 03/19] mm: memcontrol: drop @compound parameter from memcg charging API
2020-05-08 18:30 [PATCH 00/19 V2] mm: memcontrol: charge swapin pages on instantiation Johannes Weiner
@ 2020-05-08 18:30 ` Johannes Weiner
2020-05-08 18:30 ` [PATCH 04/19] mm: memcontrol: move out cgroup swaprate throttling Johannes Weiner
` (9 subsequent siblings)
10 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2020-05-08 18:30 UTC (permalink / raw)
To: Andrew Morton
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins, Michal Hocko,
Kirill A. Shutemov, Roman Gushchin,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
The memcg charging API carries a boolean @compound parameter that
tells whether the page we're dealing with is a hugepage.
mem_cgroup_commit_charge() has another boolean @lrucare that indicates
whether the page needs LRU locking or not while charging. The majority
of callsites know those parameters at compile time, which results in a
lot of naked "false, false" argument lists. This makes for cryptic
code and is a breeding ground for subtle mistakes.
Thankfully, the huge page state can be inferred from the page itself
and doesn't need to be passed along. This is safe because charging
completes before the page is published and somebody may split it.
Simplify the callsites by removing @compound, and let memcg infer the
state by using hpage_nr_pages() unconditionally. That function does
PageTransHuge() to identify huge pages, which also helpfully asserts
that nobody passes in tail pages by accident.
The following patches will introduce a new charging API, best not to
carry over unnecessary weight.
Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Reviewed-by: Alex Shi <alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim-Hm3cg6mZ9cc@public.gmane.org>
Reviewed-by: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
include/linux/memcontrol.h | 22 ++++++++--------------
kernel/events/uprobes.c | 6 +++---
mm/filemap.c | 6 +++---
mm/huge_memory.c | 8 ++++----
mm/khugepaged.c | 20 ++++++++++----------
mm/memcontrol.c | 38 +++++++++++++++-----------------------
mm/memory.c | 32 +++++++++++++++-----------------
mm/migrate.c | 6 +++---
mm/shmem.c | 22 +++++++++-------------
mm/swapfile.c | 9 ++++-----
mm/userfaultfd.c | 6 +++---
11 files changed, 77 insertions(+), 98 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index b67dd43aaa4b..30292d57c8af 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -373,15 +373,12 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg)
}
int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
- gfp_t gfp_mask, struct mem_cgroup **memcgp,
- bool compound);
+ gfp_t gfp_mask, struct mem_cgroup **memcgp);
int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm,
- gfp_t gfp_mask, struct mem_cgroup **memcgp,
- bool compound);
+ gfp_t gfp_mask, struct mem_cgroup **memcgp);
void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
- bool lrucare, bool compound);
-void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg,
- bool compound);
+ bool lrucare);
+void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg);
void mem_cgroup_uncharge(struct page *page);
void mem_cgroup_uncharge_list(struct list_head *page_list);
@@ -870,8 +867,7 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg)
static inline int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
gfp_t gfp_mask,
- struct mem_cgroup **memcgp,
- bool compound)
+ struct mem_cgroup **memcgp)
{
*memcgp = NULL;
return 0;
@@ -880,8 +876,7 @@ static inline int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
static inline int mem_cgroup_try_charge_delay(struct page *page,
struct mm_struct *mm,
gfp_t gfp_mask,
- struct mem_cgroup **memcgp,
- bool compound)
+ struct mem_cgroup **memcgp)
{
*memcgp = NULL;
return 0;
@@ -889,13 +884,12 @@ static inline int mem_cgroup_try_charge_delay(struct page *page,
static inline void mem_cgroup_commit_charge(struct page *page,
struct mem_cgroup *memcg,
- bool lrucare, bool compound)
+ bool lrucare)
{
}
static inline void mem_cgroup_cancel_charge(struct page *page,
- struct mem_cgroup *memcg,
- bool compound)
+ struct mem_cgroup *memcg)
{
}
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ece7e13f6e4a..40e7488ce467 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -169,7 +169,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
if (new_page) {
err = mem_cgroup_try_charge(new_page, vma->vm_mm, GFP_KERNEL,
- &memcg, false);
+ &memcg);
if (err)
return err;
}
@@ -181,7 +181,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
err = -EAGAIN;
if (!page_vma_mapped_walk(&pvmw)) {
if (new_page)
- mem_cgroup_cancel_charge(new_page, memcg, false);
+ mem_cgroup_cancel_charge(new_page, memcg);
goto unlock;
}
VM_BUG_ON_PAGE(addr != pvmw.address, old_page);
@@ -189,7 +189,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
if (new_page) {
get_page(new_page);
page_add_new_anon_rmap(new_page, vma, addr, false);
- mem_cgroup_commit_charge(new_page, memcg, false, false);
+ mem_cgroup_commit_charge(new_page, memcg, false);
lru_cache_add_active_or_unevictable(new_page, vma);
} else
/* no new page, just dec_mm_counter for old_page */
diff --git a/mm/filemap.c b/mm/filemap.c
index 2b057b0aa882..ce200386736c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -842,7 +842,7 @@ static int __add_to_page_cache_locked(struct page *page,
if (!huge) {
error = mem_cgroup_try_charge(page, current->mm,
- gfp_mask, &memcg, false);
+ gfp_mask, &memcg);
if (error)
return error;
}
@@ -878,14 +878,14 @@ static int __add_to_page_cache_locked(struct page *page,
goto error;
if (!huge)
- mem_cgroup_commit_charge(page, memcg, false, false);
+ mem_cgroup_commit_charge(page, memcg, false);
trace_mm_filemap_add_to_page_cache(page);
return 0;
error:
page->mapping = NULL;
/* Leave page->index set: truncation relies upon it */
if (!huge)
- mem_cgroup_cancel_charge(page, memcg, false);
+ mem_cgroup_cancel_charge(page, memcg);
put_page(page);
return xas_error(&xas);
}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d7384eb2e017..46c2bc20b7cb 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -594,7 +594,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
VM_BUG_ON_PAGE(!PageCompound(page), page);
- if (mem_cgroup_try_charge_delay(page, vma->vm_mm, gfp, &memcg, true)) {
+ if (mem_cgroup_try_charge_delay(page, vma->vm_mm, gfp, &memcg)) {
put_page(page);
count_vm_event(THP_FAULT_FALLBACK);
count_vm_event(THP_FAULT_FALLBACK_CHARGE);
@@ -630,7 +630,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
vm_fault_t ret2;
spin_unlock(vmf->ptl);
- mem_cgroup_cancel_charge(page, memcg, true);
+ mem_cgroup_cancel_charge(page, memcg);
put_page(page);
pte_free(vma->vm_mm, pgtable);
ret2 = handle_userfault(vmf, VM_UFFD_MISSING);
@@ -641,7 +641,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
entry = mk_huge_pmd(page, vma->vm_page_prot);
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
page_add_new_anon_rmap(page, vma, haddr, true);
- mem_cgroup_commit_charge(page, memcg, false, true);
+ mem_cgroup_commit_charge(page, memcg, false);
lru_cache_add_active_or_unevictable(page, vma);
pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry);
@@ -658,7 +658,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
release:
if (pgtable)
pte_free(vma->vm_mm, pgtable);
- mem_cgroup_cancel_charge(page, memcg, true);
+ mem_cgroup_cancel_charge(page, memcg);
put_page(page);
return ret;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index a02a4c5f2fe4..b73d2af6d11a 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1067,7 +1067,7 @@ static void collapse_huge_page(struct mm_struct *mm,
goto out_nolock;
}
- if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp, &memcg, true))) {
+ if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp, &memcg))) {
result = SCAN_CGROUP_CHARGE_FAIL;
goto out_nolock;
}
@@ -1075,7 +1075,7 @@ static void collapse_huge_page(struct mm_struct *mm,
down_read(&mm->mmap_sem);
result = hugepage_vma_revalidate(mm, address, &vma);
if (result) {
- mem_cgroup_cancel_charge(new_page, memcg, true);
+ mem_cgroup_cancel_charge(new_page, memcg);
up_read(&mm->mmap_sem);
goto out_nolock;
}
@@ -1083,7 +1083,7 @@ static void collapse_huge_page(struct mm_struct *mm,
pmd = mm_find_pmd(mm, address);
if (!pmd) {
result = SCAN_PMD_NULL;
- mem_cgroup_cancel_charge(new_page, memcg, true);
+ mem_cgroup_cancel_charge(new_page, memcg);
up_read(&mm->mmap_sem);
goto out_nolock;
}
@@ -1095,7 +1095,7 @@ static void collapse_huge_page(struct mm_struct *mm,
*/
if (unmapped && !__collapse_huge_page_swapin(mm, vma, address,
pmd, referenced)) {
- mem_cgroup_cancel_charge(new_page, memcg, true);
+ mem_cgroup_cancel_charge(new_page, memcg);
up_read(&mm->mmap_sem);
goto out_nolock;
}
@@ -1183,7 +1183,7 @@ static void collapse_huge_page(struct mm_struct *mm,
spin_lock(pmd_ptl);
BUG_ON(!pmd_none(*pmd));
page_add_new_anon_rmap(new_page, vma, address, true);
- mem_cgroup_commit_charge(new_page, memcg, false, true);
+ mem_cgroup_commit_charge(new_page, memcg, false);
count_memcg_events(memcg, THP_COLLAPSE_ALLOC, 1);
lru_cache_add_active_or_unevictable(new_page, vma);
pgtable_trans_huge_deposit(mm, pmd, pgtable);
@@ -1201,7 +1201,7 @@ static void collapse_huge_page(struct mm_struct *mm,
trace_mm_collapse_huge_page(mm, isolated, result);
return;
out:
- mem_cgroup_cancel_charge(new_page, memcg, true);
+ mem_cgroup_cancel_charge(new_page, memcg);
goto out_up_write;
}
@@ -1628,7 +1628,7 @@ static void collapse_file(struct mm_struct *mm,
goto out;
}
- if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp, &memcg, true))) {
+ if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp, &memcg))) {
result = SCAN_CGROUP_CHARGE_FAIL;
goto out;
}
@@ -1641,7 +1641,7 @@ static void collapse_file(struct mm_struct *mm,
break;
xas_unlock_irq(&xas);
if (!xas_nomem(&xas, GFP_KERNEL)) {
- mem_cgroup_cancel_charge(new_page, memcg, true);
+ mem_cgroup_cancel_charge(new_page, memcg);
result = SCAN_FAIL;
goto out;
}
@@ -1877,7 +1877,7 @@ static void collapse_file(struct mm_struct *mm,
SetPageUptodate(new_page);
page_ref_add(new_page, HPAGE_PMD_NR - 1);
- mem_cgroup_commit_charge(new_page, memcg, false, true);
+ mem_cgroup_commit_charge(new_page, memcg, false);
if (is_shmem) {
set_page_dirty(new_page);
@@ -1932,7 +1932,7 @@ static void collapse_file(struct mm_struct *mm,
VM_BUG_ON(nr_none);
xas_unlock_irq(&xas);
- mem_cgroup_cancel_charge(new_page, memcg, true);
+ mem_cgroup_cancel_charge(new_page, memcg);
new_page->mapping = NULL;
}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index cdd29b59929b..13da46a5d8ae 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -834,7 +834,7 @@ static unsigned long memcg_events_local(struct mem_cgroup *memcg, int event)
static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
struct page *page,
- bool compound, int nr_pages)
+ int nr_pages)
{
/*
* Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is
@@ -848,7 +848,7 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
__mod_memcg_state(memcg, NR_SHMEM, nr_pages);
}
- if (compound) {
+ if (abs(nr_pages) > 1) {
VM_BUG_ON_PAGE(!PageTransHuge(page), page);
__mod_memcg_state(memcg, MEMCG_RSS_HUGE, nr_pages);
}
@@ -5445,9 +5445,9 @@ static int mem_cgroup_move_account(struct page *page,
ret = 0;
local_irq_disable();
- mem_cgroup_charge_statistics(to, page, compound, nr_pages);
+ mem_cgroup_charge_statistics(to, page, nr_pages);
memcg_check_events(to, page);
- mem_cgroup_charge_statistics(from, page, compound, -nr_pages);
+ mem_cgroup_charge_statistics(from, page, -nr_pages);
memcg_check_events(from, page);
local_irq_enable();
out_unlock:
@@ -6435,7 +6435,6 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root,
* @mm: mm context of the victim
* @gfp_mask: reclaim mode
* @memcgp: charged memcg return
- * @compound: charge the page as compound or small page
*
* Try to charge @page to the memcg that @mm belongs to, reclaiming
* pages according to @gfp_mask if necessary.
@@ -6448,11 +6447,10 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root,
* with mem_cgroup_cancel_charge() in case page instantiation fails.
*/
int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
- gfp_t gfp_mask, struct mem_cgroup **memcgp,
- bool compound)
+ gfp_t gfp_mask, struct mem_cgroup **memcgp)
{
+ unsigned int nr_pages = hpage_nr_pages(page);
struct mem_cgroup *memcg = NULL;
- unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1;
int ret = 0;
if (mem_cgroup_disabled())
@@ -6494,13 +6492,12 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
}
int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm,
- gfp_t gfp_mask, struct mem_cgroup **memcgp,
- bool compound)
+ gfp_t gfp_mask, struct mem_cgroup **memcgp)
{
struct mem_cgroup *memcg;
int ret;
- ret = mem_cgroup_try_charge(page, mm, gfp_mask, memcgp, compound);
+ ret = mem_cgroup_try_charge(page, mm, gfp_mask, memcgp);
memcg = *memcgp;
mem_cgroup_throttle_swaprate(memcg, page_to_nid(page), gfp_mask);
return ret;
@@ -6511,7 +6508,6 @@ int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm,
* @page: page to charge
* @memcg: memcg to charge the page to
* @lrucare: page might be on LRU already
- * @compound: charge the page as compound or small page
*
* Finalize a charge transaction started by mem_cgroup_try_charge(),
* after page->mapping has been set up. This must happen atomically
@@ -6524,9 +6520,9 @@ int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm,
* Use mem_cgroup_cancel_charge() to cancel the transaction instead.
*/
void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
- bool lrucare, bool compound)
+ bool lrucare)
{
- unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1;
+ unsigned int nr_pages = hpage_nr_pages(page);
VM_BUG_ON_PAGE(!page->mapping, page);
VM_BUG_ON_PAGE(PageLRU(page) && !lrucare, page);
@@ -6544,7 +6540,7 @@ void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
commit_charge(page, memcg, lrucare);
local_irq_disable();
- mem_cgroup_charge_statistics(memcg, page, compound, nr_pages);
+ mem_cgroup_charge_statistics(memcg, page, nr_pages);
memcg_check_events(memcg, page);
local_irq_enable();
@@ -6563,14 +6559,12 @@ void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
* mem_cgroup_cancel_charge - cancel a page charge
* @page: page to charge
* @memcg: memcg to charge the page to
- * @compound: charge the page as compound or small page
*
* Cancel a charge transaction started by mem_cgroup_try_charge().
*/
-void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg,
- bool compound)
+void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg)
{
- unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1;
+ unsigned int nr_pages = hpage_nr_pages(page);
if (mem_cgroup_disabled())
return;
@@ -6785,8 +6779,7 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage)
commit_charge(newpage, memcg, false);
local_irq_save(flags);
- mem_cgroup_charge_statistics(memcg, newpage, PageTransHuge(newpage),
- nr_pages);
+ mem_cgroup_charge_statistics(memcg, newpage, nr_pages);
memcg_check_events(memcg, newpage);
local_irq_restore(flags);
}
@@ -7016,8 +7009,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
* only synchronisation we have for updating the per-CPU variables.
*/
VM_BUG_ON(!irqs_disabled());
- mem_cgroup_charge_statistics(memcg, page, PageTransHuge(page),
- -nr_entries);
+ mem_cgroup_charge_statistics(memcg, page, -nr_entries);
memcg_check_events(memcg, page);
if (!mem_cgroup_is_root(memcg))
diff --git a/mm/memory.c b/mm/memory.c
index 0ad29c7274de..a08cbaa81607 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2676,7 +2676,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
}
}
- if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg, false))
+ if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg))
goto oom_free_new;
__SetPageUptodate(new_page);
@@ -2711,7 +2711,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
*/
ptep_clear_flush_notify(vma, vmf->address, vmf->pte);
page_add_new_anon_rmap(new_page, vma, vmf->address, false);
- mem_cgroup_commit_charge(new_page, memcg, false, false);
+ mem_cgroup_commit_charge(new_page, memcg, false);
lru_cache_add_active_or_unevictable(new_page, vma);
/*
* We call the notify macro here because, when using secondary
@@ -2750,7 +2750,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
new_page = old_page;
page_copied = 1;
} else {
- mem_cgroup_cancel_charge(new_page, memcg, false);
+ mem_cgroup_cancel_charge(new_page, memcg);
}
if (new_page)
@@ -3193,8 +3193,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
goto out_page;
}
- if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL,
- &memcg, false)) {
+ if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, &memcg)) {
ret = VM_FAULT_OOM;
goto out_page;
}
@@ -3245,11 +3244,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
/* ksm created a completely new copy */
if (unlikely(page != swapcache && swapcache)) {
page_add_new_anon_rmap(page, vma, vmf->address, false);
- mem_cgroup_commit_charge(page, memcg, false, false);
+ mem_cgroup_commit_charge(page, memcg, false);
lru_cache_add_active_or_unevictable(page, vma);
} else {
do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
- mem_cgroup_commit_charge(page, memcg, true, false);
+ mem_cgroup_commit_charge(page, memcg, true);
activate_page(page);
}
@@ -3285,7 +3284,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
out:
return ret;
out_nomap:
- mem_cgroup_cancel_charge(page, memcg, false);
+ mem_cgroup_cancel_charge(page, memcg);
pte_unmap_unlock(vmf->pte, vmf->ptl);
out_page:
unlock_page(page);
@@ -3359,8 +3358,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
if (!page)
goto oom;
- if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, &memcg,
- false))
+ if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, &memcg))
goto oom_free_page;
/*
@@ -3386,14 +3384,14 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
/* Deliver the page fault to userland, check inside PT lock */
if (userfaultfd_missing(vma)) {
pte_unmap_unlock(vmf->pte, vmf->ptl);
- mem_cgroup_cancel_charge(page, memcg, false);
+ mem_cgroup_cancel_charge(page, memcg);
put_page(page);
return handle_userfault(vmf, VM_UFFD_MISSING);
}
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, vma, vmf->address, false);
- mem_cgroup_commit_charge(page, memcg, false, false);
+ mem_cgroup_commit_charge(page, memcg, false);
lru_cache_add_active_or_unevictable(page, vma);
setpte:
set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);
@@ -3404,7 +3402,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
pte_unmap_unlock(vmf->pte, vmf->ptl);
return ret;
release:
- mem_cgroup_cancel_charge(page, memcg, false);
+ mem_cgroup_cancel_charge(page, memcg);
put_page(page);
goto unlock;
oom_free_page:
@@ -3655,7 +3653,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
if (write && !(vma->vm_flags & VM_SHARED)) {
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, vma, vmf->address, false);
- mem_cgroup_commit_charge(page, memcg, false, false);
+ mem_cgroup_commit_charge(page, memcg, false);
lru_cache_add_active_or_unevictable(page, vma);
} else {
inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
@@ -3864,8 +3862,8 @@ static vm_fault_t do_cow_fault(struct vm_fault *vmf)
if (!vmf->cow_page)
return VM_FAULT_OOM;
- if (mem_cgroup_try_charge_delay(vmf->cow_page, vma->vm_mm, GFP_KERNEL,
- &vmf->memcg, false)) {
+ if (mem_cgroup_try_charge_delay(vmf->cow_page, vma->vm_mm,
+ GFP_KERNEL, &vmf->memcg)) {
put_page(vmf->cow_page);
return VM_FAULT_OOM;
}
@@ -3886,7 +3884,7 @@ static vm_fault_t do_cow_fault(struct vm_fault *vmf)
goto uncharge_out;
return ret;
uncharge_out:
- mem_cgroup_cancel_charge(vmf->cow_page, vmf->memcg, false);
+ mem_cgroup_cancel_charge(vmf->cow_page, vmf->memcg);
put_page(vmf->cow_page);
return ret;
}
diff --git a/mm/migrate.c b/mm/migrate.c
index f66f93f9a5e2..50c7a08f8f31 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2786,7 +2786,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
if (unlikely(anon_vma_prepare(vma)))
goto abort;
- if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, &memcg, false))
+ if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, &memcg))
goto abort;
/*
@@ -2832,7 +2832,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
inc_mm_counter(mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, vma, addr, false);
- mem_cgroup_commit_charge(page, memcg, false, false);
+ mem_cgroup_commit_charge(page, memcg, false);
if (!is_zone_device_page(page))
lru_cache_add_active_or_unevictable(page, vma);
get_page(page);
@@ -2854,7 +2854,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
unlock_abort:
pte_unmap_unlock(ptep, ptl);
- mem_cgroup_cancel_charge(page, memcg, false);
+ mem_cgroup_cancel_charge(page, memcg);
abort:
*src &= ~MIGRATE_PFN_MIGRATE;
}
diff --git a/mm/shmem.c b/mm/shmem.c
index bd8840082c94..d505b6cce4ab 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1664,8 +1664,7 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index,
goto failed;
}
- error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg,
- false);
+ error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg);
if (!error) {
error = shmem_add_to_page_cache(page, mapping, index,
swp_to_radix_entry(swap), gfp);
@@ -1680,14 +1679,14 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index,
* the rest.
*/
if (error) {
- mem_cgroup_cancel_charge(page, memcg, false);
+ mem_cgroup_cancel_charge(page, memcg);
delete_from_swap_cache(page);
}
}
if (error)
goto failed;
- mem_cgroup_commit_charge(page, memcg, true, false);
+ mem_cgroup_commit_charge(page, memcg, true);
spin_lock_irq(&info->lock);
info->swapped--;
@@ -1859,8 +1858,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
if (sgp == SGP_WRITE)
__SetPageReferenced(page);
- error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg,
- PageTransHuge(page));
+ error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg);
if (error) {
if (PageTransHuge(page)) {
count_vm_event(THP_FILE_FALLBACK);
@@ -1871,12 +1869,10 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
error = shmem_add_to_page_cache(page, mapping, hindex,
NULL, gfp & GFP_RECLAIM_MASK);
if (error) {
- mem_cgroup_cancel_charge(page, memcg,
- PageTransHuge(page));
+ mem_cgroup_cancel_charge(page, memcg);
goto unacct;
}
- mem_cgroup_commit_charge(page, memcg, false,
- PageTransHuge(page));
+ mem_cgroup_commit_charge(page, memcg, false);
lru_cache_add_anon(page);
spin_lock_irq(&info->lock);
@@ -2364,7 +2360,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
if (unlikely(offset >= max_off))
goto out_release;
- ret = mem_cgroup_try_charge_delay(page, dst_mm, gfp, &memcg, false);
+ ret = mem_cgroup_try_charge_delay(page, dst_mm, gfp, &memcg);
if (ret)
goto out_release;
@@ -2373,7 +2369,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
if (ret)
goto out_release_uncharge;
- mem_cgroup_commit_charge(page, memcg, false, false);
+ mem_cgroup_commit_charge(page, memcg, false);
_dst_pte = mk_pte(page, dst_vma->vm_page_prot);
if (dst_vma->vm_flags & VM_WRITE)
@@ -2424,7 +2420,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
ClearPageDirty(page);
delete_from_page_cache(page);
out_release_uncharge:
- mem_cgroup_cancel_charge(page, memcg, false);
+ mem_cgroup_cancel_charge(page, memcg);
out_release:
unlock_page(page);
put_page(page);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 0aa9a9dd5d3d..15e5f8f290cc 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1868,15 +1868,14 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
if (unlikely(!page))
return -ENOMEM;
- if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL,
- &memcg, false)) {
+ if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, &memcg)) {
ret = -ENOMEM;
goto out_nolock;
}
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
if (unlikely(!pte_same_as_swp(*pte, swp_entry_to_pte(entry)))) {
- mem_cgroup_cancel_charge(page, memcg, false);
+ mem_cgroup_cancel_charge(page, memcg);
ret = 0;
goto out;
}
@@ -1888,10 +1887,10 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
pte_mkold(mk_pte(page, vma->vm_page_prot)));
if (page == swapcache) {
page_add_anon_rmap(page, vma, addr, false);
- mem_cgroup_commit_charge(page, memcg, true, false);
+ mem_cgroup_commit_charge(page, memcg, true);
} else { /* ksm created a completely new copy */
page_add_new_anon_rmap(page, vma, addr, false);
- mem_cgroup_commit_charge(page, memcg, false, false);
+ mem_cgroup_commit_charge(page, memcg, false);
lru_cache_add_active_or_unevictable(page, vma);
}
swap_free(entry);
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 512576e171ce..bb57d0a3fca7 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -97,7 +97,7 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
__SetPageUptodate(page);
ret = -ENOMEM;
- if (mem_cgroup_try_charge(page, dst_mm, GFP_KERNEL, &memcg, false))
+ if (mem_cgroup_try_charge(page, dst_mm, GFP_KERNEL, &memcg))
goto out_release;
_dst_pte = pte_mkdirty(mk_pte(page, dst_vma->vm_page_prot));
@@ -124,7 +124,7 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
inc_mm_counter(dst_mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
- mem_cgroup_commit_charge(page, memcg, false, false);
+ mem_cgroup_commit_charge(page, memcg, false);
lru_cache_add_active_or_unevictable(page, dst_vma);
set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
@@ -138,7 +138,7 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
return ret;
out_release_uncharge_unlock:
pte_unmap_unlock(dst_pte, ptl);
- mem_cgroup_cancel_charge(page, memcg, false);
+ mem_cgroup_cancel_charge(page, memcg);
out_release:
put_page(page);
goto out;
--
2.26.2
^ permalink raw reply related [flat|nested] 58+ messages in thread* [PATCH 03/19] mm: memcontrol: drop @compound parameter from memcg charging API
@ 2020-05-08 18:30 ` Johannes Weiner
0 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2020-05-08 18:30 UTC (permalink / raw)
To: Andrew Morton
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins, Michal Hocko,
Kirill A. Shutemov, Roman Gushchin, linux-mm, cgroups,
linux-kernel, kernel-team
The memcg charging API carries a boolean @compound parameter that
tells whether the page we're dealing with is a hugepage.
mem_cgroup_commit_charge() has another boolean @lrucare that indicates
whether the page needs LRU locking or not while charging. The majority
of callsites know those parameters at compile time, which results in a
lot of naked "false, false" argument lists. This makes for cryptic
code and is a breeding ground for subtle mistakes.
Thankfully, the huge page state can be inferred from the page itself
and doesn't need to be passed along. This is safe because charging
completes before the page is published and somebody may split it.
Simplify the callsites by removing @compound, and let memcg infer the
state by using hpage_nr_pages() unconditionally. That function does
PageTransHuge() to identify huge pages, which also helpfully asserts
that nobody passes in tail pages by accident.
The following patches will introduce a new charging API, best not to
carry over unnecessary weight.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
---
include/linux/memcontrol.h | 22 ++++++++--------------
kernel/events/uprobes.c | 6 +++---
mm/filemap.c | 6 +++---
mm/huge_memory.c | 8 ++++----
mm/khugepaged.c | 20 ++++++++++----------
mm/memcontrol.c | 38 +++++++++++++++-----------------------
mm/memory.c | 32 +++++++++++++++-----------------
mm/migrate.c | 6 +++---
mm/shmem.c | 22 +++++++++-------------
mm/swapfile.c | 9 ++++-----
mm/userfaultfd.c | 6 +++---
11 files changed, 77 insertions(+), 98 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index b67dd43aaa4b..30292d57c8af 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -373,15 +373,12 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg)
}
int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
- gfp_t gfp_mask, struct mem_cgroup **memcgp,
- bool compound);
+ gfp_t gfp_mask, struct mem_cgroup **memcgp);
int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm,
- gfp_t gfp_mask, struct mem_cgroup **memcgp,
- bool compound);
+ gfp_t gfp_mask, struct mem_cgroup **memcgp);
void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
- bool lrucare, bool compound);
-void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg,
- bool compound);
+ bool lrucare);
+void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg);
void mem_cgroup_uncharge(struct page *page);
void mem_cgroup_uncharge_list(struct list_head *page_list);
@@ -870,8 +867,7 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg)
static inline int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
gfp_t gfp_mask,
- struct mem_cgroup **memcgp,
- bool compound)
+ struct mem_cgroup **memcgp)
{
*memcgp = NULL;
return 0;
@@ -880,8 +876,7 @@ static inline int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
static inline int mem_cgroup_try_charge_delay(struct page *page,
struct mm_struct *mm,
gfp_t gfp_mask,
- struct mem_cgroup **memcgp,
- bool compound)
+ struct mem_cgroup **memcgp)
{
*memcgp = NULL;
return 0;
@@ -889,13 +884,12 @@ static inline int mem_cgroup_try_charge_delay(struct page *page,
static inline void mem_cgroup_commit_charge(struct page *page,
struct mem_cgroup *memcg,
- bool lrucare, bool compound)
+ bool lrucare)
{
}
static inline void mem_cgroup_cancel_charge(struct page *page,
- struct mem_cgroup *memcg,
- bool compound)
+ struct mem_cgroup *memcg)
{
}
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ece7e13f6e4a..40e7488ce467 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -169,7 +169,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
if (new_page) {
err = mem_cgroup_try_charge(new_page, vma->vm_mm, GFP_KERNEL,
- &memcg, false);
+ &memcg);
if (err)
return err;
}
@@ -181,7 +181,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
err = -EAGAIN;
if (!page_vma_mapped_walk(&pvmw)) {
if (new_page)
- mem_cgroup_cancel_charge(new_page, memcg, false);
+ mem_cgroup_cancel_charge(new_page, memcg);
goto unlock;
}
VM_BUG_ON_PAGE(addr != pvmw.address, old_page);
@@ -189,7 +189,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
if (new_page) {
get_page(new_page);
page_add_new_anon_rmap(new_page, vma, addr, false);
- mem_cgroup_commit_charge(new_page, memcg, false, false);
+ mem_cgroup_commit_charge(new_page, memcg, false);
lru_cache_add_active_or_unevictable(new_page, vma);
} else
/* no new page, just dec_mm_counter for old_page */
diff --git a/mm/filemap.c b/mm/filemap.c
index 2b057b0aa882..ce200386736c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -842,7 +842,7 @@ static int __add_to_page_cache_locked(struct page *page,
if (!huge) {
error = mem_cgroup_try_charge(page, current->mm,
- gfp_mask, &memcg, false);
+ gfp_mask, &memcg);
if (error)
return error;
}
@@ -878,14 +878,14 @@ static int __add_to_page_cache_locked(struct page *page,
goto error;
if (!huge)
- mem_cgroup_commit_charge(page, memcg, false, false);
+ mem_cgroup_commit_charge(page, memcg, false);
trace_mm_filemap_add_to_page_cache(page);
return 0;
error:
page->mapping = NULL;
/* Leave page->index set: truncation relies upon it */
if (!huge)
- mem_cgroup_cancel_charge(page, memcg, false);
+ mem_cgroup_cancel_charge(page, memcg);
put_page(page);
return xas_error(&xas);
}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d7384eb2e017..46c2bc20b7cb 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -594,7 +594,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
VM_BUG_ON_PAGE(!PageCompound(page), page);
- if (mem_cgroup_try_charge_delay(page, vma->vm_mm, gfp, &memcg, true)) {
+ if (mem_cgroup_try_charge_delay(page, vma->vm_mm, gfp, &memcg)) {
put_page(page);
count_vm_event(THP_FAULT_FALLBACK);
count_vm_event(THP_FAULT_FALLBACK_CHARGE);
@@ -630,7 +630,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
vm_fault_t ret2;
spin_unlock(vmf->ptl);
- mem_cgroup_cancel_charge(page, memcg, true);
+ mem_cgroup_cancel_charge(page, memcg);
put_page(page);
pte_free(vma->vm_mm, pgtable);
ret2 = handle_userfault(vmf, VM_UFFD_MISSING);
@@ -641,7 +641,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
entry = mk_huge_pmd(page, vma->vm_page_prot);
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
page_add_new_anon_rmap(page, vma, haddr, true);
- mem_cgroup_commit_charge(page, memcg, false, true);
+ mem_cgroup_commit_charge(page, memcg, false);
lru_cache_add_active_or_unevictable(page, vma);
pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry);
@@ -658,7 +658,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
release:
if (pgtable)
pte_free(vma->vm_mm, pgtable);
- mem_cgroup_cancel_charge(page, memcg, true);
+ mem_cgroup_cancel_charge(page, memcg);
put_page(page);
return ret;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index a02a4c5f2fe4..b73d2af6d11a 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1067,7 +1067,7 @@ static void collapse_huge_page(struct mm_struct *mm,
goto out_nolock;
}
- if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp, &memcg, true))) {
+ if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp, &memcg))) {
result = SCAN_CGROUP_CHARGE_FAIL;
goto out_nolock;
}
@@ -1075,7 +1075,7 @@ static void collapse_huge_page(struct mm_struct *mm,
down_read(&mm->mmap_sem);
result = hugepage_vma_revalidate(mm, address, &vma);
if (result) {
- mem_cgroup_cancel_charge(new_page, memcg, true);
+ mem_cgroup_cancel_charge(new_page, memcg);
up_read(&mm->mmap_sem);
goto out_nolock;
}
@@ -1083,7 +1083,7 @@ static void collapse_huge_page(struct mm_struct *mm,
pmd = mm_find_pmd(mm, address);
if (!pmd) {
result = SCAN_PMD_NULL;
- mem_cgroup_cancel_charge(new_page, memcg, true);
+ mem_cgroup_cancel_charge(new_page, memcg);
up_read(&mm->mmap_sem);
goto out_nolock;
}
@@ -1095,7 +1095,7 @@ static void collapse_huge_page(struct mm_struct *mm,
*/
if (unmapped && !__collapse_huge_page_swapin(mm, vma, address,
pmd, referenced)) {
- mem_cgroup_cancel_charge(new_page, memcg, true);
+ mem_cgroup_cancel_charge(new_page, memcg);
up_read(&mm->mmap_sem);
goto out_nolock;
}
@@ -1183,7 +1183,7 @@ static void collapse_huge_page(struct mm_struct *mm,
spin_lock(pmd_ptl);
BUG_ON(!pmd_none(*pmd));
page_add_new_anon_rmap(new_page, vma, address, true);
- mem_cgroup_commit_charge(new_page, memcg, false, true);
+ mem_cgroup_commit_charge(new_page, memcg, false);
count_memcg_events(memcg, THP_COLLAPSE_ALLOC, 1);
lru_cache_add_active_or_unevictable(new_page, vma);
pgtable_trans_huge_deposit(mm, pmd, pgtable);
@@ -1201,7 +1201,7 @@ static void collapse_huge_page(struct mm_struct *mm,
trace_mm_collapse_huge_page(mm, isolated, result);
return;
out:
- mem_cgroup_cancel_charge(new_page, memcg, true);
+ mem_cgroup_cancel_charge(new_page, memcg);
goto out_up_write;
}
@@ -1628,7 +1628,7 @@ static void collapse_file(struct mm_struct *mm,
goto out;
}
- if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp, &memcg, true))) {
+ if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp, &memcg))) {
result = SCAN_CGROUP_CHARGE_FAIL;
goto out;
}
@@ -1641,7 +1641,7 @@ static void collapse_file(struct mm_struct *mm,
break;
xas_unlock_irq(&xas);
if (!xas_nomem(&xas, GFP_KERNEL)) {
- mem_cgroup_cancel_charge(new_page, memcg, true);
+ mem_cgroup_cancel_charge(new_page, memcg);
result = SCAN_FAIL;
goto out;
}
@@ -1877,7 +1877,7 @@ static void collapse_file(struct mm_struct *mm,
SetPageUptodate(new_page);
page_ref_add(new_page, HPAGE_PMD_NR - 1);
- mem_cgroup_commit_charge(new_page, memcg, false, true);
+ mem_cgroup_commit_charge(new_page, memcg, false);
if (is_shmem) {
set_page_dirty(new_page);
@@ -1932,7 +1932,7 @@ static void collapse_file(struct mm_struct *mm,
VM_BUG_ON(nr_none);
xas_unlock_irq(&xas);
- mem_cgroup_cancel_charge(new_page, memcg, true);
+ mem_cgroup_cancel_charge(new_page, memcg);
new_page->mapping = NULL;
}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index cdd29b59929b..13da46a5d8ae 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -834,7 +834,7 @@ static unsigned long memcg_events_local(struct mem_cgroup *memcg, int event)
static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
struct page *page,
- bool compound, int nr_pages)
+ int nr_pages)
{
/*
* Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is
@@ -848,7 +848,7 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
__mod_memcg_state(memcg, NR_SHMEM, nr_pages);
}
- if (compound) {
+ if (abs(nr_pages) > 1) {
VM_BUG_ON_PAGE(!PageTransHuge(page), page);
__mod_memcg_state(memcg, MEMCG_RSS_HUGE, nr_pages);
}
@@ -5445,9 +5445,9 @@ static int mem_cgroup_move_account(struct page *page,
ret = 0;
local_irq_disable();
- mem_cgroup_charge_statistics(to, page, compound, nr_pages);
+ mem_cgroup_charge_statistics(to, page, nr_pages);
memcg_check_events(to, page);
- mem_cgroup_charge_statistics(from, page, compound, -nr_pages);
+ mem_cgroup_charge_statistics(from, page, -nr_pages);
memcg_check_events(from, page);
local_irq_enable();
out_unlock:
@@ -6435,7 +6435,6 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root,
* @mm: mm context of the victim
* @gfp_mask: reclaim mode
* @memcgp: charged memcg return
- * @compound: charge the page as compound or small page
*
* Try to charge @page to the memcg that @mm belongs to, reclaiming
* pages according to @gfp_mask if necessary.
@@ -6448,11 +6447,10 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root,
* with mem_cgroup_cancel_charge() in case page instantiation fails.
*/
int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
- gfp_t gfp_mask, struct mem_cgroup **memcgp,
- bool compound)
+ gfp_t gfp_mask, struct mem_cgroup **memcgp)
{
+ unsigned int nr_pages = hpage_nr_pages(page);
struct mem_cgroup *memcg = NULL;
- unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1;
int ret = 0;
if (mem_cgroup_disabled())
@@ -6494,13 +6492,12 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
}
int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm,
- gfp_t gfp_mask, struct mem_cgroup **memcgp,
- bool compound)
+ gfp_t gfp_mask, struct mem_cgroup **memcgp)
{
struct mem_cgroup *memcg;
int ret;
- ret = mem_cgroup_try_charge(page, mm, gfp_mask, memcgp, compound);
+ ret = mem_cgroup_try_charge(page, mm, gfp_mask, memcgp);
memcg = *memcgp;
mem_cgroup_throttle_swaprate(memcg, page_to_nid(page), gfp_mask);
return ret;
@@ -6511,7 +6508,6 @@ int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm,
* @page: page to charge
* @memcg: memcg to charge the page to
* @lrucare: page might be on LRU already
- * @compound: charge the page as compound or small page
*
* Finalize a charge transaction started by mem_cgroup_try_charge(),
* after page->mapping has been set up. This must happen atomically
@@ -6524,9 +6520,9 @@ int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm,
* Use mem_cgroup_cancel_charge() to cancel the transaction instead.
*/
void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
- bool lrucare, bool compound)
+ bool lrucare)
{
- unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1;
+ unsigned int nr_pages = hpage_nr_pages(page);
VM_BUG_ON_PAGE(!page->mapping, page);
VM_BUG_ON_PAGE(PageLRU(page) && !lrucare, page);
@@ -6544,7 +6540,7 @@ void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
commit_charge(page, memcg, lrucare);
local_irq_disable();
- mem_cgroup_charge_statistics(memcg, page, compound, nr_pages);
+ mem_cgroup_charge_statistics(memcg, page, nr_pages);
memcg_check_events(memcg, page);
local_irq_enable();
@@ -6563,14 +6559,12 @@ void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
* mem_cgroup_cancel_charge - cancel a page charge
* @page: page to charge
* @memcg: memcg to charge the page to
- * @compound: charge the page as compound or small page
*
* Cancel a charge transaction started by mem_cgroup_try_charge().
*/
-void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg,
- bool compound)
+void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg)
{
- unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1;
+ unsigned int nr_pages = hpage_nr_pages(page);
if (mem_cgroup_disabled())
return;
@@ -6785,8 +6779,7 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage)
commit_charge(newpage, memcg, false);
local_irq_save(flags);
- mem_cgroup_charge_statistics(memcg, newpage, PageTransHuge(newpage),
- nr_pages);
+ mem_cgroup_charge_statistics(memcg, newpage, nr_pages);
memcg_check_events(memcg, newpage);
local_irq_restore(flags);
}
@@ -7016,8 +7009,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
* only synchronisation we have for updating the per-CPU variables.
*/
VM_BUG_ON(!irqs_disabled());
- mem_cgroup_charge_statistics(memcg, page, PageTransHuge(page),
- -nr_entries);
+ mem_cgroup_charge_statistics(memcg, page, -nr_entries);
memcg_check_events(memcg, page);
if (!mem_cgroup_is_root(memcg))
diff --git a/mm/memory.c b/mm/memory.c
index 0ad29c7274de..a08cbaa81607 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2676,7 +2676,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
}
}
- if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg, false))
+ if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg))
goto oom_free_new;
__SetPageUptodate(new_page);
@@ -2711,7 +2711,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
*/
ptep_clear_flush_notify(vma, vmf->address, vmf->pte);
page_add_new_anon_rmap(new_page, vma, vmf->address, false);
- mem_cgroup_commit_charge(new_page, memcg, false, false);
+ mem_cgroup_commit_charge(new_page, memcg, false);
lru_cache_add_active_or_unevictable(new_page, vma);
/*
* We call the notify macro here because, when using secondary
@@ -2750,7 +2750,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
new_page = old_page;
page_copied = 1;
} else {
- mem_cgroup_cancel_charge(new_page, memcg, false);
+ mem_cgroup_cancel_charge(new_page, memcg);
}
if (new_page)
@@ -3193,8 +3193,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
goto out_page;
}
- if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL,
- &memcg, false)) {
+ if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, &memcg)) {
ret = VM_FAULT_OOM;
goto out_page;
}
@@ -3245,11 +3244,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
/* ksm created a completely new copy */
if (unlikely(page != swapcache && swapcache)) {
page_add_new_anon_rmap(page, vma, vmf->address, false);
- mem_cgroup_commit_charge(page, memcg, false, false);
+ mem_cgroup_commit_charge(page, memcg, false);
lru_cache_add_active_or_unevictable(page, vma);
} else {
do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
- mem_cgroup_commit_charge(page, memcg, true, false);
+ mem_cgroup_commit_charge(page, memcg, true);
activate_page(page);
}
@@ -3285,7 +3284,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
out:
return ret;
out_nomap:
- mem_cgroup_cancel_charge(page, memcg, false);
+ mem_cgroup_cancel_charge(page, memcg);
pte_unmap_unlock(vmf->pte, vmf->ptl);
out_page:
unlock_page(page);
@@ -3359,8 +3358,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
if (!page)
goto oom;
- if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, &memcg,
- false))
+ if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, &memcg))
goto oom_free_page;
/*
@@ -3386,14 +3384,14 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
/* Deliver the page fault to userland, check inside PT lock */
if (userfaultfd_missing(vma)) {
pte_unmap_unlock(vmf->pte, vmf->ptl);
- mem_cgroup_cancel_charge(page, memcg, false);
+ mem_cgroup_cancel_charge(page, memcg);
put_page(page);
return handle_userfault(vmf, VM_UFFD_MISSING);
}
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, vma, vmf->address, false);
- mem_cgroup_commit_charge(page, memcg, false, false);
+ mem_cgroup_commit_charge(page, memcg, false);
lru_cache_add_active_or_unevictable(page, vma);
setpte:
set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);
@@ -3404,7 +3402,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
pte_unmap_unlock(vmf->pte, vmf->ptl);
return ret;
release:
- mem_cgroup_cancel_charge(page, memcg, false);
+ mem_cgroup_cancel_charge(page, memcg);
put_page(page);
goto unlock;
oom_free_page:
@@ -3655,7 +3653,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
if (write && !(vma->vm_flags & VM_SHARED)) {
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, vma, vmf->address, false);
- mem_cgroup_commit_charge(page, memcg, false, false);
+ mem_cgroup_commit_charge(page, memcg, false);
lru_cache_add_active_or_unevictable(page, vma);
} else {
inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
@@ -3864,8 +3862,8 @@ static vm_fault_t do_cow_fault(struct vm_fault *vmf)
if (!vmf->cow_page)
return VM_FAULT_OOM;
- if (mem_cgroup_try_charge_delay(vmf->cow_page, vma->vm_mm, GFP_KERNEL,
- &vmf->memcg, false)) {
+ if (mem_cgroup_try_charge_delay(vmf->cow_page, vma->vm_mm,
+ GFP_KERNEL, &vmf->memcg)) {
put_page(vmf->cow_page);
return VM_FAULT_OOM;
}
@@ -3886,7 +3884,7 @@ static vm_fault_t do_cow_fault(struct vm_fault *vmf)
goto uncharge_out;
return ret;
uncharge_out:
- mem_cgroup_cancel_charge(vmf->cow_page, vmf->memcg, false);
+ mem_cgroup_cancel_charge(vmf->cow_page, vmf->memcg);
put_page(vmf->cow_page);
return ret;
}
diff --git a/mm/migrate.c b/mm/migrate.c
index f66f93f9a5e2..50c7a08f8f31 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2786,7 +2786,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
if (unlikely(anon_vma_prepare(vma)))
goto abort;
- if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, &memcg, false))
+ if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, &memcg))
goto abort;
/*
@@ -2832,7 +2832,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
inc_mm_counter(mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, vma, addr, false);
- mem_cgroup_commit_charge(page, memcg, false, false);
+ mem_cgroup_commit_charge(page, memcg, false);
if (!is_zone_device_page(page))
lru_cache_add_active_or_unevictable(page, vma);
get_page(page);
@@ -2854,7 +2854,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
unlock_abort:
pte_unmap_unlock(ptep, ptl);
- mem_cgroup_cancel_charge(page, memcg, false);
+ mem_cgroup_cancel_charge(page, memcg);
abort:
*src &= ~MIGRATE_PFN_MIGRATE;
}
diff --git a/mm/shmem.c b/mm/shmem.c
index bd8840082c94..d505b6cce4ab 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1664,8 +1664,7 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index,
goto failed;
}
- error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg,
- false);
+ error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg);
if (!error) {
error = shmem_add_to_page_cache(page, mapping, index,
swp_to_radix_entry(swap), gfp);
@@ -1680,14 +1679,14 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index,
* the rest.
*/
if (error) {
- mem_cgroup_cancel_charge(page, memcg, false);
+ mem_cgroup_cancel_charge(page, memcg);
delete_from_swap_cache(page);
}
}
if (error)
goto failed;
- mem_cgroup_commit_charge(page, memcg, true, false);
+ mem_cgroup_commit_charge(page, memcg, true);
spin_lock_irq(&info->lock);
info->swapped--;
@@ -1859,8 +1858,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
if (sgp == SGP_WRITE)
__SetPageReferenced(page);
- error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg,
- PageTransHuge(page));
+ error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg);
if (error) {
if (PageTransHuge(page)) {
count_vm_event(THP_FILE_FALLBACK);
@@ -1871,12 +1869,10 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
error = shmem_add_to_page_cache(page, mapping, hindex,
NULL, gfp & GFP_RECLAIM_MASK);
if (error) {
- mem_cgroup_cancel_charge(page, memcg,
- PageTransHuge(page));
+ mem_cgroup_cancel_charge(page, memcg);
goto unacct;
}
- mem_cgroup_commit_charge(page, memcg, false,
- PageTransHuge(page));
+ mem_cgroup_commit_charge(page, memcg, false);
lru_cache_add_anon(page);
spin_lock_irq(&info->lock);
@@ -2364,7 +2360,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
if (unlikely(offset >= max_off))
goto out_release;
- ret = mem_cgroup_try_charge_delay(page, dst_mm, gfp, &memcg, false);
+ ret = mem_cgroup_try_charge_delay(page, dst_mm, gfp, &memcg);
if (ret)
goto out_release;
@@ -2373,7 +2369,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
if (ret)
goto out_release_uncharge;
- mem_cgroup_commit_charge(page, memcg, false, false);
+ mem_cgroup_commit_charge(page, memcg, false);
_dst_pte = mk_pte(page, dst_vma->vm_page_prot);
if (dst_vma->vm_flags & VM_WRITE)
@@ -2424,7 +2420,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
ClearPageDirty(page);
delete_from_page_cache(page);
out_release_uncharge:
- mem_cgroup_cancel_charge(page, memcg, false);
+ mem_cgroup_cancel_charge(page, memcg);
out_release:
unlock_page(page);
put_page(page);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 0aa9a9dd5d3d..15e5f8f290cc 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1868,15 +1868,14 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
if (unlikely(!page))
return -ENOMEM;
- if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL,
- &memcg, false)) {
+ if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, &memcg)) {
ret = -ENOMEM;
goto out_nolock;
}
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
if (unlikely(!pte_same_as_swp(*pte, swp_entry_to_pte(entry)))) {
- mem_cgroup_cancel_charge(page, memcg, false);
+ mem_cgroup_cancel_charge(page, memcg);
ret = 0;
goto out;
}
@@ -1888,10 +1887,10 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
pte_mkold(mk_pte(page, vma->vm_page_prot)));
if (page == swapcache) {
page_add_anon_rmap(page, vma, addr, false);
- mem_cgroup_commit_charge(page, memcg, true, false);
+ mem_cgroup_commit_charge(page, memcg, true);
} else { /* ksm created a completely new copy */
page_add_new_anon_rmap(page, vma, addr, false);
- mem_cgroup_commit_charge(page, memcg, false, false);
+ mem_cgroup_commit_charge(page, memcg, false);
lru_cache_add_active_or_unevictable(page, vma);
}
swap_free(entry);
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 512576e171ce..bb57d0a3fca7 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -97,7 +97,7 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
__SetPageUptodate(page);
ret = -ENOMEM;
- if (mem_cgroup_try_charge(page, dst_mm, GFP_KERNEL, &memcg, false))
+ if (mem_cgroup_try_charge(page, dst_mm, GFP_KERNEL, &memcg))
goto out_release;
_dst_pte = pte_mkdirty(mk_pte(page, dst_vma->vm_page_prot));
@@ -124,7 +124,7 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
inc_mm_counter(dst_mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
- mem_cgroup_commit_charge(page, memcg, false, false);
+ mem_cgroup_commit_charge(page, memcg, false);
lru_cache_add_active_or_unevictable(page, dst_vma);
set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
@@ -138,7 +138,7 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
return ret;
out_release_uncharge_unlock:
pte_unmap_unlock(dst_pte, ptl);
- mem_cgroup_cancel_charge(page, memcg, false);
+ mem_cgroup_cancel_charge(page, memcg);
out_release:
put_page(page);
goto out;
--
2.26.2
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 07/19] mm: memcontrol: prepare move_account for removal of private page type counters
2020-05-08 18:30 [PATCH 00/19 V2] mm: memcontrol: charge swapin pages on instantiation Johannes Weiner
@ 2020-05-08 18:30 ` Johannes Weiner
2020-05-08 18:30 ` [PATCH 04/19] mm: memcontrol: move out cgroup swaprate throttling Johannes Weiner
` (9 subsequent siblings)
10 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2020-05-08 18:30 UTC (permalink / raw)
To: Andrew Morton
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins, Michal Hocko,
Kirill A. Shutemov, Roman Gushchin,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
When memcg uses the generic vmstat counters, it doesn't need to do
anything at charging and uncharging time. It does, however, need to
migrate counts when pages move to a different cgroup in move_account.
Prepare the move_account function for the arrival of NR_FILE_PAGES,
NR_ANON_MAPPED, NR_ANON_THPS etc. by having a branch for files and a
branch for anon, which can then divided into sub-branches.
Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Reviewed-by: Alex Shi <alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim-Hm3cg6mZ9cc@public.gmane.org>
---
mm/memcontrol.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a5efdad77be4..fe4212db8411 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5378,7 +5378,6 @@ static int mem_cgroup_move_account(struct page *page,
struct pglist_data *pgdat;
unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1;
int ret;
- bool anon;
VM_BUG_ON(from == to);
VM_BUG_ON_PAGE(PageLRU(page), page);
@@ -5396,25 +5395,27 @@ static int mem_cgroup_move_account(struct page *page,
if (page->mem_cgroup != from)
goto out_unlock;
- anon = PageAnon(page);
-
pgdat = page_pgdat(page);
from_vec = mem_cgroup_lruvec(from, pgdat);
to_vec = mem_cgroup_lruvec(to, pgdat);
lock_page_memcg(page);
- if (!anon && page_mapped(page)) {
- __mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages);
- __mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages);
- }
+ if (!PageAnon(page)) {
+ if (page_mapped(page)) {
+ __mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages);
+ __mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages);
+ }
- if (!anon && PageDirty(page)) {
- struct address_space *mapping = page_mapping(page);
+ if (PageDirty(page)) {
+ struct address_space *mapping = page_mapping(page);
- if (mapping_cap_account_dirty(mapping)) {
- __mod_lruvec_state(from_vec, NR_FILE_DIRTY, -nr_pages);
- __mod_lruvec_state(to_vec, NR_FILE_DIRTY, nr_pages);
+ if (mapping_cap_account_dirty(mapping)) {
+ __mod_lruvec_state(from_vec, NR_FILE_DIRTY,
+ -nr_pages);
+ __mod_lruvec_state(to_vec, NR_FILE_DIRTY,
+ nr_pages);
+ }
}
}
--
2.26.2
^ permalink raw reply related [flat|nested] 58+ messages in thread* [PATCH 07/19] mm: memcontrol: prepare move_account for removal of private page type counters
@ 2020-05-08 18:30 ` Johannes Weiner
0 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2020-05-08 18:30 UTC (permalink / raw)
To: Andrew Morton
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins, Michal Hocko,
Kirill A. Shutemov, Roman Gushchin, linux-mm, cgroups,
linux-kernel, kernel-team
When memcg uses the generic vmstat counters, it doesn't need to do
anything at charging and uncharging time. It does, however, need to
migrate counts when pages move to a different cgroup in move_account.
Prepare the move_account function for the arrival of NR_FILE_PAGES,
NR_ANON_MAPPED, NR_ANON_THPS etc. by having a branch for files and a
branch for anon, which can then divided into sub-branches.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
mm/memcontrol.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a5efdad77be4..fe4212db8411 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5378,7 +5378,6 @@ static int mem_cgroup_move_account(struct page *page,
struct pglist_data *pgdat;
unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1;
int ret;
- bool anon;
VM_BUG_ON(from == to);
VM_BUG_ON_PAGE(PageLRU(page), page);
@@ -5396,25 +5395,27 @@ static int mem_cgroup_move_account(struct page *page,
if (page->mem_cgroup != from)
goto out_unlock;
- anon = PageAnon(page);
-
pgdat = page_pgdat(page);
from_vec = mem_cgroup_lruvec(from, pgdat);
to_vec = mem_cgroup_lruvec(to, pgdat);
lock_page_memcg(page);
- if (!anon && page_mapped(page)) {
- __mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages);
- __mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages);
- }
+ if (!PageAnon(page)) {
+ if (page_mapped(page)) {
+ __mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages);
+ __mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages);
+ }
- if (!anon && PageDirty(page)) {
- struct address_space *mapping = page_mapping(page);
+ if (PageDirty(page)) {
+ struct address_space *mapping = page_mapping(page);
- if (mapping_cap_account_dirty(mapping)) {
- __mod_lruvec_state(from_vec, NR_FILE_DIRTY, -nr_pages);
- __mod_lruvec_state(to_vec, NR_FILE_DIRTY, nr_pages);
+ if (mapping_cap_account_dirty(mapping)) {
+ __mod_lruvec_state(from_vec, NR_FILE_DIRTY,
+ -nr_pages);
+ __mod_lruvec_state(to_vec, NR_FILE_DIRTY,
+ nr_pages);
+ }
}
}
--
2.26.2
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 10/19] mm: memcontrol: switch to native NR_ANON_MAPPED counter
2020-05-08 18:30 [PATCH 00/19 V2] mm: memcontrol: charge swapin pages on instantiation Johannes Weiner
@ 2020-05-08 18:30 ` Johannes Weiner
2020-05-08 18:30 ` [PATCH 04/19] mm: memcontrol: move out cgroup swaprate throttling Johannes Weiner
` (9 subsequent siblings)
10 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2020-05-08 18:30 UTC (permalink / raw)
To: Andrew Morton
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins, Michal Hocko,
Kirill A. Shutemov, Roman Gushchin,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
Memcg maintains a private MEMCG_RSS counter. This divergence from the
generic VM accounting means unnecessary code overhead, and creates a
dependency for memcg that page->mapping is set up at the time of
charging, so that page types can be told apart.
Convert the generic accounting sites to mod_lruvec_page_state and
friends to maintain the per-cgroup vmstat counter of
NR_ANON_MAPPED. We use lock_page_memcg() to stabilize page->mem_cgroup
during rmap changes, the same way we do for NR_FILE_MAPPED.
With the previous patch removing MEMCG_CACHE and the private NR_SHMEM
counter, this patch finally eliminates the need to have page->mapping
set up at charge time. However, we need to have page->mem_cgroup set
up by the time rmap runs and does the accounting, so switch the commit
and the rmap callbacks around.
v2: fix temporary accounting bug by switching rmap<->commit (Joonsoo)
Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
---
include/linux/memcontrol.h | 3 +--
kernel/events/uprobes.c | 2 +-
mm/huge_memory.c | 2 +-
mm/khugepaged.c | 2 +-
mm/memcontrol.c | 27 ++++++++--------------
mm/memory.c | 10 ++++----
mm/migrate.c | 2 +-
mm/rmap.c | 47 +++++++++++++++++++++++---------------
mm/swapfile.c | 4 ++--
mm/userfaultfd.c | 2 +-
10 files changed, 51 insertions(+), 50 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index f932e7e9fad8..2df978a3a253 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -29,8 +29,7 @@ struct kmem_cache;
/* Cgroup-specific page state, on top of universal node page state */
enum memcg_stat_item {
- MEMCG_RSS = NR_VM_NODE_STAT_ITEMS,
- MEMCG_RSS_HUGE,
+ MEMCG_RSS_HUGE = NR_VM_NODE_STAT_ITEMS,
MEMCG_SWAP,
MEMCG_SOCK,
/* XXX: why are these zone and not node counters? */
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 40e7488ce467..89ef81b65bcb 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -188,8 +188,8 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
if (new_page) {
get_page(new_page);
- page_add_new_anon_rmap(new_page, vma, addr, false);
mem_cgroup_commit_charge(new_page, memcg, false);
+ page_add_new_anon_rmap(new_page, vma, addr, false);
lru_cache_add_active_or_unevictable(new_page, vma);
} else
/* no new page, just dec_mm_counter for old_page */
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 46c2bc20b7cb..07c012d89570 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -640,8 +640,8 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
entry = mk_huge_pmd(page, vma->vm_page_prot);
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
- page_add_new_anon_rmap(page, vma, haddr, true);
mem_cgroup_commit_charge(page, memcg, false);
+ page_add_new_anon_rmap(page, vma, haddr, true);
lru_cache_add_active_or_unevictable(page, vma);
pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index e2be7f9a92db..be67ebe8a120 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1182,8 +1182,8 @@ static void collapse_huge_page(struct mm_struct *mm,
spin_lock(pmd_ptl);
BUG_ON(!pmd_none(*pmd));
- page_add_new_anon_rmap(new_page, vma, address, true);
mem_cgroup_commit_charge(new_page, memcg, false);
+ page_add_new_anon_rmap(new_page, vma, address, true);
count_memcg_events(memcg, THP_COLLAPSE_ALLOC, 1);
lru_cache_add_active_or_unevictable(new_page, vma);
pgtable_trans_huge_deposit(mm, pmd, pgtable);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c4c060ce1876..fccb396ed7bd 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -836,13 +836,6 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
struct page *page,
int nr_pages)
{
- /*
- * Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is
- * counted as CACHE even if it's on ANON LRU.
- */
- if (PageAnon(page))
- __mod_memcg_state(memcg, MEMCG_RSS, nr_pages);
-
if (abs(nr_pages) > 1) {
VM_BUG_ON_PAGE(!PageTransHuge(page), page);
__mod_memcg_state(memcg, MEMCG_RSS_HUGE, nr_pages);
@@ -1384,7 +1377,7 @@ static char *memory_stat_format(struct mem_cgroup *memcg)
*/
seq_buf_printf(&s, "anon %llu\n",
- (u64)memcg_page_state(memcg, MEMCG_RSS) *
+ (u64)memcg_page_state(memcg, NR_ANON_MAPPED) *
PAGE_SIZE);
seq_buf_printf(&s, "file %llu\n",
(u64)memcg_page_state(memcg, NR_FILE_PAGES) *
@@ -3298,7 +3291,7 @@ static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
if (mem_cgroup_is_root(memcg)) {
val = memcg_page_state(memcg, NR_FILE_PAGES) +
- memcg_page_state(memcg, MEMCG_RSS);
+ memcg_page_state(memcg, NR_ANON_MAPPED);
if (swap)
val += memcg_page_state(memcg, MEMCG_SWAP);
} else {
@@ -3769,7 +3762,7 @@ static int memcg_numa_stat_show(struct seq_file *m, void *v)
static const unsigned int memcg1_stats[] = {
NR_FILE_PAGES,
- MEMCG_RSS,
+ NR_ANON_MAPPED,
MEMCG_RSS_HUGE,
NR_SHMEM,
NR_FILE_MAPPED,
@@ -5399,7 +5392,12 @@ static int mem_cgroup_move_account(struct page *page,
lock_page_memcg(page);
- if (!PageAnon(page)) {
+ if (PageAnon(page)) {
+ if (page_mapped(page)) {
+ __mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages);
+ __mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages);
+ }
+ } else {
__mod_lruvec_state(from_vec, NR_FILE_PAGES, -nr_pages);
__mod_lruvec_state(to_vec, NR_FILE_PAGES, nr_pages);
@@ -6530,7 +6528,6 @@ void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
{
unsigned int nr_pages = hpage_nr_pages(page);
- VM_BUG_ON_PAGE(!page->mapping, page);
VM_BUG_ON_PAGE(PageLRU(page) && !lrucare, page);
if (mem_cgroup_disabled())
@@ -6603,8 +6600,6 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
struct mem_cgroup *memcg;
int ret;
- VM_BUG_ON_PAGE(!page->mapping, page);
-
ret = mem_cgroup_try_charge(page, mm, gfp_mask, &memcg);
if (ret)
return ret;
@@ -6616,7 +6611,6 @@ struct uncharge_gather {
struct mem_cgroup *memcg;
unsigned long nr_pages;
unsigned long pgpgout;
- unsigned long nr_anon;
unsigned long nr_kmem;
unsigned long nr_huge;
struct page *dummy_page;
@@ -6641,7 +6635,6 @@ static void uncharge_batch(const struct uncharge_gather *ug)
}
local_irq_save(flags);
- __mod_memcg_state(ug->memcg, MEMCG_RSS, -ug->nr_anon);
__mod_memcg_state(ug->memcg, MEMCG_RSS_HUGE, -ug->nr_huge);
__count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout);
__this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_pages);
@@ -6681,8 +6674,6 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
if (!PageKmemcg(page)) {
if (PageTransHuge(page))
ug->nr_huge += nr_pages;
- if (PageAnon(page))
- ug->nr_anon += nr_pages;
ug->pgpgout++;
} else {
ug->nr_kmem += nr_pages;
diff --git a/mm/memory.c b/mm/memory.c
index a08cbaa81607..46c3e5dc918d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2710,8 +2710,8 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
* thread doing COW.
*/
ptep_clear_flush_notify(vma, vmf->address, vmf->pte);
- page_add_new_anon_rmap(new_page, vma, vmf->address, false);
mem_cgroup_commit_charge(new_page, memcg, false);
+ page_add_new_anon_rmap(new_page, vma, vmf->address, false);
lru_cache_add_active_or_unevictable(new_page, vma);
/*
* We call the notify macro here because, when using secondary
@@ -3243,12 +3243,12 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
/* ksm created a completely new copy */
if (unlikely(page != swapcache && swapcache)) {
- page_add_new_anon_rmap(page, vma, vmf->address, false);
mem_cgroup_commit_charge(page, memcg, false);
+ page_add_new_anon_rmap(page, vma, vmf->address, false);
lru_cache_add_active_or_unevictable(page, vma);
} else {
- do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
mem_cgroup_commit_charge(page, memcg, true);
+ do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
activate_page(page);
}
@@ -3390,8 +3390,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
}
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
- page_add_new_anon_rmap(page, vma, vmf->address, false);
mem_cgroup_commit_charge(page, memcg, false);
+ page_add_new_anon_rmap(page, vma, vmf->address, false);
lru_cache_add_active_or_unevictable(page, vma);
setpte:
set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);
@@ -3652,8 +3652,8 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
/* copy-on-write page */
if (write && !(vma->vm_flags & VM_SHARED)) {
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
- page_add_new_anon_rmap(page, vma, vmf->address, false);
mem_cgroup_commit_charge(page, memcg, false);
+ page_add_new_anon_rmap(page, vma, vmf->address, false);
lru_cache_add_active_or_unevictable(page, vma);
} else {
inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
diff --git a/mm/migrate.c b/mm/migrate.c
index 3af5447e7aca..e84fb5b87a85 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2838,8 +2838,8 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
goto unlock_abort;
inc_mm_counter(mm, MM_ANONPAGES);
- page_add_new_anon_rmap(page, vma, addr, false);
mem_cgroup_commit_charge(page, memcg, false);
+ page_add_new_anon_rmap(page, vma, addr, false);
if (!is_zone_device_page(page))
lru_cache_add_active_or_unevictable(page, vma);
get_page(page);
diff --git a/mm/rmap.c b/mm/rmap.c
index 2126fd4a254b..e96f1d099c3f 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1114,6 +1114,11 @@ void do_page_add_anon_rmap(struct page *page,
bool compound = flags & RMAP_COMPOUND;
bool first;
+ if (unlikely(PageKsm(page)))
+ lock_page_memcg(page);
+ else
+ VM_BUG_ON_PAGE(!PageLocked(page), page);
+
if (compound) {
atomic_t *mapcount;
VM_BUG_ON_PAGE(!PageLocked(page), page);
@@ -1134,12 +1139,13 @@ void do_page_add_anon_rmap(struct page *page,
*/
if (compound)
__inc_node_page_state(page, NR_ANON_THPS);
- __mod_node_page_state(page_pgdat(page), NR_ANON_MAPPED, nr);
+ __mod_lruvec_page_state(page, NR_ANON_MAPPED, nr);
}
- if (unlikely(PageKsm(page)))
- return;
- VM_BUG_ON_PAGE(!PageLocked(page), page);
+ if (unlikely(PageKsm(page))) {
+ unlock_page_memcg(page);
+ return;
+ }
/* address might be in next vma when migration races vma_adjust */
if (first)
@@ -1181,7 +1187,7 @@ void page_add_new_anon_rmap(struct page *page,
/* increment count (starts at -1) */
atomic_set(&page->_mapcount, 0);
}
- __mod_node_page_state(page_pgdat(page), NR_ANON_MAPPED, nr);
+ __mod_lruvec_page_state(page, NR_ANON_MAPPED, nr);
__page_set_anon_rmap(page, vma, address, 1);
}
@@ -1230,13 +1236,12 @@ static void page_remove_file_rmap(struct page *page, bool compound)
int i, nr = 1;
VM_BUG_ON_PAGE(compound && !PageHead(page), page);
- lock_page_memcg(page);
/* Hugepages are not counted in NR_FILE_MAPPED for now. */
if (unlikely(PageHuge(page))) {
/* hugetlb pages are always mapped with pmds */
atomic_dec(compound_mapcount_ptr(page));
- goto out;
+ return;
}
/* page still mapped by someone else? */
@@ -1246,14 +1251,14 @@ static void page_remove_file_rmap(struct page *page, bool compound)
nr++;
}
if (!atomic_add_negative(-1, compound_mapcount_ptr(page)))
- goto out;
+ return;
if (PageSwapBacked(page))
__dec_node_page_state(page, NR_SHMEM_PMDMAPPED);
else
__dec_node_page_state(page, NR_FILE_PMDMAPPED);
} else {
if (!atomic_add_negative(-1, &page->_mapcount))
- goto out;
+ return;
}
/*
@@ -1265,8 +1270,6 @@ static void page_remove_file_rmap(struct page *page, bool compound)
if (unlikely(PageMlocked(page)))
clear_page_mlock(page);
-out:
- unlock_page_memcg(page);
}
static void page_remove_anon_compound_rmap(struct page *page)
@@ -1310,7 +1313,7 @@ static void page_remove_anon_compound_rmap(struct page *page)
clear_page_mlock(page);
if (nr)
- __mod_node_page_state(page_pgdat(page), NR_ANON_MAPPED, -nr);
+ __mod_lruvec_page_state(page, NR_ANON_MAPPED, -nr);
}
/**
@@ -1322,22 +1325,28 @@ static void page_remove_anon_compound_rmap(struct page *page)
*/
void page_remove_rmap(struct page *page, bool compound)
{
- if (!PageAnon(page))
- return page_remove_file_rmap(page, compound);
+ lock_page_memcg(page);
- if (compound)
- return page_remove_anon_compound_rmap(page);
+ if (!PageAnon(page)) {
+ page_remove_file_rmap(page, compound);
+ goto out;
+ }
+
+ if (compound) {
+ page_remove_anon_compound_rmap(page);
+ goto out;
+ }
/* page still mapped by someone else? */
if (!atomic_add_negative(-1, &page->_mapcount))
- return;
+ goto out;
/*
* We use the irq-unsafe __{inc|mod}_zone_page_stat because
* these counters are not modified in interrupt context, and
* pte lock(a spinlock) is held, which implies preemption disabled.
*/
- __dec_node_page_state(page, NR_ANON_MAPPED);
+ __dec_lruvec_page_state(page, NR_ANON_MAPPED);
if (unlikely(PageMlocked(page)))
clear_page_mlock(page);
@@ -1354,6 +1363,8 @@ void page_remove_rmap(struct page *page, bool compound)
* Leaving it set also helps swapoff to reinstate ptes
* faster for those pages still in swapcache.
*/
+out:
+ unlock_page_memcg(page);
}
/*
diff --git a/mm/swapfile.c b/mm/swapfile.c
index ad42eac1822d..45b937b924f5 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1886,11 +1886,11 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
set_pte_at(vma->vm_mm, addr, pte,
pte_mkold(mk_pte(page, vma->vm_page_prot)));
if (page == swapcache) {
- page_add_anon_rmap(page, vma, addr, false);
mem_cgroup_commit_charge(page, memcg, true);
+ page_add_anon_rmap(page, vma, addr, false);
} else { /* ksm created a completely new copy */
- page_add_new_anon_rmap(page, vma, addr, false);
mem_cgroup_commit_charge(page, memcg, false);
+ page_add_new_anon_rmap(page, vma, addr, false);
lru_cache_add_active_or_unevictable(page, vma);
}
swap_free(entry);
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index bb57d0a3fca7..3dea268d2850 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -123,8 +123,8 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
goto out_release_uncharge_unlock;
inc_mm_counter(dst_mm, MM_ANONPAGES);
- page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
mem_cgroup_commit_charge(page, memcg, false);
+ page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
lru_cache_add_active_or_unevictable(page, dst_vma);
set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
--
2.26.2
^ permalink raw reply related [flat|nested] 58+ messages in thread* [PATCH 10/19] mm: memcontrol: switch to native NR_ANON_MAPPED counter
@ 2020-05-08 18:30 ` Johannes Weiner
0 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2020-05-08 18:30 UTC (permalink / raw)
To: Andrew Morton
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins, Michal Hocko,
Kirill A. Shutemov, Roman Gushchin, linux-mm, cgroups,
linux-kernel, kernel-team
Memcg maintains a private MEMCG_RSS counter. This divergence from the
generic VM accounting means unnecessary code overhead, and creates a
dependency for memcg that page->mapping is set up at the time of
charging, so that page types can be told apart.
Convert the generic accounting sites to mod_lruvec_page_state and
friends to maintain the per-cgroup vmstat counter of
NR_ANON_MAPPED. We use lock_page_memcg() to stabilize page->mem_cgroup
during rmap changes, the same way we do for NR_FILE_MAPPED.
With the previous patch removing MEMCG_CACHE and the private NR_SHMEM
counter, this patch finally eliminates the need to have page->mapping
set up at charge time. However, we need to have page->mem_cgroup set
up by the time rmap runs and does the accounting, so switch the commit
and the rmap callbacks around.
v2: fix temporary accounting bug by switching rmap<->commit (Joonsoo)
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
include/linux/memcontrol.h | 3 +--
kernel/events/uprobes.c | 2 +-
mm/huge_memory.c | 2 +-
mm/khugepaged.c | 2 +-
mm/memcontrol.c | 27 ++++++++--------------
mm/memory.c | 10 ++++----
mm/migrate.c | 2 +-
mm/rmap.c | 47 +++++++++++++++++++++++---------------
mm/swapfile.c | 4 ++--
mm/userfaultfd.c | 2 +-
10 files changed, 51 insertions(+), 50 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index f932e7e9fad8..2df978a3a253 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -29,8 +29,7 @@ struct kmem_cache;
/* Cgroup-specific page state, on top of universal node page state */
enum memcg_stat_item {
- MEMCG_RSS = NR_VM_NODE_STAT_ITEMS,
- MEMCG_RSS_HUGE,
+ MEMCG_RSS_HUGE = NR_VM_NODE_STAT_ITEMS,
MEMCG_SWAP,
MEMCG_SOCK,
/* XXX: why are these zone and not node counters? */
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 40e7488ce467..89ef81b65bcb 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -188,8 +188,8 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
if (new_page) {
get_page(new_page);
- page_add_new_anon_rmap(new_page, vma, addr, false);
mem_cgroup_commit_charge(new_page, memcg, false);
+ page_add_new_anon_rmap(new_page, vma, addr, false);
lru_cache_add_active_or_unevictable(new_page, vma);
} else
/* no new page, just dec_mm_counter for old_page */
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 46c2bc20b7cb..07c012d89570 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -640,8 +640,8 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
entry = mk_huge_pmd(page, vma->vm_page_prot);
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
- page_add_new_anon_rmap(page, vma, haddr, true);
mem_cgroup_commit_charge(page, memcg, false);
+ page_add_new_anon_rmap(page, vma, haddr, true);
lru_cache_add_active_or_unevictable(page, vma);
pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index e2be7f9a92db..be67ebe8a120 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1182,8 +1182,8 @@ static void collapse_huge_page(struct mm_struct *mm,
spin_lock(pmd_ptl);
BUG_ON(!pmd_none(*pmd));
- page_add_new_anon_rmap(new_page, vma, address, true);
mem_cgroup_commit_charge(new_page, memcg, false);
+ page_add_new_anon_rmap(new_page, vma, address, true);
count_memcg_events(memcg, THP_COLLAPSE_ALLOC, 1);
lru_cache_add_active_or_unevictable(new_page, vma);
pgtable_trans_huge_deposit(mm, pmd, pgtable);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c4c060ce1876..fccb396ed7bd 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -836,13 +836,6 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg,
struct page *page,
int nr_pages)
{
- /*
- * Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is
- * counted as CACHE even if it's on ANON LRU.
- */
- if (PageAnon(page))
- __mod_memcg_state(memcg, MEMCG_RSS, nr_pages);
-
if (abs(nr_pages) > 1) {
VM_BUG_ON_PAGE(!PageTransHuge(page), page);
__mod_memcg_state(memcg, MEMCG_RSS_HUGE, nr_pages);
@@ -1384,7 +1377,7 @@ static char *memory_stat_format(struct mem_cgroup *memcg)
*/
seq_buf_printf(&s, "anon %llu\n",
- (u64)memcg_page_state(memcg, MEMCG_RSS) *
+ (u64)memcg_page_state(memcg, NR_ANON_MAPPED) *
PAGE_SIZE);
seq_buf_printf(&s, "file %llu\n",
(u64)memcg_page_state(memcg, NR_FILE_PAGES) *
@@ -3298,7 +3291,7 @@ static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
if (mem_cgroup_is_root(memcg)) {
val = memcg_page_state(memcg, NR_FILE_PAGES) +
- memcg_page_state(memcg, MEMCG_RSS);
+ memcg_page_state(memcg, NR_ANON_MAPPED);
if (swap)
val += memcg_page_state(memcg, MEMCG_SWAP);
} else {
@@ -3769,7 +3762,7 @@ static int memcg_numa_stat_show(struct seq_file *m, void *v)
static const unsigned int memcg1_stats[] = {
NR_FILE_PAGES,
- MEMCG_RSS,
+ NR_ANON_MAPPED,
MEMCG_RSS_HUGE,
NR_SHMEM,
NR_FILE_MAPPED,
@@ -5399,7 +5392,12 @@ static int mem_cgroup_move_account(struct page *page,
lock_page_memcg(page);
- if (!PageAnon(page)) {
+ if (PageAnon(page)) {
+ if (page_mapped(page)) {
+ __mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages);
+ __mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages);
+ }
+ } else {
__mod_lruvec_state(from_vec, NR_FILE_PAGES, -nr_pages);
__mod_lruvec_state(to_vec, NR_FILE_PAGES, nr_pages);
@@ -6530,7 +6528,6 @@ void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
{
unsigned int nr_pages = hpage_nr_pages(page);
- VM_BUG_ON_PAGE(!page->mapping, page);
VM_BUG_ON_PAGE(PageLRU(page) && !lrucare, page);
if (mem_cgroup_disabled())
@@ -6603,8 +6600,6 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
struct mem_cgroup *memcg;
int ret;
- VM_BUG_ON_PAGE(!page->mapping, page);
-
ret = mem_cgroup_try_charge(page, mm, gfp_mask, &memcg);
if (ret)
return ret;
@@ -6616,7 +6611,6 @@ struct uncharge_gather {
struct mem_cgroup *memcg;
unsigned long nr_pages;
unsigned long pgpgout;
- unsigned long nr_anon;
unsigned long nr_kmem;
unsigned long nr_huge;
struct page *dummy_page;
@@ -6641,7 +6635,6 @@ static void uncharge_batch(const struct uncharge_gather *ug)
}
local_irq_save(flags);
- __mod_memcg_state(ug->memcg, MEMCG_RSS, -ug->nr_anon);
__mod_memcg_state(ug->memcg, MEMCG_RSS_HUGE, -ug->nr_huge);
__count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout);
__this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_pages);
@@ -6681,8 +6674,6 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
if (!PageKmemcg(page)) {
if (PageTransHuge(page))
ug->nr_huge += nr_pages;
- if (PageAnon(page))
- ug->nr_anon += nr_pages;
ug->pgpgout++;
} else {
ug->nr_kmem += nr_pages;
diff --git a/mm/memory.c b/mm/memory.c
index a08cbaa81607..46c3e5dc918d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2710,8 +2710,8 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
* thread doing COW.
*/
ptep_clear_flush_notify(vma, vmf->address, vmf->pte);
- page_add_new_anon_rmap(new_page, vma, vmf->address, false);
mem_cgroup_commit_charge(new_page, memcg, false);
+ page_add_new_anon_rmap(new_page, vma, vmf->address, false);
lru_cache_add_active_or_unevictable(new_page, vma);
/*
* We call the notify macro here because, when using secondary
@@ -3243,12 +3243,12 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
/* ksm created a completely new copy */
if (unlikely(page != swapcache && swapcache)) {
- page_add_new_anon_rmap(page, vma, vmf->address, false);
mem_cgroup_commit_charge(page, memcg, false);
+ page_add_new_anon_rmap(page, vma, vmf->address, false);
lru_cache_add_active_or_unevictable(page, vma);
} else {
- do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
mem_cgroup_commit_charge(page, memcg, true);
+ do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
activate_page(page);
}
@@ -3390,8 +3390,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
}
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
- page_add_new_anon_rmap(page, vma, vmf->address, false);
mem_cgroup_commit_charge(page, memcg, false);
+ page_add_new_anon_rmap(page, vma, vmf->address, false);
lru_cache_add_active_or_unevictable(page, vma);
setpte:
set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);
@@ -3652,8 +3652,8 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
/* copy-on-write page */
if (write && !(vma->vm_flags & VM_SHARED)) {
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
- page_add_new_anon_rmap(page, vma, vmf->address, false);
mem_cgroup_commit_charge(page, memcg, false);
+ page_add_new_anon_rmap(page, vma, vmf->address, false);
lru_cache_add_active_or_unevictable(page, vma);
} else {
inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
diff --git a/mm/migrate.c b/mm/migrate.c
index 3af5447e7aca..e84fb5b87a85 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2838,8 +2838,8 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
goto unlock_abort;
inc_mm_counter(mm, MM_ANONPAGES);
- page_add_new_anon_rmap(page, vma, addr, false);
mem_cgroup_commit_charge(page, memcg, false);
+ page_add_new_anon_rmap(page, vma, addr, false);
if (!is_zone_device_page(page))
lru_cache_add_active_or_unevictable(page, vma);
get_page(page);
diff --git a/mm/rmap.c b/mm/rmap.c
index 2126fd4a254b..e96f1d099c3f 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1114,6 +1114,11 @@ void do_page_add_anon_rmap(struct page *page,
bool compound = flags & RMAP_COMPOUND;
bool first;
+ if (unlikely(PageKsm(page)))
+ lock_page_memcg(page);
+ else
+ VM_BUG_ON_PAGE(!PageLocked(page), page);
+
if (compound) {
atomic_t *mapcount;
VM_BUG_ON_PAGE(!PageLocked(page), page);
@@ -1134,12 +1139,13 @@ void do_page_add_anon_rmap(struct page *page,
*/
if (compound)
__inc_node_page_state(page, NR_ANON_THPS);
- __mod_node_page_state(page_pgdat(page), NR_ANON_MAPPED, nr);
+ __mod_lruvec_page_state(page, NR_ANON_MAPPED, nr);
}
- if (unlikely(PageKsm(page)))
- return;
- VM_BUG_ON_PAGE(!PageLocked(page), page);
+ if (unlikely(PageKsm(page))) {
+ unlock_page_memcg(page);
+ return;
+ }
/* address might be in next vma when migration races vma_adjust */
if (first)
@@ -1181,7 +1187,7 @@ void page_add_new_anon_rmap(struct page *page,
/* increment count (starts at -1) */
atomic_set(&page->_mapcount, 0);
}
- __mod_node_page_state(page_pgdat(page), NR_ANON_MAPPED, nr);
+ __mod_lruvec_page_state(page, NR_ANON_MAPPED, nr);
__page_set_anon_rmap(page, vma, address, 1);
}
@@ -1230,13 +1236,12 @@ static void page_remove_file_rmap(struct page *page, bool compound)
int i, nr = 1;
VM_BUG_ON_PAGE(compound && !PageHead(page), page);
- lock_page_memcg(page);
/* Hugepages are not counted in NR_FILE_MAPPED for now. */
if (unlikely(PageHuge(page))) {
/* hugetlb pages are always mapped with pmds */
atomic_dec(compound_mapcount_ptr(page));
- goto out;
+ return;
}
/* page still mapped by someone else? */
@@ -1246,14 +1251,14 @@ static void page_remove_file_rmap(struct page *page, bool compound)
nr++;
}
if (!atomic_add_negative(-1, compound_mapcount_ptr(page)))
- goto out;
+ return;
if (PageSwapBacked(page))
__dec_node_page_state(page, NR_SHMEM_PMDMAPPED);
else
__dec_node_page_state(page, NR_FILE_PMDMAPPED);
} else {
if (!atomic_add_negative(-1, &page->_mapcount))
- goto out;
+ return;
}
/*
@@ -1265,8 +1270,6 @@ static void page_remove_file_rmap(struct page *page, bool compound)
if (unlikely(PageMlocked(page)))
clear_page_mlock(page);
-out:
- unlock_page_memcg(page);
}
static void page_remove_anon_compound_rmap(struct page *page)
@@ -1310,7 +1313,7 @@ static void page_remove_anon_compound_rmap(struct page *page)
clear_page_mlock(page);
if (nr)
- __mod_node_page_state(page_pgdat(page), NR_ANON_MAPPED, -nr);
+ __mod_lruvec_page_state(page, NR_ANON_MAPPED, -nr);
}
/**
@@ -1322,22 +1325,28 @@ static void page_remove_anon_compound_rmap(struct page *page)
*/
void page_remove_rmap(struct page *page, bool compound)
{
- if (!PageAnon(page))
- return page_remove_file_rmap(page, compound);
+ lock_page_memcg(page);
- if (compound)
- return page_remove_anon_compound_rmap(page);
+ if (!PageAnon(page)) {
+ page_remove_file_rmap(page, compound);
+ goto out;
+ }
+
+ if (compound) {
+ page_remove_anon_compound_rmap(page);
+ goto out;
+ }
/* page still mapped by someone else? */
if (!atomic_add_negative(-1, &page->_mapcount))
- return;
+ goto out;
/*
* We use the irq-unsafe __{inc|mod}_zone_page_stat because
* these counters are not modified in interrupt context, and
* pte lock(a spinlock) is held, which implies preemption disabled.
*/
- __dec_node_page_state(page, NR_ANON_MAPPED);
+ __dec_lruvec_page_state(page, NR_ANON_MAPPED);
if (unlikely(PageMlocked(page)))
clear_page_mlock(page);
@@ -1354,6 +1363,8 @@ void page_remove_rmap(struct page *page, bool compound)
* Leaving it set also helps swapoff to reinstate ptes
* faster for those pages still in swapcache.
*/
+out:
+ unlock_page_memcg(page);
}
/*
diff --git a/mm/swapfile.c b/mm/swapfile.c
index ad42eac1822d..45b937b924f5 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1886,11 +1886,11 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
set_pte_at(vma->vm_mm, addr, pte,
pte_mkold(mk_pte(page, vma->vm_page_prot)));
if (page == swapcache) {
- page_add_anon_rmap(page, vma, addr, false);
mem_cgroup_commit_charge(page, memcg, true);
+ page_add_anon_rmap(page, vma, addr, false);
} else { /* ksm created a completely new copy */
- page_add_new_anon_rmap(page, vma, addr, false);
mem_cgroup_commit_charge(page, memcg, false);
+ page_add_new_anon_rmap(page, vma, addr, false);
lru_cache_add_active_or_unevictable(page, vma);
}
swap_free(entry);
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index bb57d0a3fca7..3dea268d2850 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -123,8 +123,8 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
goto out_release_uncharge_unlock;
inc_mm_counter(dst_mm, MM_ANONPAGES);
- page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
mem_cgroup_commit_charge(page, memcg, false);
+ page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
lru_cache_add_active_or_unevictable(page, dst_vma);
set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
--
2.26.2
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 12/19] mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API
2020-05-08 18:30 [PATCH 00/19 V2] mm: memcontrol: charge swapin pages on instantiation Johannes Weiner
@ 2020-05-08 18:30 ` Johannes Weiner
2020-05-08 18:30 ` [PATCH 04/19] mm: memcontrol: move out cgroup swaprate throttling Johannes Weiner
` (9 subsequent siblings)
10 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2020-05-08 18:30 UTC (permalink / raw)
To: Andrew Morton
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins, Michal Hocko,
Kirill A. Shutemov, Roman Gushchin,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
With the page->mapping requirement gone from memcg, we can charge anon
and file-thp pages in one single step, right after they're allocated.
This removes two out of three API calls - especially the tricky commit
step that needed to happen at just the right time between when the
page is "set up" and when it's "published" - somewhat vague and fluid
concepts that varied by page type. All we need is a freshly allocated
page and a memcg context to charge.
v2: prevent double charges on pre-allocated hugepages in khugepaged
Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim-Hm3cg6mZ9cc@public.gmane.org>
---
include/linux/mm.h | 4 +---
kernel/events/uprobes.c | 11 +++--------
mm/filemap.c | 2 +-
mm/huge_memory.c | 9 +++------
mm/khugepaged.c | 35 ++++++++++-------------------------
mm/memory.c | 36 ++++++++++--------------------------
mm/migrate.c | 5 +----
mm/swapfile.c | 6 +-----
mm/userfaultfd.c | 5 +----
9 files changed, 31 insertions(+), 82 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bb8d3716bfe4..87a2c2b66d05 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -501,7 +501,6 @@ struct vm_fault {
pte_t orig_pte; /* Value of PTE at the time of fault */
struct page *cow_page; /* Page handler may use for COW fault */
- struct mem_cgroup *memcg; /* Cgroup cow_page belongs to */
struct page *page; /* ->fault handlers should return a
* page here, unless VM_FAULT_NOPAGE
* is set (which is also implied by
@@ -935,8 +934,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
return pte;
}
-vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
- struct page *page);
+vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct page *page);
vm_fault_t finish_fault(struct vm_fault *vmf);
vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
#endif
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 89ef81b65bcb..4253c153e985 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -162,14 +162,13 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
};
int err;
struct mmu_notifier_range range;
- struct mem_cgroup *memcg;
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, addr,
addr + PAGE_SIZE);
if (new_page) {
- err = mem_cgroup_try_charge(new_page, vma->vm_mm, GFP_KERNEL,
- &memcg);
+ err = mem_cgroup_charge(new_page, vma->vm_mm, GFP_KERNEL,
+ false);
if (err)
return err;
}
@@ -179,16 +178,12 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
mmu_notifier_invalidate_range_start(&range);
err = -EAGAIN;
- if (!page_vma_mapped_walk(&pvmw)) {
- if (new_page)
- mem_cgroup_cancel_charge(new_page, memcg);
+ if (!page_vma_mapped_walk(&pvmw))
goto unlock;
- }
VM_BUG_ON_PAGE(addr != pvmw.address, old_page);
if (new_page) {
get_page(new_page);
- mem_cgroup_commit_charge(new_page, memcg, false);
page_add_new_anon_rmap(new_page, vma, addr, false);
lru_cache_add_active_or_unevictable(new_page, vma);
} else
diff --git a/mm/filemap.c b/mm/filemap.c
index d5b6e3d7d402..fa47f160e1cc 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2638,7 +2638,7 @@ void filemap_map_pages(struct vm_fault *vmf,
if (vmf->pte)
vmf->pte += xas.xa_index - last_pgoff;
last_pgoff = xas.xa_index;
- if (alloc_set_pte(vmf, NULL, page))
+ if (alloc_set_pte(vmf, page))
goto unlock;
unlock_page(page);
goto next;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 74f8b4013203..d0f1e8cee93c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -587,19 +587,19 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
struct page *page, gfp_t gfp)
{
struct vm_area_struct *vma = vmf->vma;
- struct mem_cgroup *memcg;
pgtable_t pgtable;
unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
vm_fault_t ret = 0;
VM_BUG_ON_PAGE(!PageCompound(page), page);
- if (mem_cgroup_try_charge_delay(page, vma->vm_mm, gfp, &memcg)) {
+ if (mem_cgroup_charge(page, vma->vm_mm, gfp, false)) {
put_page(page);
count_vm_event(THP_FAULT_FALLBACK);
count_vm_event(THP_FAULT_FALLBACK_CHARGE);
return VM_FAULT_FALLBACK;
}
+ cgroup_throttle_swaprate(page, gfp);
pgtable = pte_alloc_one(vma->vm_mm);
if (unlikely(!pgtable)) {
@@ -630,7 +630,6 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
vm_fault_t ret2;
spin_unlock(vmf->ptl);
- mem_cgroup_cancel_charge(page, memcg);
put_page(page);
pte_free(vma->vm_mm, pgtable);
ret2 = handle_userfault(vmf, VM_UFFD_MISSING);
@@ -640,7 +639,6 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
entry = mk_huge_pmd(page, vma->vm_page_prot);
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
- mem_cgroup_commit_charge(page, memcg, false);
page_add_new_anon_rmap(page, vma, haddr, true);
lru_cache_add_active_or_unevictable(page, vma);
pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
@@ -649,7 +647,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
mm_inc_nr_ptes(vma->vm_mm);
spin_unlock(vmf->ptl);
count_vm_event(THP_FAULT_ALLOC);
- count_memcg_events(memcg, THP_FAULT_ALLOC, 1);
+ count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
}
return 0;
@@ -658,7 +656,6 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
release:
if (pgtable)
pte_free(vma->vm_mm, pgtable);
- mem_cgroup_cancel_charge(page, memcg);
put_page(page);
return ret;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index be67ebe8a120..34731e7c9a67 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1044,7 +1044,6 @@ static void collapse_huge_page(struct mm_struct *mm,
struct page *new_page;
spinlock_t *pmd_ptl, *pte_ptl;
int isolated = 0, result = 0;
- struct mem_cgroup *memcg;
struct vm_area_struct *vma;
struct mmu_notifier_range range;
gfp_t gfp;
@@ -1067,15 +1066,15 @@ static void collapse_huge_page(struct mm_struct *mm,
goto out_nolock;
}
- if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp, &memcg))) {
+ if (unlikely(mem_cgroup_charge(new_page, mm, gfp, false))) {
result = SCAN_CGROUP_CHARGE_FAIL;
goto out_nolock;
}
+ count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC);
down_read(&mm->mmap_sem);
result = hugepage_vma_revalidate(mm, address, &vma);
if (result) {
- mem_cgroup_cancel_charge(new_page, memcg);
up_read(&mm->mmap_sem);
goto out_nolock;
}
@@ -1083,7 +1082,6 @@ static void collapse_huge_page(struct mm_struct *mm,
pmd = mm_find_pmd(mm, address);
if (!pmd) {
result = SCAN_PMD_NULL;
- mem_cgroup_cancel_charge(new_page, memcg);
up_read(&mm->mmap_sem);
goto out_nolock;
}
@@ -1095,7 +1093,6 @@ static void collapse_huge_page(struct mm_struct *mm,
*/
if (unmapped && !__collapse_huge_page_swapin(mm, vma, address,
pmd, referenced)) {
- mem_cgroup_cancel_charge(new_page, memcg);
up_read(&mm->mmap_sem);
goto out_nolock;
}
@@ -1182,9 +1179,7 @@ static void collapse_huge_page(struct mm_struct *mm,
spin_lock(pmd_ptl);
BUG_ON(!pmd_none(*pmd));
- mem_cgroup_commit_charge(new_page, memcg, false);
page_add_new_anon_rmap(new_page, vma, address, true);
- count_memcg_events(memcg, THP_COLLAPSE_ALLOC, 1);
lru_cache_add_active_or_unevictable(new_page, vma);
pgtable_trans_huge_deposit(mm, pmd, pgtable);
set_pmd_at(mm, address, pmd, _pmd);
@@ -1198,10 +1193,11 @@ static void collapse_huge_page(struct mm_struct *mm,
out_up_write:
up_write(&mm->mmap_sem);
out_nolock:
+ if (*hpage)
+ mem_cgroup_uncharge(*hpage);
trace_mm_collapse_huge_page(mm, isolated, result);
return;
out:
- mem_cgroup_cancel_charge(new_page, memcg);
goto out_up_write;
}
@@ -1609,7 +1605,6 @@ static void collapse_file(struct mm_struct *mm,
struct address_space *mapping = file->f_mapping;
gfp_t gfp;
struct page *new_page;
- struct mem_cgroup *memcg;
pgoff_t index, end = start + HPAGE_PMD_NR;
LIST_HEAD(pagelist);
XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER);
@@ -1628,10 +1623,11 @@ static void collapse_file(struct mm_struct *mm,
goto out;
}
- if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp, &memcg))) {
+ if (unlikely(mem_cgroup_charge(new_page, mm, gfp, false))) {
result = SCAN_CGROUP_CHARGE_FAIL;
goto out;
}
+ count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC);
/* This will be less messy when we use multi-index entries */
do {
@@ -1641,7 +1637,6 @@ static void collapse_file(struct mm_struct *mm,
break;
xas_unlock_irq(&xas);
if (!xas_nomem(&xas, GFP_KERNEL)) {
- mem_cgroup_cancel_charge(new_page, memcg);
result = SCAN_FAIL;
goto out;
}
@@ -1834,18 +1829,9 @@ static void collapse_file(struct mm_struct *mm,
}
if (nr_none) {
- struct lruvec *lruvec;
- /*
- * XXX: We have started try_charge and pinned the
- * memcg, but the page isn't committed yet so we
- * cannot use mod_lruvec_page_state(). This hackery
- * will be cleaned up when remove the page->mapping
- * dependency from memcg and fully charge above.
- */
- lruvec = mem_cgroup_lruvec(memcg, page_pgdat(new_page));
- __mod_lruvec_state(lruvec, NR_FILE_PAGES, nr_none);
+ __mod_lruvec_page_state(new_page, NR_FILE_PAGES, nr_none);
if (is_shmem)
- __mod_lruvec_state(lruvec, NR_SHMEM, nr_none);
+ __mod_lruvec_page_state(new_page, NR_SHMEM, nr_none);
}
xa_locked:
@@ -1883,7 +1869,6 @@ static void collapse_file(struct mm_struct *mm,
SetPageUptodate(new_page);
page_ref_add(new_page, HPAGE_PMD_NR - 1);
- mem_cgroup_commit_charge(new_page, memcg, false);
if (is_shmem) {
set_page_dirty(new_page);
@@ -1891,7 +1876,6 @@ static void collapse_file(struct mm_struct *mm,
} else {
lru_cache_add_file(new_page);
}
- count_memcg_events(memcg, THP_COLLAPSE_ALLOC, 1);
/*
* Remove pte page tables, so we can re-fault the page as huge.
@@ -1938,13 +1922,14 @@ static void collapse_file(struct mm_struct *mm,
VM_BUG_ON(nr_none);
xas_unlock_irq(&xas);
- mem_cgroup_cancel_charge(new_page, memcg);
new_page->mapping = NULL;
}
unlock_page(new_page);
out:
VM_BUG_ON(!list_empty(&pagelist));
+ if (*hpage)
+ mem_cgroup_uncharge(*hpage);
/* TODO: tracepoints */
}
diff --git a/mm/memory.c b/mm/memory.c
index 46c3e5dc918d..832ee914cbcf 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2645,7 +2645,6 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
struct page *new_page = NULL;
pte_t entry;
int page_copied = 0;
- struct mem_cgroup *memcg;
struct mmu_notifier_range range;
if (unlikely(anon_vma_prepare(vma)))
@@ -2676,8 +2675,9 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
}
}
- if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg))
+ if (mem_cgroup_charge(new_page, mm, GFP_KERNEL, false))
goto oom_free_new;
+ cgroup_throttle_swaprate(new_page, GFP_KERNEL);
__SetPageUptodate(new_page);
@@ -2710,7 +2710,6 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
* thread doing COW.
*/
ptep_clear_flush_notify(vma, vmf->address, vmf->pte);
- mem_cgroup_commit_charge(new_page, memcg, false);
page_add_new_anon_rmap(new_page, vma, vmf->address, false);
lru_cache_add_active_or_unevictable(new_page, vma);
/*
@@ -2749,8 +2748,6 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
/* Free the old page.. */
new_page = old_page;
page_copied = 1;
- } else {
- mem_cgroup_cancel_charge(new_page, memcg);
}
if (new_page)
@@ -3088,7 +3085,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
struct page *page = NULL, *swapcache;
- struct mem_cgroup *memcg;
swp_entry_t entry;
pte_t pte;
int locked;
@@ -3193,10 +3189,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
goto out_page;
}
- if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, &memcg)) {
+ if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL, true)) {
ret = VM_FAULT_OOM;
goto out_page;
}
+ cgroup_throttle_swaprate(page, GFP_KERNEL);
/*
* Back out if somebody else already faulted in this pte.
@@ -3243,11 +3240,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
/* ksm created a completely new copy */
if (unlikely(page != swapcache && swapcache)) {
- mem_cgroup_commit_charge(page, memcg, false);
page_add_new_anon_rmap(page, vma, vmf->address, false);
lru_cache_add_active_or_unevictable(page, vma);
} else {
- mem_cgroup_commit_charge(page, memcg, true);
do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
activate_page(page);
}
@@ -3284,7 +3279,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
out:
return ret;
out_nomap:
- mem_cgroup_cancel_charge(page, memcg);
pte_unmap_unlock(vmf->pte, vmf->ptl);
out_page:
unlock_page(page);
@@ -3305,7 +3299,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
- struct mem_cgroup *memcg;
struct page *page;
vm_fault_t ret = 0;
pte_t entry;
@@ -3358,8 +3351,9 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
if (!page)
goto oom;
- if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, &memcg))
+ if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL, false))
goto oom_free_page;
+ cgroup_throttle_swaprate(page, GFP_KERNEL);
/*
* The memory barrier inside __SetPageUptodate makes sure that
@@ -3384,13 +3378,11 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
/* Deliver the page fault to userland, check inside PT lock */
if (userfaultfd_missing(vma)) {
pte_unmap_unlock(vmf->pte, vmf->ptl);
- mem_cgroup_cancel_charge(page, memcg);
put_page(page);
return handle_userfault(vmf, VM_UFFD_MISSING);
}
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
- mem_cgroup_commit_charge(page, memcg, false);
page_add_new_anon_rmap(page, vma, vmf->address, false);
lru_cache_add_active_or_unevictable(page, vma);
setpte:
@@ -3402,7 +3394,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
pte_unmap_unlock(vmf->pte, vmf->ptl);
return ret;
release:
- mem_cgroup_cancel_charge(page, memcg);
put_page(page);
goto unlock;
oom_free_page:
@@ -3607,7 +3598,6 @@ static vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
* mapping. If needed, the fucntion allocates page table or use pre-allocated.
*
* @vmf: fault environment
- * @memcg: memcg to charge page (only for private mappings)
* @page: page to map
*
* Caller must take care of unlocking vmf->ptl, if vmf->pte is non-NULL on
@@ -3618,8 +3608,7 @@ static vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
*
* Return: %0 on success, %VM_FAULT_ code in case of error.
*/
-vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
- struct page *page)
+vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct page *page)
{
struct vm_area_struct *vma = vmf->vma;
bool write = vmf->flags & FAULT_FLAG_WRITE;
@@ -3627,9 +3616,6 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
vm_fault_t ret;
if (pmd_none(*vmf->pmd) && PageTransCompound(page)) {
- /* THP on COW? */
- VM_BUG_ON_PAGE(memcg, page);
-
ret = do_set_pmd(vmf, page);
if (ret != VM_FAULT_FALLBACK)
return ret;
@@ -3652,7 +3638,6 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
/* copy-on-write page */
if (write && !(vma->vm_flags & VM_SHARED)) {
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
- mem_cgroup_commit_charge(page, memcg, false);
page_add_new_anon_rmap(page, vma, vmf->address, false);
lru_cache_add_active_or_unevictable(page, vma);
} else {
@@ -3702,7 +3687,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
if (!(vmf->vma->vm_flags & VM_SHARED))
ret = check_stable_address_space(vmf->vma->vm_mm);
if (!ret)
- ret = alloc_set_pte(vmf, vmf->memcg, page);
+ ret = alloc_set_pte(vmf, page);
if (vmf->pte)
pte_unmap_unlock(vmf->pte, vmf->ptl);
return ret;
@@ -3862,11 +3847,11 @@ static vm_fault_t do_cow_fault(struct vm_fault *vmf)
if (!vmf->cow_page)
return VM_FAULT_OOM;
- if (mem_cgroup_try_charge_delay(vmf->cow_page, vma->vm_mm,
- GFP_KERNEL, &vmf->memcg)) {
+ if (mem_cgroup_charge(vmf->cow_page, vma->vm_mm, GFP_KERNEL, false)) {
put_page(vmf->cow_page);
return VM_FAULT_OOM;
}
+ cgroup_throttle_swaprate(vmf->cow_page, GFP_KERNEL);
ret = __do_fault(vmf);
if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
@@ -3884,7 +3869,6 @@ static vm_fault_t do_cow_fault(struct vm_fault *vmf)
goto uncharge_out;
return ret;
uncharge_out:
- mem_cgroup_cancel_charge(vmf->cow_page, vmf->memcg);
put_page(vmf->cow_page);
return ret;
}
diff --git a/mm/migrate.c b/mm/migrate.c
index e84fb5b87a85..2028f08e3e8d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2746,7 +2746,6 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
{
struct vm_area_struct *vma = migrate->vma;
struct mm_struct *mm = vma->vm_mm;
- struct mem_cgroup *memcg;
bool flush = false;
spinlock_t *ptl;
pte_t entry;
@@ -2793,7 +2792,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
if (unlikely(anon_vma_prepare(vma)))
goto abort;
- if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, &memcg))
+ if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL, false))
goto abort;
/*
@@ -2838,7 +2837,6 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
goto unlock_abort;
inc_mm_counter(mm, MM_ANONPAGES);
- mem_cgroup_commit_charge(page, memcg, false);
page_add_new_anon_rmap(page, vma, addr, false);
if (!is_zone_device_page(page))
lru_cache_add_active_or_unevictable(page, vma);
@@ -2861,7 +2859,6 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
unlock_abort:
pte_unmap_unlock(ptep, ptl);
- mem_cgroup_cancel_charge(page, memcg);
abort:
*src &= ~MIGRATE_PFN_MIGRATE;
}
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 45b937b924f5..8c9b6767013b 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1858,7 +1858,6 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, swp_entry_t entry, struct page *page)
{
struct page *swapcache;
- struct mem_cgroup *memcg;
spinlock_t *ptl;
pte_t *pte;
int ret = 1;
@@ -1868,14 +1867,13 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
if (unlikely(!page))
return -ENOMEM;
- if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, &memcg)) {
+ if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL, true)) {
ret = -ENOMEM;
goto out_nolock;
}
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
if (unlikely(!pte_same_as_swp(*pte, swp_entry_to_pte(entry)))) {
- mem_cgroup_cancel_charge(page, memcg);
ret = 0;
goto out;
}
@@ -1886,10 +1884,8 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
set_pte_at(vma->vm_mm, addr, pte,
pte_mkold(mk_pte(page, vma->vm_page_prot)));
if (page == swapcache) {
- mem_cgroup_commit_charge(page, memcg, true);
page_add_anon_rmap(page, vma, addr, false);
} else { /* ksm created a completely new copy */
- mem_cgroup_commit_charge(page, memcg, false);
page_add_new_anon_rmap(page, vma, addr, false);
lru_cache_add_active_or_unevictable(page, vma);
}
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 3dea268d2850..2745489415cc 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -56,7 +56,6 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
struct page **pagep,
bool wp_copy)
{
- struct mem_cgroup *memcg;
pte_t _dst_pte, *dst_pte;
spinlock_t *ptl;
void *page_kaddr;
@@ -97,7 +96,7 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
__SetPageUptodate(page);
ret = -ENOMEM;
- if (mem_cgroup_try_charge(page, dst_mm, GFP_KERNEL, &memcg))
+ if (mem_cgroup_charge(page, dst_mm, GFP_KERNEL, false))
goto out_release;
_dst_pte = pte_mkdirty(mk_pte(page, dst_vma->vm_page_prot));
@@ -123,7 +122,6 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
goto out_release_uncharge_unlock;
inc_mm_counter(dst_mm, MM_ANONPAGES);
- mem_cgroup_commit_charge(page, memcg, false);
page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
lru_cache_add_active_or_unevictable(page, dst_vma);
@@ -138,7 +136,6 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
return ret;
out_release_uncharge_unlock:
pte_unmap_unlock(dst_pte, ptl);
- mem_cgroup_cancel_charge(page, memcg);
out_release:
put_page(page);
goto out;
--
2.26.2
^ permalink raw reply related [flat|nested] 58+ messages in thread* [PATCH 12/19] mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API
@ 2020-05-08 18:30 ` Johannes Weiner
0 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2020-05-08 18:30 UTC (permalink / raw)
To: Andrew Morton
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins, Michal Hocko,
Kirill A. Shutemov, Roman Gushchin, linux-mm, cgroups,
linux-kernel, kernel-team
With the page->mapping requirement gone from memcg, we can charge anon
and file-thp pages in one single step, right after they're allocated.
This removes two out of three API calls - especially the tricky commit
step that needed to happen at just the right time between when the
page is "set up" and when it's "published" - somewhat vague and fluid
concepts that varied by page type. All we need is a freshly allocated
page and a memcg context to charge.
v2: prevent double charges on pre-allocated hugepages in khugepaged
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
include/linux/mm.h | 4 +---
kernel/events/uprobes.c | 11 +++--------
mm/filemap.c | 2 +-
mm/huge_memory.c | 9 +++------
mm/khugepaged.c | 35 ++++++++++-------------------------
mm/memory.c | 36 ++++++++++--------------------------
mm/migrate.c | 5 +----
mm/swapfile.c | 6 +-----
mm/userfaultfd.c | 5 +----
9 files changed, 31 insertions(+), 82 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bb8d3716bfe4..87a2c2b66d05 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -501,7 +501,6 @@ struct vm_fault {
pte_t orig_pte; /* Value of PTE at the time of fault */
struct page *cow_page; /* Page handler may use for COW fault */
- struct mem_cgroup *memcg; /* Cgroup cow_page belongs to */
struct page *page; /* ->fault handlers should return a
* page here, unless VM_FAULT_NOPAGE
* is set (which is also implied by
@@ -935,8 +934,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
return pte;
}
-vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
- struct page *page);
+vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct page *page);
vm_fault_t finish_fault(struct vm_fault *vmf);
vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
#endif
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 89ef81b65bcb..4253c153e985 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -162,14 +162,13 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
};
int err;
struct mmu_notifier_range range;
- struct mem_cgroup *memcg;
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, addr,
addr + PAGE_SIZE);
if (new_page) {
- err = mem_cgroup_try_charge(new_page, vma->vm_mm, GFP_KERNEL,
- &memcg);
+ err = mem_cgroup_charge(new_page, vma->vm_mm, GFP_KERNEL,
+ false);
if (err)
return err;
}
@@ -179,16 +178,12 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
mmu_notifier_invalidate_range_start(&range);
err = -EAGAIN;
- if (!page_vma_mapped_walk(&pvmw)) {
- if (new_page)
- mem_cgroup_cancel_charge(new_page, memcg);
+ if (!page_vma_mapped_walk(&pvmw))
goto unlock;
- }
VM_BUG_ON_PAGE(addr != pvmw.address, old_page);
if (new_page) {
get_page(new_page);
- mem_cgroup_commit_charge(new_page, memcg, false);
page_add_new_anon_rmap(new_page, vma, addr, false);
lru_cache_add_active_or_unevictable(new_page, vma);
} else
diff --git a/mm/filemap.c b/mm/filemap.c
index d5b6e3d7d402..fa47f160e1cc 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2638,7 +2638,7 @@ void filemap_map_pages(struct vm_fault *vmf,
if (vmf->pte)
vmf->pte += xas.xa_index - last_pgoff;
last_pgoff = xas.xa_index;
- if (alloc_set_pte(vmf, NULL, page))
+ if (alloc_set_pte(vmf, page))
goto unlock;
unlock_page(page);
goto next;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 74f8b4013203..d0f1e8cee93c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -587,19 +587,19 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
struct page *page, gfp_t gfp)
{
struct vm_area_struct *vma = vmf->vma;
- struct mem_cgroup *memcg;
pgtable_t pgtable;
unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
vm_fault_t ret = 0;
VM_BUG_ON_PAGE(!PageCompound(page), page);
- if (mem_cgroup_try_charge_delay(page, vma->vm_mm, gfp, &memcg)) {
+ if (mem_cgroup_charge(page, vma->vm_mm, gfp, false)) {
put_page(page);
count_vm_event(THP_FAULT_FALLBACK);
count_vm_event(THP_FAULT_FALLBACK_CHARGE);
return VM_FAULT_FALLBACK;
}
+ cgroup_throttle_swaprate(page, gfp);
pgtable = pte_alloc_one(vma->vm_mm);
if (unlikely(!pgtable)) {
@@ -630,7 +630,6 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
vm_fault_t ret2;
spin_unlock(vmf->ptl);
- mem_cgroup_cancel_charge(page, memcg);
put_page(page);
pte_free(vma->vm_mm, pgtable);
ret2 = handle_userfault(vmf, VM_UFFD_MISSING);
@@ -640,7 +639,6 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
entry = mk_huge_pmd(page, vma->vm_page_prot);
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
- mem_cgroup_commit_charge(page, memcg, false);
page_add_new_anon_rmap(page, vma, haddr, true);
lru_cache_add_active_or_unevictable(page, vma);
pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
@@ -649,7 +647,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
mm_inc_nr_ptes(vma->vm_mm);
spin_unlock(vmf->ptl);
count_vm_event(THP_FAULT_ALLOC);
- count_memcg_events(memcg, THP_FAULT_ALLOC, 1);
+ count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
}
return 0;
@@ -658,7 +656,6 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
release:
if (pgtable)
pte_free(vma->vm_mm, pgtable);
- mem_cgroup_cancel_charge(page, memcg);
put_page(page);
return ret;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index be67ebe8a120..34731e7c9a67 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1044,7 +1044,6 @@ static void collapse_huge_page(struct mm_struct *mm,
struct page *new_page;
spinlock_t *pmd_ptl, *pte_ptl;
int isolated = 0, result = 0;
- struct mem_cgroup *memcg;
struct vm_area_struct *vma;
struct mmu_notifier_range range;
gfp_t gfp;
@@ -1067,15 +1066,15 @@ static void collapse_huge_page(struct mm_struct *mm,
goto out_nolock;
}
- if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp, &memcg))) {
+ if (unlikely(mem_cgroup_charge(new_page, mm, gfp, false))) {
result = SCAN_CGROUP_CHARGE_FAIL;
goto out_nolock;
}
+ count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC);
down_read(&mm->mmap_sem);
result = hugepage_vma_revalidate(mm, address, &vma);
if (result) {
- mem_cgroup_cancel_charge(new_page, memcg);
up_read(&mm->mmap_sem);
goto out_nolock;
}
@@ -1083,7 +1082,6 @@ static void collapse_huge_page(struct mm_struct *mm,
pmd = mm_find_pmd(mm, address);
if (!pmd) {
result = SCAN_PMD_NULL;
- mem_cgroup_cancel_charge(new_page, memcg);
up_read(&mm->mmap_sem);
goto out_nolock;
}
@@ -1095,7 +1093,6 @@ static void collapse_huge_page(struct mm_struct *mm,
*/
if (unmapped && !__collapse_huge_page_swapin(mm, vma, address,
pmd, referenced)) {
- mem_cgroup_cancel_charge(new_page, memcg);
up_read(&mm->mmap_sem);
goto out_nolock;
}
@@ -1182,9 +1179,7 @@ static void collapse_huge_page(struct mm_struct *mm,
spin_lock(pmd_ptl);
BUG_ON(!pmd_none(*pmd));
- mem_cgroup_commit_charge(new_page, memcg, false);
page_add_new_anon_rmap(new_page, vma, address, true);
- count_memcg_events(memcg, THP_COLLAPSE_ALLOC, 1);
lru_cache_add_active_or_unevictable(new_page, vma);
pgtable_trans_huge_deposit(mm, pmd, pgtable);
set_pmd_at(mm, address, pmd, _pmd);
@@ -1198,10 +1193,11 @@ static void collapse_huge_page(struct mm_struct *mm,
out_up_write:
up_write(&mm->mmap_sem);
out_nolock:
+ if (*hpage)
+ mem_cgroup_uncharge(*hpage);
trace_mm_collapse_huge_page(mm, isolated, result);
return;
out:
- mem_cgroup_cancel_charge(new_page, memcg);
goto out_up_write;
}
@@ -1609,7 +1605,6 @@ static void collapse_file(struct mm_struct *mm,
struct address_space *mapping = file->f_mapping;
gfp_t gfp;
struct page *new_page;
- struct mem_cgroup *memcg;
pgoff_t index, end = start + HPAGE_PMD_NR;
LIST_HEAD(pagelist);
XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER);
@@ -1628,10 +1623,11 @@ static void collapse_file(struct mm_struct *mm,
goto out;
}
- if (unlikely(mem_cgroup_try_charge(new_page, mm, gfp, &memcg))) {
+ if (unlikely(mem_cgroup_charge(new_page, mm, gfp, false))) {
result = SCAN_CGROUP_CHARGE_FAIL;
goto out;
}
+ count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC);
/* This will be less messy when we use multi-index entries */
do {
@@ -1641,7 +1637,6 @@ static void collapse_file(struct mm_struct *mm,
break;
xas_unlock_irq(&xas);
if (!xas_nomem(&xas, GFP_KERNEL)) {
- mem_cgroup_cancel_charge(new_page, memcg);
result = SCAN_FAIL;
goto out;
}
@@ -1834,18 +1829,9 @@ static void collapse_file(struct mm_struct *mm,
}
if (nr_none) {
- struct lruvec *lruvec;
- /*
- * XXX: We have started try_charge and pinned the
- * memcg, but the page isn't committed yet so we
- * cannot use mod_lruvec_page_state(). This hackery
- * will be cleaned up when remove the page->mapping
- * dependency from memcg and fully charge above.
- */
- lruvec = mem_cgroup_lruvec(memcg, page_pgdat(new_page));
- __mod_lruvec_state(lruvec, NR_FILE_PAGES, nr_none);
+ __mod_lruvec_page_state(new_page, NR_FILE_PAGES, nr_none);
if (is_shmem)
- __mod_lruvec_state(lruvec, NR_SHMEM, nr_none);
+ __mod_lruvec_page_state(new_page, NR_SHMEM, nr_none);
}
xa_locked:
@@ -1883,7 +1869,6 @@ static void collapse_file(struct mm_struct *mm,
SetPageUptodate(new_page);
page_ref_add(new_page, HPAGE_PMD_NR - 1);
- mem_cgroup_commit_charge(new_page, memcg, false);
if (is_shmem) {
set_page_dirty(new_page);
@@ -1891,7 +1876,6 @@ static void collapse_file(struct mm_struct *mm,
} else {
lru_cache_add_file(new_page);
}
- count_memcg_events(memcg, THP_COLLAPSE_ALLOC, 1);
/*
* Remove pte page tables, so we can re-fault the page as huge.
@@ -1938,13 +1922,14 @@ static void collapse_file(struct mm_struct *mm,
VM_BUG_ON(nr_none);
xas_unlock_irq(&xas);
- mem_cgroup_cancel_charge(new_page, memcg);
new_page->mapping = NULL;
}
unlock_page(new_page);
out:
VM_BUG_ON(!list_empty(&pagelist));
+ if (*hpage)
+ mem_cgroup_uncharge(*hpage);
/* TODO: tracepoints */
}
diff --git a/mm/memory.c b/mm/memory.c
index 46c3e5dc918d..832ee914cbcf 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2645,7 +2645,6 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
struct page *new_page = NULL;
pte_t entry;
int page_copied = 0;
- struct mem_cgroup *memcg;
struct mmu_notifier_range range;
if (unlikely(anon_vma_prepare(vma)))
@@ -2676,8 +2675,9 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
}
}
- if (mem_cgroup_try_charge_delay(new_page, mm, GFP_KERNEL, &memcg))
+ if (mem_cgroup_charge(new_page, mm, GFP_KERNEL, false))
goto oom_free_new;
+ cgroup_throttle_swaprate(new_page, GFP_KERNEL);
__SetPageUptodate(new_page);
@@ -2710,7 +2710,6 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
* thread doing COW.
*/
ptep_clear_flush_notify(vma, vmf->address, vmf->pte);
- mem_cgroup_commit_charge(new_page, memcg, false);
page_add_new_anon_rmap(new_page, vma, vmf->address, false);
lru_cache_add_active_or_unevictable(new_page, vma);
/*
@@ -2749,8 +2748,6 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
/* Free the old page.. */
new_page = old_page;
page_copied = 1;
- } else {
- mem_cgroup_cancel_charge(new_page, memcg);
}
if (new_page)
@@ -3088,7 +3085,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
struct page *page = NULL, *swapcache;
- struct mem_cgroup *memcg;
swp_entry_t entry;
pte_t pte;
int locked;
@@ -3193,10 +3189,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
goto out_page;
}
- if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, &memcg)) {
+ if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL, true)) {
ret = VM_FAULT_OOM;
goto out_page;
}
+ cgroup_throttle_swaprate(page, GFP_KERNEL);
/*
* Back out if somebody else already faulted in this pte.
@@ -3243,11 +3240,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
/* ksm created a completely new copy */
if (unlikely(page != swapcache && swapcache)) {
- mem_cgroup_commit_charge(page, memcg, false);
page_add_new_anon_rmap(page, vma, vmf->address, false);
lru_cache_add_active_or_unevictable(page, vma);
} else {
- mem_cgroup_commit_charge(page, memcg, true);
do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
activate_page(page);
}
@@ -3284,7 +3279,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
out:
return ret;
out_nomap:
- mem_cgroup_cancel_charge(page, memcg);
pte_unmap_unlock(vmf->pte, vmf->ptl);
out_page:
unlock_page(page);
@@ -3305,7 +3299,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
- struct mem_cgroup *memcg;
struct page *page;
vm_fault_t ret = 0;
pte_t entry;
@@ -3358,8 +3351,9 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
if (!page)
goto oom;
- if (mem_cgroup_try_charge_delay(page, vma->vm_mm, GFP_KERNEL, &memcg))
+ if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL, false))
goto oom_free_page;
+ cgroup_throttle_swaprate(page, GFP_KERNEL);
/*
* The memory barrier inside __SetPageUptodate makes sure that
@@ -3384,13 +3378,11 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
/* Deliver the page fault to userland, check inside PT lock */
if (userfaultfd_missing(vma)) {
pte_unmap_unlock(vmf->pte, vmf->ptl);
- mem_cgroup_cancel_charge(page, memcg);
put_page(page);
return handle_userfault(vmf, VM_UFFD_MISSING);
}
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
- mem_cgroup_commit_charge(page, memcg, false);
page_add_new_anon_rmap(page, vma, vmf->address, false);
lru_cache_add_active_or_unevictable(page, vma);
setpte:
@@ -3402,7 +3394,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
pte_unmap_unlock(vmf->pte, vmf->ptl);
return ret;
release:
- mem_cgroup_cancel_charge(page, memcg);
put_page(page);
goto unlock;
oom_free_page:
@@ -3607,7 +3598,6 @@ static vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
* mapping. If needed, the fucntion allocates page table or use pre-allocated.
*
* @vmf: fault environment
- * @memcg: memcg to charge page (only for private mappings)
* @page: page to map
*
* Caller must take care of unlocking vmf->ptl, if vmf->pte is non-NULL on
@@ -3618,8 +3608,7 @@ static vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
*
* Return: %0 on success, %VM_FAULT_ code in case of error.
*/
-vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
- struct page *page)
+vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct page *page)
{
struct vm_area_struct *vma = vmf->vma;
bool write = vmf->flags & FAULT_FLAG_WRITE;
@@ -3627,9 +3616,6 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
vm_fault_t ret;
if (pmd_none(*vmf->pmd) && PageTransCompound(page)) {
- /* THP on COW? */
- VM_BUG_ON_PAGE(memcg, page);
-
ret = do_set_pmd(vmf, page);
if (ret != VM_FAULT_FALLBACK)
return ret;
@@ -3652,7 +3638,6 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
/* copy-on-write page */
if (write && !(vma->vm_flags & VM_SHARED)) {
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
- mem_cgroup_commit_charge(page, memcg, false);
page_add_new_anon_rmap(page, vma, vmf->address, false);
lru_cache_add_active_or_unevictable(page, vma);
} else {
@@ -3702,7 +3687,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
if (!(vmf->vma->vm_flags & VM_SHARED))
ret = check_stable_address_space(vmf->vma->vm_mm);
if (!ret)
- ret = alloc_set_pte(vmf, vmf->memcg, page);
+ ret = alloc_set_pte(vmf, page);
if (vmf->pte)
pte_unmap_unlock(vmf->pte, vmf->ptl);
return ret;
@@ -3862,11 +3847,11 @@ static vm_fault_t do_cow_fault(struct vm_fault *vmf)
if (!vmf->cow_page)
return VM_FAULT_OOM;
- if (mem_cgroup_try_charge_delay(vmf->cow_page, vma->vm_mm,
- GFP_KERNEL, &vmf->memcg)) {
+ if (mem_cgroup_charge(vmf->cow_page, vma->vm_mm, GFP_KERNEL, false)) {
put_page(vmf->cow_page);
return VM_FAULT_OOM;
}
+ cgroup_throttle_swaprate(vmf->cow_page, GFP_KERNEL);
ret = __do_fault(vmf);
if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE | VM_FAULT_RETRY)))
@@ -3884,7 +3869,6 @@ static vm_fault_t do_cow_fault(struct vm_fault *vmf)
goto uncharge_out;
return ret;
uncharge_out:
- mem_cgroup_cancel_charge(vmf->cow_page, vmf->memcg);
put_page(vmf->cow_page);
return ret;
}
diff --git a/mm/migrate.c b/mm/migrate.c
index e84fb5b87a85..2028f08e3e8d 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2746,7 +2746,6 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
{
struct vm_area_struct *vma = migrate->vma;
struct mm_struct *mm = vma->vm_mm;
- struct mem_cgroup *memcg;
bool flush = false;
spinlock_t *ptl;
pte_t entry;
@@ -2793,7 +2792,7 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
if (unlikely(anon_vma_prepare(vma)))
goto abort;
- if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, &memcg))
+ if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL, false))
goto abort;
/*
@@ -2838,7 +2837,6 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
goto unlock_abort;
inc_mm_counter(mm, MM_ANONPAGES);
- mem_cgroup_commit_charge(page, memcg, false);
page_add_new_anon_rmap(page, vma, addr, false);
if (!is_zone_device_page(page))
lru_cache_add_active_or_unevictable(page, vma);
@@ -2861,7 +2859,6 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
unlock_abort:
pte_unmap_unlock(ptep, ptl);
- mem_cgroup_cancel_charge(page, memcg);
abort:
*src &= ~MIGRATE_PFN_MIGRATE;
}
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 45b937b924f5..8c9b6767013b 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1858,7 +1858,6 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, swp_entry_t entry, struct page *page)
{
struct page *swapcache;
- struct mem_cgroup *memcg;
spinlock_t *ptl;
pte_t *pte;
int ret = 1;
@@ -1868,14 +1867,13 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
if (unlikely(!page))
return -ENOMEM;
- if (mem_cgroup_try_charge(page, vma->vm_mm, GFP_KERNEL, &memcg)) {
+ if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL, true)) {
ret = -ENOMEM;
goto out_nolock;
}
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
if (unlikely(!pte_same_as_swp(*pte, swp_entry_to_pte(entry)))) {
- mem_cgroup_cancel_charge(page, memcg);
ret = 0;
goto out;
}
@@ -1886,10 +1884,8 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
set_pte_at(vma->vm_mm, addr, pte,
pte_mkold(mk_pte(page, vma->vm_page_prot)));
if (page == swapcache) {
- mem_cgroup_commit_charge(page, memcg, true);
page_add_anon_rmap(page, vma, addr, false);
} else { /* ksm created a completely new copy */
- mem_cgroup_commit_charge(page, memcg, false);
page_add_new_anon_rmap(page, vma, addr, false);
lru_cache_add_active_or_unevictable(page, vma);
}
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 3dea268d2850..2745489415cc 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -56,7 +56,6 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
struct page **pagep,
bool wp_copy)
{
- struct mem_cgroup *memcg;
pte_t _dst_pte, *dst_pte;
spinlock_t *ptl;
void *page_kaddr;
@@ -97,7 +96,7 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
__SetPageUptodate(page);
ret = -ENOMEM;
- if (mem_cgroup_try_charge(page, dst_mm, GFP_KERNEL, &memcg))
+ if (mem_cgroup_charge(page, dst_mm, GFP_KERNEL, false))
goto out_release;
_dst_pte = pte_mkdirty(mk_pte(page, dst_vma->vm_page_prot));
@@ -123,7 +122,6 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
goto out_release_uncharge_unlock;
inc_mm_counter(dst_mm, MM_ANONPAGES);
- mem_cgroup_commit_charge(page, memcg, false);
page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
lru_cache_add_active_or_unevictable(page, dst_vma);
@@ -138,7 +136,6 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm,
return ret;
out_release_uncharge_unlock:
pte_unmap_unlock(dst_pte, ptl);
- mem_cgroup_cancel_charge(page, memcg);
out_release:
put_page(page);
goto out;
--
2.26.2
^ permalink raw reply related [flat|nested] 58+ messages in thread[parent not found: <20200508183105.225460-13-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>]
* Re: [PATCH 12/19] mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API
2020-05-08 18:30 ` Johannes Weiner
@ 2020-05-12 14:38 ` Qian Cai
-1 siblings, 0 replies; 58+ messages in thread
From: Qian Cai @ 2020-05-12 14:38 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins,
Michal Hocko, Kirill A. Shutemov, Roman Gushchin, Linux-MM,
cgroups-u79uwXL29TY76Z2rM5mHXA, LKML, kernel-team-b10kYP2dOMg
> On May 8, 2020, at 2:30 PM, Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
>
> With the page->mapping requirement gone from memcg, we can charge anon
> and file-thp pages in one single step, right after they're allocated.
>
> This removes two out of three API calls - especially the tricky commit
> step that needed to happen at just the right time between when the
> page is "set up" and when it's "published" - somewhat vague and fluid
> concepts that varied by page type. All we need is a freshly allocated
> page and a memcg context to charge.
>
> v2: prevent double charges on pre-allocated hugepages in khugepaged
>
> Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim-Hm3cg6mZ9cc@public.gmane.org>
> ---
> include/linux/mm.h | 4 +---
> kernel/events/uprobes.c | 11 +++--------
> mm/filemap.c | 2 +-
> mm/huge_memory.c | 9 +++------
> mm/khugepaged.c | 35 ++++++++++-------------------------
> mm/memory.c | 36 ++++++++++--------------------------
> mm/migrate.c | 5 +----
> mm/swapfile.c | 6 +-----
> mm/userfaultfd.c | 5 +----
> 9 files changed, 31 insertions(+), 82 deletions(-)
[]
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>
> @@ -1198,10 +1193,11 @@ static void collapse_huge_page(struct mm_struct *mm,
> out_up_write:
> up_write(&mm->mmap_sem);
> out_nolock:
> + if (*hpage)
> + mem_cgroup_uncharge(*hpage);
> trace_mm_collapse_huge_page(mm, isolated, result);
> return;
> out:
> - mem_cgroup_cancel_charge(new_page, memcg);
> goto out_up_write;
> }
[]
Some memory pressure will crash this new code. It looks like somewhat racy.
if (!page->mem_cgroup)
where page == NULL in mem_cgroup_uncharge().
[ 2244.414421][ T726] BUG: Kernel NULL pointer dereference on read at 0x0000002c
[ 2244.414454][ T726] Faulting instruction address: 0xc0000000004f7e44
[ 2244.414467][ T726] Oops: Kernel access of bad area, sig: 11 [#1]
[ 2244.414488][ T726] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA PowerNV
[ 2244.414501][ T726] Modules linked in: brd ext4 crc16 mbcache jbd2 loop kvm_hv kvm ip_tables x_tables xfs sd_mod bnx2x ahci tg3 libahci libphy mdio libata firmware_class dm_mirror dm_region_hash dm_log dm_mod
[ 2244.414556][ T726] CPU: 11 PID: 726 Comm: khugepaged Not tainted 5.7.0-rc5-next-20200512+ #8
[ 2244.414579][ T726] NIP: c0000000004f7e44 LR: c0000000004df95c CTR: c0000000001c1400
[ 2244.414600][ T726] REGS: c000001a2398f6e0 TRAP: 0300 Not tainted (5.7.0-rc5-next-20200512+)
[ 2244.414630][ T726] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24000244 XER: 20040000
[ 2244.414656][ T726] CFAR: c0000000004df958 DAR: 000000000000002c DSISR: 40000000 IRQMASK: 0
[ 2244.414656][ T726] GPR00: c0000000004df95c c000001a2398f970 c00000000168a700 fffffffffffffff4
[ 2244.414656][ T726] GPR04: ffffffffffffffff c000000000bd0980 0000000000000005 0000000000000080
[ 2244.414656][ T726] GPR08: 0000001ffc030000 0000000000000001 0000000000000000 c00000000152bb58
[ 2244.414656][ T726] GPR12: 0000000024000222 c000001fffff5680 c0000001d818ce00 c0000001d818cd00
[ 2244.414656][ T726] GPR16: 0000000000000000 c000001a2398fce0 fe7fffffffffefff fffffffffffffe7f
[ 2244.414656][ T726] GPR20: c000201320aa53c8 000000000000001e 0000000000000017 c00020047636b868
[ 2244.414656][ T726] GPR24: 0000000000000000 0000000000000000 c000000001756080 c000001a2398fce0
[ 2244.414656][ T726] GPR28: c000001a2398fa20 00007ffeeda00000 c000200f28547928 c000200f28547880
[ 2244.414865][ T726] NIP [c0000000004f7e44] mem_cgroup_uncharge+0x34/0xb0
mem_cgroup_uncharge at mm/memcontrol.c:6563
[ 2244.414895][ T726] LR [c0000000004df95c] collapse_huge_page+0x24c/0x1000
collapse_huge_page at mm/khugepaged.c:1197
[ 2244.414924][ T726] Call Trace:
[ 2244.414940][ T726] [c000001a2398f970] [0000000000000001] 0x1 (unreliable)
[ 2244.414970][ T726] [c000001a2398f9c0] [c0000000004df814] collapse_huge_page+0x104/0x1000
collapse_huge_page at mm/khugepaged.c:1064 (discriminator 10)
[ 2244.414991][ T726] [c000001a2398faf0] [c0000000004e0f84] khugepaged_scan_pmd+0x874/0xc70
[ 2244.415021][ T726] [c000001a2398fbf0] [c0000000004e2a90] khugepaged+0x900/0x1920
[ 2244.415043][ T726] [c000001a2398fdb0] [c000000000155aa4] kthread+0x1c4/0x1d0
[ 2244.415075][ T726] [c000001a2398fe20] [c00000000000cb28] ret_from_kernel_thread+0x5c/0x74
[ 2244.415095][ T726] Instruction dump:
[ 2244.415113][ T726] 384228f0 7c0802a6 60000000 f821ffb1 e92d0c70 f9210048 39200000 3d22ffec
[ 2244.415146][ T726] 3929f9f4 81290000 2f890000 409d0048 <e9230038> 2fa90000 419e003c 7c0802a6
[ 2244.415181][ T726] ---[ end trace 3488eb8818913a26 ]---
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 12/19] mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API
@ 2020-05-12 14:38 ` Qian Cai
0 siblings, 0 replies; 58+ messages in thread
From: Qian Cai @ 2020-05-12 14:38 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins,
Michal Hocko, Kirill A. Shutemov, Roman Gushchin, Linux-MM,
cgroups, LKML, kernel-team
> On May 8, 2020, at 2:30 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> With the page->mapping requirement gone from memcg, we can charge anon
> and file-thp pages in one single step, right after they're allocated.
>
> This removes two out of three API calls - especially the tricky commit
> step that needed to happen at just the right time between when the
> page is "set up" and when it's "published" - somewhat vague and fluid
> concepts that varied by page type. All we need is a freshly allocated
> page and a memcg context to charge.
>
> v2: prevent double charges on pre-allocated hugepages in khugepaged
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> ---
> include/linux/mm.h | 4 +---
> kernel/events/uprobes.c | 11 +++--------
> mm/filemap.c | 2 +-
> mm/huge_memory.c | 9 +++------
> mm/khugepaged.c | 35 ++++++++++-------------------------
> mm/memory.c | 36 ++++++++++--------------------------
> mm/migrate.c | 5 +----
> mm/swapfile.c | 6 +-----
> mm/userfaultfd.c | 5 +----
> 9 files changed, 31 insertions(+), 82 deletions(-)
[]
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>
> @@ -1198,10 +1193,11 @@ static void collapse_huge_page(struct mm_struct *mm,
> out_up_write:
> up_write(&mm->mmap_sem);
> out_nolock:
> + if (*hpage)
> + mem_cgroup_uncharge(*hpage);
> trace_mm_collapse_huge_page(mm, isolated, result);
> return;
> out:
> - mem_cgroup_cancel_charge(new_page, memcg);
> goto out_up_write;
> }
[]
Some memory pressure will crash this new code. It looks like somewhat racy.
if (!page->mem_cgroup)
where page == NULL in mem_cgroup_uncharge().
[ 2244.414421][ T726] BUG: Kernel NULL pointer dereference on read at 0x0000002c
[ 2244.414454][ T726] Faulting instruction address: 0xc0000000004f7e44
[ 2244.414467][ T726] Oops: Kernel access of bad area, sig: 11 [#1]
[ 2244.414488][ T726] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA PowerNV
[ 2244.414501][ T726] Modules linked in: brd ext4 crc16 mbcache jbd2 loop kvm_hv kvm ip_tables x_tables xfs sd_mod bnx2x ahci tg3 libahci libphy mdio libata firmware_class dm_mirror dm_region_hash dm_log dm_mod
[ 2244.414556][ T726] CPU: 11 PID: 726 Comm: khugepaged Not tainted 5.7.0-rc5-next-20200512+ #8
[ 2244.414579][ T726] NIP: c0000000004f7e44 LR: c0000000004df95c CTR: c0000000001c1400
[ 2244.414600][ T726] REGS: c000001a2398f6e0 TRAP: 0300 Not tainted (5.7.0-rc5-next-20200512+)
[ 2244.414630][ T726] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24000244 XER: 20040000
[ 2244.414656][ T726] CFAR: c0000000004df958 DAR: 000000000000002c DSISR: 40000000 IRQMASK: 0
[ 2244.414656][ T726] GPR00: c0000000004df95c c000001a2398f970 c00000000168a700 fffffffffffffff4
[ 2244.414656][ T726] GPR04: ffffffffffffffff c000000000bd0980 0000000000000005 0000000000000080
[ 2244.414656][ T726] GPR08: 0000001ffc030000 0000000000000001 0000000000000000 c00000000152bb58
[ 2244.414656][ T726] GPR12: 0000000024000222 c000001fffff5680 c0000001d818ce00 c0000001d818cd00
[ 2244.414656][ T726] GPR16: 0000000000000000 c000001a2398fce0 fe7fffffffffefff fffffffffffffe7f
[ 2244.414656][ T726] GPR20: c000201320aa53c8 000000000000001e 0000000000000017 c00020047636b868
[ 2244.414656][ T726] GPR24: 0000000000000000 0000000000000000 c000000001756080 c000001a2398fce0
[ 2244.414656][ T726] GPR28: c000001a2398fa20 00007ffeeda00000 c000200f28547928 c000200f28547880
[ 2244.414865][ T726] NIP [c0000000004f7e44] mem_cgroup_uncharge+0x34/0xb0
mem_cgroup_uncharge at mm/memcontrol.c:6563
[ 2244.414895][ T726] LR [c0000000004df95c] collapse_huge_page+0x24c/0x1000
collapse_huge_page at mm/khugepaged.c:1197
[ 2244.414924][ T726] Call Trace:
[ 2244.414940][ T726] [c000001a2398f970] [0000000000000001] 0x1 (unreliable)
[ 2244.414970][ T726] [c000001a2398f9c0] [c0000000004df814] collapse_huge_page+0x104/0x1000
collapse_huge_page at mm/khugepaged.c:1064 (discriminator 10)
[ 2244.414991][ T726] [c000001a2398faf0] [c0000000004e0f84] khugepaged_scan_pmd+0x874/0xc70
[ 2244.415021][ T726] [c000001a2398fbf0] [c0000000004e2a90] khugepaged+0x900/0x1920
[ 2244.415043][ T726] [c000001a2398fdb0] [c000000000155aa4] kthread+0x1c4/0x1d0
[ 2244.415075][ T726] [c000001a2398fe20] [c00000000000cb28] ret_from_kernel_thread+0x5c/0x74
[ 2244.415095][ T726] Instruction dump:
[ 2244.415113][ T726] 384228f0 7c0802a6 60000000 f821ffb1 e92d0c70 f9210048 39200000 3d22ffec
[ 2244.415146][ T726] 3929f9f4 81290000 2f890000 409d0048 <e9230038> 2fa90000 419e003c 7c0802a6
[ 2244.415181][ T726] ---[ end trace 3488eb8818913a26 ]---
^ permalink raw reply [flat|nested] 58+ messages in thread
[parent not found: <45AA36A9-0C4D-49C2-BA3C-08753BBC30FB-J5quhbR+WMc@public.gmane.org>]
* Re: [PATCH 12/19] mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API
2020-05-12 14:38 ` Qian Cai
@ 2020-05-12 17:11 ` Qian Cai
-1 siblings, 0 replies; 58+ messages in thread
From: Qian Cai @ 2020-05-12 17:11 UTC (permalink / raw)
To: Johannes Weiner, Stephen Rothwell
Cc: Andrew Morton, Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins,
Michal Hocko, Kirill A. Shutemov, Roman Gushchin, Linux-MM,
cgroups-u79uwXL29TY76Z2rM5mHXA, LKML, kernel-team-b10kYP2dOMg
> On May 12, 2020, at 10:38 AM, Qian Cai <cai-J5quhbR+WMc@public.gmane.org> wrote:
>
>
>
>> On May 8, 2020, at 2:30 PM, Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> wrote:
>>
>> With the page->mapping requirement gone from memcg, we can charge anon
>> and file-thp pages in one single step, right after they're allocated.
>>
>> This removes two out of three API calls - especially the tricky commit
>> step that needed to happen at just the right time between when the
>> page is "set up" and when it's "published" - somewhat vague and fluid
>> concepts that varied by page type. All we need is a freshly allocated
>> page and a memcg context to charge.
>>
>> v2: prevent double charges on pre-allocated hugepages in khugepaged
>>
>> Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
>> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim-Hm3cg6mZ9cc@public.gmane.org>
>> ---
>> include/linux/mm.h | 4 +---
>> kernel/events/uprobes.c | 11 +++--------
>> mm/filemap.c | 2 +-
>> mm/huge_memory.c | 9 +++------
>> mm/khugepaged.c | 35 ++++++++++-------------------------
>> mm/memory.c | 36 ++++++++++--------------------------
>> mm/migrate.c | 5 +----
>> mm/swapfile.c | 6 +-----
>> mm/userfaultfd.c | 5 +----
>> 9 files changed, 31 insertions(+), 82 deletions(-)
> []
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>
>> @@ -1198,10 +1193,11 @@ static void collapse_huge_page(struct mm_struct *mm,
>> out_up_write:
>> up_write(&mm->mmap_sem);
>> out_nolock:
>> + if (*hpage)
>> + mem_cgroup_uncharge(*hpage);
>> trace_mm_collapse_huge_page(mm, isolated, result);
>> return;
>> out:
>> - mem_cgroup_cancel_charge(new_page, memcg);
>> goto out_up_write;
>> }
> []
>
> Some memory pressure will crash this new code. It looks like somewhat racy.
Reverted the whole series fixed the crash, i.e.,
git revert --no-edit 6070efb8e52b..c986ddf58a95
There is a minor conflict during reverting due to another linux-next commit,
2a6b525f0de1 (“khugepaged: do not stop collapse if less than half PTEs are referenced”)
which is trivial to resolve,
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@@ -1091,8 -1000,8 +1093,9 @@@ static void collapse_huge_page(struct m
* If it fails, we release mmap_sem and jump out_nolock.
* Continuing to collapse causes inconsistency.
*/
- if (!__collapse_huge_page_swapin(mm, vma, address, pmd, referenced)) {
+ if (unmapped && !__collapse_huge_page_swapin(mm, vma, address,
+ pmd, referenced)) {
+ mem_cgroup_cancel_charge(new_page, memcg, true);
up_read(&mm->mmap_sem);
goto out_nolock;
}
>
> if (!page->mem_cgroup)
>
> where page == NULL in mem_cgroup_uncharge().
>
> [ 2244.414421][ T726] BUG: Kernel NULL pointer dereference on read at 0x0000002c
> [ 2244.414454][ T726] Faulting instruction address: 0xc0000000004f7e44
> [ 2244.414467][ T726] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 2244.414488][ T726] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA PowerNV
> [ 2244.414501][ T726] Modules linked in: brd ext4 crc16 mbcache jbd2 loop kvm_hv kvm ip_tables x_tables xfs sd_mod bnx2x ahci tg3 libahci libphy mdio libata firmware_class dm_mirror dm_region_hash dm_log dm_mod
> [ 2244.414556][ T726] CPU: 11 PID: 726 Comm: khugepaged Not tainted 5.7.0-rc5-next-20200512+ #8
> [ 2244.414579][ T726] NIP: c0000000004f7e44 LR: c0000000004df95c CTR: c0000000001c1400
> [ 2244.414600][ T726] REGS: c000001a2398f6e0 TRAP: 0300 Not tainted (5.7.0-rc5-next-20200512+)
> [ 2244.414630][ T726] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24000244 XER: 20040000
> [ 2244.414656][ T726] CFAR: c0000000004df958 DAR: 000000000000002c DSISR: 40000000 IRQMASK: 0
> [ 2244.414656][ T726] GPR00: c0000000004df95c c000001a2398f970 c00000000168a700 fffffffffffffff4
> [ 2244.414656][ T726] GPR04: ffffffffffffffff c000000000bd0980 0000000000000005 0000000000000080
> [ 2244.414656][ T726] GPR08: 0000001ffc030000 0000000000000001 0000000000000000 c00000000152bb58
> [ 2244.414656][ T726] GPR12: 0000000024000222 c000001fffff5680 c0000001d818ce00 c0000001d818cd00
> [ 2244.414656][ T726] GPR16: 0000000000000000 c000001a2398fce0 fe7fffffffffefff fffffffffffffe7f
> [ 2244.414656][ T726] GPR20: c000201320aa53c8 000000000000001e 0000000000000017 c00020047636b868
> [ 2244.414656][ T726] GPR24: 0000000000000000 0000000000000000 c000000001756080 c000001a2398fce0
> [ 2244.414656][ T726] GPR28: c000001a2398fa20 00007ffeeda00000 c000200f28547928 c000200f28547880
> [ 2244.414865][ T726] NIP [c0000000004f7e44] mem_cgroup_uncharge+0x34/0xb0
> mem_cgroup_uncharge at mm/memcontrol.c:6563
> [ 2244.414895][ T726] LR [c0000000004df95c] collapse_huge_page+0x24c/0x1000
> collapse_huge_page at mm/khugepaged.c:1197
> [ 2244.414924][ T726] Call Trace:
> [ 2244.414940][ T726] [c000001a2398f970] [0000000000000001] 0x1 (unreliable)
> [ 2244.414970][ T726] [c000001a2398f9c0] [c0000000004df814] collapse_huge_page+0x104/0x1000
> collapse_huge_page at mm/khugepaged.c:1064 (discriminator 10)
> [ 2244.414991][ T726] [c000001a2398faf0] [c0000000004e0f84] khugepaged_scan_pmd+0x874/0xc70
> [ 2244.415021][ T726] [c000001a2398fbf0] [c0000000004e2a90] khugepaged+0x900/0x1920
> [ 2244.415043][ T726] [c000001a2398fdb0] [c000000000155aa4] kthread+0x1c4/0x1d0
> [ 2244.415075][ T726] [c000001a2398fe20] [c00000000000cb28] ret_from_kernel_thread+0x5c/0x74
> [ 2244.415095][ T726] Instruction dump:
> [ 2244.415113][ T726] 384228f0 7c0802a6 60000000 f821ffb1 e92d0c70 f9210048 39200000 3d22ffec
> [ 2244.415146][ T726] 3929f9f4 81290000 2f890000 409d0048 <e9230038> 2fa90000 419e003c 7c0802a6
> [ 2244.415181][ T726] ---[ end trace 3488eb8818913a26 ]---
^ permalink raw reply [flat|nested] 58+ messages in thread* Re: [PATCH 12/19] mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API
@ 2020-05-12 17:11 ` Qian Cai
0 siblings, 0 replies; 58+ messages in thread
From: Qian Cai @ 2020-05-12 17:11 UTC (permalink / raw)
To: Johannes Weiner, Stephen Rothwell
Cc: Andrew Morton, Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins,
Michal Hocko, Kirill A. Shutemov, Roman Gushchin, Linux-MM,
cgroups, LKML, kernel-team
> On May 12, 2020, at 10:38 AM, Qian Cai <cai@lca.pw> wrote:
>
>
>
>> On May 8, 2020, at 2:30 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>>
>> With the page->mapping requirement gone from memcg, we can charge anon
>> and file-thp pages in one single step, right after they're allocated.
>>
>> This removes two out of three API calls - especially the tricky commit
>> step that needed to happen at just the right time between when the
>> page is "set up" and when it's "published" - somewhat vague and fluid
>> concepts that varied by page type. All we need is a freshly allocated
>> page and a memcg context to charge.
>>
>> v2: prevent double charges on pre-allocated hugepages in khugepaged
>>
>> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
>> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> ---
>> include/linux/mm.h | 4 +---
>> kernel/events/uprobes.c | 11 +++--------
>> mm/filemap.c | 2 +-
>> mm/huge_memory.c | 9 +++------
>> mm/khugepaged.c | 35 ++++++++++-------------------------
>> mm/memory.c | 36 ++++++++++--------------------------
>> mm/migrate.c | 5 +----
>> mm/swapfile.c | 6 +-----
>> mm/userfaultfd.c | 5 +----
>> 9 files changed, 31 insertions(+), 82 deletions(-)
> []
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>
>> @@ -1198,10 +1193,11 @@ static void collapse_huge_page(struct mm_struct *mm,
>> out_up_write:
>> up_write(&mm->mmap_sem);
>> out_nolock:
>> + if (*hpage)
>> + mem_cgroup_uncharge(*hpage);
>> trace_mm_collapse_huge_page(mm, isolated, result);
>> return;
>> out:
>> - mem_cgroup_cancel_charge(new_page, memcg);
>> goto out_up_write;
>> }
> []
>
> Some memory pressure will crash this new code. It looks like somewhat racy.
Reverted the whole series fixed the crash, i.e.,
git revert --no-edit 6070efb8e52b..c986ddf58a95
There is a minor conflict during reverting due to another linux-next commit,
2a6b525f0de1 (“khugepaged: do not stop collapse if less than half PTEs are referenced”)
which is trivial to resolve,
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@@ -1091,8 -1000,8 +1093,9 @@@ static void collapse_huge_page(struct m
* If it fails, we release mmap_sem and jump out_nolock.
* Continuing to collapse causes inconsistency.
*/
- if (!__collapse_huge_page_swapin(mm, vma, address, pmd, referenced)) {
+ if (unmapped && !__collapse_huge_page_swapin(mm, vma, address,
+ pmd, referenced)) {
+ mem_cgroup_cancel_charge(new_page, memcg, true);
up_read(&mm->mmap_sem);
goto out_nolock;
}
>
> if (!page->mem_cgroup)
>
> where page == NULL in mem_cgroup_uncharge().
>
> [ 2244.414421][ T726] BUG: Kernel NULL pointer dereference on read at 0x0000002c
> [ 2244.414454][ T726] Faulting instruction address: 0xc0000000004f7e44
> [ 2244.414467][ T726] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 2244.414488][ T726] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA PowerNV
> [ 2244.414501][ T726] Modules linked in: brd ext4 crc16 mbcache jbd2 loop kvm_hv kvm ip_tables x_tables xfs sd_mod bnx2x ahci tg3 libahci libphy mdio libata firmware_class dm_mirror dm_region_hash dm_log dm_mod
> [ 2244.414556][ T726] CPU: 11 PID: 726 Comm: khugepaged Not tainted 5.7.0-rc5-next-20200512+ #8
> [ 2244.414579][ T726] NIP: c0000000004f7e44 LR: c0000000004df95c CTR: c0000000001c1400
> [ 2244.414600][ T726] REGS: c000001a2398f6e0 TRAP: 0300 Not tainted (5.7.0-rc5-next-20200512+)
> [ 2244.414630][ T726] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24000244 XER: 20040000
> [ 2244.414656][ T726] CFAR: c0000000004df958 DAR: 000000000000002c DSISR: 40000000 IRQMASK: 0
> [ 2244.414656][ T726] GPR00: c0000000004df95c c000001a2398f970 c00000000168a700 fffffffffffffff4
> [ 2244.414656][ T726] GPR04: ffffffffffffffff c000000000bd0980 0000000000000005 0000000000000080
> [ 2244.414656][ T726] GPR08: 0000001ffc030000 0000000000000001 0000000000000000 c00000000152bb58
> [ 2244.414656][ T726] GPR12: 0000000024000222 c000001fffff5680 c0000001d818ce00 c0000001d818cd00
> [ 2244.414656][ T726] GPR16: 0000000000000000 c000001a2398fce0 fe7fffffffffefff fffffffffffffe7f
> [ 2244.414656][ T726] GPR20: c000201320aa53c8 000000000000001e 0000000000000017 c00020047636b868
> [ 2244.414656][ T726] GPR24: 0000000000000000 0000000000000000 c000000001756080 c000001a2398fce0
> [ 2244.414656][ T726] GPR28: c000001a2398fa20 00007ffeeda00000 c000200f28547928 c000200f28547880
> [ 2244.414865][ T726] NIP [c0000000004f7e44] mem_cgroup_uncharge+0x34/0xb0
> mem_cgroup_uncharge at mm/memcontrol.c:6563
> [ 2244.414895][ T726] LR [c0000000004df95c] collapse_huge_page+0x24c/0x1000
> collapse_huge_page at mm/khugepaged.c:1197
> [ 2244.414924][ T726] Call Trace:
> [ 2244.414940][ T726] [c000001a2398f970] [0000000000000001] 0x1 (unreliable)
> [ 2244.414970][ T726] [c000001a2398f9c0] [c0000000004df814] collapse_huge_page+0x104/0x1000
> collapse_huge_page at mm/khugepaged.c:1064 (discriminator 10)
> [ 2244.414991][ T726] [c000001a2398faf0] [c0000000004e0f84] khugepaged_scan_pmd+0x874/0xc70
> [ 2244.415021][ T726] [c000001a2398fbf0] [c0000000004e2a90] khugepaged+0x900/0x1920
> [ 2244.415043][ T726] [c000001a2398fdb0] [c000000000155aa4] kthread+0x1c4/0x1d0
> [ 2244.415075][ T726] [c000001a2398fe20] [c00000000000cb28] ret_from_kernel_thread+0x5c/0x74
> [ 2244.415095][ T726] Instruction dump:
> [ 2244.415113][ T726] 384228f0 7c0802a6 60000000 f821ffb1 e92d0c70 f9210048 39200000 3d22ffec
> [ 2244.415146][ T726] 3929f9f4 81290000 2f890000 409d0048 <e9230038> 2fa90000 419e003c 7c0802a6
> [ 2244.415181][ T726] ---[ end trace 3488eb8818913a26 ]---
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 12/19] mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API
2020-05-12 14:38 ` Qian Cai
(?)
(?)
@ 2020-05-12 21:58 ` Johannes Weiner
[not found] ` <20200512215813.GA487759-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
-1 siblings, 1 reply; 58+ messages in thread
From: Johannes Weiner @ 2020-05-12 21:58 UTC (permalink / raw)
To: Qian Cai
Cc: Andrew Morton, Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins,
Michal Hocko, Kirill A. Shutemov, Roman Gushchin, Linux-MM,
cgroups, LKML, kernel-team
On Tue, May 12, 2020 at 10:38:54AM -0400, Qian Cai wrote:
> > On May 8, 2020, at 2:30 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > With the page->mapping requirement gone from memcg, we can charge anon
> > and file-thp pages in one single step, right after they're allocated.
> >
> > This removes two out of three API calls - especially the tricky commit
> > step that needed to happen at just the right time between when the
> > page is "set up" and when it's "published" - somewhat vague and fluid
> > concepts that varied by page type. All we need is a freshly allocated
> > page and a memcg context to charge.
> >
> > v2: prevent double charges on pre-allocated hugepages in khugepaged
> >
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > ---
> > include/linux/mm.h | 4 +---
> > kernel/events/uprobes.c | 11 +++--------
> > mm/filemap.c | 2 +-
> > mm/huge_memory.c | 9 +++------
> > mm/khugepaged.c | 35 ++++++++++-------------------------
> > mm/memory.c | 36 ++++++++++--------------------------
> > mm/migrate.c | 5 +----
> > mm/swapfile.c | 6 +-----
> > mm/userfaultfd.c | 5 +----
> > 9 files changed, 31 insertions(+), 82 deletions(-)
> []
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >
> > @@ -1198,10 +1193,11 @@ static void collapse_huge_page(struct mm_struct *mm,
> > out_up_write:
> > up_write(&mm->mmap_sem);
> > out_nolock:
> > + if (*hpage)
> > + mem_cgroup_uncharge(*hpage);
> > trace_mm_collapse_huge_page(mm, isolated, result);
> > return;
> > out:
> > - mem_cgroup_cancel_charge(new_page, memcg);
> > goto out_up_write;
> > }
> []
>
> Some memory pressure will crash this new code. It looks like somewhat racy.
>
> if (!page->mem_cgroup)
>
> where page == NULL in mem_cgroup_uncharge().
Thanks for the report, sorry about the inconvenience.
Hm, the page is exclusive at this point, nobody else should be
touching it. After all, khugepaged might reuse the preallocated page
for another pmd if this one fails to collapse.
Looking at the code, I think it's page itself that's garbage, not
page->mem_cgroup changing. If you have CONFIG_NUMA and the allocation
fails, *hpage could contain an ERR_PTR instead of being NULL.
I think we need the following fixlet:
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index f2e0a5e5cfbb..f6161e17da26 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1193,7 +1193,7 @@ static void collapse_huge_page(struct mm_struct *mm,
out_up_write:
up_write(&mm->mmap_sem);
out_nolock:
- if (*hpage)
+ if (!IS_ERR_OR_NULL(*hpage))
mem_cgroup_uncharge(*hpage);
trace_mm_collapse_huge_page(mm, isolated, result);
return;
@@ -1928,7 +1928,7 @@ static void collapse_file(struct mm_struct *mm,
unlock_page(new_page);
out:
VM_BUG_ON(!list_empty(&pagelist));
- if (*hpage)
+ if (!IS_ERR_OR_NULL(*hpage))
mem_cgroup_uncharge(*hpage);
/* TODO: tracepoints */
}
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 13/19] mm: memcontrol: drop unused try/commit/cancel charge API
2020-05-08 18:30 [PATCH 00/19 V2] mm: memcontrol: charge swapin pages on instantiation Johannes Weiner
@ 2020-05-08 18:31 ` Johannes Weiner
2020-05-08 18:30 ` [PATCH 04/19] mm: memcontrol: move out cgroup swaprate throttling Johannes Weiner
` (9 subsequent siblings)
10 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2020-05-08 18:31 UTC (permalink / raw)
To: Andrew Morton
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins, Michal Hocko,
Kirill A. Shutemov, Roman Gushchin,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
There are no more users. RIP in peace.
Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim-Hm3cg6mZ9cc@public.gmane.org>
---
include/linux/memcontrol.h | 36 -----------
mm/memcontrol.c | 126 +++++--------------------------------
2 files changed, 15 insertions(+), 147 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 9b1054bf6d35..23608d3ee70f 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -369,14 +369,6 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg)
page_counter_read(&memcg->memory);
}
-int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
- gfp_t gfp_mask, struct mem_cgroup **memcgp);
-int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm,
- gfp_t gfp_mask, struct mem_cgroup **memcgp);
-void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
- bool lrucare);
-void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg);
-
int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
bool lrucare);
@@ -867,34 +859,6 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg)
return false;
}
-static inline int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
- gfp_t gfp_mask,
- struct mem_cgroup **memcgp)
-{
- *memcgp = NULL;
- return 0;
-}
-
-static inline int mem_cgroup_try_charge_delay(struct page *page,
- struct mm_struct *mm,
- gfp_t gfp_mask,
- struct mem_cgroup **memcgp)
-{
- *memcgp = NULL;
- return 0;
-}
-
-static inline void mem_cgroup_commit_charge(struct page *page,
- struct mem_cgroup *memcg,
- bool lrucare)
-{
-}
-
-static inline void mem_cgroup_cancel_charge(struct page *page,
- struct mem_cgroup *memcg)
-{
-}
-
static inline int mem_cgroup_charge(struct page *page, struct mm_struct *mm,
gfp_t gfp_mask, bool lrucare)
{
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fd92c1c99e1f..7b9bb7ca0b44 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6432,29 +6432,26 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root,
}
/**
- * mem_cgroup_try_charge - try charging a page
+ * mem_cgroup_charge - charge a newly allocated page to a cgroup
* @page: page to charge
* @mm: mm context of the victim
* @gfp_mask: reclaim mode
- * @memcgp: charged memcg return
+ * @lrucare: page might be on the LRU already
*
* Try to charge @page to the memcg that @mm belongs to, reclaiming
* pages according to @gfp_mask if necessary.
*
- * Returns 0 on success, with *@memcgp pointing to the charged memcg.
- * Otherwise, an error code is returned.
- *
- * After page->mapping has been set up, the caller must finalize the
- * charge with mem_cgroup_commit_charge(). Or abort the transaction
- * with mem_cgroup_cancel_charge() in case page instantiation fails.
+ * Returns 0 on success. Otherwise, an error code is returned.
*/
-int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
- gfp_t gfp_mask, struct mem_cgroup **memcgp)
+int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
+ bool lrucare)
{
unsigned int nr_pages = hpage_nr_pages(page);
struct mem_cgroup *memcg = NULL;
int ret = 0;
+ VM_BUG_ON_PAGE(PageLRU(page) && !lrucare, page);
+
if (mem_cgroup_disabled())
goto out;
@@ -6486,56 +6483,8 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
memcg = get_mem_cgroup_from_mm(mm);
ret = try_charge(memcg, gfp_mask, nr_pages);
-
- css_put(&memcg->css);
-out:
- *memcgp = memcg;
- return ret;
-}
-
-int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm,
- gfp_t gfp_mask, struct mem_cgroup **memcgp)
-{
- int ret;
-
- ret = mem_cgroup_try_charge(page, mm, gfp_mask, memcgp);
- if (*memcgp)
- cgroup_throttle_swaprate(page, gfp_mask);
- return ret;
-}
-
-/**
- * mem_cgroup_commit_charge - commit a page charge
- * @page: page to charge
- * @memcg: memcg to charge the page to
- * @lrucare: page might be on LRU already
- *
- * Finalize a charge transaction started by mem_cgroup_try_charge(),
- * after page->mapping has been set up. This must happen atomically
- * as part of the page instantiation, i.e. under the page table lock
- * for anonymous pages, under the page lock for page and swap cache.
- *
- * In addition, the page must not be on the LRU during the commit, to
- * prevent racing with task migration. If it might be, use @lrucare.
- *
- * Use mem_cgroup_cancel_charge() to cancel the transaction instead.
- */
-void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
- bool lrucare)
-{
- unsigned int nr_pages = hpage_nr_pages(page);
-
- VM_BUG_ON_PAGE(PageLRU(page) && !lrucare, page);
-
- if (mem_cgroup_disabled())
- return;
- /*
- * Swap faults will attempt to charge the same page multiple
- * times. But reuse_swap_page() might have removed the page
- * from swapcache already, so we can't check PageSwapCache().
- */
- if (!memcg)
- return;
+ if (ret)
+ goto out_put;
commit_charge(page, memcg, lrucare);
@@ -6553,55 +6502,11 @@ void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
*/
mem_cgroup_uncharge_swap(entry, nr_pages);
}
-}
-/**
- * mem_cgroup_cancel_charge - cancel a page charge
- * @page: page to charge
- * @memcg: memcg to charge the page to
- *
- * Cancel a charge transaction started by mem_cgroup_try_charge().
- */
-void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg)
-{
- unsigned int nr_pages = hpage_nr_pages(page);
-
- if (mem_cgroup_disabled())
- return;
- /*
- * Swap faults will attempt to charge the same page multiple
- * times. But reuse_swap_page() might have removed the page
- * from swapcache already, so we can't check PageSwapCache().
- */
- if (!memcg)
- return;
-
- cancel_charge(memcg, nr_pages);
-}
-
-/**
- * mem_cgroup_charge - charge a newly allocated page to a cgroup
- * @page: page to charge
- * @mm: mm context of the victim
- * @gfp_mask: reclaim mode
- * @lrucare: page might be on the LRU already
- *
- * Try to charge @page to the memcg that @mm belongs to, reclaiming
- * pages according to @gfp_mask if necessary.
- *
- * Returns 0 on success. Otherwise, an error code is returned.
- */
-int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
- bool lrucare)
-{
- struct mem_cgroup *memcg;
- int ret;
-
- ret = mem_cgroup_try_charge(page, mm, gfp_mask, &memcg);
- if (ret)
- return ret;
- mem_cgroup_commit_charge(page, memcg, lrucare);
- return 0;
+out_put:
+ css_put(&memcg->css);
+out:
+ return ret;
}
struct uncharge_gather {
@@ -6706,8 +6611,7 @@ static void uncharge_list(struct list_head *page_list)
* mem_cgroup_uncharge - uncharge a page
* @page: page to uncharge
*
- * Uncharge a page previously charged with mem_cgroup_try_charge() and
- * mem_cgroup_commit_charge().
+ * Uncharge a page previously charged with mem_cgroup_charge().
*/
void mem_cgroup_uncharge(struct page *page)
{
@@ -6730,7 +6634,7 @@ void mem_cgroup_uncharge(struct page *page)
* @page_list: list of pages to uncharge
*
* Uncharge a list of pages previously charged with
- * mem_cgroup_try_charge() and mem_cgroup_commit_charge().
+ * mem_cgroup_charge().
*/
void mem_cgroup_uncharge_list(struct list_head *page_list)
{
--
2.26.2
^ permalink raw reply related [flat|nested] 58+ messages in thread* [PATCH 13/19] mm: memcontrol: drop unused try/commit/cancel charge API
@ 2020-05-08 18:31 ` Johannes Weiner
0 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2020-05-08 18:31 UTC (permalink / raw)
To: Andrew Morton
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins, Michal Hocko,
Kirill A. Shutemov, Roman Gushchin, linux-mm, cgroups,
linux-kernel, kernel-team
There are no more users. RIP in peace.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
include/linux/memcontrol.h | 36 -----------
mm/memcontrol.c | 126 +++++--------------------------------
2 files changed, 15 insertions(+), 147 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 9b1054bf6d35..23608d3ee70f 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -369,14 +369,6 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg)
page_counter_read(&memcg->memory);
}
-int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
- gfp_t gfp_mask, struct mem_cgroup **memcgp);
-int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm,
- gfp_t gfp_mask, struct mem_cgroup **memcgp);
-void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
- bool lrucare);
-void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg);
-
int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
bool lrucare);
@@ -867,34 +859,6 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg)
return false;
}
-static inline int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
- gfp_t gfp_mask,
- struct mem_cgroup **memcgp)
-{
- *memcgp = NULL;
- return 0;
-}
-
-static inline int mem_cgroup_try_charge_delay(struct page *page,
- struct mm_struct *mm,
- gfp_t gfp_mask,
- struct mem_cgroup **memcgp)
-{
- *memcgp = NULL;
- return 0;
-}
-
-static inline void mem_cgroup_commit_charge(struct page *page,
- struct mem_cgroup *memcg,
- bool lrucare)
-{
-}
-
-static inline void mem_cgroup_cancel_charge(struct page *page,
- struct mem_cgroup *memcg)
-{
-}
-
static inline int mem_cgroup_charge(struct page *page, struct mm_struct *mm,
gfp_t gfp_mask, bool lrucare)
{
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fd92c1c99e1f..7b9bb7ca0b44 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6432,29 +6432,26 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root,
}
/**
- * mem_cgroup_try_charge - try charging a page
+ * mem_cgroup_charge - charge a newly allocated page to a cgroup
* @page: page to charge
* @mm: mm context of the victim
* @gfp_mask: reclaim mode
- * @memcgp: charged memcg return
+ * @lrucare: page might be on the LRU already
*
* Try to charge @page to the memcg that @mm belongs to, reclaiming
* pages according to @gfp_mask if necessary.
*
- * Returns 0 on success, with *@memcgp pointing to the charged memcg.
- * Otherwise, an error code is returned.
- *
- * After page->mapping has been set up, the caller must finalize the
- * charge with mem_cgroup_commit_charge(). Or abort the transaction
- * with mem_cgroup_cancel_charge() in case page instantiation fails.
+ * Returns 0 on success. Otherwise, an error code is returned.
*/
-int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
- gfp_t gfp_mask, struct mem_cgroup **memcgp)
+int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
+ bool lrucare)
{
unsigned int nr_pages = hpage_nr_pages(page);
struct mem_cgroup *memcg = NULL;
int ret = 0;
+ VM_BUG_ON_PAGE(PageLRU(page) && !lrucare, page);
+
if (mem_cgroup_disabled())
goto out;
@@ -6486,56 +6483,8 @@ int mem_cgroup_try_charge(struct page *page, struct mm_struct *mm,
memcg = get_mem_cgroup_from_mm(mm);
ret = try_charge(memcg, gfp_mask, nr_pages);
-
- css_put(&memcg->css);
-out:
- *memcgp = memcg;
- return ret;
-}
-
-int mem_cgroup_try_charge_delay(struct page *page, struct mm_struct *mm,
- gfp_t gfp_mask, struct mem_cgroup **memcgp)
-{
- int ret;
-
- ret = mem_cgroup_try_charge(page, mm, gfp_mask, memcgp);
- if (*memcgp)
- cgroup_throttle_swaprate(page, gfp_mask);
- return ret;
-}
-
-/**
- * mem_cgroup_commit_charge - commit a page charge
- * @page: page to charge
- * @memcg: memcg to charge the page to
- * @lrucare: page might be on LRU already
- *
- * Finalize a charge transaction started by mem_cgroup_try_charge(),
- * after page->mapping has been set up. This must happen atomically
- * as part of the page instantiation, i.e. under the page table lock
- * for anonymous pages, under the page lock for page and swap cache.
- *
- * In addition, the page must not be on the LRU during the commit, to
- * prevent racing with task migration. If it might be, use @lrucare.
- *
- * Use mem_cgroup_cancel_charge() to cancel the transaction instead.
- */
-void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
- bool lrucare)
-{
- unsigned int nr_pages = hpage_nr_pages(page);
-
- VM_BUG_ON_PAGE(PageLRU(page) && !lrucare, page);
-
- if (mem_cgroup_disabled())
- return;
- /*
- * Swap faults will attempt to charge the same page multiple
- * times. But reuse_swap_page() might have removed the page
- * from swapcache already, so we can't check PageSwapCache().
- */
- if (!memcg)
- return;
+ if (ret)
+ goto out_put;
commit_charge(page, memcg, lrucare);
@@ -6553,55 +6502,11 @@ void mem_cgroup_commit_charge(struct page *page, struct mem_cgroup *memcg,
*/
mem_cgroup_uncharge_swap(entry, nr_pages);
}
-}
-/**
- * mem_cgroup_cancel_charge - cancel a page charge
- * @page: page to charge
- * @memcg: memcg to charge the page to
- *
- * Cancel a charge transaction started by mem_cgroup_try_charge().
- */
-void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg)
-{
- unsigned int nr_pages = hpage_nr_pages(page);
-
- if (mem_cgroup_disabled())
- return;
- /*
- * Swap faults will attempt to charge the same page multiple
- * times. But reuse_swap_page() might have removed the page
- * from swapcache already, so we can't check PageSwapCache().
- */
- if (!memcg)
- return;
-
- cancel_charge(memcg, nr_pages);
-}
-
-/**
- * mem_cgroup_charge - charge a newly allocated page to a cgroup
- * @page: page to charge
- * @mm: mm context of the victim
- * @gfp_mask: reclaim mode
- * @lrucare: page might be on the LRU already
- *
- * Try to charge @page to the memcg that @mm belongs to, reclaiming
- * pages according to @gfp_mask if necessary.
- *
- * Returns 0 on success. Otherwise, an error code is returned.
- */
-int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
- bool lrucare)
-{
- struct mem_cgroup *memcg;
- int ret;
-
- ret = mem_cgroup_try_charge(page, mm, gfp_mask, &memcg);
- if (ret)
- return ret;
- mem_cgroup_commit_charge(page, memcg, lrucare);
- return 0;
+out_put:
+ css_put(&memcg->css);
+out:
+ return ret;
}
struct uncharge_gather {
@@ -6706,8 +6611,7 @@ static void uncharge_list(struct list_head *page_list)
* mem_cgroup_uncharge - uncharge a page
* @page: page to uncharge
*
- * Uncharge a page previously charged with mem_cgroup_try_charge() and
- * mem_cgroup_commit_charge().
+ * Uncharge a page previously charged with mem_cgroup_charge().
*/
void mem_cgroup_uncharge(struct page *page)
{
@@ -6730,7 +6634,7 @@ void mem_cgroup_uncharge(struct page *page)
* @page_list: list of pages to uncharge
*
* Uncharge a list of pages previously charged with
- * mem_cgroup_try_charge() and mem_cgroup_commit_charge().
+ * mem_cgroup_charge().
*/
void mem_cgroup_uncharge_list(struct list_head *page_list)
{
--
2.26.2
^ permalink raw reply related [flat|nested] 58+ messages in thread[parent not found: <20200508183105.225460-14-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>]
* Re: [PATCH 13/19] mm: memcontrol: drop unused try/commit/cancel charge API
2020-05-08 18:31 ` Johannes Weiner
@ 2020-06-22 17:06 ` Ben Widawsky
-1 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2020-06-22 17:06 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins,
Michal Hocko, Kirill A. Shutemov, Roman Gushchin,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
On 20-05-08 14:31:00, Johannes Weiner wrote:
> There are no more users. RIP in peace.
>
Would it make sense to update Documentation/admin-guide/cgroup-v1/memcg_test.rst
too? I don't have the history on this file, or why it exists (it does say
implementation details can be changed).
> Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim-Hm3cg6mZ9cc@public.gmane.org>
[snip]
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 13/19] mm: memcontrol: drop unused try/commit/cancel charge API
@ 2020-06-22 17:06 ` Ben Widawsky
0 siblings, 0 replies; 58+ messages in thread
From: Ben Widawsky @ 2020-06-22 17:06 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins,
Michal Hocko, Kirill A. Shutemov, Roman Gushchin, linux-mm,
cgroups, linux-kernel, kernel-team
On 20-05-08 14:31:00, Johannes Weiner wrote:
> There are no more users. RIP in peace.
>
Would it make sense to update Documentation/admin-guide/cgroup-v1/memcg_test.rst
too? I don't have the history on this file, or why it exists (it does say
implementation details can be changed).
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
[snip]
^ permalink raw reply [flat|nested] 58+ messages in thread
* [PATCH 15/19] mm: memcontrol: make swap tracking an integral part of memory control
2020-05-08 18:30 [PATCH 00/19 V2] mm: memcontrol: charge swapin pages on instantiation Johannes Weiner
@ 2020-05-08 18:31 ` Johannes Weiner
2020-05-08 18:30 ` [PATCH 04/19] mm: memcontrol: move out cgroup swaprate throttling Johannes Weiner
` (9 subsequent siblings)
10 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2020-05-08 18:31 UTC (permalink / raw)
To: Andrew Morton
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins, Michal Hocko,
Kirill A. Shutemov, Roman Gushchin,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
Without swap page tracking, users that are otherwise memory controlled
can easily escape their containment and allocate significant amounts
of memory that they're not being charged for. That's because swap does
readahead, but without the cgroup records of who owned the page at
swapout, readahead pages don't get charged until somebody actually
faults them into their page table and we can identify an owner task.
This can be maliciously exploited with MADV_WILLNEED, which triggers
arbitrary readahead allocations without charging the pages.
Make swap swap page tracking an integral part of memcg and remove the
Kconfig options. In the first place, it was only made configurable to
allow users to save some memory. But the overhead of tracking cgroup
ownership per swap page is minimal - 2 byte per page, or 512k per 1G
of swap, or 0.04%. Saving that at the expense of broken containment
semantics is not something we should present as a coequal option.
The swapaccount=0 boot option will continue to exist, and it will
eliminate the page_counter overhead and hide the swap control files,
but it won't disable swap slot ownership tracking.
This patch makes sure we always have the cgroup records at swapin
time; the next patch will fix the actual bug by charging readahead
swap pages at swapin time rather than at fault time.
v2: fix double swap charge bug in cgroup1/cgroup2 code gating
Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim-Hm3cg6mZ9cc@public.gmane.org>
---
init/Kconfig | 17 +----------------
mm/memcontrol.c | 47 ++++++++++++++++++-----------------------------
mm/swap_cgroup.c | 6 ------
3 files changed, 19 insertions(+), 51 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig
index 492bb7000aa4..9a874b2201bd 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -847,24 +847,9 @@ config MEMCG
Provides control over the memory footprint of tasks in a cgroup.
config MEMCG_SWAP
- bool "Swap controller"
+ bool
depends on MEMCG && SWAP
- help
- Provides control over the swap space consumed by tasks in a cgroup.
-
-config MEMCG_SWAP_ENABLED
- bool "Swap controller enabled by default"
- depends on MEMCG_SWAP
default y
- help
- Memory Resource Controller Swap Extension comes with its price in
- a bigger memory consumption. General purpose distribution kernels
- which want to enable the feature but keep it disabled by default
- and let the user enable it by swapaccount=1 boot command line
- parameter should have this option unselected.
- For those who want to have the feature enabled by default should
- select this option (if, for some reason, they need to disable it
- then swapaccount=0 does the trick).
config MEMCG_KMEM
bool
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index bb5f02ab92fb..4a003531af07 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -83,14 +83,10 @@ static bool cgroup_memory_nokmem;
/* Whether the swap controller is active */
#ifdef CONFIG_MEMCG_SWAP
-#ifdef CONFIG_MEMCG_SWAP_ENABLED
bool cgroup_memory_noswap __read_mostly;
#else
-bool cgroup_memory_noswap __read_mostly = 1;
-#endif /* CONFIG_MEMCG_SWAP_ENABLED */
-#else
#define cgroup_memory_noswap 1
-#endif /* CONFIG_MEMCG_SWAP */
+#endif
#ifdef CONFIG_CGROUP_WRITEBACK
static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq);
@@ -5294,8 +5290,7 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
* we call find_get_page() with swapper_space directly.
*/
page = find_get_page(swap_address_space(ent), swp_offset(ent));
- if (do_memsw_account())
- entry->val = ent.val;
+ entry->val = ent.val;
return page;
}
@@ -5329,8 +5324,7 @@ static struct page *mc_handle_file_pte(struct vm_area_struct *vma,
page = find_get_entry(mapping, pgoff);
if (xa_is_value(page)) {
swp_entry_t swp = radix_to_swp_entry(page);
- if (do_memsw_account())
- *entry = swp;
+ *entry = swp;
page = find_get_page(swap_address_space(swp),
swp_offset(swp));
}
@@ -6460,6 +6454,9 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
goto out;
if (PageSwapCache(page)) {
+ swp_entry_t ent = { .val = page_private(page), };
+ unsigned short id;
+
/*
* Every swap fault against a single page tries to charge the
* page, bail as early as possible. shmem_unuse() encounters
@@ -6471,17 +6468,12 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
if (compound_head(page)->mem_cgroup)
goto out;
- if (!cgroup_memory_noswap) {
- swp_entry_t ent = { .val = page_private(page), };
- unsigned short id;
-
- id = lookup_swap_cgroup_id(ent);
- rcu_read_lock();
- memcg = mem_cgroup_from_id(id);
- if (memcg && !css_tryget_online(&memcg->css))
- memcg = NULL;
- rcu_read_unlock();
- }
+ id = lookup_swap_cgroup_id(ent);
+ rcu_read_lock();
+ memcg = mem_cgroup_from_id(id);
+ if (memcg && !css_tryget_online(&memcg->css))
+ memcg = NULL;
+ rcu_read_unlock();
}
if (!memcg)
@@ -6498,7 +6490,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
memcg_check_events(memcg, page);
local_irq_enable();
- if (do_memsw_account() && PageSwapCache(page)) {
+ if (PageSwapCache(page)) {
swp_entry_t entry = { .val = page_private(page) };
/*
* The swap entry might not get freed for a long time,
@@ -6883,7 +6875,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
VM_BUG_ON_PAGE(PageLRU(page), page);
VM_BUG_ON_PAGE(page_count(page), page);
- if (!do_memsw_account())
+ if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
return;
memcg = page->mem_cgroup;
@@ -6912,7 +6904,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
if (!mem_cgroup_is_root(memcg))
page_counter_uncharge(&memcg->memory, nr_entries);
- if (memcg != swap_memcg) {
+ if (!cgroup_memory_noswap && memcg != swap_memcg) {
if (!mem_cgroup_is_root(swap_memcg))
page_counter_charge(&swap_memcg->memsw, nr_entries);
page_counter_uncharge(&memcg->memsw, nr_entries);
@@ -6948,7 +6940,7 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
struct mem_cgroup *memcg;
unsigned short oldid;
- if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) || cgroup_memory_noswap)
+ if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
return 0;
memcg = page->mem_cgroup;
@@ -6964,7 +6956,7 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
memcg = mem_cgroup_id_get_online(memcg);
- if (!mem_cgroup_is_root(memcg) &&
+ if (!cgroup_memory_noswap && !mem_cgroup_is_root(memcg) &&
!page_counter_try_charge(&memcg->swap, nr_pages, &counter)) {
memcg_memory_event(memcg, MEMCG_SWAP_MAX);
memcg_memory_event(memcg, MEMCG_SWAP_FAIL);
@@ -6992,14 +6984,11 @@ void mem_cgroup_uncharge_swap(swp_entry_t entry, unsigned int nr_pages)
struct mem_cgroup *memcg;
unsigned short id;
- if (cgroup_memory_noswap)
- return;
-
id = swap_cgroup_record(entry, 0, nr_pages);
rcu_read_lock();
memcg = mem_cgroup_from_id(id);
if (memcg) {
- if (!mem_cgroup_is_root(memcg)) {
+ if (!cgroup_memory_noswap && !mem_cgroup_is_root(memcg)) {
if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
page_counter_uncharge(&memcg->swap, nr_pages);
else
diff --git a/mm/swap_cgroup.c b/mm/swap_cgroup.c
index 7aa764f09079..7f34343c075a 100644
--- a/mm/swap_cgroup.c
+++ b/mm/swap_cgroup.c
@@ -171,9 +171,6 @@ int swap_cgroup_swapon(int type, unsigned long max_pages)
unsigned long length;
struct swap_cgroup_ctrl *ctrl;
- if (cgroup_memory_noswap)
- return 0;
-
length = DIV_ROUND_UP(max_pages, SC_PER_PAGE);
array_size = length * sizeof(void *);
@@ -209,9 +206,6 @@ void swap_cgroup_swapoff(int type)
unsigned long i, length;
struct swap_cgroup_ctrl *ctrl;
- if (cgroup_memory_noswap)
- return;
-
mutex_lock(&swap_cgroup_mutex);
ctrl = &swap_cgroup_ctrl[type];
map = ctrl->map;
--
2.26.2
^ permalink raw reply related [flat|nested] 58+ messages in thread* [PATCH 15/19] mm: memcontrol: make swap tracking an integral part of memory control
@ 2020-05-08 18:31 ` Johannes Weiner
0 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2020-05-08 18:31 UTC (permalink / raw)
To: Andrew Morton
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins, Michal Hocko,
Kirill A. Shutemov, Roman Gushchin, linux-mm, cgroups,
linux-kernel, kernel-team
Without swap page tracking, users that are otherwise memory controlled
can easily escape their containment and allocate significant amounts
of memory that they're not being charged for. That's because swap does
readahead, but without the cgroup records of who owned the page at
swapout, readahead pages don't get charged until somebody actually
faults them into their page table and we can identify an owner task.
This can be maliciously exploited with MADV_WILLNEED, which triggers
arbitrary readahead allocations without charging the pages.
Make swap swap page tracking an integral part of memcg and remove the
Kconfig options. In the first place, it was only made configurable to
allow users to save some memory. But the overhead of tracking cgroup
ownership per swap page is minimal - 2 byte per page, or 512k per 1G
of swap, or 0.04%. Saving that at the expense of broken containment
semantics is not something we should present as a coequal option.
The swapaccount=0 boot option will continue to exist, and it will
eliminate the page_counter overhead and hide the swap control files,
but it won't disable swap slot ownership tracking.
This patch makes sure we always have the cgroup records at swapin
time; the next patch will fix the actual bug by charging readahead
swap pages at swapin time rather than at fault time.
v2: fix double swap charge bug in cgroup1/cgroup2 code gating
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
init/Kconfig | 17 +----------------
mm/memcontrol.c | 47 ++++++++++++++++++-----------------------------
mm/swap_cgroup.c | 6 ------
3 files changed, 19 insertions(+), 51 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig
index 492bb7000aa4..9a874b2201bd 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -847,24 +847,9 @@ config MEMCG
Provides control over the memory footprint of tasks in a cgroup.
config MEMCG_SWAP
- bool "Swap controller"
+ bool
depends on MEMCG && SWAP
- help
- Provides control over the swap space consumed by tasks in a cgroup.
-
-config MEMCG_SWAP_ENABLED
- bool "Swap controller enabled by default"
- depends on MEMCG_SWAP
default y
- help
- Memory Resource Controller Swap Extension comes with its price in
- a bigger memory consumption. General purpose distribution kernels
- which want to enable the feature but keep it disabled by default
- and let the user enable it by swapaccount=1 boot command line
- parameter should have this option unselected.
- For those who want to have the feature enabled by default should
- select this option (if, for some reason, they need to disable it
- then swapaccount=0 does the trick).
config MEMCG_KMEM
bool
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index bb5f02ab92fb..4a003531af07 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -83,14 +83,10 @@ static bool cgroup_memory_nokmem;
/* Whether the swap controller is active */
#ifdef CONFIG_MEMCG_SWAP
-#ifdef CONFIG_MEMCG_SWAP_ENABLED
bool cgroup_memory_noswap __read_mostly;
#else
-bool cgroup_memory_noswap __read_mostly = 1;
-#endif /* CONFIG_MEMCG_SWAP_ENABLED */
-#else
#define cgroup_memory_noswap 1
-#endif /* CONFIG_MEMCG_SWAP */
+#endif
#ifdef CONFIG_CGROUP_WRITEBACK
static DECLARE_WAIT_QUEUE_HEAD(memcg_cgwb_frn_waitq);
@@ -5294,8 +5290,7 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
* we call find_get_page() with swapper_space directly.
*/
page = find_get_page(swap_address_space(ent), swp_offset(ent));
- if (do_memsw_account())
- entry->val = ent.val;
+ entry->val = ent.val;
return page;
}
@@ -5329,8 +5324,7 @@ static struct page *mc_handle_file_pte(struct vm_area_struct *vma,
page = find_get_entry(mapping, pgoff);
if (xa_is_value(page)) {
swp_entry_t swp = radix_to_swp_entry(page);
- if (do_memsw_account())
- *entry = swp;
+ *entry = swp;
page = find_get_page(swap_address_space(swp),
swp_offset(swp));
}
@@ -6460,6 +6454,9 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
goto out;
if (PageSwapCache(page)) {
+ swp_entry_t ent = { .val = page_private(page), };
+ unsigned short id;
+
/*
* Every swap fault against a single page tries to charge the
* page, bail as early as possible. shmem_unuse() encounters
@@ -6471,17 +6468,12 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
if (compound_head(page)->mem_cgroup)
goto out;
- if (!cgroup_memory_noswap) {
- swp_entry_t ent = { .val = page_private(page), };
- unsigned short id;
-
- id = lookup_swap_cgroup_id(ent);
- rcu_read_lock();
- memcg = mem_cgroup_from_id(id);
- if (memcg && !css_tryget_online(&memcg->css))
- memcg = NULL;
- rcu_read_unlock();
- }
+ id = lookup_swap_cgroup_id(ent);
+ rcu_read_lock();
+ memcg = mem_cgroup_from_id(id);
+ if (memcg && !css_tryget_online(&memcg->css))
+ memcg = NULL;
+ rcu_read_unlock();
}
if (!memcg)
@@ -6498,7 +6490,7 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask,
memcg_check_events(memcg, page);
local_irq_enable();
- if (do_memsw_account() && PageSwapCache(page)) {
+ if (PageSwapCache(page)) {
swp_entry_t entry = { .val = page_private(page) };
/*
* The swap entry might not get freed for a long time,
@@ -6883,7 +6875,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
VM_BUG_ON_PAGE(PageLRU(page), page);
VM_BUG_ON_PAGE(page_count(page), page);
- if (!do_memsw_account())
+ if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
return;
memcg = page->mem_cgroup;
@@ -6912,7 +6904,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
if (!mem_cgroup_is_root(memcg))
page_counter_uncharge(&memcg->memory, nr_entries);
- if (memcg != swap_memcg) {
+ if (!cgroup_memory_noswap && memcg != swap_memcg) {
if (!mem_cgroup_is_root(swap_memcg))
page_counter_charge(&swap_memcg->memsw, nr_entries);
page_counter_uncharge(&memcg->memsw, nr_entries);
@@ -6948,7 +6940,7 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
struct mem_cgroup *memcg;
unsigned short oldid;
- if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) || cgroup_memory_noswap)
+ if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
return 0;
memcg = page->mem_cgroup;
@@ -6964,7 +6956,7 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
memcg = mem_cgroup_id_get_online(memcg);
- if (!mem_cgroup_is_root(memcg) &&
+ if (!cgroup_memory_noswap && !mem_cgroup_is_root(memcg) &&
!page_counter_try_charge(&memcg->swap, nr_pages, &counter)) {
memcg_memory_event(memcg, MEMCG_SWAP_MAX);
memcg_memory_event(memcg, MEMCG_SWAP_FAIL);
@@ -6992,14 +6984,11 @@ void mem_cgroup_uncharge_swap(swp_entry_t entry, unsigned int nr_pages)
struct mem_cgroup *memcg;
unsigned short id;
- if (cgroup_memory_noswap)
- return;
-
id = swap_cgroup_record(entry, 0, nr_pages);
rcu_read_lock();
memcg = mem_cgroup_from_id(id);
if (memcg) {
- if (!mem_cgroup_is_root(memcg)) {
+ if (!cgroup_memory_noswap && !mem_cgroup_is_root(memcg)) {
if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
page_counter_uncharge(&memcg->swap, nr_pages);
else
diff --git a/mm/swap_cgroup.c b/mm/swap_cgroup.c
index 7aa764f09079..7f34343c075a 100644
--- a/mm/swap_cgroup.c
+++ b/mm/swap_cgroup.c
@@ -171,9 +171,6 @@ int swap_cgroup_swapon(int type, unsigned long max_pages)
unsigned long length;
struct swap_cgroup_ctrl *ctrl;
- if (cgroup_memory_noswap)
- return 0;
-
length = DIV_ROUND_UP(max_pages, SC_PER_PAGE);
array_size = length * sizeof(void *);
@@ -209,9 +206,6 @@ void swap_cgroup_swapoff(int type)
unsigned long i, length;
struct swap_cgroup_ctrl *ctrl;
- if (cgroup_memory_noswap)
- return;
-
mutex_lock(&swap_cgroup_mutex);
ctrl = &swap_cgroup_ctrl[type];
map = ctrl->map;
--
2.26.2
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 16/19] mm: memcontrol: charge swapin pages on instantiation
2020-05-08 18:30 [PATCH 00/19 V2] mm: memcontrol: charge swapin pages on instantiation Johannes Weiner
@ 2020-05-08 18:31 ` Johannes Weiner
2020-05-08 18:30 ` [PATCH 04/19] mm: memcontrol: move out cgroup swaprate throttling Johannes Weiner
` (9 subsequent siblings)
10 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2020-05-08 18:31 UTC (permalink / raw)
To: Andrew Morton
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins, Michal Hocko,
Kirill A. Shutemov, Roman Gushchin,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
Right now, users that are otherwise memory controlled can easily
escape their containment and allocate significant amounts of memory
that they're not being charged for. That's because swap readahead
pages are not being charged until somebody actually faults them into
their page table. This can be exploited with MADV_WILLNEED, which
triggers arbitrary readahead allocations without charging the pages.
There are additional problems with the delayed charging of swap pages:
1. To implement refault/workingset detection for anonymous pages, we
need to have a target LRU available at swapin time, but the LRU is
not determinable until the page has been charged.
2. To implement per-cgroup LRU locking, we need page->mem_cgroup to be
stable when the page is isolated from the LRU; otherwise, the locks
change under us. But swapcache gets charged after it's already on
the LRU, and even if we cannot isolate it ourselves (since charging
is not exactly optional).
The previous patch ensured we always maintain cgroup ownership records
for swap pages. This patch moves the swapcache charging point from the
fault handler to swapin time to fix all of the above problems.
v2: simplify swapin error checking (Joonsoo)
Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Reviewed-by: Alex Shi <alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
---
mm/memory.c | 15 ++++++---
mm/shmem.c | 14 ++++----
mm/swap_state.c | 89 ++++++++++++++++++++++++++-----------------------
mm/swapfile.c | 6 ----
4 files changed, 67 insertions(+), 57 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 832ee914cbcf..93900b121b6e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3125,9 +3125,20 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma,
vmf->address);
if (page) {
+ int err;
+
__SetPageLocked(page);
__SetPageSwapBacked(page);
set_page_private(page, entry.val);
+
+ /* Tell memcg to use swap ownership records */
+ SetPageSwapCache(page);
+ err = mem_cgroup_charge(page, vma->vm_mm,
+ GFP_KERNEL, false);
+ ClearPageSwapCache(page);
+ if (err)
+ goto out_page;
+
lru_cache_add_anon(page);
swap_readpage(page, true);
}
@@ -3189,10 +3200,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
goto out_page;
}
- if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL, true)) {
- ret = VM_FAULT_OOM;
- goto out_page;
- }
cgroup_throttle_swaprate(page, GFP_KERNEL);
/*
diff --git a/mm/shmem.c b/mm/shmem.c
index d0306a36f42c..98547dc4642d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -623,13 +623,15 @@ static int shmem_add_to_page_cache(struct page *page,
page->mapping = mapping;
page->index = index;
- error = mem_cgroup_charge(page, charge_mm, gfp, PageSwapCache(page));
- if (error) {
- if (!PageSwapCache(page) && PageTransHuge(page)) {
- count_vm_event(THP_FILE_FALLBACK);
- count_vm_event(THP_FILE_FALLBACK_CHARGE);
+ if (!PageSwapCache(page)) {
+ error = mem_cgroup_charge(page, charge_mm, gfp, false);
+ if (error) {
+ if (PageTransHuge(page)) {
+ count_vm_event(THP_FILE_FALLBACK);
+ count_vm_event(THP_FILE_FALLBACK_CHARGE);
+ }
+ goto error;
}
- goto error;
}
cgroup_throttle_swaprate(page, gfp);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 558e224138d1..4052c011391d 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -360,12 +360,13 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
struct vm_area_struct *vma, unsigned long addr,
bool *new_page_allocated)
{
- struct page *found_page = NULL, *new_page = NULL;
struct swap_info_struct *si;
- int err;
+ struct page *page;
+
*new_page_allocated = false;
- do {
+ for (;;) {
+ int err;
/*
* First check the swap cache. Since this is normally
* called after lookup_swap_cache() failed, re-calling
@@ -373,12 +374,12 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
*/
si = get_swap_device(entry);
if (!si)
- break;
- found_page = find_get_page(swap_address_space(entry),
- swp_offset(entry));
+ return NULL;
+ page = find_get_page(swap_address_space(entry),
+ swp_offset(entry));
put_swap_device(si);
- if (found_page)
- break;
+ if (page)
+ return page;
/*
* Just skip read ahead for unused swap slot.
@@ -389,21 +390,15 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
* else swap_off will be aborted if we return NULL.
*/
if (!__swp_swapcount(entry) && swap_slot_cache_enabled)
- break;
-
- /*
- * Get a new page to read into from swap.
- */
- if (!new_page) {
- new_page = alloc_page_vma(gfp_mask, vma, addr);
- if (!new_page)
- break; /* Out of memory */
- }
+ return NULL;
/*
* Swap entry may have been freed since our caller observed it.
*/
err = swapcache_prepare(entry);
+ if (!err)
+ break;
+
if (err == -EEXIST) {
/*
* We might race against get_swap_page() and stumble
@@ -412,31 +407,43 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
*/
cond_resched();
continue;
- } else if (err) /* swp entry is obsolete ? */
- break;
-
- /* May fail (-ENOMEM) if XArray node allocation failed. */
- __SetPageLocked(new_page);
- __SetPageSwapBacked(new_page);
- err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL);
- if (likely(!err)) {
- /* Initiate read into locked page */
- SetPageWorkingset(new_page);
- lru_cache_add_anon(new_page);
- *new_page_allocated = true;
- return new_page;
}
- __ClearPageLocked(new_page);
- /*
- * add_to_swap_cache() doesn't return -EEXIST, so we can safely
- * clear SWAP_HAS_CACHE flag.
- */
- put_swap_page(new_page, entry);
- } while (err != -ENOMEM);
- if (new_page)
- put_page(new_page);
- return found_page;
+ return NULL;
+ }
+
+ /*
+ * The swap entry is ours to swap in. Prepare a new page.
+ */
+
+ page = alloc_page_vma(gfp_mask, vma, addr);
+ if (!page)
+ goto fail_free;
+
+ __SetPageLocked(page);
+ __SetPageSwapBacked(page);
+
+ /* May fail (-ENOMEM) if XArray node allocation failed. */
+ if (add_to_swap_cache(page, entry, gfp_mask & GFP_KERNEL))
+ goto fail_unlock;
+
+ if (mem_cgroup_charge(page, NULL, gfp_mask & GFP_KERNEL, false))
+ goto fail_delete;
+
+ /* Initiate read into locked page */
+ SetPageWorkingset(page);
+ lru_cache_add_anon(page);
+ *new_page_allocated = true;
+ return page;
+
+fail_delete:
+ delete_from_swap_cache(page);
+fail_unlock:
+ unlock_page(page);
+ put_page(page);
+fail_free:
+ swap_free(entry);
+ return NULL;
}
/*
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 8c9b6767013b..3bc7acc68ba8 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1867,11 +1867,6 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
if (unlikely(!page))
return -ENOMEM;
- if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL, true)) {
- ret = -ENOMEM;
- goto out_nolock;
- }
-
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
if (unlikely(!pte_same_as_swp(*pte, swp_entry_to_pte(entry)))) {
ret = 0;
@@ -1897,7 +1892,6 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
activate_page(page);
out:
pte_unmap_unlock(pte, ptl);
-out_nolock:
if (page != swapcache) {
unlock_page(page);
put_page(page);
--
2.26.2
^ permalink raw reply related [flat|nested] 58+ messages in thread* [PATCH 16/19] mm: memcontrol: charge swapin pages on instantiation
@ 2020-05-08 18:31 ` Johannes Weiner
0 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2020-05-08 18:31 UTC (permalink / raw)
To: Andrew Morton
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins, Michal Hocko,
Kirill A. Shutemov, Roman Gushchin, linux-mm, cgroups,
linux-kernel, kernel-team
Right now, users that are otherwise memory controlled can easily
escape their containment and allocate significant amounts of memory
that they're not being charged for. That's because swap readahead
pages are not being charged until somebody actually faults them into
their page table. This can be exploited with MADV_WILLNEED, which
triggers arbitrary readahead allocations without charging the pages.
There are additional problems with the delayed charging of swap pages:
1. To implement refault/workingset detection for anonymous pages, we
need to have a target LRU available at swapin time, but the LRU is
not determinable until the page has been charged.
2. To implement per-cgroup LRU locking, we need page->mem_cgroup to be
stable when the page is isolated from the LRU; otherwise, the locks
change under us. But swapcache gets charged after it's already on
the LRU, and even if we cannot isolate it ourselves (since charging
is not exactly optional).
The previous patch ensured we always maintain cgroup ownership records
for swap pages. This patch moves the swapcache charging point from the
fault handler to swapin time to fix all of the above problems.
v2: simplify swapin error checking (Joonsoo)
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
---
mm/memory.c | 15 ++++++---
mm/shmem.c | 14 ++++----
mm/swap_state.c | 89 ++++++++++++++++++++++++++-----------------------
mm/swapfile.c | 6 ----
4 files changed, 67 insertions(+), 57 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 832ee914cbcf..93900b121b6e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3125,9 +3125,20 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma,
vmf->address);
if (page) {
+ int err;
+
__SetPageLocked(page);
__SetPageSwapBacked(page);
set_page_private(page, entry.val);
+
+ /* Tell memcg to use swap ownership records */
+ SetPageSwapCache(page);
+ err = mem_cgroup_charge(page, vma->vm_mm,
+ GFP_KERNEL, false);
+ ClearPageSwapCache(page);
+ if (err)
+ goto out_page;
+
lru_cache_add_anon(page);
swap_readpage(page, true);
}
@@ -3189,10 +3200,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
goto out_page;
}
- if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL, true)) {
- ret = VM_FAULT_OOM;
- goto out_page;
- }
cgroup_throttle_swaprate(page, GFP_KERNEL);
/*
diff --git a/mm/shmem.c b/mm/shmem.c
index d0306a36f42c..98547dc4642d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -623,13 +623,15 @@ static int shmem_add_to_page_cache(struct page *page,
page->mapping = mapping;
page->index = index;
- error = mem_cgroup_charge(page, charge_mm, gfp, PageSwapCache(page));
- if (error) {
- if (!PageSwapCache(page) && PageTransHuge(page)) {
- count_vm_event(THP_FILE_FALLBACK);
- count_vm_event(THP_FILE_FALLBACK_CHARGE);
+ if (!PageSwapCache(page)) {
+ error = mem_cgroup_charge(page, charge_mm, gfp, false);
+ if (error) {
+ if (PageTransHuge(page)) {
+ count_vm_event(THP_FILE_FALLBACK);
+ count_vm_event(THP_FILE_FALLBACK_CHARGE);
+ }
+ goto error;
}
- goto error;
}
cgroup_throttle_swaprate(page, gfp);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 558e224138d1..4052c011391d 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -360,12 +360,13 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
struct vm_area_struct *vma, unsigned long addr,
bool *new_page_allocated)
{
- struct page *found_page = NULL, *new_page = NULL;
struct swap_info_struct *si;
- int err;
+ struct page *page;
+
*new_page_allocated = false;
- do {
+ for (;;) {
+ int err;
/*
* First check the swap cache. Since this is normally
* called after lookup_swap_cache() failed, re-calling
@@ -373,12 +374,12 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
*/
si = get_swap_device(entry);
if (!si)
- break;
- found_page = find_get_page(swap_address_space(entry),
- swp_offset(entry));
+ return NULL;
+ page = find_get_page(swap_address_space(entry),
+ swp_offset(entry));
put_swap_device(si);
- if (found_page)
- break;
+ if (page)
+ return page;
/*
* Just skip read ahead for unused swap slot.
@@ -389,21 +390,15 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
* else swap_off will be aborted if we return NULL.
*/
if (!__swp_swapcount(entry) && swap_slot_cache_enabled)
- break;
-
- /*
- * Get a new page to read into from swap.
- */
- if (!new_page) {
- new_page = alloc_page_vma(gfp_mask, vma, addr);
- if (!new_page)
- break; /* Out of memory */
- }
+ return NULL;
/*
* Swap entry may have been freed since our caller observed it.
*/
err = swapcache_prepare(entry);
+ if (!err)
+ break;
+
if (err == -EEXIST) {
/*
* We might race against get_swap_page() and stumble
@@ -412,31 +407,43 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
*/
cond_resched();
continue;
- } else if (err) /* swp entry is obsolete ? */
- break;
-
- /* May fail (-ENOMEM) if XArray node allocation failed. */
- __SetPageLocked(new_page);
- __SetPageSwapBacked(new_page);
- err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL);
- if (likely(!err)) {
- /* Initiate read into locked page */
- SetPageWorkingset(new_page);
- lru_cache_add_anon(new_page);
- *new_page_allocated = true;
- return new_page;
}
- __ClearPageLocked(new_page);
- /*
- * add_to_swap_cache() doesn't return -EEXIST, so we can safely
- * clear SWAP_HAS_CACHE flag.
- */
- put_swap_page(new_page, entry);
- } while (err != -ENOMEM);
- if (new_page)
- put_page(new_page);
- return found_page;
+ return NULL;
+ }
+
+ /*
+ * The swap entry is ours to swap in. Prepare a new page.
+ */
+
+ page = alloc_page_vma(gfp_mask, vma, addr);
+ if (!page)
+ goto fail_free;
+
+ __SetPageLocked(page);
+ __SetPageSwapBacked(page);
+
+ /* May fail (-ENOMEM) if XArray node allocation failed. */
+ if (add_to_swap_cache(page, entry, gfp_mask & GFP_KERNEL))
+ goto fail_unlock;
+
+ if (mem_cgroup_charge(page, NULL, gfp_mask & GFP_KERNEL, false))
+ goto fail_delete;
+
+ /* Initiate read into locked page */
+ SetPageWorkingset(page);
+ lru_cache_add_anon(page);
+ *new_page_allocated = true;
+ return page;
+
+fail_delete:
+ delete_from_swap_cache(page);
+fail_unlock:
+ unlock_page(page);
+ put_page(page);
+fail_free:
+ swap_free(entry);
+ return NULL;
}
/*
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 8c9b6767013b..3bc7acc68ba8 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1867,11 +1867,6 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
if (unlikely(!page))
return -ENOMEM;
- if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL, true)) {
- ret = -ENOMEM;
- goto out_nolock;
- }
-
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
if (unlikely(!pte_same_as_swp(*pte, swp_entry_to_pte(entry)))) {
ret = 0;
@@ -1897,7 +1892,6 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
activate_page(page);
out:
pte_unmap_unlock(pte, ptl);
-out_nolock:
if (page != swapcache) {
unlock_page(page);
put_page(page);
--
2.26.2
^ permalink raw reply related [flat|nested] 58+ messages in thread[parent not found: <20200508183105.225460-17-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>]
* Re: [PATCH 16/19] mm: memcontrol: charge swapin pages on instantiation
2020-05-08 18:31 ` Johannes Weiner
@ 2020-06-11 9:35 ` Michal Hocko
-1 siblings, 0 replies; 58+ messages in thread
From: Michal Hocko @ 2020-06-11 9:35 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins,
Kirill A. Shutemov, Roman Gushchin,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
On Fri 08-05-20 14:31:03, Johannes Weiner wrote:
[...]
> diff --git a/mm/memory.c b/mm/memory.c
> index 832ee914cbcf..93900b121b6e 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3125,9 +3125,20 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma,
> vmf->address);
> if (page) {
> + int err;
> +
> __SetPageLocked(page);
> __SetPageSwapBacked(page);
> set_page_private(page, entry.val);
> +
> + /* Tell memcg to use swap ownership records */
> + SetPageSwapCache(page);
> + err = mem_cgroup_charge(page, vma->vm_mm,
> + GFP_KERNEL, false);
> + ClearPageSwapCache(page);
> + if (err)
> + goto out_page;
err would be a return value from try_charge and that can be -ENOMEM. Now
we almost never return ENOMEM for GFP_KERNEL single page charge. Except
for async OOM handling (oom_disabled v1). So this needs translation to
VM_FAULT_OOM.
I am not an expert on the swap code so I might have missed some subtle
issues but the rest of the patch seems reasonable to me.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 58+ messages in thread* Re: [PATCH 16/19] mm: memcontrol: charge swapin pages on instantiation
@ 2020-06-11 9:35 ` Michal Hocko
0 siblings, 0 replies; 58+ messages in thread
From: Michal Hocko @ 2020-06-11 9:35 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins,
Kirill A. Shutemov, Roman Gushchin, linux-mm, cgroups,
linux-kernel, kernel-team
On Fri 08-05-20 14:31:03, Johannes Weiner wrote:
[...]
> diff --git a/mm/memory.c b/mm/memory.c
> index 832ee914cbcf..93900b121b6e 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3125,9 +3125,20 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma,
> vmf->address);
> if (page) {
> + int err;
> +
> __SetPageLocked(page);
> __SetPageSwapBacked(page);
> set_page_private(page, entry.val);
> +
> + /* Tell memcg to use swap ownership records */
> + SetPageSwapCache(page);
> + err = mem_cgroup_charge(page, vma->vm_mm,
> + GFP_KERNEL, false);
> + ClearPageSwapCache(page);
> + if (err)
> + goto out_page;
err would be a return value from try_charge and that can be -ENOMEM. Now
we almost never return ENOMEM for GFP_KERNEL single page charge. Except
for async OOM handling (oom_disabled v1). So this needs translation to
VM_FAULT_OOM.
I am not an expert on the swap code so I might have missed some subtle
issues but the rest of the patch seems reasonable to me.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 58+ messages in thread[parent not found: <20200611093523.GB20450-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>]
* [PATCH for 5.8] mm: do_swap_page fix up the error code instantiation
2020-06-11 9:35 ` Michal Hocko
@ 2020-06-17 8:49 ` Michal Hocko
-1 siblings, 0 replies; 58+ messages in thread
From: Michal Hocko @ 2020-06-17 8:49 UTC (permalink / raw)
To: Andrew Morton, Johannes Weiner
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins,
Kirill A. Shutemov, Roman Gushchin,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
I hope I haven't missed anything but the patch should be the following.
From acd488c22b4bb2ee42526be8ca67145d5127b014 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>
Date: Wed, 17 Jun 2020 10:40:47 +0200
Subject: [PATCH] mm: do_swap_page fix up the error code
do_swap_page returns error codes from the VM_FAULT* space. try_charge
might return -ENOMEM, though, and then do_swap_page simply returns 0
which means a success.
We almost never return ENOMEM for GFP_KERNEL single page charge. Except
for async OOM handling (oom_disabled v1). So this needs translation to
VM_FAULT_OOM otherwise the the page fault path will not notify the
userspace and wait for an action.
Fixes: 4c6355b25e8b ("mm: memcontrol: charge swapin pages on instantiation")
Signed-off-by: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>
---
mm/memory.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/memory.c b/mm/memory.c
index dc7f3543b1fd..d944b7946b27 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3140,8 +3140,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
err = mem_cgroup_charge(page, vma->vm_mm,
GFP_KERNEL);
ClearPageSwapCache(page);
- if (err)
+ if (err) {
+ err = VM_FAULT_OOM;
goto out_page;
+ }
lru_cache_add(page);
swap_readpage(page, true);
--
2.26.2
--
Michal Hocko
SUSE Labs
^ permalink raw reply related [flat|nested] 58+ messages in thread* [PATCH for 5.8] mm: do_swap_page fix up the error code instantiation
@ 2020-06-17 8:49 ` Michal Hocko
0 siblings, 0 replies; 58+ messages in thread
From: Michal Hocko @ 2020-06-17 8:49 UTC (permalink / raw)
To: Andrew Morton, Johannes Weiner
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins,
Kirill A. Shutemov, Roman Gushchin, linux-mm, cgroups,
linux-kernel, kernel-team
I hope I haven't missed anything but the patch should be the following.
From acd488c22b4bb2ee42526be8ca67145d5127b014 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Wed, 17 Jun 2020 10:40:47 +0200
Subject: [PATCH] mm: do_swap_page fix up the error code
do_swap_page returns error codes from the VM_FAULT* space. try_charge
might return -ENOMEM, though, and then do_swap_page simply returns 0
which means a success.
We almost never return ENOMEM for GFP_KERNEL single page charge. Except
for async OOM handling (oom_disabled v1). So this needs translation to
VM_FAULT_OOM otherwise the the page fault path will not notify the
userspace and wait for an action.
Fixes: 4c6355b25e8b ("mm: memcontrol: charge swapin pages on instantiation")
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
mm/memory.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/memory.c b/mm/memory.c
index dc7f3543b1fd..d944b7946b27 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3140,8 +3140,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
err = mem_cgroup_charge(page, vma->vm_mm,
GFP_KERNEL);
ClearPageSwapCache(page);
- if (err)
+ if (err) {
+ err = VM_FAULT_OOM;
goto out_page;
+ }
lru_cache_add(page);
swap_readpage(page, true);
--
2.26.2
--
Michal Hocko
SUSE Labs
^ permalink raw reply related [flat|nested] 58+ messages in thread* Re: [PATCH for 5.8] mm: do_swap_page fix up the error code instantiation
2020-06-17 8:49 ` Michal Hocko
(?)
@ 2020-06-17 9:02 ` Michal Hocko
[not found] ` <20200617090238.GL9499-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
-1 siblings, 1 reply; 58+ messages in thread
From: Michal Hocko @ 2020-06-17 9:02 UTC (permalink / raw)
To: Andrew Morton, Johannes Weiner
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins,
Kirill A. Shutemov, Roman Gushchin, linux-mm, cgroups,
linux-kernel, kernel-team
Damn, I forgot to commit my last change (s@err@ret@). Sorry about the
noise.
From 50297dd026ebf71fe901e1945a9ce1e8d8aa083b Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Wed, 17 Jun 2020 10:40:47 +0200
Subject: [PATCH] mm: do_swap_page fix up the error code
do_swap_page returns error codes from the VM_FAULT* space. try_charge
might return -ENOMEM, though, and then do_swap_page simply returns 0
which means a success.
We almost never return ENOMEM for GFP_KERNEL single page charge. Except
for async OOM handling (oom_disabled v1). So this needs translation to
VM_FAULT_OOM otherwise the the page fault path will not notify the
userspace and wait for an action.
Fixes: 4c6355b25e8b ("mm: memcontrol: charge swapin pages on instantiation")
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
mm/memory.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/memory.c b/mm/memory.c
index dc7f3543b1fd..1c632faa2611 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3140,8 +3140,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
err = mem_cgroup_charge(page, vma->vm_mm,
GFP_KERNEL);
ClearPageSwapCache(page);
- if (err)
+ if (err) {
+ ret = VM_FAULT_OOM;
goto out_page;
+ }
lru_cache_add(page);
swap_readpage(page, true);
--
2.26.2
--
Michal Hocko
SUSE Labs
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 17/19] mm: memcontrol: document the new swap control behavior
2020-05-08 18:30 [PATCH 00/19 V2] mm: memcontrol: charge swapin pages on instantiation Johannes Weiner
@ 2020-05-08 18:31 ` Johannes Weiner
2020-05-08 18:30 ` [PATCH 04/19] mm: memcontrol: move out cgroup swaprate throttling Johannes Weiner
` (9 subsequent siblings)
10 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2020-05-08 18:31 UTC (permalink / raw)
To: Andrew Morton
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins, Michal Hocko,
Kirill A. Shutemov, Roman Gushchin,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
From: Alex Shi <alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
Signed-off-by: Alex Shi <alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
---
.../admin-guide/cgroup-v1/memory.rst | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
index 0ae4f564c2d6..12757e63b26c 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -199,11 +199,11 @@ An RSS page is unaccounted when it's fully unmapped. A PageCache page is
unaccounted when it's removed from radix-tree. Even if RSS pages are fully
unmapped (by kswapd), they may exist as SwapCache in the system until they
are really freed. Such SwapCaches are also accounted.
-A swapped-in page is not accounted until it's mapped.
+A swapped-in page is accounted after adding into swapcache.
Note: The kernel does swapin-readahead and reads multiple swaps at once.
-This means swapped-in pages may contain pages for other tasks than a task
-causing page fault. So, we avoid accounting at swap-in I/O.
+Since page's memcg recorded into swap whatever memsw enabled, the page will
+be accounted after swapin.
At page migration, accounting information is kept.
@@ -222,18 +222,13 @@ the cgroup that brought it in -- this will happen on memory pressure).
But see section 8.2: when moving a task to another cgroup, its pages may
be recharged to the new cgroup, if move_charge_at_immigrate has been chosen.
-Exception: If CONFIG_MEMCG_SWAP is not used.
-When you do swapoff and make swapped-out pages of shmem(tmpfs) to
-be backed into memory in force, charges for pages are accounted against the
-caller of swapoff rather than the users of shmem.
-
-2.4 Swap Extension (CONFIG_MEMCG_SWAP)
+2.4 Swap Extension
--------------------------------------
-Swap Extension allows you to record charge for swap. A swapped-in page is
-charged back to original page allocator if possible.
+Swap usage is always recorded for each of cgroup. Swap Extension allows you to
+read and limit it.
-When swap is accounted, following files are added.
+When CONFIG_SWAP is enabled, following files are added.
- memory.memsw.usage_in_bytes.
- memory.memsw.limit_in_bytes.
--
2.26.2
^ permalink raw reply related [flat|nested] 58+ messages in thread* [PATCH 17/19] mm: memcontrol: document the new swap control behavior
@ 2020-05-08 18:31 ` Johannes Weiner
0 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2020-05-08 18:31 UTC (permalink / raw)
To: Andrew Morton
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins, Michal Hocko,
Kirill A. Shutemov, Roman Gushchin, linux-mm, cgroups,
linux-kernel, kernel-team
From: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
.../admin-guide/cgroup-v1/memory.rst | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
index 0ae4f564c2d6..12757e63b26c 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -199,11 +199,11 @@ An RSS page is unaccounted when it's fully unmapped. A PageCache page is
unaccounted when it's removed from radix-tree. Even if RSS pages are fully
unmapped (by kswapd), they may exist as SwapCache in the system until they
are really freed. Such SwapCaches are also accounted.
-A swapped-in page is not accounted until it's mapped.
+A swapped-in page is accounted after adding into swapcache.
Note: The kernel does swapin-readahead and reads multiple swaps at once.
-This means swapped-in pages may contain pages for other tasks than a task
-causing page fault. So, we avoid accounting at swap-in I/O.
+Since page's memcg recorded into swap whatever memsw enabled, the page will
+be accounted after swapin.
At page migration, accounting information is kept.
@@ -222,18 +222,13 @@ the cgroup that brought it in -- this will happen on memory pressure).
But see section 8.2: when moving a task to another cgroup, its pages may
be recharged to the new cgroup, if move_charge_at_immigrate has been chosen.
-Exception: If CONFIG_MEMCG_SWAP is not used.
-When you do swapoff and make swapped-out pages of shmem(tmpfs) to
-be backed into memory in force, charges for pages are accounted against the
-caller of swapoff rather than the users of shmem.
-
-2.4 Swap Extension (CONFIG_MEMCG_SWAP)
+2.4 Swap Extension
--------------------------------------
-Swap Extension allows you to record charge for swap. A swapped-in page is
-charged back to original page allocator if possible.
+Swap usage is always recorded for each of cgroup. Swap Extension allows you to
+read and limit it.
-When swap is accounted, following files are added.
+When CONFIG_SWAP is enabled, following files are added.
- memory.memsw.usage_in_bytes.
- memory.memsw.limit_in_bytes.
--
2.26.2
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [PATCH 19/19] mm: memcontrol: update page->mem_cgroup stability rules
2020-05-08 18:30 [PATCH 00/19 V2] mm: memcontrol: charge swapin pages on instantiation Johannes Weiner
@ 2020-05-08 18:31 ` Johannes Weiner
2020-05-08 18:30 ` [PATCH 04/19] mm: memcontrol: move out cgroup swaprate throttling Johannes Weiner
` (9 subsequent siblings)
10 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2020-05-08 18:31 UTC (permalink / raw)
To: Andrew Morton
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins, Michal Hocko,
Kirill A. Shutemov, Roman Gushchin,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
The previous patches have simplified the access rules around
page->mem_cgroup somewhat:
1. We never change page->mem_cgroup while the page is isolated by
somebody else. This was by far the biggest exception to our rules
and it didn't stop at lock_page() or lock_page_memcg().
2. We charge pages before they get put into page tables now, so the
somewhat fishy rule about "can be in page table as long as it's
still locked" is now gone and boiled down to having an exclusive
reference to the page.
Document the new rules. Any of the following will stabilize the
page->mem_cgroup association:
- the page lock
- LRU isolation
- lock_page_memcg()
- exclusive access to the page
Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Reviewed-by: Alex Shi <alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim-Hm3cg6mZ9cc@public.gmane.org>
---
mm/memcontrol.c | 21 +++++++--------------
1 file changed, 7 insertions(+), 14 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 491fdeec0ce4..865440e8438e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1201,9 +1201,8 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
* @page: the page
* @pgdat: pgdat of the page
*
- * This function is only safe when following the LRU page isolation
- * and putback protocol: the LRU lock must be held, and the page must
- * either be PageLRU() or the caller must have isolated/allocated it.
+ * This function relies on page->mem_cgroup being stable - see the
+ * access rules in commit_charge().
*/
struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgdat)
{
@@ -2605,18 +2604,12 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg)
{
VM_BUG_ON_PAGE(page->mem_cgroup, page);
/*
- * Nobody should be changing or seriously looking at
- * page->mem_cgroup at this point:
- *
- * - the page is uncharged
- *
- * - the page is off-LRU
- *
- * - an anonymous fault has exclusive page access, except for
- * a locked page table
+ * Any of the following ensures page->mem_cgroup stability:
*
- * - a page cache insertion, a swapin fault, or a migration
- * have the page locked
+ * - the page lock
+ * - LRU isolation
+ * - lock_page_memcg()
+ * - exclusive reference
*/
page->mem_cgroup = memcg;
}
--
2.26.2
^ permalink raw reply related [flat|nested] 58+ messages in thread* [PATCH 19/19] mm: memcontrol: update page->mem_cgroup stability rules
@ 2020-05-08 18:31 ` Johannes Weiner
0 siblings, 0 replies; 58+ messages in thread
From: Johannes Weiner @ 2020-05-08 18:31 UTC (permalink / raw)
To: Andrew Morton
Cc: Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins, Michal Hocko,
Kirill A. Shutemov, Roman Gushchin, linux-mm, cgroups,
linux-kernel, kernel-team
The previous patches have simplified the access rules around
page->mem_cgroup somewhat:
1. We never change page->mem_cgroup while the page is isolated by
somebody else. This was by far the biggest exception to our rules
and it didn't stop at lock_page() or lock_page_memcg().
2. We charge pages before they get put into page tables now, so the
somewhat fishy rule about "can be in page table as long as it's
still locked" is now gone and boiled down to having an exclusive
reference to the page.
Document the new rules. Any of the following will stabilize the
page->mem_cgroup association:
- the page lock
- LRU isolation
- lock_page_memcg()
- exclusive access to the page
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
mm/memcontrol.c | 21 +++++++--------------
1 file changed, 7 insertions(+), 14 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 491fdeec0ce4..865440e8438e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1201,9 +1201,8 @@ int mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
* @page: the page
* @pgdat: pgdat of the page
*
- * This function is only safe when following the LRU page isolation
- * and putback protocol: the LRU lock must be held, and the page must
- * either be PageLRU() or the caller must have isolated/allocated it.
+ * This function relies on page->mem_cgroup being stable - see the
+ * access rules in commit_charge().
*/
struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgdat)
{
@@ -2605,18 +2604,12 @@ static void commit_charge(struct page *page, struct mem_cgroup *memcg)
{
VM_BUG_ON_PAGE(page->mem_cgroup, page);
/*
- * Nobody should be changing or seriously looking at
- * page->mem_cgroup at this point:
- *
- * - the page is uncharged
- *
- * - the page is off-LRU
- *
- * - an anonymous fault has exclusive page access, except for
- * a locked page table
+ * Any of the following ensures page->mem_cgroup stability:
*
- * - a page cache insertion, a swapin fault, or a migration
- * have the page locked
+ * - the page lock
+ * - LRU isolation
+ * - lock_page_memcg()
+ * - exclusive reference
*/
page->mem_cgroup = memcg;
}
--
2.26.2
^ permalink raw reply related [flat|nested] 58+ messages in thread[parent not found: <20200508183105.225460-20-hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>]
* Re: [PATCH 19/19] mm: memcontrol: update page->mem_cgroup stability rules
2020-05-08 18:31 ` Johannes Weiner
@ 2020-06-11 9:40 ` Michal Hocko
-1 siblings, 0 replies; 58+ messages in thread
From: Michal Hocko @ 2020-06-11 9:40 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins,
Kirill A. Shutemov, Roman Gushchin,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, cgroups-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg
On Fri 08-05-20 14:31:06, Johannes Weiner wrote:
> The previous patches have simplified the access rules around
> page->mem_cgroup somewhat:
>
> 1. We never change page->mem_cgroup while the page is isolated by
> somebody else. This was by far the biggest exception to our rules
> and it didn't stop at lock_page() or lock_page_memcg().
>
> 2. We charge pages before they get put into page tables now, so the
> somewhat fishy rule about "can be in page table as long as it's
> still locked" is now gone and boiled down to having an exclusive
> reference to the page.
>
> Document the new rules. Any of the following will stabilize the
> page->mem_cgroup association:
>
> - the page lock
> - LRU isolation
> - lock_page_memcg()
> - exclusive access to the page
>
> Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> Reviewed-by: Alex Shi <alex.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>
> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim-Hm3cg6mZ9cc@public.gmane.org>
Thanks a lot this is a big improvement and simplification.
I have gone through the whole series finally. I have followed up where
necessary but overall this is really nice!
Sorry I couldn't jump in to review in time.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [PATCH 19/19] mm: memcontrol: update page->mem_cgroup stability rules
@ 2020-06-11 9:40 ` Michal Hocko
0 siblings, 0 replies; 58+ messages in thread
From: Michal Hocko @ 2020-06-11 9:40 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Alex Shi, Joonsoo Kim, Shakeel Butt, Hugh Dickins,
Kirill A. Shutemov, Roman Gushchin, linux-mm, cgroups,
linux-kernel, kernel-team
On Fri 08-05-20 14:31:06, Johannes Weiner wrote:
> The previous patches have simplified the access rules around
> page->mem_cgroup somewhat:
>
> 1. We never change page->mem_cgroup while the page is isolated by
> somebody else. This was by far the biggest exception to our rules
> and it didn't stop at lock_page() or lock_page_memcg().
>
> 2. We charge pages before they get put into page tables now, so the
> somewhat fishy rule about "can be in page table as long as it's
> still locked" is now gone and boiled down to having an exclusive
> reference to the page.
>
> Document the new rules. Any of the following will stabilize the
> page->mem_cgroup association:
>
> - the page lock
> - LRU isolation
> - lock_page_memcg()
> - exclusive access to the page
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Thanks a lot this is a big improvement and simplification.
I have gone through the whole series finally. I have followed up where
necessary but overall this is really nice!
Sorry I couldn't jump in to review in time.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 58+ messages in thread