[RFC PATCH 0/3] memcg-v1: fully deprecate charge moving

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/3] memcg-v1: fully deprecate charge moving
@ 2024-10-24  6:57 ` Shakeel Butt
  2024-10-24  6:57   ` [RFC PATCH 1/3] memcg-v1: fully deprecate move_charge_at_immigrate Shakeel Butt
                     ` (2 more replies)
  0 siblings, 3 replies; 56+ messages in thread
From: Shakeel Butt @ 2024-10-24  6:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, linux-mm, cgroups, linux-kernel, linux-fsdevel,
	linux-doc, Meta kernel team

The memcg v1's charge moving feature has been deprecated for almost 2
years and the kernel warns if someone try to use it. This warning has
been backported to all stable kernel and there have not been any report
of the warning or the request to support this feature anymore. Let's
proceed to fully deprecate this feature.

Shakeel Butt (3):
  memcg-v1: fully deprecate move_charge_at_immigrate
  memcg-v1: remove charge move code
  memcg-v1: remove memcg move locking code

 .../admin-guide/cgroup-v1/memory.rst          |  82 +-
 fs/buffer.c                                   |   5 -
 include/linux/memcontrol.h                    |  59 --
 mm/filemap.c                                  |   1 -
 mm/memcontrol-v1.c                            | 960 +-----------------
 mm/memcontrol-v1.h                            |   6 -
 mm/memcontrol.c                               |  14 -
 mm/page-writeback.c                           |  21 +-
 mm/rmap.c                                     |   1 -
 mm/vmscan.c                                   |  11 -
 10 files changed, 8 insertions(+), 1152 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [RFC PATCH 1/3] memcg-v1: fully deprecate move_charge_at_immigrate
  2024-10-24  6:57 ` [RFC PATCH 0/3] " Shakeel Butt
@ 2024-10-24  6:57   ` Shakeel Butt
  2024-10-24  9:14     ` Michal Hocko
  2024-10-24 16:49     ` Roman Gushchin
  2024-10-24  6:57   ` [RFC PATCH 2/3] memcg-v1: remove charge move code Shakeel Butt
  2024-10-24  6:57   ` [RFC PATCH 3/3] memcg-v1: remove memcg move locking code Shakeel Butt
  2 siblings, 2 replies; 56+ messages in thread
From: Shakeel Butt @ 2024-10-24  6:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, linux-mm, cgroups, linux-kernel, linux-fsdevel,
	linux-doc, Meta kernel team

Proceed with the complete deprecation of memcg v1's charge moving
feature. The deprecation warning has been in the kernel for almost two
years and has been ported to all stable kernel since. Now is the time to
fully deprecate this feature.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 .../admin-guide/cgroup-v1/memory.rst          | 82 +------------------
 mm/memcontrol-v1.c                            | 14 +---
 2 files changed, 5 insertions(+), 91 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
index 270501db9f4e..286d16fc22eb 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -90,9 +90,7 @@ Brief summary of control files.
                                      used.
  memory.swappiness		     set/show swappiness parameter of vmscan
 				     (See sysctl's vm.swappiness)
- memory.move_charge_at_immigrate     set/show controls of moving charges
-                                     This knob is deprecated and shouldn't be
-                                     used.
+ memory.move_charge_at_immigrate     This knob is deprecated.
  memory.oom_control		     set/show oom controls.
                                      This knob is deprecated and shouldn't be
                                      used.
@@ -243,10 +241,6 @@ behind this approach is that a cgroup that aggressively uses a shared
 page will eventually get charged for it (once it is uncharged from
 the cgroup that brought it in -- this will happen on memory pressure).
 
-But see :ref:`section 8.2 <cgroup-v1-memory-movable-charges>` when moving a
-task to another cgroup, its pages may be recharged to the new cgroup, if
-move_charge_at_immigrate has been chosen.
-
 2.4 Swap Extension
 --------------------------------------
 
@@ -756,78 +750,8 @@ If we want to change this to 1G, we can at any time use::
 
 THIS IS DEPRECATED!
 
-It's expensive and unreliable! It's better practice to launch workload
-tasks directly from inside their target cgroup. Use dedicated workload
-cgroups to allow fine-grained policy adjustments without having to
-move physical pages between control domains.
-
-Users can move charges associated with a task along with task migration, that
-is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
-This feature is not supported in !CONFIG_MMU environments because of lack of
-page tables.
-
-8.1 Interface
--------------
-
-This feature is disabled by default. It can be enabled (and disabled again) by
-writing to memory.move_charge_at_immigrate of the destination cgroup.
-
-If you want to enable it::
-
-	# echo (some positive value) > memory.move_charge_at_immigrate
-
-.. note::
-      Each bits of move_charge_at_immigrate has its own meaning about what type
-      of charges should be moved. See :ref:`section 8.2
-      <cgroup-v1-memory-movable-charges>` for details.
-
-.. note::
-      Charges are moved only when you move mm->owner, in other words,
-      a leader of a thread group.
-
-.. note::
-      If we cannot find enough space for the task in the destination cgroup, we
-      try to make space by reclaiming memory. Task migration may fail if we
-      cannot make enough space.
-
-.. note::
-      It can take several seconds if you move charges much.
-
-And if you want disable it again::
-
-	# echo 0 > memory.move_charge_at_immigrate
-
-.. _cgroup-v1-memory-movable-charges:
-
-8.2 Type of charges which can be moved
---------------------------------------
-
-Each bit in move_charge_at_immigrate has its own meaning about what type of
-charges should be moved. But in any case, it must be noted that an account of
-a page or a swap can be moved only when it is charged to the task's current
-(old) memory cgroup.
-
-+---+--------------------------------------------------------------------------+
-|bit| what type of charges would be moved ?                                    |
-+===+==========================================================================+
-| 0 | A charge of an anonymous page (or swap of it) used by the target task.   |
-|   | You must enable Swap Extension (see 2.4) to enable move of swap charges. |
-+---+--------------------------------------------------------------------------+
-| 1 | A charge of file pages (normal file, tmpfs file (e.g. ipc shared memory) |
-|   | and swaps of tmpfs file) mmapped by the target task. Unlike the case of  |
-|   | anonymous pages, file pages (and swaps) in the range mmapped by the task |
-|   | will be moved even if the task hasn't done page fault, i.e. they might   |
-|   | not be the task's "RSS", but other task's "RSS" that maps the same file. |
-|   | The mapcount of the page is ignored (the page can be moved independent   |
-|   | of the mapcount). You must enable Swap Extension (see 2.4) to            |
-|   | enable move of swap charges.                                             |
-+---+--------------------------------------------------------------------------+
-
-8.3 TODO
---------
-
-- All of moving charge operations are done under cgroup_mutex. It's not good
-  behavior to hold the mutex too long, so we may need some trick.
+Reading memory.move_charge_at_immigrate will always return 0 and writing
+to it will always return -EINVAL.
 
 9. Memory thresholds
 ====================
diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
index 81d8819f13cd..8f88540f0159 100644
--- a/mm/memcontrol-v1.c
+++ b/mm/memcontrol-v1.c
@@ -593,7 +593,7 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
 static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css,
 				struct cftype *cft)
 {
-	return mem_cgroup_from_css(css)->move_charge_at_immigrate;
+	return 0;
 }
 
 #ifdef CONFIG_MMU
@@ -606,17 +606,7 @@ static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css,
 		     "Please report your usecase to linux-mm@kvack.org if you "
 		     "depend on this functionality.\n");
 
-	if (val & ~MOVE_MASK)
-		return -EINVAL;
-
-	/*
-	 * No kind of locking is needed in here, because ->can_attach() will
-	 * check this value once in the beginning of the process, and then carry
-	 * on with stale data. This means that changes to this value will only
-	 * affect task migrations starting after the change.
-	 */
-	memcg->move_charge_at_immigrate = val;
-	return 0;
+	return -EINVAL;
 }
 #else
 static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css,
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC PATCH 2/3] memcg-v1: remove charge move code
  2024-10-24  6:57 ` [RFC PATCH 0/3] " Shakeel Butt
  2024-10-24  6:57   ` [RFC PATCH 1/3] memcg-v1: fully deprecate move_charge_at_immigrate Shakeel Butt
@ 2024-10-24  6:57   ` Shakeel Butt
  2024-10-24  9:14     ` Michal Hocko
  2024-10-24 16:50     ` Roman Gushchin
  2024-10-24  6:57   ` [RFC PATCH 3/3] memcg-v1: remove memcg move locking code Shakeel Butt
  2 siblings, 2 replies; 56+ messages in thread
From: Shakeel Butt @ 2024-10-24  6:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, linux-mm, cgroups, linux-kernel, linux-fsdevel,
	linux-doc, Meta kernel team

The memcg-v1 charge move feature has been deprecated completely and
let's remove the relevant code as well.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 include/linux/memcontrol.h |   5 -
 mm/memcontrol-v1.c         | 864 -------------------------------------
 mm/memcontrol-v1.h         |   6 -
 mm/memcontrol.c            |   9 -
 4 files changed, 884 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 524006313b0d..798db70b0a30 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -299,11 +299,6 @@ struct mem_cgroup {
 	/* For oom notifier event fd */
 	struct list_head oom_notify;
 
-	/*
-	 * Should we move charges of a task when a task is moved into this
-	 * mem_cgroup ? And what type of charges should we move ?
-	 */
-	unsigned long move_charge_at_immigrate;
 	/* taken only while moving_account > 0 */
 	spinlock_t move_lock;
 	unsigned long move_lock_flags;
diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
index 8f88540f0159..79339cb65b9d 100644
--- a/mm/memcontrol-v1.c
+++ b/mm/memcontrol-v1.c
@@ -40,31 +40,6 @@ static struct mem_cgroup_tree soft_limit_tree __read_mostly;
 #define	MEM_CGROUP_MAX_RECLAIM_LOOPS		100
 #define	MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS	2
 
-/* Stuffs for move charges at task migration. */
-/*
- * Types of charges to be moved.
- */
-#define MOVE_ANON	0x1ULL
-#define MOVE_FILE	0x2ULL
-#define MOVE_MASK	(MOVE_ANON | MOVE_FILE)
-
-/* "mc" and its members are protected by cgroup_mutex */
-static struct move_charge_struct {
-	spinlock_t	  lock; /* for from, to */
-	struct mm_struct  *mm;
-	struct mem_cgroup *from;
-	struct mem_cgroup *to;
-	unsigned long flags;
-	unsigned long precharge;
-	unsigned long moved_charge;
-	unsigned long moved_swap;
-	struct task_struct *moving_task;	/* a task moving charges */
-	wait_queue_head_t waitq;		/* a waitq for other context */
-} mc = {
-	.lock = __SPIN_LOCK_UNLOCKED(mc.lock),
-	.waitq = __WAIT_QUEUE_HEAD_INITIALIZER(mc.waitq),
-};
-
 /* for OOM */
 struct mem_cgroup_eventfd_list {
 	struct list_head list;
@@ -426,51 +401,6 @@ unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order,
 	return nr_reclaimed;
 }
 
-/*
- * A routine for checking "mem" is under move_account() or not.
- *
- * Checking a cgroup is mc.from or mc.to or under hierarchy of
- * moving cgroups. This is for waiting at high-memory pressure
- * caused by "move".
- */
-static bool mem_cgroup_under_move(struct mem_cgroup *memcg)
-{
-	struct mem_cgroup *from;
-	struct mem_cgroup *to;
-	bool ret = false;
-	/*
-	 * Unlike task_move routines, we access mc.to, mc.from not under
-	 * mutual exclusion by cgroup_mutex. Here, we take spinlock instead.
-	 */
-	spin_lock(&mc.lock);
-	from = mc.from;
-	to = mc.to;
-	if (!from)
-		goto unlock;
-
-	ret = mem_cgroup_is_descendant(from, memcg) ||
-		mem_cgroup_is_descendant(to, memcg);
-unlock:
-	spin_unlock(&mc.lock);
-	return ret;
-}
-
-bool memcg1_wait_acct_move(struct mem_cgroup *memcg)
-{
-	if (mc.moving_task && current != mc.moving_task) {
-		if (mem_cgroup_under_move(memcg)) {
-			DEFINE_WAIT(wait);
-			prepare_to_wait(&mc.waitq, &wait, TASK_INTERRUPTIBLE);
-			/* moving charge context might have finished. */
-			if (mc.moving_task)
-				schedule();
-			finish_wait(&mc.waitq, &wait);
-			return true;
-		}
-	}
-	return false;
-}
-
 /**
  * folio_memcg_lock - Bind a folio to its memcg.
  * @folio: The folio.
@@ -552,44 +482,6 @@ void folio_memcg_unlock(struct folio *folio)
 	__folio_memcg_unlock(folio_memcg(folio));
 }
 
-#ifdef CONFIG_SWAP
-/**
- * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record.
- * @entry: swap entry to be moved
- * @from:  mem_cgroup which the entry is moved from
- * @to:  mem_cgroup which the entry is moved to
- *
- * It succeeds only when the swap_cgroup's record for this entry is the same
- * as the mem_cgroup's id of @from.
- *
- * Returns 0 on success, -EINVAL on failure.
- *
- * The caller must have charged to @to, IOW, called page_counter_charge() about
- * both res and memsw, and called css_get().
- */
-static int mem_cgroup_move_swap_account(swp_entry_t entry,
-				struct mem_cgroup *from, struct mem_cgroup *to)
-{
-	unsigned short old_id, new_id;
-
-	old_id = mem_cgroup_id(from);
-	new_id = mem_cgroup_id(to);
-
-	if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) {
-		mod_memcg_state(from, MEMCG_SWAP, -1);
-		mod_memcg_state(to, MEMCG_SWAP, 1);
-		return 0;
-	}
-	return -EINVAL;
-}
-#else
-static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
-				struct mem_cgroup *from, struct mem_cgroup *to)
-{
-	return -EINVAL;
-}
-#endif
-
 static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css,
 				struct cftype *cft)
 {
@@ -600,8 +492,6 @@ static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css,
 static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css,
 				 struct cftype *cft, u64 val)
 {
-	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
-
 	pr_warn_once("Cgroup memory moving (move_charge_at_immigrate) is deprecated. "
 		     "Please report your usecase to linux-mm@kvack.org if you "
 		     "depend on this functionality.\n");
@@ -616,760 +506,6 @@ static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css,
 }
 #endif
 
-#ifdef CONFIG_MMU
-/* Handlers for move charge at task migration. */
-static int mem_cgroup_do_precharge(unsigned long count)
-{
-	int ret;
-
-	/* Try a single bulk charge without reclaim first, kswapd may wake */
-	ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count);
-	if (!ret) {
-		mc.precharge += count;
-		return ret;
-	}
-
-	/* Try charges one by one with reclaim, but do not retry */
-	while (count--) {
-		ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1);
-		if (ret)
-			return ret;
-		mc.precharge++;
-		cond_resched();
-	}
-	return 0;
-}
-
-union mc_target {
-	struct folio	*folio;
-	swp_entry_t	ent;
-};
-
-enum mc_target_type {
-	MC_TARGET_NONE = 0,
-	MC_TARGET_PAGE,
-	MC_TARGET_SWAP,
-	MC_TARGET_DEVICE,
-};
-
-static struct page *mc_handle_present_pte(struct vm_area_struct *vma,
-						unsigned long addr, pte_t ptent)
-{
-	struct page *page = vm_normal_page(vma, addr, ptent);
-
-	if (!page)
-		return NULL;
-	if (PageAnon(page)) {
-		if (!(mc.flags & MOVE_ANON))
-			return NULL;
-	} else {
-		if (!(mc.flags & MOVE_FILE))
-			return NULL;
-	}
-	get_page(page);
-
-	return page;
-}
-
-#if defined(CONFIG_SWAP) || defined(CONFIG_DEVICE_PRIVATE)
-static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
-			pte_t ptent, swp_entry_t *entry)
-{
-	struct page *page = NULL;
-	swp_entry_t ent = pte_to_swp_entry(ptent);
-
-	if (!(mc.flags & MOVE_ANON))
-		return NULL;
-
-	/*
-	 * Handle device private pages that are not accessible by the CPU, but
-	 * stored as special swap entries in the page table.
-	 */
-	if (is_device_private_entry(ent)) {
-		page = pfn_swap_entry_to_page(ent);
-		if (!get_page_unless_zero(page))
-			return NULL;
-		return page;
-	}
-
-	if (non_swap_entry(ent))
-		return NULL;
-
-	/*
-	 * Because swap_cache_get_folio() updates some statistics counter,
-	 * we call find_get_page() with swapper_space directly.
-	 */
-	page = find_get_page(swap_address_space(ent), swap_cache_index(ent));
-	entry->val = ent.val;
-
-	return page;
-}
-#else
-static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
-			pte_t ptent, swp_entry_t *entry)
-{
-	return NULL;
-}
-#endif
-
-static struct page *mc_handle_file_pte(struct vm_area_struct *vma,
-			unsigned long addr, pte_t ptent)
-{
-	unsigned long index;
-	struct folio *folio;
-
-	if (!vma->vm_file) /* anonymous vma */
-		return NULL;
-	if (!(mc.flags & MOVE_FILE))
-		return NULL;
-
-	/* folio is moved even if it's not RSS of this task(page-faulted). */
-	/* shmem/tmpfs may report page out on swap: account for that too. */
-	index = linear_page_index(vma, addr);
-	folio = filemap_get_incore_folio(vma->vm_file->f_mapping, index);
-	if (IS_ERR(folio))
-		return NULL;
-	return folio_file_page(folio, index);
-}
-
-static void memcg1_check_events(struct mem_cgroup *memcg, int nid);
-static void memcg1_charge_statistics(struct mem_cgroup *memcg, int nr_pages);
-
-/**
- * mem_cgroup_move_account - move account of the folio
- * @folio: The folio.
- * @compound: charge the page as compound or small page
- * @from: mem_cgroup which the folio is moved from.
- * @to:	mem_cgroup which the folio is moved to. @from != @to.
- *
- * The folio must be locked and not on the LRU.
- *
- * This function doesn't do "charge" to new cgroup and doesn't do "uncharge"
- * from old cgroup.
- */
-static int mem_cgroup_move_account(struct folio *folio,
-				   bool compound,
-				   struct mem_cgroup *from,
-				   struct mem_cgroup *to)
-{
-	struct lruvec *from_vec, *to_vec;
-	struct pglist_data *pgdat;
-	unsigned int nr_pages = compound ? folio_nr_pages(folio) : 1;
-	int nid, ret;
-
-	VM_BUG_ON(from == to);
-	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
-	VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
-	VM_BUG_ON(compound && !folio_test_large(folio));
-
-	ret = -EINVAL;
-	if (folio_memcg(folio) != from)
-		goto out;
-
-	pgdat = folio_pgdat(folio);
-	from_vec = mem_cgroup_lruvec(from, pgdat);
-	to_vec = mem_cgroup_lruvec(to, pgdat);
-
-	folio_memcg_lock(folio);
-
-	if (folio_test_anon(folio)) {
-		if (folio_mapped(folio)) {
-			__mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages);
-			__mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages);
-			if (folio_test_pmd_mappable(folio)) {
-				__mod_lruvec_state(from_vec, NR_ANON_THPS,
-						   -nr_pages);
-				__mod_lruvec_state(to_vec, NR_ANON_THPS,
-						   nr_pages);
-			}
-		}
-	} else {
-		__mod_lruvec_state(from_vec, NR_FILE_PAGES, -nr_pages);
-		__mod_lruvec_state(to_vec, NR_FILE_PAGES, nr_pages);
-
-		if (folio_test_swapbacked(folio)) {
-			__mod_lruvec_state(from_vec, NR_SHMEM, -nr_pages);
-			__mod_lruvec_state(to_vec, NR_SHMEM, nr_pages);
-		}
-
-		if (folio_mapped(folio)) {
-			__mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages);
-			__mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages);
-		}
-
-		if (folio_test_dirty(folio)) {
-			struct address_space *mapping = folio_mapping(folio);
-
-			if (mapping_can_writeback(mapping)) {
-				__mod_lruvec_state(from_vec, NR_FILE_DIRTY,
-						   -nr_pages);
-				__mod_lruvec_state(to_vec, NR_FILE_DIRTY,
-						   nr_pages);
-			}
-		}
-	}
-
-#ifdef CONFIG_SWAP
-	if (folio_test_swapcache(folio)) {
-		__mod_lruvec_state(from_vec, NR_SWAPCACHE, -nr_pages);
-		__mod_lruvec_state(to_vec, NR_SWAPCACHE, nr_pages);
-	}
-#endif
-	if (folio_test_writeback(folio)) {
-		__mod_lruvec_state(from_vec, NR_WRITEBACK, -nr_pages);
-		__mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages);
-	}
-
-	/*
-	 * All state has been migrated, let's switch to the new memcg.
-	 *
-	 * It is safe to change page's memcg here because the page
-	 * is referenced, charged, isolated, and locked: we can't race
-	 * with (un)charging, migration, LRU putback, or anything else
-	 * that would rely on a stable page's memory cgroup.
-	 *
-	 * Note that folio_memcg_lock is a memcg lock, not a page lock,
-	 * to save space. As soon as we switch page's memory cgroup to a
-	 * new memcg that isn't locked, the above state can change
-	 * concurrently again. Make sure we're truly done with it.
-	 */
-	smp_mb();
-
-	css_get(&to->css);
-	css_put(&from->css);
-
-	folio->memcg_data = (unsigned long)to;
-
-	__folio_memcg_unlock(from);
-
-	ret = 0;
-	nid = folio_nid(folio);
-
-	local_irq_disable();
-	memcg1_charge_statistics(to, nr_pages);
-	memcg1_check_events(to, nid);
-	memcg1_charge_statistics(from, -nr_pages);
-	memcg1_check_events(from, nid);
-	local_irq_enable();
-out:
-	return ret;
-}
-
-/**
- * get_mctgt_type - get target type of moving charge
- * @vma: the vma the pte to be checked belongs
- * @addr: the address corresponding to the pte to be checked
- * @ptent: the pte to be checked
- * @target: the pointer the target page or swap ent will be stored(can be NULL)
- *
- * Context: Called with pte lock held.
- * Return:
- * * MC_TARGET_NONE - If the pte is not a target for move charge.
- * * MC_TARGET_PAGE - If the page corresponding to this pte is a target for
- *   move charge. If @target is not NULL, the folio is stored in target->folio
- *   with extra refcnt taken (Caller should release it).
- * * MC_TARGET_SWAP - If the swap entry corresponding to this pte is a
- *   target for charge migration.  If @target is not NULL, the entry is
- *   stored in target->ent.
- * * MC_TARGET_DEVICE - Like MC_TARGET_PAGE but page is device memory and
- *   thus not on the lru.  For now such page is charged like a regular page
- *   would be as it is just special memory taking the place of a regular page.
- *   See Documentations/vm/hmm.txt and include/linux/hmm.h
- */
-static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
-		unsigned long addr, pte_t ptent, union mc_target *target)
-{
-	struct page *page = NULL;
-	struct folio *folio;
-	enum mc_target_type ret = MC_TARGET_NONE;
-	swp_entry_t ent = { .val = 0 };
-
-	if (pte_present(ptent))
-		page = mc_handle_present_pte(vma, addr, ptent);
-	else if (pte_none_mostly(ptent))
-		/*
-		 * PTE markers should be treated as a none pte here, separated
-		 * from other swap handling below.
-		 */
-		page = mc_handle_file_pte(vma, addr, ptent);
-	else if (is_swap_pte(ptent))
-		page = mc_handle_swap_pte(vma, ptent, &ent);
-
-	if (page)
-		folio = page_folio(page);
-	if (target && page) {
-		if (!folio_trylock(folio)) {
-			folio_put(folio);
-			return ret;
-		}
-		/*
-		 * page_mapped() must be stable during the move. This
-		 * pte is locked, so if it's present, the page cannot
-		 * become unmapped. If it isn't, we have only partial
-		 * control over the mapped state: the page lock will
-		 * prevent new faults against pagecache and swapcache,
-		 * so an unmapped page cannot become mapped. However,
-		 * if the page is already mapped elsewhere, it can
-		 * unmap, and there is nothing we can do about it.
-		 * Alas, skip moving the page in this case.
-		 */
-		if (!pte_present(ptent) && page_mapped(page)) {
-			folio_unlock(folio);
-			folio_put(folio);
-			return ret;
-		}
-	}
-
-	if (!page && !ent.val)
-		return ret;
-	if (page) {
-		/*
-		 * Do only loose check w/o serialization.
-		 * mem_cgroup_move_account() checks the page is valid or
-		 * not under LRU exclusion.
-		 */
-		if (folio_memcg(folio) == mc.from) {
-			ret = MC_TARGET_PAGE;
-			if (folio_is_device_private(folio) ||
-			    folio_is_device_coherent(folio))
-				ret = MC_TARGET_DEVICE;
-			if (target)
-				target->folio = folio;
-		}
-		if (!ret || !target) {
-			if (target)
-				folio_unlock(folio);
-			folio_put(folio);
-		}
-	}
-	/*
-	 * There is a swap entry and a page doesn't exist or isn't charged.
-	 * But we cannot move a tail-page in a THP.
-	 */
-	if (ent.val && !ret && (!page || !PageTransCompound(page)) &&
-	    mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) {
-		ret = MC_TARGET_SWAP;
-		if (target)
-			target->ent = ent;
-	}
-	return ret;
-}
-
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-/*
- * We don't consider PMD mapped swapping or file mapped pages because THP does
- * not support them for now.
- * Caller should make sure that pmd_trans_huge(pmd) is true.
- */
-static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma,
-		unsigned long addr, pmd_t pmd, union mc_target *target)
-{
-	struct page *page = NULL;
-	struct folio *folio;
-	enum mc_target_type ret = MC_TARGET_NONE;
-
-	if (unlikely(is_swap_pmd(pmd))) {
-		VM_BUG_ON(thp_migration_supported() &&
-				  !is_pmd_migration_entry(pmd));
-		return ret;
-	}
-	page = pmd_page(pmd);
-	VM_BUG_ON_PAGE(!page || !PageHead(page), page);
-	folio = page_folio(page);
-	if (!(mc.flags & MOVE_ANON))
-		return ret;
-	if (folio_memcg(folio) == mc.from) {
-		ret = MC_TARGET_PAGE;
-		if (target) {
-			folio_get(folio);
-			if (!folio_trylock(folio)) {
-				folio_put(folio);
-				return MC_TARGET_NONE;
-			}
-			target->folio = folio;
-		}
-	}
-	return ret;
-}
-#else
-static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma,
-		unsigned long addr, pmd_t pmd, union mc_target *target)
-{
-	return MC_TARGET_NONE;
-}
-#endif
-
-static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd,
-					unsigned long addr, unsigned long end,
-					struct mm_walk *walk)
-{
-	struct vm_area_struct *vma = walk->vma;
-	pte_t *pte;
-	spinlock_t *ptl;
-
-	ptl = pmd_trans_huge_lock(pmd, vma);
-	if (ptl) {
-		/*
-		 * Note their can not be MC_TARGET_DEVICE for now as we do not
-		 * support transparent huge page with MEMORY_DEVICE_PRIVATE but
-		 * this might change.
-		 */
-		if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
-			mc.precharge += HPAGE_PMD_NR;
-		spin_unlock(ptl);
-		return 0;
-	}
-
-	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
-	if (!pte)
-		return 0;
-	for (; addr != end; pte++, addr += PAGE_SIZE)
-		if (get_mctgt_type(vma, addr, ptep_get(pte), NULL))
-			mc.precharge++;	/* increment precharge temporarily */
-	pte_unmap_unlock(pte - 1, ptl);
-	cond_resched();
-
-	return 0;
-}
-
-static const struct mm_walk_ops precharge_walk_ops = {
-	.pmd_entry	= mem_cgroup_count_precharge_pte_range,
-	.walk_lock	= PGWALK_RDLOCK,
-};
-
-static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm)
-{
-	unsigned long precharge;
-
-	mmap_read_lock(mm);
-	walk_page_range(mm, 0, ULONG_MAX, &precharge_walk_ops, NULL);
-	mmap_read_unlock(mm);
-
-	precharge = mc.precharge;
-	mc.precharge = 0;
-
-	return precharge;
-}
-
-static int mem_cgroup_precharge_mc(struct mm_struct *mm)
-{
-	unsigned long precharge = mem_cgroup_count_precharge(mm);
-
-	VM_BUG_ON(mc.moving_task);
-	mc.moving_task = current;
-	return mem_cgroup_do_precharge(precharge);
-}
-
-/* cancels all extra charges on mc.from and mc.to, and wakes up all waiters. */
-static void __mem_cgroup_clear_mc(void)
-{
-	struct mem_cgroup *from = mc.from;
-	struct mem_cgroup *to = mc.to;
-
-	/* we must uncharge all the leftover precharges from mc.to */
-	if (mc.precharge) {
-		mem_cgroup_cancel_charge(mc.to, mc.precharge);
-		mc.precharge = 0;
-	}
-	/*
-	 * we didn't uncharge from mc.from at mem_cgroup_move_account(), so
-	 * we must uncharge here.
-	 */
-	if (mc.moved_charge) {
-		mem_cgroup_cancel_charge(mc.from, mc.moved_charge);
-		mc.moved_charge = 0;
-	}
-	/* we must fixup refcnts and charges */
-	if (mc.moved_swap) {
-		/* uncharge swap account from the old cgroup */
-		if (!mem_cgroup_is_root(mc.from))
-			page_counter_uncharge(&mc.from->memsw, mc.moved_swap);
-
-		mem_cgroup_id_put_many(mc.from, mc.moved_swap);
-
-		/*
-		 * we charged both to->memory and to->memsw, so we
-		 * should uncharge to->memory.
-		 */
-		if (!mem_cgroup_is_root(mc.to))
-			page_counter_uncharge(&mc.to->memory, mc.moved_swap);
-
-		mc.moved_swap = 0;
-	}
-	memcg1_oom_recover(from);
-	memcg1_oom_recover(to);
-	wake_up_all(&mc.waitq);
-}
-
-static void mem_cgroup_clear_mc(void)
-{
-	struct mm_struct *mm = mc.mm;
-
-	/*
-	 * we must clear moving_task before waking up waiters at the end of
-	 * task migration.
-	 */
-	mc.moving_task = NULL;
-	__mem_cgroup_clear_mc();
-	spin_lock(&mc.lock);
-	mc.from = NULL;
-	mc.to = NULL;
-	mc.mm = NULL;
-	spin_unlock(&mc.lock);
-
-	mmput(mm);
-}
-
-int memcg1_can_attach(struct cgroup_taskset *tset)
-{
-	struct cgroup_subsys_state *css;
-	struct mem_cgroup *memcg = NULL; /* unneeded init to make gcc happy */
-	struct mem_cgroup *from;
-	struct task_struct *leader, *p;
-	struct mm_struct *mm;
-	unsigned long move_flags;
-	int ret = 0;
-
-	/* charge immigration isn't supported on the default hierarchy */
-	if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
-		return 0;
-
-	/*
-	 * Multi-process migrations only happen on the default hierarchy
-	 * where charge immigration is not used.  Perform charge
-	 * immigration if @tset contains a leader and whine if there are
-	 * multiple.
-	 */
-	p = NULL;
-	cgroup_taskset_for_each_leader(leader, css, tset) {
-		WARN_ON_ONCE(p);
-		p = leader;
-		memcg = mem_cgroup_from_css(css);
-	}
-	if (!p)
-		return 0;
-
-	/*
-	 * We are now committed to this value whatever it is. Changes in this
-	 * tunable will only affect upcoming migrations, not the current one.
-	 * So we need to save it, and keep it going.
-	 */
-	move_flags = READ_ONCE(memcg->move_charge_at_immigrate);
-	if (!move_flags)
-		return 0;
-
-	from = mem_cgroup_from_task(p);
-
-	VM_BUG_ON(from == memcg);
-
-	mm = get_task_mm(p);
-	if (!mm)
-		return 0;
-	/* We move charges only when we move a owner of the mm */
-	if (mm->owner == p) {
-		VM_BUG_ON(mc.from);
-		VM_BUG_ON(mc.to);
-		VM_BUG_ON(mc.precharge);
-		VM_BUG_ON(mc.moved_charge);
-		VM_BUG_ON(mc.moved_swap);
-
-		spin_lock(&mc.lock);
-		mc.mm = mm;
-		mc.from = from;
-		mc.to = memcg;
-		mc.flags = move_flags;
-		spin_unlock(&mc.lock);
-		/* We set mc.moving_task later */
-
-		ret = mem_cgroup_precharge_mc(mm);
-		if (ret)
-			mem_cgroup_clear_mc();
-	} else {
-		mmput(mm);
-	}
-	return ret;
-}
-
-void memcg1_cancel_attach(struct cgroup_taskset *tset)
-{
-	if (mc.to)
-		mem_cgroup_clear_mc();
-}
-
-static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
-				unsigned long addr, unsigned long end,
-				struct mm_walk *walk)
-{
-	int ret = 0;
-	struct vm_area_struct *vma = walk->vma;
-	pte_t *pte;
-	spinlock_t *ptl;
-	enum mc_target_type target_type;
-	union mc_target target;
-	struct folio *folio;
-
-	ptl = pmd_trans_huge_lock(pmd, vma);
-	if (ptl) {
-		if (mc.precharge < HPAGE_PMD_NR) {
-			spin_unlock(ptl);
-			return 0;
-		}
-		target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
-		if (target_type == MC_TARGET_PAGE) {
-			folio = target.folio;
-			if (folio_isolate_lru(folio)) {
-				if (!mem_cgroup_move_account(folio, true,
-							     mc.from, mc.to)) {
-					mc.precharge -= HPAGE_PMD_NR;
-					mc.moved_charge += HPAGE_PMD_NR;
-				}
-				folio_putback_lru(folio);
-			}
-			folio_unlock(folio);
-			folio_put(folio);
-		} else if (target_type == MC_TARGET_DEVICE) {
-			folio = target.folio;
-			if (!mem_cgroup_move_account(folio, true,
-						     mc.from, mc.to)) {
-				mc.precharge -= HPAGE_PMD_NR;
-				mc.moved_charge += HPAGE_PMD_NR;
-			}
-			folio_unlock(folio);
-			folio_put(folio);
-		}
-		spin_unlock(ptl);
-		return 0;
-	}
-
-retry:
-	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
-	if (!pte)
-		return 0;
-	for (; addr != end; addr += PAGE_SIZE) {
-		pte_t ptent = ptep_get(pte++);
-		bool device = false;
-		swp_entry_t ent;
-
-		if (!mc.precharge)
-			break;
-
-		switch (get_mctgt_type(vma, addr, ptent, &target)) {
-		case MC_TARGET_DEVICE:
-			device = true;
-			fallthrough;
-		case MC_TARGET_PAGE:
-			folio = target.folio;
-			/*
-			 * We can have a part of the split pmd here. Moving it
-			 * can be done but it would be too convoluted so simply
-			 * ignore such a partial THP and keep it in original
-			 * memcg. There should be somebody mapping the head.
-			 */
-			if (folio_test_large(folio))
-				goto put;
-			if (!device && !folio_isolate_lru(folio))
-				goto put;
-			if (!mem_cgroup_move_account(folio, false,
-						mc.from, mc.to)) {
-				mc.precharge--;
-				/* we uncharge from mc.from later. */
-				mc.moved_charge++;
-			}
-			if (!device)
-				folio_putback_lru(folio);
-put:			/* get_mctgt_type() gets & locks the page */
-			folio_unlock(folio);
-			folio_put(folio);
-			break;
-		case MC_TARGET_SWAP:
-			ent = target.ent;
-			if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) {
-				mc.precharge--;
-				mem_cgroup_id_get_many(mc.to, 1);
-				/* we fixup other refcnts and charges later. */
-				mc.moved_swap++;
-			}
-			break;
-		default:
-			break;
-		}
-	}
-	pte_unmap_unlock(pte - 1, ptl);
-	cond_resched();
-
-	if (addr != end) {
-		/*
-		 * We have consumed all precharges we got in can_attach().
-		 * We try charge one by one, but don't do any additional
-		 * charges to mc.to if we have failed in charge once in attach()
-		 * phase.
-		 */
-		ret = mem_cgroup_do_precharge(1);
-		if (!ret)
-			goto retry;
-	}
-
-	return ret;
-}
-
-static const struct mm_walk_ops charge_walk_ops = {
-	.pmd_entry	= mem_cgroup_move_charge_pte_range,
-	.walk_lock	= PGWALK_RDLOCK,
-};
-
-static void mem_cgroup_move_charge(void)
-{
-	lru_add_drain_all();
-	/*
-	 * Signal folio_memcg_lock() to take the memcg's move_lock
-	 * while we're moving its pages to another memcg. Then wait
-	 * for already started RCU-only updates to finish.
-	 */
-	atomic_inc(&mc.from->moving_account);
-	synchronize_rcu();
-retry:
-	if (unlikely(!mmap_read_trylock(mc.mm))) {
-		/*
-		 * Someone who are holding the mmap_lock might be waiting in
-		 * waitq. So we cancel all extra charges, wake up all waiters,
-		 * and retry. Because we cancel precharges, we might not be able
-		 * to move enough charges, but moving charge is a best-effort
-		 * feature anyway, so it wouldn't be a big problem.
-		 */
-		__mem_cgroup_clear_mc();
-		cond_resched();
-		goto retry;
-	}
-	/*
-	 * When we have consumed all precharges and failed in doing
-	 * additional charge, the page walk just aborts.
-	 */
-	walk_page_range(mc.mm, 0, ULONG_MAX, &charge_walk_ops, NULL);
-	mmap_read_unlock(mc.mm);
-	atomic_dec(&mc.from->moving_account);
-}
-
-void memcg1_move_task(void)
-{
-	if (mc.to) {
-		mem_cgroup_move_charge();
-		mem_cgroup_clear_mc();
-	}
-}
-
-#else	/* !CONFIG_MMU */
-int memcg1_can_attach(struct cgroup_taskset *tset)
-{
-	return 0;
-}
-void memcg1_cancel_attach(struct cgroup_taskset *tset)
-{
-}
-void memcg1_move_task(void)
-{
-}
-#endif
-
 static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap)
 {
 	struct mem_cgroup_threshold_ary *t;
diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h
index c0672e25bcdb..0e3b82951d91 100644
--- a/mm/memcontrol-v1.h
+++ b/mm/memcontrol-v1.h
@@ -80,12 +80,7 @@ static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg)
 	WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX);
 }
 
-bool memcg1_wait_acct_move(struct mem_cgroup *memcg);
-
 struct cgroup_taskset;
-int memcg1_can_attach(struct cgroup_taskset *tset);
-void memcg1_cancel_attach(struct cgroup_taskset *tset);
-void memcg1_move_task(void);
 void memcg1_css_offline(struct mem_cgroup *memcg);
 
 /* for encoding cft->private value on file */
@@ -130,7 +125,6 @@ static inline void memcg1_free_events(struct mem_cgroup *memcg) {}
 static inline void memcg1_memcg_init(struct mem_cgroup *memcg) {}
 static inline void memcg1_remove_from_trees(struct mem_cgroup *memcg) {}
 static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) {}
-static inline bool memcg1_wait_acct_move(struct mem_cgroup *memcg) { return false; }
 static inline void memcg1_css_offline(struct mem_cgroup *memcg) {}
 
 static inline bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked) { return true; }
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5c3a8629ef3e..94279b9c766a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2242,12 +2242,6 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	 */
 	if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
 		goto retry;
-	/*
-	 * At task move, charge accounts can be doubly counted. So, it's
-	 * better to wait until the end of task_move if something is going on.
-	 */
-	if (memcg1_wait_acct_move(mem_over_limit))
-		goto retry;
 
 	if (nr_retries--)
 		goto retry;
@@ -4439,9 +4433,6 @@ struct cgroup_subsys memory_cgrp_subsys = {
 	.exit = mem_cgroup_exit,
 	.dfl_cftypes = memory_files,
 #ifdef CONFIG_MEMCG_V1
-	.can_attach = memcg1_can_attach,
-	.cancel_attach = memcg1_cancel_attach,
-	.post_attach = memcg1_move_task,
 	.legacy_cftypes = mem_cgroup_legacy_files,
 #endif
 	.early_init = 0,
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC PATCH 3/3] memcg-v1: remove memcg move locking code
  2024-10-24  6:57 ` [RFC PATCH 0/3] " Shakeel Butt
  2024-10-24  6:57   ` [RFC PATCH 1/3] memcg-v1: fully deprecate move_charge_at_immigrate Shakeel Butt
  2024-10-24  6:57   ` [RFC PATCH 2/3] memcg-v1: remove charge move code Shakeel Butt
@ 2024-10-24  6:57   ` Shakeel Butt
  2024-10-24  9:16     ` Michal Hocko
                       ` (2 more replies)
  2 siblings, 3 replies; 56+ messages in thread
From: Shakeel Butt @ 2024-10-24  6:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, linux-mm, cgroups, linux-kernel, linux-fsdevel,
	linux-doc, Meta kernel team

The memcg v1's charge move feature has been deprecated. There is no need
to have any locking or protection against the moving charge. Let's
proceed to remove all the locking code related to charge moving.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 fs/buffer.c                |  5 ---
 include/linux/memcontrol.h | 54 -------------------------
 mm/filemap.c               |  1 -
 mm/memcontrol-v1.c         | 82 --------------------------------------
 mm/memcontrol.c            |  5 ---
 mm/page-writeback.c        | 21 ++--------
 mm/rmap.c                  |  1 -
 mm/vmscan.c                | 11 -----
 8 files changed, 3 insertions(+), 177 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 1fc9a50def0b..88e765b0699f 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -736,15 +736,12 @@ bool block_dirty_folio(struct address_space *mapping, struct folio *folio)
 	 * Lock out page's memcg migration to keep PageDirty
 	 * synchronized with per-memcg dirty page counters.
 	 */
-	folio_memcg_lock(folio);
 	newly_dirty = !folio_test_set_dirty(folio);
 	spin_unlock(&mapping->i_private_lock);
 
 	if (newly_dirty)
 		__folio_mark_dirty(folio, mapping, 1);
 
-	folio_memcg_unlock(folio);
-
 	if (newly_dirty)
 		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
 
@@ -1194,13 +1191,11 @@ void mark_buffer_dirty(struct buffer_head *bh)
 		struct folio *folio = bh->b_folio;
 		struct address_space *mapping = NULL;
 
-		folio_memcg_lock(folio);
 		if (!folio_test_set_dirty(folio)) {
 			mapping = folio->mapping;
 			if (mapping)
 				__folio_mark_dirty(folio, mapping, 0);
 		}
-		folio_memcg_unlock(folio);
 		if (mapping)
 			__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
 	}
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 798db70b0a30..932534291ca2 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -299,20 +299,10 @@ struct mem_cgroup {
 	/* For oom notifier event fd */
 	struct list_head oom_notify;
 
-	/* taken only while moving_account > 0 */
-	spinlock_t move_lock;
-	unsigned long move_lock_flags;
-
 	/* Legacy tcp memory accounting */
 	bool tcpmem_active;
 	int tcpmem_pressure;
 
-	/*
-	 * set > 0 if pages under this cgroup are moving to other cgroup.
-	 */
-	atomic_t moving_account;
-	struct task_struct *move_lock_task;
-
 	/* List of events which userspace want to receive */
 	struct list_head event_list;
 	spinlock_t event_list_lock;
@@ -428,9 +418,7 @@ static inline struct obj_cgroup *__folio_objcg(struct folio *folio)
  *
  * - the folio lock
  * - LRU isolation
- * - folio_memcg_lock()
  * - exclusive reference
- * - mem_cgroup_trylock_pages()
  *
  * For a kmem folio a caller should hold an rcu read lock to protect memcg
  * associated with a kmem folio from being released.
@@ -499,9 +487,7 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
  *
  * - the folio lock
  * - LRU isolation
- * - lock_folio_memcg()
  * - exclusive reference
- * - mem_cgroup_trylock_pages()
  *
  * For a kmem folio a caller should hold an rcu read lock to protect memcg
  * associated with a kmem folio from being released.
@@ -1873,26 +1859,6 @@ static inline bool task_in_memcg_oom(struct task_struct *p)
 	return p->memcg_in_oom;
 }
 
-void folio_memcg_lock(struct folio *folio);
-void folio_memcg_unlock(struct folio *folio);
-
-/* try to stablize folio_memcg() for all the pages in a memcg */
-static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg)
-{
-	rcu_read_lock();
-
-	if (mem_cgroup_disabled() || !atomic_read(&memcg->moving_account))
-		return true;
-
-	rcu_read_unlock();
-	return false;
-}
-
-static inline void mem_cgroup_unlock_pages(void)
-{
-	rcu_read_unlock();
-}
-
 static inline void mem_cgroup_enter_user_fault(void)
 {
 	WARN_ON(current->in_user_fault);
@@ -1914,26 +1880,6 @@ unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order,
 	return 0;
 }
 
-static inline void folio_memcg_lock(struct folio *folio)
-{
-}
-
-static inline void folio_memcg_unlock(struct folio *folio)
-{
-}
-
-static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg)
-{
-	/* to match folio_memcg_rcu() */
-	rcu_read_lock();
-	return true;
-}
-
-static inline void mem_cgroup_unlock_pages(void)
-{
-	rcu_read_unlock();
-}
-
 static inline bool task_in_memcg_oom(struct task_struct *p)
 {
 	return false;
diff --git a/mm/filemap.c b/mm/filemap.c
index 630a1c431ea1..e582a1545d2a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -119,7 +119,6 @@
  *    ->i_pages lock		(folio_remove_rmap_pte->set_page_dirty)
  *    bdi.wb->list_lock		(folio_remove_rmap_pte->set_page_dirty)
  *    ->inode->i_lock		(folio_remove_rmap_pte->set_page_dirty)
- *    ->memcg->move_lock	(folio_remove_rmap_pte->folio_memcg_lock)
  *    bdi.wb->list_lock		(zap_pte_range->set_page_dirty)
  *    ->inode->i_lock		(zap_pte_range->set_page_dirty)
  *    ->private_lock		(zap_pte_range->block_dirty_folio)
diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
index 79339cb65b9d..b14f01a93d0c 100644
--- a/mm/memcontrol-v1.c
+++ b/mm/memcontrol-v1.c
@@ -401,87 +401,6 @@ unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order,
 	return nr_reclaimed;
 }
 
-/**
- * folio_memcg_lock - Bind a folio to its memcg.
- * @folio: The folio.
- *
- * This function prevents unlocked LRU folios from being moved to
- * another cgroup.
- *
- * It ensures lifetime of the bound memcg.  The caller is responsible
- * for the lifetime of the folio.
- */
-void folio_memcg_lock(struct folio *folio)
-{
-	struct mem_cgroup *memcg;
-	unsigned long flags;
-
-	/*
-	 * The RCU lock is held throughout the transaction.  The fast
-	 * path can get away without acquiring the memcg->move_lock
-	 * because page moving starts with an RCU grace period.
-         */
-	rcu_read_lock();
-
-	if (mem_cgroup_disabled())
-		return;
-again:
-	memcg = folio_memcg(folio);
-	if (unlikely(!memcg))
-		return;
-
-#ifdef CONFIG_PROVE_LOCKING
-	local_irq_save(flags);
-	might_lock(&memcg->move_lock);
-	local_irq_restore(flags);
-#endif
-
-	if (atomic_read(&memcg->moving_account) <= 0)
-		return;
-
-	spin_lock_irqsave(&memcg->move_lock, flags);
-	if (memcg != folio_memcg(folio)) {
-		spin_unlock_irqrestore(&memcg->move_lock, flags);
-		goto again;
-	}
-
-	/*
-	 * When charge migration first begins, we can have multiple
-	 * critical sections holding the fast-path RCU lock and one
-	 * holding the slowpath move_lock. Track the task who has the
-	 * move_lock for folio_memcg_unlock().
-	 */
-	memcg->move_lock_task = current;
-	memcg->move_lock_flags = flags;
-}
-
-static void __folio_memcg_unlock(struct mem_cgroup *memcg)
-{
-	if (memcg && memcg->move_lock_task == current) {
-		unsigned long flags = memcg->move_lock_flags;
-
-		memcg->move_lock_task = NULL;
-		memcg->move_lock_flags = 0;
-
-		spin_unlock_irqrestore(&memcg->move_lock, flags);
-	}
-
-	rcu_read_unlock();
-}
-
-/**
- * folio_memcg_unlock - Release the binding between a folio and its memcg.
- * @folio: The folio.
- *
- * This releases the binding created by folio_memcg_lock().  This does
- * not change the accounting of this folio to its memcg, but it does
- * permit others to change it.
- */
-void folio_memcg_unlock(struct folio *folio)
-{
-	__folio_memcg_unlock(folio_memcg(folio));
-}
-
 static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css,
 				struct cftype *cft)
 {
@@ -1187,7 +1106,6 @@ void memcg1_memcg_init(struct mem_cgroup *memcg)
 {
 	INIT_LIST_HEAD(&memcg->oom_notify);
 	mutex_init(&memcg->thresholds_lock);
-	spin_lock_init(&memcg->move_lock);
 	INIT_LIST_HEAD(&memcg->event_list);
 	spin_lock_init(&memcg->event_list_lock);
 }
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 94279b9c766a..3c223aaeb6af 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1189,7 +1189,6 @@ void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio)
  * These functions are safe to use under any of the following conditions:
  * - folio locked
  * - folio_test_lru false
- * - folio_memcg_lock()
  * - folio frozen (refcount of 0)
  *
  * Return: The lruvec this folio is on with its lock held.
@@ -1211,7 +1210,6 @@ struct lruvec *folio_lruvec_lock(struct folio *folio)
  * These functions are safe to use under any of the following conditions:
  * - folio locked
  * - folio_test_lru false
- * - folio_memcg_lock()
  * - folio frozen (refcount of 0)
  *
  * Return: The lruvec this folio is on with its lock held and interrupts
@@ -1235,7 +1233,6 @@ struct lruvec *folio_lruvec_lock_irq(struct folio *folio)
  * These functions are safe to use under any of the following conditions:
  * - folio locked
  * - folio_test_lru false
- * - folio_memcg_lock()
  * - folio frozen (refcount of 0)
  *
  * Return: The lruvec this folio is on with its lock held and interrupts
@@ -2375,9 +2372,7 @@ static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
 	 *
 	 * - the page lock
 	 * - LRU isolation
-	 * - folio_memcg_lock()
 	 * - exclusive reference
-	 * - mem_cgroup_trylock_pages()
 	 */
 	folio->memcg_data = (unsigned long)memcg;
 }
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 1d7179aba8e3..e33727dd6b47 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2743,8 +2743,6 @@ EXPORT_SYMBOL(noop_dirty_folio);
 /*
  * Helper function for set_page_dirty family.
  *
- * Caller must hold folio_memcg_lock().
- *
  * NOTE: This relies on being atomic wrt interrupts.
  */
 static void folio_account_dirtied(struct folio *folio,
@@ -2776,8 +2774,6 @@ static void folio_account_dirtied(struct folio *folio,
 
 /*
  * Helper function for deaccounting dirty page without writeback.
- *
- * Caller must hold folio_memcg_lock().
  */
 void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
 {
@@ -2795,9 +2791,8 @@ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
  * If warn is true, then emit a warning if the folio is not uptodate and has
  * not been truncated.
  *
- * The caller must hold folio_memcg_lock().  It is the caller's
- * responsibility to prevent the folio from being truncated while
- * this function is in progress, although it may have been truncated
+ * It is the caller's responsibility to prevent the folio from being truncated
+ * while this function is in progress, although it may have been truncated
  * before this function is called.  Most callers have the folio locked.
  * A few have the folio blocked from truncation through other means (e.g.
  * zap_vma_pages() has it mapped and is holding the page table lock).
@@ -2841,14 +2836,10 @@ void __folio_mark_dirty(struct folio *folio, struct address_space *mapping,
  */
 bool filemap_dirty_folio(struct address_space *mapping, struct folio *folio)
 {
-	folio_memcg_lock(folio);
-	if (folio_test_set_dirty(folio)) {
-		folio_memcg_unlock(folio);
+	if (folio_test_set_dirty(folio))
 		return false;
-	}
 
 	__folio_mark_dirty(folio, mapping, !folio_test_private(folio));
-	folio_memcg_unlock(folio);
 
 	if (mapping->host) {
 		/* !PageAnon && !swapper_space */
@@ -2975,14 +2966,12 @@ void __folio_cancel_dirty(struct folio *folio)
 		struct bdi_writeback *wb;
 		struct wb_lock_cookie cookie = {};
 
-		folio_memcg_lock(folio);
 		wb = unlocked_inode_to_wb_begin(inode, &cookie);
 
 		if (folio_test_clear_dirty(folio))
 			folio_account_cleaned(folio, wb);
 
 		unlocked_inode_to_wb_end(inode, &cookie);
-		folio_memcg_unlock(folio);
 	} else {
 		folio_clear_dirty(folio);
 	}
@@ -3093,7 +3082,6 @@ bool __folio_end_writeback(struct folio *folio)
 	struct address_space *mapping = folio_mapping(folio);
 	bool ret;
 
-	folio_memcg_lock(folio);
 	if (mapping && mapping_use_writeback_tags(mapping)) {
 		struct inode *inode = mapping->host;
 		struct backing_dev_info *bdi = inode_to_bdi(inode);
@@ -3124,7 +3112,6 @@ bool __folio_end_writeback(struct folio *folio)
 	lruvec_stat_mod_folio(folio, NR_WRITEBACK, -nr);
 	zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, -nr);
 	node_stat_mod_folio(folio, NR_WRITTEN, nr);
-	folio_memcg_unlock(folio);
 
 	return ret;
 }
@@ -3137,7 +3124,6 @@ void __folio_start_writeback(struct folio *folio, bool keep_write)
 
 	VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio);
 
-	folio_memcg_lock(folio);
 	if (mapping && mapping_use_writeback_tags(mapping)) {
 		XA_STATE(xas, &mapping->i_pages, folio_index(folio));
 		struct inode *inode = mapping->host;
@@ -3178,7 +3164,6 @@ void __folio_start_writeback(struct folio *folio, bool keep_write)
 
 	lruvec_stat_mod_folio(folio, NR_WRITEBACK, nr);
 	zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, nr);
-	folio_memcg_unlock(folio);
 
 	access_ret = arch_make_folio_accessible(folio);
 	/*
diff --git a/mm/rmap.c b/mm/rmap.c
index 4785a693857a..c6c4d4ea29a7 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -32,7 +32,6 @@
  *                   swap_lock (in swap_duplicate, swap_info_get)
  *                     mmlist_lock (in mmput, drain_mmlist and others)
  *                     mapping->private_lock (in block_dirty_folio)
- *                       folio_lock_memcg move_lock (in block_dirty_folio)
  *                         i_pages lock (widely used)
  *                           lruvec->lru_lock (in folio_lruvec_lock_irq)
  *                     inode->i_lock (in set_page_dirty's __mark_inode_dirty)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 29c098790b01..fd7171658b63 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3662,10 +3662,6 @@ static void walk_mm(struct mm_struct *mm, struct lru_gen_mm_walk *walk)
 		if (walk->seq != max_seq)
 			break;
 
-		/* folio_update_gen() requires stable folio_memcg() */
-		if (!mem_cgroup_trylock_pages(memcg))
-			break;
-
 		/* the caller might be holding the lock for write */
 		if (mmap_read_trylock(mm)) {
 			err = walk_page_range(mm, walk->next_addr, ULONG_MAX, &mm_walk_ops, walk);
@@ -3673,8 +3669,6 @@ static void walk_mm(struct mm_struct *mm, struct lru_gen_mm_walk *walk)
 			mmap_read_unlock(mm);
 		}
 
-		mem_cgroup_unlock_pages();
-
 		if (walk->batched) {
 			spin_lock_irq(&lruvec->lru_lock);
 			reset_batch_size(walk);
@@ -4096,10 +4090,6 @@ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
 		}
 	}
 
-	/* folio_update_gen() requires stable folio_memcg() */
-	if (!mem_cgroup_trylock_pages(memcg))
-		return true;
-
 	arch_enter_lazy_mmu_mode();
 
 	pte -= (addr - start) / PAGE_SIZE;
@@ -4144,7 +4134,6 @@ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
 	}
 
 	arch_leave_lazy_mmu_mode();
-	mem_cgroup_unlock_pages();
 
 	/* feedback from rmap walkers to page table walkers */
 	if (mm_state && suitable_to_scan(i, young))
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 1/3] memcg-v1: fully deprecate move_charge_at_immigrate
  2024-10-24  6:57   ` [RFC PATCH 1/3] memcg-v1: fully deprecate move_charge_at_immigrate Shakeel Butt
@ 2024-10-24  9:14     ` Michal Hocko
  2024-10-24 16:51       ` Roman Gushchin
  2024-10-24 16:49     ` Roman Gushchin
  1 sibling, 1 reply; 56+ messages in thread
From: Michal Hocko @ 2024-10-24  9:14 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Roman Gushchin, Muchun Song,
	Hugh Dickins, linux-mm, cgroups, linux-kernel, linux-fsdevel,
	linux-doc, Meta kernel team

On Wed 23-10-24 23:57:10, Shakeel Butt wrote:
> Proceed with the complete deprecation of memcg v1's charge moving
> feature. The deprecation warning has been in the kernel for almost two
> years and has been ported to all stable kernel since. Now is the time to
> fully deprecate this feature.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

I fine with this move, just one detail we might need to consider
[...]
> @@ -606,17 +606,7 @@ static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css,
>  		     "Please report your usecase to linux-mm@kvack.org if you "
>  		     "depend on this functionality.\n");
>  
> -	if (val & ~MOVE_MASK)
> -		return -EINVAL;
> -
> -	/*
> -	 * No kind of locking is needed in here, because ->can_attach() will
> -	 * check this value once in the beginning of the process, and then carry
> -	 * on with stale data. This means that changes to this value will only
> -	 * affect task migrations starting after the change.
> -	 */
> -	memcg->move_charge_at_immigrate = val;
> -	return 0;
> +	return -EINVAL;

Would it make more sense to -EINVAL only if val != 0? The reason being
that some userspace might be just writing 0 here for whatever reason and
see the failure unexpected.

>  }
>  #else
>  static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css,
> -- 
> 2.43.5

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 2/3] memcg-v1: remove charge move code
  2024-10-24  6:57   ` [RFC PATCH 2/3] memcg-v1: remove charge move code Shakeel Butt
@ 2024-10-24  9:14     ` Michal Hocko
  2024-10-24 16:50     ` Roman Gushchin
  1 sibling, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2024-10-24  9:14 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Roman Gushchin, Muchun Song,
	Hugh Dickins, linux-mm, cgroups, linux-kernel, linux-fsdevel,
	linux-doc, Meta kernel team

On Wed 23-10-24 23:57:11, Shakeel Butt wrote:
> The memcg-v1 charge move feature has been deprecated completely and
> let's remove the relevant code as well.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Acked-by: Michal Hocko <mhocko@suse.com>
> ---
>  include/linux/memcontrol.h |   5 -
>  mm/memcontrol-v1.c         | 864 -------------------------------------
>  mm/memcontrol-v1.h         |   6 -
>  mm/memcontrol.c            |   9 -
>  4 files changed, 884 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 524006313b0d..798db70b0a30 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -299,11 +299,6 @@ struct mem_cgroup {
>  	/* For oom notifier event fd */
>  	struct list_head oom_notify;
>  
> -	/*
> -	 * Should we move charges of a task when a task is moved into this
> -	 * mem_cgroup ? And what type of charges should we move ?
> -	 */
> -	unsigned long move_charge_at_immigrate;
>  	/* taken only while moving_account > 0 */
>  	spinlock_t move_lock;
>  	unsigned long move_lock_flags;
> diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
> index 8f88540f0159..79339cb65b9d 100644
> --- a/mm/memcontrol-v1.c
> +++ b/mm/memcontrol-v1.c
> @@ -40,31 +40,6 @@ static struct mem_cgroup_tree soft_limit_tree __read_mostly;
>  #define	MEM_CGROUP_MAX_RECLAIM_LOOPS		100
>  #define	MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS	2
>  
> -/* Stuffs for move charges at task migration. */
> -/*
> - * Types of charges to be moved.
> - */
> -#define MOVE_ANON	0x1ULL
> -#define MOVE_FILE	0x2ULL
> -#define MOVE_MASK	(MOVE_ANON | MOVE_FILE)
> -
> -/* "mc" and its members are protected by cgroup_mutex */
> -static struct move_charge_struct {
> -	spinlock_t	  lock; /* for from, to */
> -	struct mm_struct  *mm;
> -	struct mem_cgroup *from;
> -	struct mem_cgroup *to;
> -	unsigned long flags;
> -	unsigned long precharge;
> -	unsigned long moved_charge;
> -	unsigned long moved_swap;
> -	struct task_struct *moving_task;	/* a task moving charges */
> -	wait_queue_head_t waitq;		/* a waitq for other context */
> -} mc = {
> -	.lock = __SPIN_LOCK_UNLOCKED(mc.lock),
> -	.waitq = __WAIT_QUEUE_HEAD_INITIALIZER(mc.waitq),
> -};
> -
>  /* for OOM */
>  struct mem_cgroup_eventfd_list {
>  	struct list_head list;
> @@ -426,51 +401,6 @@ unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order,
>  	return nr_reclaimed;
>  }
>  
> -/*
> - * A routine for checking "mem" is under move_account() or not.
> - *
> - * Checking a cgroup is mc.from or mc.to or under hierarchy of
> - * moving cgroups. This is for waiting at high-memory pressure
> - * caused by "move".
> - */
> -static bool mem_cgroup_under_move(struct mem_cgroup *memcg)
> -{
> -	struct mem_cgroup *from;
> -	struct mem_cgroup *to;
> -	bool ret = false;
> -	/*
> -	 * Unlike task_move routines, we access mc.to, mc.from not under
> -	 * mutual exclusion by cgroup_mutex. Here, we take spinlock instead.
> -	 */
> -	spin_lock(&mc.lock);
> -	from = mc.from;
> -	to = mc.to;
> -	if (!from)
> -		goto unlock;
> -
> -	ret = mem_cgroup_is_descendant(from, memcg) ||
> -		mem_cgroup_is_descendant(to, memcg);
> -unlock:
> -	spin_unlock(&mc.lock);
> -	return ret;
> -}
> -
> -bool memcg1_wait_acct_move(struct mem_cgroup *memcg)
> -{
> -	if (mc.moving_task && current != mc.moving_task) {
> -		if (mem_cgroup_under_move(memcg)) {
> -			DEFINE_WAIT(wait);
> -			prepare_to_wait(&mc.waitq, &wait, TASK_INTERRUPTIBLE);
> -			/* moving charge context might have finished. */
> -			if (mc.moving_task)
> -				schedule();
> -			finish_wait(&mc.waitq, &wait);
> -			return true;
> -		}
> -	}
> -	return false;
> -}
> -
>  /**
>   * folio_memcg_lock - Bind a folio to its memcg.
>   * @folio: The folio.
> @@ -552,44 +482,6 @@ void folio_memcg_unlock(struct folio *folio)
>  	__folio_memcg_unlock(folio_memcg(folio));
>  }
>  
> -#ifdef CONFIG_SWAP
> -/**
> - * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record.
> - * @entry: swap entry to be moved
> - * @from:  mem_cgroup which the entry is moved from
> - * @to:  mem_cgroup which the entry is moved to
> - *
> - * It succeeds only when the swap_cgroup's record for this entry is the same
> - * as the mem_cgroup's id of @from.
> - *
> - * Returns 0 on success, -EINVAL on failure.
> - *
> - * The caller must have charged to @to, IOW, called page_counter_charge() about
> - * both res and memsw, and called css_get().
> - */
> -static int mem_cgroup_move_swap_account(swp_entry_t entry,
> -				struct mem_cgroup *from, struct mem_cgroup *to)
> -{
> -	unsigned short old_id, new_id;
> -
> -	old_id = mem_cgroup_id(from);
> -	new_id = mem_cgroup_id(to);
> -
> -	if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) {
> -		mod_memcg_state(from, MEMCG_SWAP, -1);
> -		mod_memcg_state(to, MEMCG_SWAP, 1);
> -		return 0;
> -	}
> -	return -EINVAL;
> -}
> -#else
> -static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
> -				struct mem_cgroup *from, struct mem_cgroup *to)
> -{
> -	return -EINVAL;
> -}
> -#endif
> -
>  static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css,
>  				struct cftype *cft)
>  {
> @@ -600,8 +492,6 @@ static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css,
>  static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css,
>  				 struct cftype *cft, u64 val)
>  {
> -	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> -
>  	pr_warn_once("Cgroup memory moving (move_charge_at_immigrate) is deprecated. "
>  		     "Please report your usecase to linux-mm@kvack.org if you "
>  		     "depend on this functionality.\n");
> @@ -616,760 +506,6 @@ static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css,
>  }
>  #endif
>  
> -#ifdef CONFIG_MMU
> -/* Handlers for move charge at task migration. */
> -static int mem_cgroup_do_precharge(unsigned long count)
> -{
> -	int ret;
> -
> -	/* Try a single bulk charge without reclaim first, kswapd may wake */
> -	ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count);
> -	if (!ret) {
> -		mc.precharge += count;
> -		return ret;
> -	}
> -
> -	/* Try charges one by one with reclaim, but do not retry */
> -	while (count--) {
> -		ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1);
> -		if (ret)
> -			return ret;
> -		mc.precharge++;
> -		cond_resched();
> -	}
> -	return 0;
> -}
> -
> -union mc_target {
> -	struct folio	*folio;
> -	swp_entry_t	ent;
> -};
> -
> -enum mc_target_type {
> -	MC_TARGET_NONE = 0,
> -	MC_TARGET_PAGE,
> -	MC_TARGET_SWAP,
> -	MC_TARGET_DEVICE,
> -};
> -
> -static struct page *mc_handle_present_pte(struct vm_area_struct *vma,
> -						unsigned long addr, pte_t ptent)
> -{
> -	struct page *page = vm_normal_page(vma, addr, ptent);
> -
> -	if (!page)
> -		return NULL;
> -	if (PageAnon(page)) {
> -		if (!(mc.flags & MOVE_ANON))
> -			return NULL;
> -	} else {
> -		if (!(mc.flags & MOVE_FILE))
> -			return NULL;
> -	}
> -	get_page(page);
> -
> -	return page;
> -}
> -
> -#if defined(CONFIG_SWAP) || defined(CONFIG_DEVICE_PRIVATE)
> -static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
> -			pte_t ptent, swp_entry_t *entry)
> -{
> -	struct page *page = NULL;
> -	swp_entry_t ent = pte_to_swp_entry(ptent);
> -
> -	if (!(mc.flags & MOVE_ANON))
> -		return NULL;
> -
> -	/*
> -	 * Handle device private pages that are not accessible by the CPU, but
> -	 * stored as special swap entries in the page table.
> -	 */
> -	if (is_device_private_entry(ent)) {
> -		page = pfn_swap_entry_to_page(ent);
> -		if (!get_page_unless_zero(page))
> -			return NULL;
> -		return page;
> -	}
> -
> -	if (non_swap_entry(ent))
> -		return NULL;
> -
> -	/*
> -	 * Because swap_cache_get_folio() updates some statistics counter,
> -	 * we call find_get_page() with swapper_space directly.
> -	 */
> -	page = find_get_page(swap_address_space(ent), swap_cache_index(ent));
> -	entry->val = ent.val;
> -
> -	return page;
> -}
> -#else
> -static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
> -			pte_t ptent, swp_entry_t *entry)
> -{
> -	return NULL;
> -}
> -#endif
> -
> -static struct page *mc_handle_file_pte(struct vm_area_struct *vma,
> -			unsigned long addr, pte_t ptent)
> -{
> -	unsigned long index;
> -	struct folio *folio;
> -
> -	if (!vma->vm_file) /* anonymous vma */
> -		return NULL;
> -	if (!(mc.flags & MOVE_FILE))
> -		return NULL;
> -
> -	/* folio is moved even if it's not RSS of this task(page-faulted). */
> -	/* shmem/tmpfs may report page out on swap: account for that too. */
> -	index = linear_page_index(vma, addr);
> -	folio = filemap_get_incore_folio(vma->vm_file->f_mapping, index);
> -	if (IS_ERR(folio))
> -		return NULL;
> -	return folio_file_page(folio, index);
> -}
> -
> -static void memcg1_check_events(struct mem_cgroup *memcg, int nid);
> -static void memcg1_charge_statistics(struct mem_cgroup *memcg, int nr_pages);
> -
> -/**
> - * mem_cgroup_move_account - move account of the folio
> - * @folio: The folio.
> - * @compound: charge the page as compound or small page
> - * @from: mem_cgroup which the folio is moved from.
> - * @to:	mem_cgroup which the folio is moved to. @from != @to.
> - *
> - * The folio must be locked and not on the LRU.
> - *
> - * This function doesn't do "charge" to new cgroup and doesn't do "uncharge"
> - * from old cgroup.
> - */
> -static int mem_cgroup_move_account(struct folio *folio,
> -				   bool compound,
> -				   struct mem_cgroup *from,
> -				   struct mem_cgroup *to)
> -{
> -	struct lruvec *from_vec, *to_vec;
> -	struct pglist_data *pgdat;
> -	unsigned int nr_pages = compound ? folio_nr_pages(folio) : 1;
> -	int nid, ret;
> -
> -	VM_BUG_ON(from == to);
> -	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
> -	VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
> -	VM_BUG_ON(compound && !folio_test_large(folio));
> -
> -	ret = -EINVAL;
> -	if (folio_memcg(folio) != from)
> -		goto out;
> -
> -	pgdat = folio_pgdat(folio);
> -	from_vec = mem_cgroup_lruvec(from, pgdat);
> -	to_vec = mem_cgroup_lruvec(to, pgdat);
> -
> -	folio_memcg_lock(folio);
> -
> -	if (folio_test_anon(folio)) {
> -		if (folio_mapped(folio)) {
> -			__mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages);
> -			__mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages);
> -			if (folio_test_pmd_mappable(folio)) {
> -				__mod_lruvec_state(from_vec, NR_ANON_THPS,
> -						   -nr_pages);
> -				__mod_lruvec_state(to_vec, NR_ANON_THPS,
> -						   nr_pages);
> -			}
> -		}
> -	} else {
> -		__mod_lruvec_state(from_vec, NR_FILE_PAGES, -nr_pages);
> -		__mod_lruvec_state(to_vec, NR_FILE_PAGES, nr_pages);
> -
> -		if (folio_test_swapbacked(folio)) {
> -			__mod_lruvec_state(from_vec, NR_SHMEM, -nr_pages);
> -			__mod_lruvec_state(to_vec, NR_SHMEM, nr_pages);
> -		}
> -
> -		if (folio_mapped(folio)) {
> -			__mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages);
> -			__mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages);
> -		}
> -
> -		if (folio_test_dirty(folio)) {
> -			struct address_space *mapping = folio_mapping(folio);
> -
> -			if (mapping_can_writeback(mapping)) {
> -				__mod_lruvec_state(from_vec, NR_FILE_DIRTY,
> -						   -nr_pages);
> -				__mod_lruvec_state(to_vec, NR_FILE_DIRTY,
> -						   nr_pages);
> -			}
> -		}
> -	}
> -
> -#ifdef CONFIG_SWAP
> -	if (folio_test_swapcache(folio)) {
> -		__mod_lruvec_state(from_vec, NR_SWAPCACHE, -nr_pages);
> -		__mod_lruvec_state(to_vec, NR_SWAPCACHE, nr_pages);
> -	}
> -#endif
> -	if (folio_test_writeback(folio)) {
> -		__mod_lruvec_state(from_vec, NR_WRITEBACK, -nr_pages);
> -		__mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages);
> -	}
> -
> -	/*
> -	 * All state has been migrated, let's switch to the new memcg.
> -	 *
> -	 * It is safe to change page's memcg here because the page
> -	 * is referenced, charged, isolated, and locked: we can't race
> -	 * with (un)charging, migration, LRU putback, or anything else
> -	 * that would rely on a stable page's memory cgroup.
> -	 *
> -	 * Note that folio_memcg_lock is a memcg lock, not a page lock,
> -	 * to save space. As soon as we switch page's memory cgroup to a
> -	 * new memcg that isn't locked, the above state can change
> -	 * concurrently again. Make sure we're truly done with it.
> -	 */
> -	smp_mb();
> -
> -	css_get(&to->css);
> -	css_put(&from->css);
> -
> -	folio->memcg_data = (unsigned long)to;
> -
> -	__folio_memcg_unlock(from);
> -
> -	ret = 0;
> -	nid = folio_nid(folio);
> -
> -	local_irq_disable();
> -	memcg1_charge_statistics(to, nr_pages);
> -	memcg1_check_events(to, nid);
> -	memcg1_charge_statistics(from, -nr_pages);
> -	memcg1_check_events(from, nid);
> -	local_irq_enable();
> -out:
> -	return ret;
> -}
> -
> -/**
> - * get_mctgt_type - get target type of moving charge
> - * @vma: the vma the pte to be checked belongs
> - * @addr: the address corresponding to the pte to be checked
> - * @ptent: the pte to be checked
> - * @target: the pointer the target page or swap ent will be stored(can be NULL)
> - *
> - * Context: Called with pte lock held.
> - * Return:
> - * * MC_TARGET_NONE - If the pte is not a target for move charge.
> - * * MC_TARGET_PAGE - If the page corresponding to this pte is a target for
> - *   move charge. If @target is not NULL, the folio is stored in target->folio
> - *   with extra refcnt taken (Caller should release it).
> - * * MC_TARGET_SWAP - If the swap entry corresponding to this pte is a
> - *   target for charge migration.  If @target is not NULL, the entry is
> - *   stored in target->ent.
> - * * MC_TARGET_DEVICE - Like MC_TARGET_PAGE but page is device memory and
> - *   thus not on the lru.  For now such page is charged like a regular page
> - *   would be as it is just special memory taking the place of a regular page.
> - *   See Documentations/vm/hmm.txt and include/linux/hmm.h
> - */
> -static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
> -		unsigned long addr, pte_t ptent, union mc_target *target)
> -{
> -	struct page *page = NULL;
> -	struct folio *folio;
> -	enum mc_target_type ret = MC_TARGET_NONE;
> -	swp_entry_t ent = { .val = 0 };
> -
> -	if (pte_present(ptent))
> -		page = mc_handle_present_pte(vma, addr, ptent);
> -	else if (pte_none_mostly(ptent))
> -		/*
> -		 * PTE markers should be treated as a none pte here, separated
> -		 * from other swap handling below.
> -		 */
> -		page = mc_handle_file_pte(vma, addr, ptent);
> -	else if (is_swap_pte(ptent))
> -		page = mc_handle_swap_pte(vma, ptent, &ent);
> -
> -	if (page)
> -		folio = page_folio(page);
> -	if (target && page) {
> -		if (!folio_trylock(folio)) {
> -			folio_put(folio);
> -			return ret;
> -		}
> -		/*
> -		 * page_mapped() must be stable during the move. This
> -		 * pte is locked, so if it's present, the page cannot
> -		 * become unmapped. If it isn't, we have only partial
> -		 * control over the mapped state: the page lock will
> -		 * prevent new faults against pagecache and swapcache,
> -		 * so an unmapped page cannot become mapped. However,
> -		 * if the page is already mapped elsewhere, it can
> -		 * unmap, and there is nothing we can do about it.
> -		 * Alas, skip moving the page in this case.
> -		 */
> -		if (!pte_present(ptent) && page_mapped(page)) {
> -			folio_unlock(folio);
> -			folio_put(folio);
> -			return ret;
> -		}
> -	}
> -
> -	if (!page && !ent.val)
> -		return ret;
> -	if (page) {
> -		/*
> -		 * Do only loose check w/o serialization.
> -		 * mem_cgroup_move_account() checks the page is valid or
> -		 * not under LRU exclusion.
> -		 */
> -		if (folio_memcg(folio) == mc.from) {
> -			ret = MC_TARGET_PAGE;
> -			if (folio_is_device_private(folio) ||
> -			    folio_is_device_coherent(folio))
> -				ret = MC_TARGET_DEVICE;
> -			if (target)
> -				target->folio = folio;
> -		}
> -		if (!ret || !target) {
> -			if (target)
> -				folio_unlock(folio);
> -			folio_put(folio);
> -		}
> -	}
> -	/*
> -	 * There is a swap entry and a page doesn't exist or isn't charged.
> -	 * But we cannot move a tail-page in a THP.
> -	 */
> -	if (ent.val && !ret && (!page || !PageTransCompound(page)) &&
> -	    mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) {
> -		ret = MC_TARGET_SWAP;
> -		if (target)
> -			target->ent = ent;
> -	}
> -	return ret;
> -}
> -
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -/*
> - * We don't consider PMD mapped swapping or file mapped pages because THP does
> - * not support them for now.
> - * Caller should make sure that pmd_trans_huge(pmd) is true.
> - */
> -static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma,
> -		unsigned long addr, pmd_t pmd, union mc_target *target)
> -{
> -	struct page *page = NULL;
> -	struct folio *folio;
> -	enum mc_target_type ret = MC_TARGET_NONE;
> -
> -	if (unlikely(is_swap_pmd(pmd))) {
> -		VM_BUG_ON(thp_migration_supported() &&
> -				  !is_pmd_migration_entry(pmd));
> -		return ret;
> -	}
> -	page = pmd_page(pmd);
> -	VM_BUG_ON_PAGE(!page || !PageHead(page), page);
> -	folio = page_folio(page);
> -	if (!(mc.flags & MOVE_ANON))
> -		return ret;
> -	if (folio_memcg(folio) == mc.from) {
> -		ret = MC_TARGET_PAGE;
> -		if (target) {
> -			folio_get(folio);
> -			if (!folio_trylock(folio)) {
> -				folio_put(folio);
> -				return MC_TARGET_NONE;
> -			}
> -			target->folio = folio;
> -		}
> -	}
> -	return ret;
> -}
> -#else
> -static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma,
> -		unsigned long addr, pmd_t pmd, union mc_target *target)
> -{
> -	return MC_TARGET_NONE;
> -}
> -#endif
> -
> -static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd,
> -					unsigned long addr, unsigned long end,
> -					struct mm_walk *walk)
> -{
> -	struct vm_area_struct *vma = walk->vma;
> -	pte_t *pte;
> -	spinlock_t *ptl;
> -
> -	ptl = pmd_trans_huge_lock(pmd, vma);
> -	if (ptl) {
> -		/*
> -		 * Note their can not be MC_TARGET_DEVICE for now as we do not
> -		 * support transparent huge page with MEMORY_DEVICE_PRIVATE but
> -		 * this might change.
> -		 */
> -		if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
> -			mc.precharge += HPAGE_PMD_NR;
> -		spin_unlock(ptl);
> -		return 0;
> -	}
> -
> -	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> -	if (!pte)
> -		return 0;
> -	for (; addr != end; pte++, addr += PAGE_SIZE)
> -		if (get_mctgt_type(vma, addr, ptep_get(pte), NULL))
> -			mc.precharge++;	/* increment precharge temporarily */
> -	pte_unmap_unlock(pte - 1, ptl);
> -	cond_resched();
> -
> -	return 0;
> -}
> -
> -static const struct mm_walk_ops precharge_walk_ops = {
> -	.pmd_entry	= mem_cgroup_count_precharge_pte_range,
> -	.walk_lock	= PGWALK_RDLOCK,
> -};
> -
> -static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm)
> -{
> -	unsigned long precharge;
> -
> -	mmap_read_lock(mm);
> -	walk_page_range(mm, 0, ULONG_MAX, &precharge_walk_ops, NULL);
> -	mmap_read_unlock(mm);
> -
> -	precharge = mc.precharge;
> -	mc.precharge = 0;
> -
> -	return precharge;
> -}
> -
> -static int mem_cgroup_precharge_mc(struct mm_struct *mm)
> -{
> -	unsigned long precharge = mem_cgroup_count_precharge(mm);
> -
> -	VM_BUG_ON(mc.moving_task);
> -	mc.moving_task = current;
> -	return mem_cgroup_do_precharge(precharge);
> -}
> -
> -/* cancels all extra charges on mc.from and mc.to, and wakes up all waiters. */
> -static void __mem_cgroup_clear_mc(void)
> -{
> -	struct mem_cgroup *from = mc.from;
> -	struct mem_cgroup *to = mc.to;
> -
> -	/* we must uncharge all the leftover precharges from mc.to */
> -	if (mc.precharge) {
> -		mem_cgroup_cancel_charge(mc.to, mc.precharge);
> -		mc.precharge = 0;
> -	}
> -	/*
> -	 * we didn't uncharge from mc.from at mem_cgroup_move_account(), so
> -	 * we must uncharge here.
> -	 */
> -	if (mc.moved_charge) {
> -		mem_cgroup_cancel_charge(mc.from, mc.moved_charge);
> -		mc.moved_charge = 0;
> -	}
> -	/* we must fixup refcnts and charges */
> -	if (mc.moved_swap) {
> -		/* uncharge swap account from the old cgroup */
> -		if (!mem_cgroup_is_root(mc.from))
> -			page_counter_uncharge(&mc.from->memsw, mc.moved_swap);
> -
> -		mem_cgroup_id_put_many(mc.from, mc.moved_swap);
> -
> -		/*
> -		 * we charged both to->memory and to->memsw, so we
> -		 * should uncharge to->memory.
> -		 */
> -		if (!mem_cgroup_is_root(mc.to))
> -			page_counter_uncharge(&mc.to->memory, mc.moved_swap);
> -
> -		mc.moved_swap = 0;
> -	}
> -	memcg1_oom_recover(from);
> -	memcg1_oom_recover(to);
> -	wake_up_all(&mc.waitq);
> -}
> -
> -static void mem_cgroup_clear_mc(void)
> -{
> -	struct mm_struct *mm = mc.mm;
> -
> -	/*
> -	 * we must clear moving_task before waking up waiters at the end of
> -	 * task migration.
> -	 */
> -	mc.moving_task = NULL;
> -	__mem_cgroup_clear_mc();
> -	spin_lock(&mc.lock);
> -	mc.from = NULL;
> -	mc.to = NULL;
> -	mc.mm = NULL;
> -	spin_unlock(&mc.lock);
> -
> -	mmput(mm);
> -}
> -
> -int memcg1_can_attach(struct cgroup_taskset *tset)
> -{
> -	struct cgroup_subsys_state *css;
> -	struct mem_cgroup *memcg = NULL; /* unneeded init to make gcc happy */
> -	struct mem_cgroup *from;
> -	struct task_struct *leader, *p;
> -	struct mm_struct *mm;
> -	unsigned long move_flags;
> -	int ret = 0;
> -
> -	/* charge immigration isn't supported on the default hierarchy */
> -	if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
> -		return 0;
> -
> -	/*
> -	 * Multi-process migrations only happen on the default hierarchy
> -	 * where charge immigration is not used.  Perform charge
> -	 * immigration if @tset contains a leader and whine if there are
> -	 * multiple.
> -	 */
> -	p = NULL;
> -	cgroup_taskset_for_each_leader(leader, css, tset) {
> -		WARN_ON_ONCE(p);
> -		p = leader;
> -		memcg = mem_cgroup_from_css(css);
> -	}
> -	if (!p)
> -		return 0;
> -
> -	/*
> -	 * We are now committed to this value whatever it is. Changes in this
> -	 * tunable will only affect upcoming migrations, not the current one.
> -	 * So we need to save it, and keep it going.
> -	 */
> -	move_flags = READ_ONCE(memcg->move_charge_at_immigrate);
> -	if (!move_flags)
> -		return 0;
> -
> -	from = mem_cgroup_from_task(p);
> -
> -	VM_BUG_ON(from == memcg);
> -
> -	mm = get_task_mm(p);
> -	if (!mm)
> -		return 0;
> -	/* We move charges only when we move a owner of the mm */
> -	if (mm->owner == p) {
> -		VM_BUG_ON(mc.from);
> -		VM_BUG_ON(mc.to);
> -		VM_BUG_ON(mc.precharge);
> -		VM_BUG_ON(mc.moved_charge);
> -		VM_BUG_ON(mc.moved_swap);
> -
> -		spin_lock(&mc.lock);
> -		mc.mm = mm;
> -		mc.from = from;
> -		mc.to = memcg;
> -		mc.flags = move_flags;
> -		spin_unlock(&mc.lock);
> -		/* We set mc.moving_task later */
> -
> -		ret = mem_cgroup_precharge_mc(mm);
> -		if (ret)
> -			mem_cgroup_clear_mc();
> -	} else {
> -		mmput(mm);
> -	}
> -	return ret;
> -}
> -
> -void memcg1_cancel_attach(struct cgroup_taskset *tset)
> -{
> -	if (mc.to)
> -		mem_cgroup_clear_mc();
> -}
> -
> -static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
> -				unsigned long addr, unsigned long end,
> -				struct mm_walk *walk)
> -{
> -	int ret = 0;
> -	struct vm_area_struct *vma = walk->vma;
> -	pte_t *pte;
> -	spinlock_t *ptl;
> -	enum mc_target_type target_type;
> -	union mc_target target;
> -	struct folio *folio;
> -
> -	ptl = pmd_trans_huge_lock(pmd, vma);
> -	if (ptl) {
> -		if (mc.precharge < HPAGE_PMD_NR) {
> -			spin_unlock(ptl);
> -			return 0;
> -		}
> -		target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
> -		if (target_type == MC_TARGET_PAGE) {
> -			folio = target.folio;
> -			if (folio_isolate_lru(folio)) {
> -				if (!mem_cgroup_move_account(folio, true,
> -							     mc.from, mc.to)) {
> -					mc.precharge -= HPAGE_PMD_NR;
> -					mc.moved_charge += HPAGE_PMD_NR;
> -				}
> -				folio_putback_lru(folio);
> -			}
> -			folio_unlock(folio);
> -			folio_put(folio);
> -		} else if (target_type == MC_TARGET_DEVICE) {
> -			folio = target.folio;
> -			if (!mem_cgroup_move_account(folio, true,
> -						     mc.from, mc.to)) {
> -				mc.precharge -= HPAGE_PMD_NR;
> -				mc.moved_charge += HPAGE_PMD_NR;
> -			}
> -			folio_unlock(folio);
> -			folio_put(folio);
> -		}
> -		spin_unlock(ptl);
> -		return 0;
> -	}
> -
> -retry:
> -	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
> -	if (!pte)
> -		return 0;
> -	for (; addr != end; addr += PAGE_SIZE) {
> -		pte_t ptent = ptep_get(pte++);
> -		bool device = false;
> -		swp_entry_t ent;
> -
> -		if (!mc.precharge)
> -			break;
> -
> -		switch (get_mctgt_type(vma, addr, ptent, &target)) {
> -		case MC_TARGET_DEVICE:
> -			device = true;
> -			fallthrough;
> -		case MC_TARGET_PAGE:
> -			folio = target.folio;
> -			/*
> -			 * We can have a part of the split pmd here. Moving it
> -			 * can be done but it would be too convoluted so simply
> -			 * ignore such a partial THP and keep it in original
> -			 * memcg. There should be somebody mapping the head.
> -			 */
> -			if (folio_test_large(folio))
> -				goto put;
> -			if (!device && !folio_isolate_lru(folio))
> -				goto put;
> -			if (!mem_cgroup_move_account(folio, false,
> -						mc.from, mc.to)) {
> -				mc.precharge--;
> -				/* we uncharge from mc.from later. */
> -				mc.moved_charge++;
> -			}
> -			if (!device)
> -				folio_putback_lru(folio);
> -put:			/* get_mctgt_type() gets & locks the page */
> -			folio_unlock(folio);
> -			folio_put(folio);
> -			break;
> -		case MC_TARGET_SWAP:
> -			ent = target.ent;
> -			if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) {
> -				mc.precharge--;
> -				mem_cgroup_id_get_many(mc.to, 1);
> -				/* we fixup other refcnts and charges later. */
> -				mc.moved_swap++;
> -			}
> -			break;
> -		default:
> -			break;
> -		}
> -	}
> -	pte_unmap_unlock(pte - 1, ptl);
> -	cond_resched();
> -
> -	if (addr != end) {
> -		/*
> -		 * We have consumed all precharges we got in can_attach().
> -		 * We try charge one by one, but don't do any additional
> -		 * charges to mc.to if we have failed in charge once in attach()
> -		 * phase.
> -		 */
> -		ret = mem_cgroup_do_precharge(1);
> -		if (!ret)
> -			goto retry;
> -	}
> -
> -	return ret;
> -}
> -
> -static const struct mm_walk_ops charge_walk_ops = {
> -	.pmd_entry	= mem_cgroup_move_charge_pte_range,
> -	.walk_lock	= PGWALK_RDLOCK,
> -};
> -
> -static void mem_cgroup_move_charge(void)
> -{
> -	lru_add_drain_all();
> -	/*
> -	 * Signal folio_memcg_lock() to take the memcg's move_lock
> -	 * while we're moving its pages to another memcg. Then wait
> -	 * for already started RCU-only updates to finish.
> -	 */
> -	atomic_inc(&mc.from->moving_account);
> -	synchronize_rcu();
> -retry:
> -	if (unlikely(!mmap_read_trylock(mc.mm))) {
> -		/*
> -		 * Someone who are holding the mmap_lock might be waiting in
> -		 * waitq. So we cancel all extra charges, wake up all waiters,
> -		 * and retry. Because we cancel precharges, we might not be able
> -		 * to move enough charges, but moving charge is a best-effort
> -		 * feature anyway, so it wouldn't be a big problem.
> -		 */
> -		__mem_cgroup_clear_mc();
> -		cond_resched();
> -		goto retry;
> -	}
> -	/*
> -	 * When we have consumed all precharges and failed in doing
> -	 * additional charge, the page walk just aborts.
> -	 */
> -	walk_page_range(mc.mm, 0, ULONG_MAX, &charge_walk_ops, NULL);
> -	mmap_read_unlock(mc.mm);
> -	atomic_dec(&mc.from->moving_account);
> -}
> -
> -void memcg1_move_task(void)
> -{
> -	if (mc.to) {
> -		mem_cgroup_move_charge();
> -		mem_cgroup_clear_mc();
> -	}
> -}
> -
> -#else	/* !CONFIG_MMU */
> -int memcg1_can_attach(struct cgroup_taskset *tset)
> -{
> -	return 0;
> -}
> -void memcg1_cancel_attach(struct cgroup_taskset *tset)
> -{
> -}
> -void memcg1_move_task(void)
> -{
> -}
> -#endif
> -
>  static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap)
>  {
>  	struct mem_cgroup_threshold_ary *t;
> diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h
> index c0672e25bcdb..0e3b82951d91 100644
> --- a/mm/memcontrol-v1.h
> +++ b/mm/memcontrol-v1.h
> @@ -80,12 +80,7 @@ static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg)
>  	WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX);
>  }
>  
> -bool memcg1_wait_acct_move(struct mem_cgroup *memcg);
> -
>  struct cgroup_taskset;
> -int memcg1_can_attach(struct cgroup_taskset *tset);
> -void memcg1_cancel_attach(struct cgroup_taskset *tset);
> -void memcg1_move_task(void);
>  void memcg1_css_offline(struct mem_cgroup *memcg);
>  
>  /* for encoding cft->private value on file */
> @@ -130,7 +125,6 @@ static inline void memcg1_free_events(struct mem_cgroup *memcg) {}
>  static inline void memcg1_memcg_init(struct mem_cgroup *memcg) {}
>  static inline void memcg1_remove_from_trees(struct mem_cgroup *memcg) {}
>  static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) {}
> -static inline bool memcg1_wait_acct_move(struct mem_cgroup *memcg) { return false; }
>  static inline void memcg1_css_offline(struct mem_cgroup *memcg) {}
>  
>  static inline bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked) { return true; }
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 5c3a8629ef3e..94279b9c766a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2242,12 +2242,6 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  	 */
>  	if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
>  		goto retry;
> -	/*
> -	 * At task move, charge accounts can be doubly counted. So, it's
> -	 * better to wait until the end of task_move if something is going on.
> -	 */
> -	if (memcg1_wait_acct_move(mem_over_limit))
> -		goto retry;
>  
>  	if (nr_retries--)
>  		goto retry;
> @@ -4439,9 +4433,6 @@ struct cgroup_subsys memory_cgrp_subsys = {
>  	.exit = mem_cgroup_exit,
>  	.dfl_cftypes = memory_files,
>  #ifdef CONFIG_MEMCG_V1
> -	.can_attach = memcg1_can_attach,
> -	.cancel_attach = memcg1_cancel_attach,
> -	.post_attach = memcg1_move_task,
>  	.legacy_cftypes = mem_cgroup_legacy_files,
>  #endif
>  	.early_init = 0,
> -- 
> 2.43.5

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 3/3] memcg-v1: remove memcg move locking code
  2024-10-24  6:57   ` [RFC PATCH 3/3] memcg-v1: remove memcg move locking code Shakeel Butt
@ 2024-10-24  9:16     ` Michal Hocko
  2024-10-24 17:23       ` Shakeel Butt
  2024-10-24 16:50     ` Roman Gushchin
  2024-10-25  1:23     ` Shakeel Butt
  2 siblings, 1 reply; 56+ messages in thread
From: Michal Hocko @ 2024-10-24  9:16 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Roman Gushchin, Muchun Song,
	Hugh Dickins, linux-mm, cgroups, linux-kernel, linux-fsdevel,
	linux-doc, Meta kernel team

On Wed 23-10-24 23:57:12, Shakeel Butt wrote:
> The memcg v1's charge move feature has been deprecated. There is no need
> to have any locking or protection against the moving charge. Let's
> proceed to remove all the locking code related to charge moving.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---
> -/**
> - * folio_memcg_lock - Bind a folio to its memcg.
> - * @folio: The folio.
> - *
> - * This function prevents unlocked LRU folios from being moved to
> - * another cgroup.
> - *
> - * It ensures lifetime of the bound memcg.  The caller is responsible
> - * for the lifetime of the folio.
> - */
> -void folio_memcg_lock(struct folio *folio)
> -{
> -	struct mem_cgroup *memcg;
> -	unsigned long flags;
> -
> -	/*
> -	 * The RCU lock is held throughout the transaction.  The fast
> -	 * path can get away without acquiring the memcg->move_lock
> -	 * because page moving starts with an RCU grace period.
> -         */
> -	rcu_read_lock();

Is it safe to remove the implicit RCU?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 1/3] memcg-v1: fully deprecate move_charge_at_immigrate
  2024-10-24  6:57   ` [RFC PATCH 1/3] memcg-v1: fully deprecate move_charge_at_immigrate Shakeel Butt
  2024-10-24  9:14     ` Michal Hocko
@ 2024-10-24 16:49     ` Roman Gushchin
  1 sibling, 0 replies; 56+ messages in thread
From: Roman Gushchin @ 2024-10-24 16:49 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
	Hugh Dickins, linux-mm, cgroups, linux-kernel, linux-fsdevel,
	linux-doc, Meta kernel team

On Wed, Oct 23, 2024 at 11:57:10PM -0700, Shakeel Butt wrote:
> Proceed with the complete deprecation of memcg v1's charge moving
> feature. The deprecation warning has been in the kernel for almost two
> years and has been ported to all stable kernel since. Now is the time to
> fully deprecate this feature.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>

Nice! Thank you!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 2/3] memcg-v1: remove charge move code
  2024-10-24  6:57   ` [RFC PATCH 2/3] memcg-v1: remove charge move code Shakeel Butt
  2024-10-24  9:14     ` Michal Hocko
@ 2024-10-24 16:50     ` Roman Gushchin
  1 sibling, 0 replies; 56+ messages in thread
From: Roman Gushchin @ 2024-10-24 16:50 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
	Hugh Dickins, linux-mm, cgroups, linux-kernel, linux-fsdevel,
	linux-doc, Meta kernel team

On Wed, Oct 23, 2024 at 11:57:11PM -0700, Shakeel Butt wrote:
> The memcg-v1 charge move feature has been deprecated completely and
> let's remove the relevant code as well.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 3/3] memcg-v1: remove memcg move locking code
  2024-10-24  6:57   ` [RFC PATCH 3/3] memcg-v1: remove memcg move locking code Shakeel Butt
  2024-10-24  9:16     ` Michal Hocko
@ 2024-10-24 16:50     ` Roman Gushchin
  2024-10-24 17:26       ` Shakeel Butt
  2024-10-25  1:23     ` Shakeel Butt
  2 siblings, 1 reply; 56+ messages in thread
From: Roman Gushchin @ 2024-10-24 16:50 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
	Hugh Dickins, linux-mm, cgroups, linux-kernel, linux-fsdevel,
	linux-doc, Meta kernel team

On Wed, Oct 23, 2024 at 11:57:12PM -0700, Shakeel Butt wrote:
> The memcg v1's charge move feature has been deprecated. There is no need
> to have any locking or protection against the moving charge. Let's
> proceed to remove all the locking code related to charge moving.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 1/3] memcg-v1: fully deprecate move_charge_at_immigrate
  2024-10-24  9:14     ` Michal Hocko
@ 2024-10-24 16:51       ` Roman Gushchin
  2024-10-24 17:16         ` Shakeel Butt
  0 siblings, 1 reply; 56+ messages in thread
From: Roman Gushchin @ 2024-10-24 16:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Shakeel Butt, Andrew Morton, Johannes Weiner, Muchun Song,
	Hugh Dickins, linux-mm, cgroups, linux-kernel, linux-fsdevel,
	linux-doc, Meta kernel team

On Thu, Oct 24, 2024 at 11:14:01AM +0200, Michal Hocko wrote:
> On Wed 23-10-24 23:57:10, Shakeel Butt wrote:
> > Proceed with the complete deprecation of memcg v1's charge moving
> > feature. The deprecation warning has been in the kernel for almost two
> > years and has been ported to all stable kernel since. Now is the time to
> > fully deprecate this feature.
> > 
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> 
> I fine with this move, just one detail we might need to consider
> [...]
> > @@ -606,17 +606,7 @@ static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css,
> >  		     "Please report your usecase to linux-mm@kvack.org if you "
> >  		     "depend on this functionality.\n");
> >  
> > -	if (val & ~MOVE_MASK)
> > -		return -EINVAL;
> > -
> > -	/*
> > -	 * No kind of locking is needed in here, because ->can_attach() will
> > -	 * check this value once in the beginning of the process, and then carry
> > -	 * on with stale data. This means that changes to this value will only
> > -	 * affect task migrations starting after the change.
> > -	 */
> > -	memcg->move_charge_at_immigrate = val;
> > -	return 0;
> > +	return -EINVAL;
> 
> Would it make more sense to -EINVAL only if val != 0? The reason being
> that some userspace might be just writing 0 here for whatever reason and
> see the failure unexpected.

I think it's a good idea.

Thanks!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 1/3] memcg-v1: fully deprecate move_charge_at_immigrate
  2024-10-24 16:51       ` Roman Gushchin
@ 2024-10-24 17:16         ` Shakeel Butt
  0 siblings, 0 replies; 56+ messages in thread
From: Shakeel Butt @ 2024-10-24 17:16 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Michal Hocko, Andrew Morton, Johannes Weiner, Muchun Song,
	Hugh Dickins, linux-mm, cgroups, linux-kernel, linux-fsdevel,
	linux-doc, Meta kernel team

On Thu, Oct 24, 2024 at 04:51:46PM GMT, Roman Gushchin wrote:
> On Thu, Oct 24, 2024 at 11:14:01AM +0200, Michal Hocko wrote:
> > On Wed 23-10-24 23:57:10, Shakeel Butt wrote:
> > > Proceed with the complete deprecation of memcg v1's charge moving
> > > feature. The deprecation warning has been in the kernel for almost two
> > > years and has been ported to all stable kernel since. Now is the time to
> > > fully deprecate this feature.
> > > 
> > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > 
> > I fine with this move, just one detail we might need to consider
> > [...]
> > > @@ -606,17 +606,7 @@ static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css,
> > >  		     "Please report your usecase to linux-mm@kvack.org if you "
> > >  		     "depend on this functionality.\n");
> > >  
> > > -	if (val & ~MOVE_MASK)
> > > -		return -EINVAL;
> > > -
> > > -	/*
> > > -	 * No kind of locking is needed in here, because ->can_attach() will
> > > -	 * check this value once in the beginning of the process, and then carry
> > > -	 * on with stale data. This means that changes to this value will only
> > > -	 * affect task migrations starting after the change.
> > > -	 */
> > > -	memcg->move_charge_at_immigrate = val;
> > > -	return 0;
> > > +	return -EINVAL;
> > 
> > Would it make more sense to -EINVAL only if val != 0? The reason being
> > that some userspace might be just writing 0 here for whatever reason and
> > see the failure unexpected.
> 
> I think it's a good idea.
> 
> Thanks!

Thanks Michal and Roman for the review and I will make this change in
the next version.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 3/3] memcg-v1: remove memcg move locking code
  2024-10-24  9:16     ` Michal Hocko
@ 2024-10-24 17:23       ` Shakeel Butt
  2024-10-24 18:54         ` Roman Gushchin
  0 siblings, 1 reply; 56+ messages in thread
From: Shakeel Butt @ 2024-10-24 17:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Johannes Weiner, Roman Gushchin, Muchun Song,
	Hugh Dickins, linux-mm, cgroups, linux-kernel, linux-fsdevel,
	linux-doc, Meta kernel team

On Thu, Oct 24, 2024 at 11:16:52AM GMT, Michal Hocko wrote:
> On Wed 23-10-24 23:57:12, Shakeel Butt wrote:
> > The memcg v1's charge move feature has been deprecated. There is no need
> > to have any locking or protection against the moving charge. Let's
> > proceed to remove all the locking code related to charge moving.
> > 
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > ---
> > -/**
> > - * folio_memcg_lock - Bind a folio to its memcg.
> > - * @folio: The folio.
> > - *
> > - * This function prevents unlocked LRU folios from being moved to
> > - * another cgroup.
> > - *
> > - * It ensures lifetime of the bound memcg.  The caller is responsible
> > - * for the lifetime of the folio.
> > - */
> > -void folio_memcg_lock(struct folio *folio)
> > -{
> > -	struct mem_cgroup *memcg;
> > -	unsigned long flags;
> > -
> > -	/*
> > -	 * The RCU lock is held throughout the transaction.  The fast
> > -	 * path can get away without acquiring the memcg->move_lock
> > -	 * because page moving starts with an RCU grace period.
> > -         */
> > -	rcu_read_lock();
> 
> Is it safe to remove the implicit RCU?

Good question. I think it will be safe to keep the RCU in this patch and
in the followup examine each place and decide to remove RCU or not.

Thanks for the review.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 3/3] memcg-v1: remove memcg move locking code
  2024-10-24 16:50     ` Roman Gushchin
@ 2024-10-24 17:26       ` Shakeel Butt
  2024-10-24 19:45         ` Michal Hocko
  0 siblings, 1 reply; 56+ messages in thread
From: Shakeel Butt @ 2024-10-24 17:26 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
	Hugh Dickins, linux-mm, cgroups, linux-kernel, linux-fsdevel,
	linux-doc, Meta kernel team

On Thu, Oct 24, 2024 at 04:50:37PM GMT, Roman Gushchin wrote:
> On Wed, Oct 23, 2024 at 11:57:12PM -0700, Shakeel Butt wrote:
> > The memcg v1's charge move feature has been deprecated. There is no need
> > to have any locking or protection against the moving charge. Let's
> > proceed to remove all the locking code related to charge moving.
> > 
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> 
> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>

Thanks Roman for the review. Based on Michal's question, I am planning
to keep the RCU locking in the next version of this patch and folowup
with clear understanding where we really need RCU and where we don't.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 3/3] memcg-v1: remove memcg move locking code
  2024-10-24 17:23       ` Shakeel Butt
@ 2024-10-24 18:54         ` Roman Gushchin
  2024-10-24 19:38           ` Shakeel Butt
  0 siblings, 1 reply; 56+ messages in thread
From: Roman Gushchin @ 2024-10-24 18:54 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Michal Hocko, Andrew Morton, Johannes Weiner, Muchun Song,
	Hugh Dickins, linux-mm, cgroups, linux-kernel, linux-fsdevel,
	linux-doc, Meta kernel team

On Thu, Oct 24, 2024 at 10:23:49AM -0700, Shakeel Butt wrote:
> On Thu, Oct 24, 2024 at 11:16:52AM GMT, Michal Hocko wrote:
> > On Wed 23-10-24 23:57:12, Shakeel Butt wrote:
> > > The memcg v1's charge move feature has been deprecated. There is no need
> > > to have any locking or protection against the moving charge. Let's
> > > proceed to remove all the locking code related to charge moving.
> > > 
> > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > > ---
> > > -/**
> > > - * folio_memcg_lock - Bind a folio to its memcg.
> > > - * @folio: The folio.
> > > - *
> > > - * This function prevents unlocked LRU folios from being moved to
> > > - * another cgroup.
> > > - *
> > > - * It ensures lifetime of the bound memcg.  The caller is responsible
> > > - * for the lifetime of the folio.
> > > - */
> > > -void folio_memcg_lock(struct folio *folio)
> > > -{
> > > -	struct mem_cgroup *memcg;
> > > -	unsigned long flags;
> > > -
> > > -	/*
> > > -	 * The RCU lock is held throughout the transaction.  The fast
> > > -	 * path can get away without acquiring the memcg->move_lock
> > > -	 * because page moving starts with an RCU grace period.
> > > -         */
> > > -	rcu_read_lock();
> > 
> > Is it safe to remove the implicit RCU?
> 
> Good question. I think it will be safe to keep the RCU in this patch and
> in the followup examine each place and decide to remove RCU or not.

I took a really quick look and based on it I believe it is safe.
Shakeel, can you, please, check too and preferably keep your code intact.
I think it's better to remove it all together, rather than do it in two steps.
If we really need rcu somewhere, we can replace folio_memcg_lock()/unlock()
with an explicit rcu_read_lock()/rcu_read_unlock().

Thanks!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 3/3] memcg-v1: remove memcg move locking code
  2024-10-24 18:54         ` Roman Gushchin
@ 2024-10-24 19:38           ` Shakeel Butt
  0 siblings, 0 replies; 56+ messages in thread
From: Shakeel Butt @ 2024-10-24 19:38 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Michal Hocko, Andrew Morton, Johannes Weiner, Muchun Song,
	Hugh Dickins, linux-mm, cgroups, linux-kernel, linux-fsdevel,
	linux-doc, Meta kernel team

On Thu, Oct 24, 2024 at 06:54:01PM GMT, Roman Gushchin wrote:
> On Thu, Oct 24, 2024 at 10:23:49AM -0700, Shakeel Butt wrote:
> > On Thu, Oct 24, 2024 at 11:16:52AM GMT, Michal Hocko wrote:
> > > On Wed 23-10-24 23:57:12, Shakeel Butt wrote:
> > > > The memcg v1's charge move feature has been deprecated. There is no need
> > > > to have any locking or protection against the moving charge. Let's
> > > > proceed to remove all the locking code related to charge moving.
> > > > 
> > > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > > > ---
> > > > -/**
> > > > - * folio_memcg_lock - Bind a folio to its memcg.
> > > > - * @folio: The folio.
> > > > - *
> > > > - * This function prevents unlocked LRU folios from being moved to
> > > > - * another cgroup.
> > > > - *
> > > > - * It ensures lifetime of the bound memcg.  The caller is responsible
> > > > - * for the lifetime of the folio.
> > > > - */
> > > > -void folio_memcg_lock(struct folio *folio)
> > > > -{
> > > > -	struct mem_cgroup *memcg;
> > > > -	unsigned long flags;
> > > > -
> > > > -	/*
> > > > -	 * The RCU lock is held throughout the transaction.  The fast
> > > > -	 * path can get away without acquiring the memcg->move_lock
> > > > -	 * because page moving starts with an RCU grace period.
> > > > -         */
> > > > -	rcu_read_lock();
> > > 
> > > Is it safe to remove the implicit RCU?
> > 
> > Good question. I think it will be safe to keep the RCU in this patch and
> > in the followup examine each place and decide to remove RCU or not.
> 
> I took a really quick look and based on it I believe it is safe.
> Shakeel, can you, please, check too and preferably keep your code intact.
> I think it's better to remove it all together, rather than do it in two steps.
> If we really need rcu somewhere, we can replace folio_memcg_lock()/unlock()
> with an explicit rcu_read_lock()/rcu_read_unlock().
> 

Yup going through that and till now it seems safe. Hopefully I will have
the update by the evening.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 3/3] memcg-v1: remove memcg move locking code
  2024-10-24 17:26       ` Shakeel Butt
@ 2024-10-24 19:45         ` Michal Hocko
  2024-10-24 20:32           ` Yosry Ahmed
  0 siblings, 1 reply; 56+ messages in thread
From: Michal Hocko @ 2024-10-24 19:45 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Roman Gushchin, Andrew Morton, Johannes Weiner, Muchun Song,
	Hugh Dickins, linux-mm, cgroups, linux-kernel, linux-fsdevel,
	linux-doc, Meta kernel team

On Thu 24-10-24 10:26:15, Shakeel Butt wrote:
> On Thu, Oct 24, 2024 at 04:50:37PM GMT, Roman Gushchin wrote:
> > On Wed, Oct 23, 2024 at 11:57:12PM -0700, Shakeel Butt wrote:
> > > The memcg v1's charge move feature has been deprecated. There is no need
> > > to have any locking or protection against the moving charge. Let's
> > > proceed to remove all the locking code related to charge moving.
> > > 
> > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > 
> > Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
> 
> Thanks Roman for the review. Based on Michal's question, I am planning
> to keep the RCU locking in the next version of this patch and folowup
> with clear understanding where we really need RCU and where we don't.

I think it would be safer and easier to review if we drop each RCU
separately or in smaller batches.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 3/3] memcg-v1: remove memcg move locking code
  2024-10-24 19:45         ` Michal Hocko
@ 2024-10-24 20:32           ` Yosry Ahmed
  2024-10-24 21:08             ` Michal Hocko
  0 siblings, 1 reply; 56+ messages in thread
From: Yosry Ahmed @ 2024-10-24 20:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Shakeel Butt, Roman Gushchin, Andrew Morton, Johannes Weiner,
	Muchun Song, Hugh Dickins, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

On Thu, Oct 24, 2024 at 12:45 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 24-10-24 10:26:15, Shakeel Butt wrote:
> > On Thu, Oct 24, 2024 at 04:50:37PM GMT, Roman Gushchin wrote:
> > > On Wed, Oct 23, 2024 at 11:57:12PM -0700, Shakeel Butt wrote:
> > > > The memcg v1's charge move feature has been deprecated. There is no need
> > > > to have any locking or protection against the moving charge. Let's
> > > > proceed to remove all the locking code related to charge moving.
> > > >
> > > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > >
> > > Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
> >
> > Thanks Roman for the review. Based on Michal's question, I am planning
> > to keep the RCU locking in the next version of this patch and folowup
> > with clear understanding where we really need RCU and where we don't.
>
> I think it would be safer and easier to review if we drop each RCU
> separately or in smaller batches.

FWIW if we go with this route, I agree with Roman's idea about
replacing folio_memcg_lock()/unlock()
with an explicit rcu_read_lock()/rcu_read_unlock(), and then having
separate patches/series that remove the RCU annotations. If done in a
separate series, we should comment the explicit RCU calls
appropriately to reflect the fact that they should mostly be removed
(or at least re-evaluated).

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC PATCH 3/3] memcg-v1: remove memcg move locking code
  2024-10-24 20:32           ` Yosry Ahmed
@ 2024-10-24 21:08             ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2024-10-24 21:08 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Shakeel Butt, Roman Gushchin, Andrew Morton, Johannes Weiner,
	Muchun Song, Hugh Dickins, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

On Thu 24-10-24 13:32:53, Yosry Ahmed wrote:
> On Thu, Oct 24, 2024 at 12:45 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Thu 24-10-24 10:26:15, Shakeel Butt wrote:
> > > On Thu, Oct 24, 2024 at 04:50:37PM GMT, Roman Gushchin wrote:
> > > > On Wed, Oct 23, 2024 at 11:57:12PM -0700, Shakeel Butt wrote:
> > > > > The memcg v1's charge move feature has been deprecated. There is no need
> > > > > to have any locking or protection against the moving charge. Let's
> > > > > proceed to remove all the locking code related to charge moving.
> > > > >
> > > > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > > >
> > > > Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
> > >
> > > Thanks Roman for the review. Based on Michal's question, I am planning
> > > to keep the RCU locking in the next version of this patch and folowup
> > > with clear understanding where we really need RCU and where we don't.
> >
> > I think it would be safer and easier to review if we drop each RCU
> > separately or in smaller batches.
> 
> FWIW if we go with this route, I agree with Roman's idea about
> replacing folio_memcg_lock()/unlock()
> with an explicit rcu_read_lock()/rcu_read_unlock(), and then having
> separate patches/series that remove the RCU annotations. If done in a
> separate series, we should comment the explicit RCU calls
> appropriately to reflect the fact that they should mostly be removed
> (or at least re-evaluated).

Agreed!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v1 0/6] memcg-v1: fully deprecate charge moving
@ 2024-10-25  1:22 Shakeel Butt
  2024-10-24  6:57 ` [RFC PATCH 0/3] " Shakeel Butt
                   ` (7 more replies)
  0 siblings, 8 replies; 56+ messages in thread
From: Shakeel Butt @ 2024-10-25  1:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

The memcg v1's charge moving feature has been deprecated for almost 2
years and the kernel warns if someone try to use it. This warning has
been backported to all stable kernel and there have not been any report
of the warning or the request to support this feature anymore. Let's
proceed to fully deprecate this feature.

Changes since RFC:
- Writing 0 to memory.move_charge_at_immigrate is allowed.
- Remove the memcg move locking in separate patches.

Shakeel Butt (6):
  memcg-v1: fully deprecate move_charge_at_immigrate
  memcg-v1: remove charge move code
  memcg-v1: no need for memcg locking for dirty tracking
  memcg-v1: no need for memcg locking for writeback tracking
  memcg-v1: no need for memcg locking for MGLRU
  memcg-v1: remove memcg move locking code

 .../admin-guide/cgroup-v1/memory.rst          |  82 +-
 fs/buffer.c                                   |   5 -
 include/linux/memcontrol.h                    |  59 --
 mm/filemap.c                                  |   1 -
 mm/memcontrol-v1.c                            | 958 +-----------------
 mm/memcontrol-v1.h                            |   6 -
 mm/memcontrol.c                               |  14 -
 mm/page-writeback.c                           |  20 +-
 mm/rmap.c                                     |   1 -
 mm/vmscan.c                                   |  11 -
 10 files changed, 8 insertions(+), 1149 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v1 1/6] memcg-v1: fully deprecate move_charge_at_immigrate
  2024-10-25  1:22 [PATCH v1 0/6] memcg-v1: fully deprecate charge moving Shakeel Butt
  2024-10-24  6:57 ` [RFC PATCH 0/3] " Shakeel Butt
@ 2024-10-25  1:22 ` Shakeel Butt
  2024-10-25  6:54   ` Michal Hocko
  2024-10-28 13:53   ` Johannes Weiner
  2024-10-25  1:22 ` [PATCH v1 2/6] memcg-v1: remove charge move code Shakeel Butt
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 56+ messages in thread
From: Shakeel Butt @ 2024-10-25  1:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

Proceed with the complete deprecation of memcg v1's charge moving
feature. The deprecation warning has been in the kernel for almost two
years and has been ported to all stable kernel since. Now is the time to
fully deprecate this feature.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
---

Changes since RFC:
- Writing 0 to memory.move_charge_at_immigrate is allowed.

 .../admin-guide/cgroup-v1/memory.rst          | 82 +------------------
 mm/memcontrol-v1.c                            | 14 +---
 2 files changed, 5 insertions(+), 91 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
index 270501db9f4e..286d16fc22eb 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -90,9 +90,7 @@ Brief summary of control files.
                                      used.
  memory.swappiness		     set/show swappiness parameter of vmscan
 				     (See sysctl's vm.swappiness)
- memory.move_charge_at_immigrate     set/show controls of moving charges
-                                     This knob is deprecated and shouldn't be
-                                     used.
+ memory.move_charge_at_immigrate     This knob is deprecated.
  memory.oom_control		     set/show oom controls.
                                      This knob is deprecated and shouldn't be
                                      used.
@@ -243,10 +241,6 @@ behind this approach is that a cgroup that aggressively uses a shared
 page will eventually get charged for it (once it is uncharged from
 the cgroup that brought it in -- this will happen on memory pressure).
 
-But see :ref:`section 8.2 <cgroup-v1-memory-movable-charges>` when moving a
-task to another cgroup, its pages may be recharged to the new cgroup, if
-move_charge_at_immigrate has been chosen.
-
 2.4 Swap Extension
 --------------------------------------
 
@@ -756,78 +750,8 @@ If we want to change this to 1G, we can at any time use::
 
 THIS IS DEPRECATED!
 
-It's expensive and unreliable! It's better practice to launch workload
-tasks directly from inside their target cgroup. Use dedicated workload
-cgroups to allow fine-grained policy adjustments without having to
-move physical pages between control domains.
-
-Users can move charges associated with a task along with task migration, that
-is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
-This feature is not supported in !CONFIG_MMU environments because of lack of
-page tables.
-
-8.1 Interface
--------------
-
-This feature is disabled by default. It can be enabled (and disabled again) by
-writing to memory.move_charge_at_immigrate of the destination cgroup.
-
-If you want to enable it::
-
-	# echo (some positive value) > memory.move_charge_at_immigrate
-
-.. note::
-      Each bits of move_charge_at_immigrate has its own meaning about what type
-      of charges should be moved. See :ref:`section 8.2
-      <cgroup-v1-memory-movable-charges>` for details.
-
-.. note::
-      Charges are moved only when you move mm->owner, in other words,
-      a leader of a thread group.
-
-.. note::
-      If we cannot find enough space for the task in the destination cgroup, we
-      try to make space by reclaiming memory. Task migration may fail if we
-      cannot make enough space.
-
-.. note::
-      It can take several seconds if you move charges much.
-
-And if you want disable it again::
-
-	# echo 0 > memory.move_charge_at_immigrate
-
-.. _cgroup-v1-memory-movable-charges:
-
-8.2 Type of charges which can be moved
---------------------------------------
-
-Each bit in move_charge_at_immigrate has its own meaning about what type of
-charges should be moved. But in any case, it must be noted that an account of
-a page or a swap can be moved only when it is charged to the task's current
-(old) memory cgroup.
-
-+---+--------------------------------------------------------------------------+
-|bit| what type of charges would be moved ?                                    |
-+===+==========================================================================+
-| 0 | A charge of an anonymous page (or swap of it) used by the target task.   |
-|   | You must enable Swap Extension (see 2.4) to enable move of swap charges. |
-+---+--------------------------------------------------------------------------+
-| 1 | A charge of file pages (normal file, tmpfs file (e.g. ipc shared memory) |
-|   | and swaps of tmpfs file) mmapped by the target task. Unlike the case of  |
-|   | anonymous pages, file pages (and swaps) in the range mmapped by the task |
-|   | will be moved even if the task hasn't done page fault, i.e. they might   |
-|   | not be the task's "RSS", but other task's "RSS" that maps the same file. |
-|   | The mapcount of the page is ignored (the page can be moved independent   |
-|   | of the mapcount). You must enable Swap Extension (see 2.4) to            |
-|   | enable move of swap charges.                                             |
-+---+--------------------------------------------------------------------------+
-
-8.3 TODO
---------
-
-- All of moving charge operations are done under cgroup_mutex. It's not good
-  behavior to hold the mutex too long, so we may need some trick.
+Reading memory.move_charge_at_immigrate will always return 0 and writing
+to it will always return -EINVAL.
 
 9. Memory thresholds
 ====================
diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
index 81d8819f13cd..9b3b1a446c65 100644
--- a/mm/memcontrol-v1.c
+++ b/mm/memcontrol-v1.c
@@ -593,29 +593,19 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
 static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css,
 				struct cftype *cft)
 {
-	return mem_cgroup_from_css(css)->move_charge_at_immigrate;
+	return 0;
 }
 
 #ifdef CONFIG_MMU
 static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css,
 				 struct cftype *cft, u64 val)
 {
-	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
-
 	pr_warn_once("Cgroup memory moving (move_charge_at_immigrate) is deprecated. "
 		     "Please report your usecase to linux-mm@kvack.org if you "
 		     "depend on this functionality.\n");
 
-	if (val & ~MOVE_MASK)
+	if (val != 0)
 		return -EINVAL;
-
-	/*
-	 * No kind of locking is needed in here, because ->can_attach() will
-	 * check this value once in the beginning of the process, and then carry
-	 * on with stale data. This means that changes to this value will only
-	 * affect task migrations starting after the change.
-	 */
-	memcg->move_charge_at_immigrate = val;
 	return 0;
 }
 #else
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v1 2/6] memcg-v1: remove charge move code
  2024-10-25  1:22 [PATCH v1 0/6] memcg-v1: fully deprecate charge moving Shakeel Butt
  2024-10-24  6:57 ` [RFC PATCH 0/3] " Shakeel Butt
  2024-10-25  1:22 ` [PATCH v1 1/6] memcg-v1: fully deprecate move_charge_at_immigrate Shakeel Butt
@ 2024-10-25  1:22 ` Shakeel Butt
  2024-10-28 10:22   ` David Hildenbrand
  2024-10-28 13:54   ` Johannes Weiner
  2024-10-25  1:23 ` [PATCH v1 3/6] memcg-v1: no need for memcg locking for dirty tracking Shakeel Butt
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 56+ messages in thread
From: Shakeel Butt @ 2024-10-25  1:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team, Michal Hocko

The memcg-v1 charge move feature has been deprecated completely and
let's remove the relevant code as well.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
---
 include/linux/memcontrol.h |   5 -
 mm/memcontrol-v1.c         | 862 -------------------------------------
 mm/memcontrol-v1.h         |   6 -
 mm/memcontrol.c            |   9 -
 4 files changed, 882 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 524006313b0d..798db70b0a30 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -299,11 +299,6 @@ struct mem_cgroup {
 	/* For oom notifier event fd */
 	struct list_head oom_notify;
 
-	/*
-	 * Should we move charges of a task when a task is moved into this
-	 * mem_cgroup ? And what type of charges should we move ?
-	 */
-	unsigned long move_charge_at_immigrate;
 	/* taken only while moving_account > 0 */
 	spinlock_t move_lock;
 	unsigned long move_lock_flags;
diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
index 9b3b1a446c65..9c0fba8c8a83 100644
--- a/mm/memcontrol-v1.c
+++ b/mm/memcontrol-v1.c
@@ -40,31 +40,6 @@ static struct mem_cgroup_tree soft_limit_tree __read_mostly;
 #define	MEM_CGROUP_MAX_RECLAIM_LOOPS		100
 #define	MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS	2
 
-/* Stuffs for move charges at task migration. */
-/*
- * Types of charges to be moved.
- */
-#define MOVE_ANON	0x1ULL
-#define MOVE_FILE	0x2ULL
-#define MOVE_MASK	(MOVE_ANON | MOVE_FILE)
-
-/* "mc" and its members are protected by cgroup_mutex */
-static struct move_charge_struct {
-	spinlock_t	  lock; /* for from, to */
-	struct mm_struct  *mm;
-	struct mem_cgroup *from;
-	struct mem_cgroup *to;
-	unsigned long flags;
-	unsigned long precharge;
-	unsigned long moved_charge;
-	unsigned long moved_swap;
-	struct task_struct *moving_task;	/* a task moving charges */
-	wait_queue_head_t waitq;		/* a waitq for other context */
-} mc = {
-	.lock = __SPIN_LOCK_UNLOCKED(mc.lock),
-	.waitq = __WAIT_QUEUE_HEAD_INITIALIZER(mc.waitq),
-};
-
 /* for OOM */
 struct mem_cgroup_eventfd_list {
 	struct list_head list;
@@ -426,51 +401,6 @@ unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order,
 	return nr_reclaimed;
 }
 
-/*
- * A routine for checking "mem" is under move_account() or not.
- *
- * Checking a cgroup is mc.from or mc.to or under hierarchy of
- * moving cgroups. This is for waiting at high-memory pressure
- * caused by "move".
- */
-static bool mem_cgroup_under_move(struct mem_cgroup *memcg)
-{
-	struct mem_cgroup *from;
-	struct mem_cgroup *to;
-	bool ret = false;
-	/*
-	 * Unlike task_move routines, we access mc.to, mc.from not under
-	 * mutual exclusion by cgroup_mutex. Here, we take spinlock instead.
-	 */
-	spin_lock(&mc.lock);
-	from = mc.from;
-	to = mc.to;
-	if (!from)
-		goto unlock;
-
-	ret = mem_cgroup_is_descendant(from, memcg) ||
-		mem_cgroup_is_descendant(to, memcg);
-unlock:
-	spin_unlock(&mc.lock);
-	return ret;
-}
-
-bool memcg1_wait_acct_move(struct mem_cgroup *memcg)
-{
-	if (mc.moving_task && current != mc.moving_task) {
-		if (mem_cgroup_under_move(memcg)) {
-			DEFINE_WAIT(wait);
-			prepare_to_wait(&mc.waitq, &wait, TASK_INTERRUPTIBLE);
-			/* moving charge context might have finished. */
-			if (mc.moving_task)
-				schedule();
-			finish_wait(&mc.waitq, &wait);
-			return true;
-		}
-	}
-	return false;
-}
-
 /**
  * folio_memcg_lock - Bind a folio to its memcg.
  * @folio: The folio.
@@ -552,44 +482,6 @@ void folio_memcg_unlock(struct folio *folio)
 	__folio_memcg_unlock(folio_memcg(folio));
 }
 
-#ifdef CONFIG_SWAP
-/**
- * mem_cgroup_move_swap_account - move swap charge and swap_cgroup's record.
- * @entry: swap entry to be moved
- * @from:  mem_cgroup which the entry is moved from
- * @to:  mem_cgroup which the entry is moved to
- *
- * It succeeds only when the swap_cgroup's record for this entry is the same
- * as the mem_cgroup's id of @from.
- *
- * Returns 0 on success, -EINVAL on failure.
- *
- * The caller must have charged to @to, IOW, called page_counter_charge() about
- * both res and memsw, and called css_get().
- */
-static int mem_cgroup_move_swap_account(swp_entry_t entry,
-				struct mem_cgroup *from, struct mem_cgroup *to)
-{
-	unsigned short old_id, new_id;
-
-	old_id = mem_cgroup_id(from);
-	new_id = mem_cgroup_id(to);
-
-	if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) {
-		mod_memcg_state(from, MEMCG_SWAP, -1);
-		mod_memcg_state(to, MEMCG_SWAP, 1);
-		return 0;
-	}
-	return -EINVAL;
-}
-#else
-static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
-				struct mem_cgroup *from, struct mem_cgroup *to)
-{
-	return -EINVAL;
-}
-#endif
-
 static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css,
 				struct cftype *cft)
 {
@@ -616,760 +508,6 @@ static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css,
 }
 #endif
 
-#ifdef CONFIG_MMU
-/* Handlers for move charge at task migration. */
-static int mem_cgroup_do_precharge(unsigned long count)
-{
-	int ret;
-
-	/* Try a single bulk charge without reclaim first, kswapd may wake */
-	ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count);
-	if (!ret) {
-		mc.precharge += count;
-		return ret;
-	}
-
-	/* Try charges one by one with reclaim, but do not retry */
-	while (count--) {
-		ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1);
-		if (ret)
-			return ret;
-		mc.precharge++;
-		cond_resched();
-	}
-	return 0;
-}
-
-union mc_target {
-	struct folio	*folio;
-	swp_entry_t	ent;
-};
-
-enum mc_target_type {
-	MC_TARGET_NONE = 0,
-	MC_TARGET_PAGE,
-	MC_TARGET_SWAP,
-	MC_TARGET_DEVICE,
-};
-
-static struct page *mc_handle_present_pte(struct vm_area_struct *vma,
-						unsigned long addr, pte_t ptent)
-{
-	struct page *page = vm_normal_page(vma, addr, ptent);
-
-	if (!page)
-		return NULL;
-	if (PageAnon(page)) {
-		if (!(mc.flags & MOVE_ANON))
-			return NULL;
-	} else {
-		if (!(mc.flags & MOVE_FILE))
-			return NULL;
-	}
-	get_page(page);
-
-	return page;
-}
-
-#if defined(CONFIG_SWAP) || defined(CONFIG_DEVICE_PRIVATE)
-static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
-			pte_t ptent, swp_entry_t *entry)
-{
-	struct page *page = NULL;
-	swp_entry_t ent = pte_to_swp_entry(ptent);
-
-	if (!(mc.flags & MOVE_ANON))
-		return NULL;
-
-	/*
-	 * Handle device private pages that are not accessible by the CPU, but
-	 * stored as special swap entries in the page table.
-	 */
-	if (is_device_private_entry(ent)) {
-		page = pfn_swap_entry_to_page(ent);
-		if (!get_page_unless_zero(page))
-			return NULL;
-		return page;
-	}
-
-	if (non_swap_entry(ent))
-		return NULL;
-
-	/*
-	 * Because swap_cache_get_folio() updates some statistics counter,
-	 * we call find_get_page() with swapper_space directly.
-	 */
-	page = find_get_page(swap_address_space(ent), swap_cache_index(ent));
-	entry->val = ent.val;
-
-	return page;
-}
-#else
-static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
-			pte_t ptent, swp_entry_t *entry)
-{
-	return NULL;
-}
-#endif
-
-static struct page *mc_handle_file_pte(struct vm_area_struct *vma,
-			unsigned long addr, pte_t ptent)
-{
-	unsigned long index;
-	struct folio *folio;
-
-	if (!vma->vm_file) /* anonymous vma */
-		return NULL;
-	if (!(mc.flags & MOVE_FILE))
-		return NULL;
-
-	/* folio is moved even if it's not RSS of this task(page-faulted). */
-	/* shmem/tmpfs may report page out on swap: account for that too. */
-	index = linear_page_index(vma, addr);
-	folio = filemap_get_incore_folio(vma->vm_file->f_mapping, index);
-	if (IS_ERR(folio))
-		return NULL;
-	return folio_file_page(folio, index);
-}
-
-static void memcg1_check_events(struct mem_cgroup *memcg, int nid);
-static void memcg1_charge_statistics(struct mem_cgroup *memcg, int nr_pages);
-
-/**
- * mem_cgroup_move_account - move account of the folio
- * @folio: The folio.
- * @compound: charge the page as compound or small page
- * @from: mem_cgroup which the folio is moved from.
- * @to:	mem_cgroup which the folio is moved to. @from != @to.
- *
- * The folio must be locked and not on the LRU.
- *
- * This function doesn't do "charge" to new cgroup and doesn't do "uncharge"
- * from old cgroup.
- */
-static int mem_cgroup_move_account(struct folio *folio,
-				   bool compound,
-				   struct mem_cgroup *from,
-				   struct mem_cgroup *to)
-{
-	struct lruvec *from_vec, *to_vec;
-	struct pglist_data *pgdat;
-	unsigned int nr_pages = compound ? folio_nr_pages(folio) : 1;
-	int nid, ret;
-
-	VM_BUG_ON(from == to);
-	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
-	VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
-	VM_BUG_ON(compound && !folio_test_large(folio));
-
-	ret = -EINVAL;
-	if (folio_memcg(folio) != from)
-		goto out;
-
-	pgdat = folio_pgdat(folio);
-	from_vec = mem_cgroup_lruvec(from, pgdat);
-	to_vec = mem_cgroup_lruvec(to, pgdat);
-
-	folio_memcg_lock(folio);
-
-	if (folio_test_anon(folio)) {
-		if (folio_mapped(folio)) {
-			__mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages);
-			__mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages);
-			if (folio_test_pmd_mappable(folio)) {
-				__mod_lruvec_state(from_vec, NR_ANON_THPS,
-						   -nr_pages);
-				__mod_lruvec_state(to_vec, NR_ANON_THPS,
-						   nr_pages);
-			}
-		}
-	} else {
-		__mod_lruvec_state(from_vec, NR_FILE_PAGES, -nr_pages);
-		__mod_lruvec_state(to_vec, NR_FILE_PAGES, nr_pages);
-
-		if (folio_test_swapbacked(folio)) {
-			__mod_lruvec_state(from_vec, NR_SHMEM, -nr_pages);
-			__mod_lruvec_state(to_vec, NR_SHMEM, nr_pages);
-		}
-
-		if (folio_mapped(folio)) {
-			__mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages);
-			__mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages);
-		}
-
-		if (folio_test_dirty(folio)) {
-			struct address_space *mapping = folio_mapping(folio);
-
-			if (mapping_can_writeback(mapping)) {
-				__mod_lruvec_state(from_vec, NR_FILE_DIRTY,
-						   -nr_pages);
-				__mod_lruvec_state(to_vec, NR_FILE_DIRTY,
-						   nr_pages);
-			}
-		}
-	}
-
-#ifdef CONFIG_SWAP
-	if (folio_test_swapcache(folio)) {
-		__mod_lruvec_state(from_vec, NR_SWAPCACHE, -nr_pages);
-		__mod_lruvec_state(to_vec, NR_SWAPCACHE, nr_pages);
-	}
-#endif
-	if (folio_test_writeback(folio)) {
-		__mod_lruvec_state(from_vec, NR_WRITEBACK, -nr_pages);
-		__mod_lruvec_state(to_vec, NR_WRITEBACK, nr_pages);
-	}
-
-	/*
-	 * All state has been migrated, let's switch to the new memcg.
-	 *
-	 * It is safe to change page's memcg here because the page
-	 * is referenced, charged, isolated, and locked: we can't race
-	 * with (un)charging, migration, LRU putback, or anything else
-	 * that would rely on a stable page's memory cgroup.
-	 *
-	 * Note that folio_memcg_lock is a memcg lock, not a page lock,
-	 * to save space. As soon as we switch page's memory cgroup to a
-	 * new memcg that isn't locked, the above state can change
-	 * concurrently again. Make sure we're truly done with it.
-	 */
-	smp_mb();
-
-	css_get(&to->css);
-	css_put(&from->css);
-
-	folio->memcg_data = (unsigned long)to;
-
-	__folio_memcg_unlock(from);
-
-	ret = 0;
-	nid = folio_nid(folio);
-
-	local_irq_disable();
-	memcg1_charge_statistics(to, nr_pages);
-	memcg1_check_events(to, nid);
-	memcg1_charge_statistics(from, -nr_pages);
-	memcg1_check_events(from, nid);
-	local_irq_enable();
-out:
-	return ret;
-}
-
-/**
- * get_mctgt_type - get target type of moving charge
- * @vma: the vma the pte to be checked belongs
- * @addr: the address corresponding to the pte to be checked
- * @ptent: the pte to be checked
- * @target: the pointer the target page or swap ent will be stored(can be NULL)
- *
- * Context: Called with pte lock held.
- * Return:
- * * MC_TARGET_NONE - If the pte is not a target for move charge.
- * * MC_TARGET_PAGE - If the page corresponding to this pte is a target for
- *   move charge. If @target is not NULL, the folio is stored in target->folio
- *   with extra refcnt taken (Caller should release it).
- * * MC_TARGET_SWAP - If the swap entry corresponding to this pte is a
- *   target for charge migration.  If @target is not NULL, the entry is
- *   stored in target->ent.
- * * MC_TARGET_DEVICE - Like MC_TARGET_PAGE but page is device memory and
- *   thus not on the lru.  For now such page is charged like a regular page
- *   would be as it is just special memory taking the place of a regular page.
- *   See Documentations/vm/hmm.txt and include/linux/hmm.h
- */
-static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
-		unsigned long addr, pte_t ptent, union mc_target *target)
-{
-	struct page *page = NULL;
-	struct folio *folio;
-	enum mc_target_type ret = MC_TARGET_NONE;
-	swp_entry_t ent = { .val = 0 };
-
-	if (pte_present(ptent))
-		page = mc_handle_present_pte(vma, addr, ptent);
-	else if (pte_none_mostly(ptent))
-		/*
-		 * PTE markers should be treated as a none pte here, separated
-		 * from other swap handling below.
-		 */
-		page = mc_handle_file_pte(vma, addr, ptent);
-	else if (is_swap_pte(ptent))
-		page = mc_handle_swap_pte(vma, ptent, &ent);
-
-	if (page)
-		folio = page_folio(page);
-	if (target && page) {
-		if (!folio_trylock(folio)) {
-			folio_put(folio);
-			return ret;
-		}
-		/*
-		 * page_mapped() must be stable during the move. This
-		 * pte is locked, so if it's present, the page cannot
-		 * become unmapped. If it isn't, we have only partial
-		 * control over the mapped state: the page lock will
-		 * prevent new faults against pagecache and swapcache,
-		 * so an unmapped page cannot become mapped. However,
-		 * if the page is already mapped elsewhere, it can
-		 * unmap, and there is nothing we can do about it.
-		 * Alas, skip moving the page in this case.
-		 */
-		if (!pte_present(ptent) && page_mapped(page)) {
-			folio_unlock(folio);
-			folio_put(folio);
-			return ret;
-		}
-	}
-
-	if (!page && !ent.val)
-		return ret;
-	if (page) {
-		/*
-		 * Do only loose check w/o serialization.
-		 * mem_cgroup_move_account() checks the page is valid or
-		 * not under LRU exclusion.
-		 */
-		if (folio_memcg(folio) == mc.from) {
-			ret = MC_TARGET_PAGE;
-			if (folio_is_device_private(folio) ||
-			    folio_is_device_coherent(folio))
-				ret = MC_TARGET_DEVICE;
-			if (target)
-				target->folio = folio;
-		}
-		if (!ret || !target) {
-			if (target)
-				folio_unlock(folio);
-			folio_put(folio);
-		}
-	}
-	/*
-	 * There is a swap entry and a page doesn't exist or isn't charged.
-	 * But we cannot move a tail-page in a THP.
-	 */
-	if (ent.val && !ret && (!page || !PageTransCompound(page)) &&
-	    mem_cgroup_id(mc.from) == lookup_swap_cgroup_id(ent)) {
-		ret = MC_TARGET_SWAP;
-		if (target)
-			target->ent = ent;
-	}
-	return ret;
-}
-
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-/*
- * We don't consider PMD mapped swapping or file mapped pages because THP does
- * not support them for now.
- * Caller should make sure that pmd_trans_huge(pmd) is true.
- */
-static enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma,
-		unsigned long addr, pmd_t pmd, union mc_target *target)
-{
-	struct page *page = NULL;
-	struct folio *folio;
-	enum mc_target_type ret = MC_TARGET_NONE;
-
-	if (unlikely(is_swap_pmd(pmd))) {
-		VM_BUG_ON(thp_migration_supported() &&
-				  !is_pmd_migration_entry(pmd));
-		return ret;
-	}
-	page = pmd_page(pmd);
-	VM_BUG_ON_PAGE(!page || !PageHead(page), page);
-	folio = page_folio(page);
-	if (!(mc.flags & MOVE_ANON))
-		return ret;
-	if (folio_memcg(folio) == mc.from) {
-		ret = MC_TARGET_PAGE;
-		if (target) {
-			folio_get(folio);
-			if (!folio_trylock(folio)) {
-				folio_put(folio);
-				return MC_TARGET_NONE;
-			}
-			target->folio = folio;
-		}
-	}
-	return ret;
-}
-#else
-static inline enum mc_target_type get_mctgt_type_thp(struct vm_area_struct *vma,
-		unsigned long addr, pmd_t pmd, union mc_target *target)
-{
-	return MC_TARGET_NONE;
-}
-#endif
-
-static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd,
-					unsigned long addr, unsigned long end,
-					struct mm_walk *walk)
-{
-	struct vm_area_struct *vma = walk->vma;
-	pte_t *pte;
-	spinlock_t *ptl;
-
-	ptl = pmd_trans_huge_lock(pmd, vma);
-	if (ptl) {
-		/*
-		 * Note their can not be MC_TARGET_DEVICE for now as we do not
-		 * support transparent huge page with MEMORY_DEVICE_PRIVATE but
-		 * this might change.
-		 */
-		if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
-			mc.precharge += HPAGE_PMD_NR;
-		spin_unlock(ptl);
-		return 0;
-	}
-
-	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
-	if (!pte)
-		return 0;
-	for (; addr != end; pte++, addr += PAGE_SIZE)
-		if (get_mctgt_type(vma, addr, ptep_get(pte), NULL))
-			mc.precharge++;	/* increment precharge temporarily */
-	pte_unmap_unlock(pte - 1, ptl);
-	cond_resched();
-
-	return 0;
-}
-
-static const struct mm_walk_ops precharge_walk_ops = {
-	.pmd_entry	= mem_cgroup_count_precharge_pte_range,
-	.walk_lock	= PGWALK_RDLOCK,
-};
-
-static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm)
-{
-	unsigned long precharge;
-
-	mmap_read_lock(mm);
-	walk_page_range(mm, 0, ULONG_MAX, &precharge_walk_ops, NULL);
-	mmap_read_unlock(mm);
-
-	precharge = mc.precharge;
-	mc.precharge = 0;
-
-	return precharge;
-}
-
-static int mem_cgroup_precharge_mc(struct mm_struct *mm)
-{
-	unsigned long precharge = mem_cgroup_count_precharge(mm);
-
-	VM_BUG_ON(mc.moving_task);
-	mc.moving_task = current;
-	return mem_cgroup_do_precharge(precharge);
-}
-
-/* cancels all extra charges on mc.from and mc.to, and wakes up all waiters. */
-static void __mem_cgroup_clear_mc(void)
-{
-	struct mem_cgroup *from = mc.from;
-	struct mem_cgroup *to = mc.to;
-
-	/* we must uncharge all the leftover precharges from mc.to */
-	if (mc.precharge) {
-		mem_cgroup_cancel_charge(mc.to, mc.precharge);
-		mc.precharge = 0;
-	}
-	/*
-	 * we didn't uncharge from mc.from at mem_cgroup_move_account(), so
-	 * we must uncharge here.
-	 */
-	if (mc.moved_charge) {
-		mem_cgroup_cancel_charge(mc.from, mc.moved_charge);
-		mc.moved_charge = 0;
-	}
-	/* we must fixup refcnts and charges */
-	if (mc.moved_swap) {
-		/* uncharge swap account from the old cgroup */
-		if (!mem_cgroup_is_root(mc.from))
-			page_counter_uncharge(&mc.from->memsw, mc.moved_swap);
-
-		mem_cgroup_id_put_many(mc.from, mc.moved_swap);
-
-		/*
-		 * we charged both to->memory and to->memsw, so we
-		 * should uncharge to->memory.
-		 */
-		if (!mem_cgroup_is_root(mc.to))
-			page_counter_uncharge(&mc.to->memory, mc.moved_swap);
-
-		mc.moved_swap = 0;
-	}
-	memcg1_oom_recover(from);
-	memcg1_oom_recover(to);
-	wake_up_all(&mc.waitq);
-}
-
-static void mem_cgroup_clear_mc(void)
-{
-	struct mm_struct *mm = mc.mm;
-
-	/*
-	 * we must clear moving_task before waking up waiters at the end of
-	 * task migration.
-	 */
-	mc.moving_task = NULL;
-	__mem_cgroup_clear_mc();
-	spin_lock(&mc.lock);
-	mc.from = NULL;
-	mc.to = NULL;
-	mc.mm = NULL;
-	spin_unlock(&mc.lock);
-
-	mmput(mm);
-}
-
-int memcg1_can_attach(struct cgroup_taskset *tset)
-{
-	struct cgroup_subsys_state *css;
-	struct mem_cgroup *memcg = NULL; /* unneeded init to make gcc happy */
-	struct mem_cgroup *from;
-	struct task_struct *leader, *p;
-	struct mm_struct *mm;
-	unsigned long move_flags;
-	int ret = 0;
-
-	/* charge immigration isn't supported on the default hierarchy */
-	if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
-		return 0;
-
-	/*
-	 * Multi-process migrations only happen on the default hierarchy
-	 * where charge immigration is not used.  Perform charge
-	 * immigration if @tset contains a leader and whine if there are
-	 * multiple.
-	 */
-	p = NULL;
-	cgroup_taskset_for_each_leader(leader, css, tset) {
-		WARN_ON_ONCE(p);
-		p = leader;
-		memcg = mem_cgroup_from_css(css);
-	}
-	if (!p)
-		return 0;
-
-	/*
-	 * We are now committed to this value whatever it is. Changes in this
-	 * tunable will only affect upcoming migrations, not the current one.
-	 * So we need to save it, and keep it going.
-	 */
-	move_flags = READ_ONCE(memcg->move_charge_at_immigrate);
-	if (!move_flags)
-		return 0;
-
-	from = mem_cgroup_from_task(p);
-
-	VM_BUG_ON(from == memcg);
-
-	mm = get_task_mm(p);
-	if (!mm)
-		return 0;
-	/* We move charges only when we move a owner of the mm */
-	if (mm->owner == p) {
-		VM_BUG_ON(mc.from);
-		VM_BUG_ON(mc.to);
-		VM_BUG_ON(mc.precharge);
-		VM_BUG_ON(mc.moved_charge);
-		VM_BUG_ON(mc.moved_swap);
-
-		spin_lock(&mc.lock);
-		mc.mm = mm;
-		mc.from = from;
-		mc.to = memcg;
-		mc.flags = move_flags;
-		spin_unlock(&mc.lock);
-		/* We set mc.moving_task later */
-
-		ret = mem_cgroup_precharge_mc(mm);
-		if (ret)
-			mem_cgroup_clear_mc();
-	} else {
-		mmput(mm);
-	}
-	return ret;
-}
-
-void memcg1_cancel_attach(struct cgroup_taskset *tset)
-{
-	if (mc.to)
-		mem_cgroup_clear_mc();
-}
-
-static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
-				unsigned long addr, unsigned long end,
-				struct mm_walk *walk)
-{
-	int ret = 0;
-	struct vm_area_struct *vma = walk->vma;
-	pte_t *pte;
-	spinlock_t *ptl;
-	enum mc_target_type target_type;
-	union mc_target target;
-	struct folio *folio;
-
-	ptl = pmd_trans_huge_lock(pmd, vma);
-	if (ptl) {
-		if (mc.precharge < HPAGE_PMD_NR) {
-			spin_unlock(ptl);
-			return 0;
-		}
-		target_type = get_mctgt_type_thp(vma, addr, *pmd, &target);
-		if (target_type == MC_TARGET_PAGE) {
-			folio = target.folio;
-			if (folio_isolate_lru(folio)) {
-				if (!mem_cgroup_move_account(folio, true,
-							     mc.from, mc.to)) {
-					mc.precharge -= HPAGE_PMD_NR;
-					mc.moved_charge += HPAGE_PMD_NR;
-				}
-				folio_putback_lru(folio);
-			}
-			folio_unlock(folio);
-			folio_put(folio);
-		} else if (target_type == MC_TARGET_DEVICE) {
-			folio = target.folio;
-			if (!mem_cgroup_move_account(folio, true,
-						     mc.from, mc.to)) {
-				mc.precharge -= HPAGE_PMD_NR;
-				mc.moved_charge += HPAGE_PMD_NR;
-			}
-			folio_unlock(folio);
-			folio_put(folio);
-		}
-		spin_unlock(ptl);
-		return 0;
-	}
-
-retry:
-	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
-	if (!pte)
-		return 0;
-	for (; addr != end; addr += PAGE_SIZE) {
-		pte_t ptent = ptep_get(pte++);
-		bool device = false;
-		swp_entry_t ent;
-
-		if (!mc.precharge)
-			break;
-
-		switch (get_mctgt_type(vma, addr, ptent, &target)) {
-		case MC_TARGET_DEVICE:
-			device = true;
-			fallthrough;
-		case MC_TARGET_PAGE:
-			folio = target.folio;
-			/*
-			 * We can have a part of the split pmd here. Moving it
-			 * can be done but it would be too convoluted so simply
-			 * ignore such a partial THP and keep it in original
-			 * memcg. There should be somebody mapping the head.
-			 */
-			if (folio_test_large(folio))
-				goto put;
-			if (!device && !folio_isolate_lru(folio))
-				goto put;
-			if (!mem_cgroup_move_account(folio, false,
-						mc.from, mc.to)) {
-				mc.precharge--;
-				/* we uncharge from mc.from later. */
-				mc.moved_charge++;
-			}
-			if (!device)
-				folio_putback_lru(folio);
-put:			/* get_mctgt_type() gets & locks the page */
-			folio_unlock(folio);
-			folio_put(folio);
-			break;
-		case MC_TARGET_SWAP:
-			ent = target.ent;
-			if (!mem_cgroup_move_swap_account(ent, mc.from, mc.to)) {
-				mc.precharge--;
-				mem_cgroup_id_get_many(mc.to, 1);
-				/* we fixup other refcnts and charges later. */
-				mc.moved_swap++;
-			}
-			break;
-		default:
-			break;
-		}
-	}
-	pte_unmap_unlock(pte - 1, ptl);
-	cond_resched();
-
-	if (addr != end) {
-		/*
-		 * We have consumed all precharges we got in can_attach().
-		 * We try charge one by one, but don't do any additional
-		 * charges to mc.to if we have failed in charge once in attach()
-		 * phase.
-		 */
-		ret = mem_cgroup_do_precharge(1);
-		if (!ret)
-			goto retry;
-	}
-
-	return ret;
-}
-
-static const struct mm_walk_ops charge_walk_ops = {
-	.pmd_entry	= mem_cgroup_move_charge_pte_range,
-	.walk_lock	= PGWALK_RDLOCK,
-};
-
-static void mem_cgroup_move_charge(void)
-{
-	lru_add_drain_all();
-	/*
-	 * Signal folio_memcg_lock() to take the memcg's move_lock
-	 * while we're moving its pages to another memcg. Then wait
-	 * for already started RCU-only updates to finish.
-	 */
-	atomic_inc(&mc.from->moving_account);
-	synchronize_rcu();
-retry:
-	if (unlikely(!mmap_read_trylock(mc.mm))) {
-		/*
-		 * Someone who are holding the mmap_lock might be waiting in
-		 * waitq. So we cancel all extra charges, wake up all waiters,
-		 * and retry. Because we cancel precharges, we might not be able
-		 * to move enough charges, but moving charge is a best-effort
-		 * feature anyway, so it wouldn't be a big problem.
-		 */
-		__mem_cgroup_clear_mc();
-		cond_resched();
-		goto retry;
-	}
-	/*
-	 * When we have consumed all precharges and failed in doing
-	 * additional charge, the page walk just aborts.
-	 */
-	walk_page_range(mc.mm, 0, ULONG_MAX, &charge_walk_ops, NULL);
-	mmap_read_unlock(mc.mm);
-	atomic_dec(&mc.from->moving_account);
-}
-
-void memcg1_move_task(void)
-{
-	if (mc.to) {
-		mem_cgroup_move_charge();
-		mem_cgroup_clear_mc();
-	}
-}
-
-#else	/* !CONFIG_MMU */
-int memcg1_can_attach(struct cgroup_taskset *tset)
-{
-	return 0;
-}
-void memcg1_cancel_attach(struct cgroup_taskset *tset)
-{
-}
-void memcg1_move_task(void)
-{
-}
-#endif
-
 static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap)
 {
 	struct mem_cgroup_threshold_ary *t;
diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h
index c0672e25bcdb..0e3b82951d91 100644
--- a/mm/memcontrol-v1.h
+++ b/mm/memcontrol-v1.h
@@ -80,12 +80,7 @@ static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg)
 	WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX);
 }
 
-bool memcg1_wait_acct_move(struct mem_cgroup *memcg);
-
 struct cgroup_taskset;
-int memcg1_can_attach(struct cgroup_taskset *tset);
-void memcg1_cancel_attach(struct cgroup_taskset *tset);
-void memcg1_move_task(void);
 void memcg1_css_offline(struct mem_cgroup *memcg);
 
 /* for encoding cft->private value on file */
@@ -130,7 +125,6 @@ static inline void memcg1_free_events(struct mem_cgroup *memcg) {}
 static inline void memcg1_memcg_init(struct mem_cgroup *memcg) {}
 static inline void memcg1_remove_from_trees(struct mem_cgroup *memcg) {}
 static inline void memcg1_soft_limit_reset(struct mem_cgroup *memcg) {}
-static inline bool memcg1_wait_acct_move(struct mem_cgroup *memcg) { return false; }
 static inline void memcg1_css_offline(struct mem_cgroup *memcg) {}
 
 static inline bool memcg1_oom_prepare(struct mem_cgroup *memcg, bool *locked) { return true; }
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5c3a8629ef3e..94279b9c766a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2242,12 +2242,6 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	 */
 	if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
 		goto retry;
-	/*
-	 * At task move, charge accounts can be doubly counted. So, it's
-	 * better to wait until the end of task_move if something is going on.
-	 */
-	if (memcg1_wait_acct_move(mem_over_limit))
-		goto retry;
 
 	if (nr_retries--)
 		goto retry;
@@ -4439,9 +4433,6 @@ struct cgroup_subsys memory_cgrp_subsys = {
 	.exit = mem_cgroup_exit,
 	.dfl_cftypes = memory_files,
 #ifdef CONFIG_MEMCG_V1
-	.can_attach = memcg1_can_attach,
-	.cancel_attach = memcg1_cancel_attach,
-	.post_attach = memcg1_move_task,
 	.legacy_cftypes = mem_cgroup_legacy_files,
 #endif
 	.early_init = 0,
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v1 3/6] memcg-v1: no need for memcg locking for dirty tracking
  2024-10-25  1:22 [PATCH v1 0/6] memcg-v1: fully deprecate charge moving Shakeel Butt
                   ` (2 preceding siblings ...)
  2024-10-25  1:22 ` [PATCH v1 2/6] memcg-v1: remove charge move code Shakeel Butt
@ 2024-10-25  1:23 ` Shakeel Butt
  2024-10-25  6:56   ` Michal Hocko
                     ` (2 more replies)
  2024-10-25  1:23 ` [PATCH v1 4/6] memcg-v1: no need for memcg locking for writeback tracking Shakeel Butt
                   ` (3 subsequent siblings)
  7 siblings, 3 replies; 56+ messages in thread
From: Shakeel Butt @ 2024-10-25  1:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

During the era of memcg charge migration, the kernel has to be make sure
that the dirty stat updates do not race with the charge migration.
Otherwise it might update the dirty stats of the wrong memcg. Now with
the memcg charge migration deprecated, there is no more race for dirty
stat updates and the previous locking can be removed.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 fs/buffer.c         |  5 -----
 mm/page-writeback.c | 16 +++-------------
 2 files changed, 3 insertions(+), 18 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 1fc9a50def0b..88e765b0699f 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -736,15 +736,12 @@ bool block_dirty_folio(struct address_space *mapping, struct folio *folio)
 	 * Lock out page's memcg migration to keep PageDirty
 	 * synchronized with per-memcg dirty page counters.
 	 */
-	folio_memcg_lock(folio);
 	newly_dirty = !folio_test_set_dirty(folio);
 	spin_unlock(&mapping->i_private_lock);
 
 	if (newly_dirty)
 		__folio_mark_dirty(folio, mapping, 1);
 
-	folio_memcg_unlock(folio);
-
 	if (newly_dirty)
 		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
 
@@ -1194,13 +1191,11 @@ void mark_buffer_dirty(struct buffer_head *bh)
 		struct folio *folio = bh->b_folio;
 		struct address_space *mapping = NULL;
 
-		folio_memcg_lock(folio);
 		if (!folio_test_set_dirty(folio)) {
 			mapping = folio->mapping;
 			if (mapping)
 				__folio_mark_dirty(folio, mapping, 0);
 		}
-		folio_memcg_unlock(folio);
 		if (mapping)
 			__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
 	}
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 1d7179aba8e3..a76a73529fd9 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2743,8 +2743,6 @@ EXPORT_SYMBOL(noop_dirty_folio);
 /*
  * Helper function for set_page_dirty family.
  *
- * Caller must hold folio_memcg_lock().
- *
  * NOTE: This relies on being atomic wrt interrupts.
  */
 static void folio_account_dirtied(struct folio *folio,
@@ -2777,7 +2775,6 @@ static void folio_account_dirtied(struct folio *folio,
 /*
  * Helper function for deaccounting dirty page without writeback.
  *
- * Caller must hold folio_memcg_lock().
  */
 void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
 {
@@ -2795,9 +2792,8 @@ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
  * If warn is true, then emit a warning if the folio is not uptodate and has
  * not been truncated.
  *
- * The caller must hold folio_memcg_lock().  It is the caller's
- * responsibility to prevent the folio from being truncated while
- * this function is in progress, although it may have been truncated
+ * It is the caller's responsibility to prevent the folio from being truncated
+ * while this function is in progress, although it may have been truncated
  * before this function is called.  Most callers have the folio locked.
  * A few have the folio blocked from truncation through other means (e.g.
  * zap_vma_pages() has it mapped and is holding the page table lock).
@@ -2841,14 +2837,10 @@ void __folio_mark_dirty(struct folio *folio, struct address_space *mapping,
  */
 bool filemap_dirty_folio(struct address_space *mapping, struct folio *folio)
 {
-	folio_memcg_lock(folio);
-	if (folio_test_set_dirty(folio)) {
-		folio_memcg_unlock(folio);
+	if (folio_test_set_dirty(folio))
 		return false;
-	}
 
 	__folio_mark_dirty(folio, mapping, !folio_test_private(folio));
-	folio_memcg_unlock(folio);
 
 	if (mapping->host) {
 		/* !PageAnon && !swapper_space */
@@ -2975,14 +2967,12 @@ void __folio_cancel_dirty(struct folio *folio)
 		struct bdi_writeback *wb;
 		struct wb_lock_cookie cookie = {};
 
-		folio_memcg_lock(folio);
 		wb = unlocked_inode_to_wb_begin(inode, &cookie);
 
 		if (folio_test_clear_dirty(folio))
 			folio_account_cleaned(folio, wb);
 
 		unlocked_inode_to_wb_end(inode, &cookie);
-		folio_memcg_unlock(folio);
 	} else {
 		folio_clear_dirty(folio);
 	}
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v1 4/6] memcg-v1: no need for memcg locking for writeback tracking
  2024-10-25  1:22 [PATCH v1 0/6] memcg-v1: fully deprecate charge moving Shakeel Butt
                   ` (3 preceding siblings ...)
  2024-10-25  1:23 ` [PATCH v1 3/6] memcg-v1: no need for memcg locking for dirty tracking Shakeel Butt
@ 2024-10-25  1:23 ` Shakeel Butt
  2024-10-25  6:57   ` Michal Hocko
                     ` (2 more replies)
  2024-10-25  1:23 ` [PATCH v1 5/6] memcg-v1: no need for memcg locking for MGLRU Shakeel Butt
                   ` (2 subsequent siblings)
  7 siblings, 3 replies; 56+ messages in thread
From: Shakeel Butt @ 2024-10-25  1:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

During the era of memcg charge migration, the kernel has to be make sure
that the writeback stat updates do not race with the charge migration.
Otherwise it might update the writeback stats of the wrong memcg. Now
with the memcg charge migration deprecated, there is no more race for
writeback stat updates and the previous locking can be removed.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 mm/page-writeback.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index a76a73529fd9..9c3317c3a615 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -3083,7 +3083,6 @@ bool __folio_end_writeback(struct folio *folio)
 	struct address_space *mapping = folio_mapping(folio);
 	bool ret;
 
-	folio_memcg_lock(folio);
 	if (mapping && mapping_use_writeback_tags(mapping)) {
 		struct inode *inode = mapping->host;
 		struct backing_dev_info *bdi = inode_to_bdi(inode);
@@ -3114,7 +3113,6 @@ bool __folio_end_writeback(struct folio *folio)
 	lruvec_stat_mod_folio(folio, NR_WRITEBACK, -nr);
 	zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, -nr);
 	node_stat_mod_folio(folio, NR_WRITTEN, nr);
-	folio_memcg_unlock(folio);
 
 	return ret;
 }
@@ -3127,7 +3125,6 @@ void __folio_start_writeback(struct folio *folio, bool keep_write)
 
 	VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio);
 
-	folio_memcg_lock(folio);
 	if (mapping && mapping_use_writeback_tags(mapping)) {
 		XA_STATE(xas, &mapping->i_pages, folio_index(folio));
 		struct inode *inode = mapping->host;
@@ -3168,7 +3165,6 @@ void __folio_start_writeback(struct folio *folio, bool keep_write)
 
 	lruvec_stat_mod_folio(folio, NR_WRITEBACK, nr);
 	zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, nr);
-	folio_memcg_unlock(folio);
 
 	access_ret = arch_make_folio_accessible(folio);
 	/*
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v1 5/6] memcg-v1: no need for memcg locking for MGLRU
  2024-10-25  1:22 [PATCH v1 0/6] memcg-v1: fully deprecate charge moving Shakeel Butt
                   ` (4 preceding siblings ...)
  2024-10-25  1:23 ` [PATCH v1 4/6] memcg-v1: no need for memcg locking for writeback tracking Shakeel Butt
@ 2024-10-25  1:23 ` Shakeel Butt
  2024-10-25 17:41   ` Roman Gushchin
                     ` (2 more replies)
  2024-10-25  1:23 ` [PATCH v1 6/6] memcg-v1: remove memcg move locking code Shakeel Butt
  2024-10-25  1:33 ` [PATCH v1 0/6] memcg-v1: fully deprecate charge moving Shakeel Butt
  7 siblings, 3 replies; 56+ messages in thread
From: Shakeel Butt @ 2024-10-25  1:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

While updating the generation of the folios, MGLRU requires that the
folio's memcg association remains stable. With the charge migration
deprecated, there is no need for MGLRU to acquire locks to keep the
folio and memcg association stable.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 mm/vmscan.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 29c098790b01..fd7171658b63 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3662,10 +3662,6 @@ static void walk_mm(struct mm_struct *mm, struct lru_gen_mm_walk *walk)
 		if (walk->seq != max_seq)
 			break;
 
-		/* folio_update_gen() requires stable folio_memcg() */
-		if (!mem_cgroup_trylock_pages(memcg))
-			break;
-
 		/* the caller might be holding the lock for write */
 		if (mmap_read_trylock(mm)) {
 			err = walk_page_range(mm, walk->next_addr, ULONG_MAX, &mm_walk_ops, walk);
@@ -3673,8 +3669,6 @@ static void walk_mm(struct mm_struct *mm, struct lru_gen_mm_walk *walk)
 			mmap_read_unlock(mm);
 		}
 
-		mem_cgroup_unlock_pages();
-
 		if (walk->batched) {
 			spin_lock_irq(&lruvec->lru_lock);
 			reset_batch_size(walk);
@@ -4096,10 +4090,6 @@ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
 		}
 	}
 
-	/* folio_update_gen() requires stable folio_memcg() */
-	if (!mem_cgroup_trylock_pages(memcg))
-		return true;
-
 	arch_enter_lazy_mmu_mode();
 
 	pte -= (addr - start) / PAGE_SIZE;
@@ -4144,7 +4134,6 @@ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
 	}
 
 	arch_leave_lazy_mmu_mode();
-	mem_cgroup_unlock_pages();
 
 	/* feedback from rmap walkers to page table walkers */
 	if (mm_state && suitable_to_scan(i, young))
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v1 6/6] memcg-v1: remove memcg move locking code
  2024-10-25  1:22 [PATCH v1 0/6] memcg-v1: fully deprecate charge moving Shakeel Butt
                   ` (5 preceding siblings ...)
  2024-10-25  1:23 ` [PATCH v1 5/6] memcg-v1: no need for memcg locking for MGLRU Shakeel Butt
@ 2024-10-25  1:23 ` Shakeel Butt
  2024-10-25  6:59   ` Michal Hocko
                     ` (3 more replies)
  2024-10-25  1:33 ` [PATCH v1 0/6] memcg-v1: fully deprecate charge moving Shakeel Butt
  7 siblings, 4 replies; 56+ messages in thread
From: Shakeel Butt @ 2024-10-25  1:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

The memcg v1's charge move feature has been deprecated. All the places
using the memcg move lock, have stopped using it as they don't need the
protection any more. Let's proceed to remove all the locking code
related to charge moving.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---

Changes since RFC:
- Remove the memcg move locking in separate patches.

 include/linux/memcontrol.h | 54 -------------------------
 mm/filemap.c               |  1 -
 mm/memcontrol-v1.c         | 82 --------------------------------------
 mm/memcontrol.c            |  5 ---
 mm/rmap.c                  |  1 -
 5 files changed, 143 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 798db70b0a30..932534291ca2 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -299,20 +299,10 @@ struct mem_cgroup {
 	/* For oom notifier event fd */
 	struct list_head oom_notify;
 
-	/* taken only while moving_account > 0 */
-	spinlock_t move_lock;
-	unsigned long move_lock_flags;
-
 	/* Legacy tcp memory accounting */
 	bool tcpmem_active;
 	int tcpmem_pressure;
 
-	/*
-	 * set > 0 if pages under this cgroup are moving to other cgroup.
-	 */
-	atomic_t moving_account;
-	struct task_struct *move_lock_task;
-
 	/* List of events which userspace want to receive */
 	struct list_head event_list;
 	spinlock_t event_list_lock;
@@ -428,9 +418,7 @@ static inline struct obj_cgroup *__folio_objcg(struct folio *folio)
  *
  * - the folio lock
  * - LRU isolation
- * - folio_memcg_lock()
  * - exclusive reference
- * - mem_cgroup_trylock_pages()
  *
  * For a kmem folio a caller should hold an rcu read lock to protect memcg
  * associated with a kmem folio from being released.
@@ -499,9 +487,7 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
  *
  * - the folio lock
  * - LRU isolation
- * - lock_folio_memcg()
  * - exclusive reference
- * - mem_cgroup_trylock_pages()
  *
  * For a kmem folio a caller should hold an rcu read lock to protect memcg
  * associated with a kmem folio from being released.
@@ -1873,26 +1859,6 @@ static inline bool task_in_memcg_oom(struct task_struct *p)
 	return p->memcg_in_oom;
 }
 
-void folio_memcg_lock(struct folio *folio);
-void folio_memcg_unlock(struct folio *folio);
-
-/* try to stablize folio_memcg() for all the pages in a memcg */
-static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg)
-{
-	rcu_read_lock();
-
-	if (mem_cgroup_disabled() || !atomic_read(&memcg->moving_account))
-		return true;
-
-	rcu_read_unlock();
-	return false;
-}
-
-static inline void mem_cgroup_unlock_pages(void)
-{
-	rcu_read_unlock();
-}
-
 static inline void mem_cgroup_enter_user_fault(void)
 {
 	WARN_ON(current->in_user_fault);
@@ -1914,26 +1880,6 @@ unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order,
 	return 0;
 }
 
-static inline void folio_memcg_lock(struct folio *folio)
-{
-}
-
-static inline void folio_memcg_unlock(struct folio *folio)
-{
-}
-
-static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg)
-{
-	/* to match folio_memcg_rcu() */
-	rcu_read_lock();
-	return true;
-}
-
-static inline void mem_cgroup_unlock_pages(void)
-{
-	rcu_read_unlock();
-}
-
 static inline bool task_in_memcg_oom(struct task_struct *p)
 {
 	return false;
diff --git a/mm/filemap.c b/mm/filemap.c
index 630a1c431ea1..e582a1545d2a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -119,7 +119,6 @@
  *    ->i_pages lock		(folio_remove_rmap_pte->set_page_dirty)
  *    bdi.wb->list_lock		(folio_remove_rmap_pte->set_page_dirty)
  *    ->inode->i_lock		(folio_remove_rmap_pte->set_page_dirty)
- *    ->memcg->move_lock	(folio_remove_rmap_pte->folio_memcg_lock)
  *    bdi.wb->list_lock		(zap_pte_range->set_page_dirty)
  *    ->inode->i_lock		(zap_pte_range->set_page_dirty)
  *    ->private_lock		(zap_pte_range->block_dirty_folio)
diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
index 9c0fba8c8a83..539ceefa9d2d 100644
--- a/mm/memcontrol-v1.c
+++ b/mm/memcontrol-v1.c
@@ -401,87 +401,6 @@ unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order,
 	return nr_reclaimed;
 }
 
-/**
- * folio_memcg_lock - Bind a folio to its memcg.
- * @folio: The folio.
- *
- * This function prevents unlocked LRU folios from being moved to
- * another cgroup.
- *
- * It ensures lifetime of the bound memcg.  The caller is responsible
- * for the lifetime of the folio.
- */
-void folio_memcg_lock(struct folio *folio)
-{
-	struct mem_cgroup *memcg;
-	unsigned long flags;
-
-	/*
-	 * The RCU lock is held throughout the transaction.  The fast
-	 * path can get away without acquiring the memcg->move_lock
-	 * because page moving starts with an RCU grace period.
-         */
-	rcu_read_lock();
-
-	if (mem_cgroup_disabled())
-		return;
-again:
-	memcg = folio_memcg(folio);
-	if (unlikely(!memcg))
-		return;
-
-#ifdef CONFIG_PROVE_LOCKING
-	local_irq_save(flags);
-	might_lock(&memcg->move_lock);
-	local_irq_restore(flags);
-#endif
-
-	if (atomic_read(&memcg->moving_account) <= 0)
-		return;
-
-	spin_lock_irqsave(&memcg->move_lock, flags);
-	if (memcg != folio_memcg(folio)) {
-		spin_unlock_irqrestore(&memcg->move_lock, flags);
-		goto again;
-	}
-
-	/*
-	 * When charge migration first begins, we can have multiple
-	 * critical sections holding the fast-path RCU lock and one
-	 * holding the slowpath move_lock. Track the task who has the
-	 * move_lock for folio_memcg_unlock().
-	 */
-	memcg->move_lock_task = current;
-	memcg->move_lock_flags = flags;
-}
-
-static void __folio_memcg_unlock(struct mem_cgroup *memcg)
-{
-	if (memcg && memcg->move_lock_task == current) {
-		unsigned long flags = memcg->move_lock_flags;
-
-		memcg->move_lock_task = NULL;
-		memcg->move_lock_flags = 0;
-
-		spin_unlock_irqrestore(&memcg->move_lock, flags);
-	}
-
-	rcu_read_unlock();
-}
-
-/**
- * folio_memcg_unlock - Release the binding between a folio and its memcg.
- * @folio: The folio.
- *
- * This releases the binding created by folio_memcg_lock().  This does
- * not change the accounting of this folio to its memcg, but it does
- * permit others to change it.
- */
-void folio_memcg_unlock(struct folio *folio)
-{
-	__folio_memcg_unlock(folio_memcg(folio));
-}
-
 static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css,
 				struct cftype *cft)
 {
@@ -1189,7 +1108,6 @@ void memcg1_memcg_init(struct mem_cgroup *memcg)
 {
 	INIT_LIST_HEAD(&memcg->oom_notify);
 	mutex_init(&memcg->thresholds_lock);
-	spin_lock_init(&memcg->move_lock);
 	INIT_LIST_HEAD(&memcg->event_list);
 	spin_lock_init(&memcg->event_list_lock);
 }
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 94279b9c766a..3c223aaeb6af 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1189,7 +1189,6 @@ void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio)
  * These functions are safe to use under any of the following conditions:
  * - folio locked
  * - folio_test_lru false
- * - folio_memcg_lock()
  * - folio frozen (refcount of 0)
  *
  * Return: The lruvec this folio is on with its lock held.
@@ -1211,7 +1210,6 @@ struct lruvec *folio_lruvec_lock(struct folio *folio)
  * These functions are safe to use under any of the following conditions:
  * - folio locked
  * - folio_test_lru false
- * - folio_memcg_lock()
  * - folio frozen (refcount of 0)
  *
  * Return: The lruvec this folio is on with its lock held and interrupts
@@ -1235,7 +1233,6 @@ struct lruvec *folio_lruvec_lock_irq(struct folio *folio)
  * These functions are safe to use under any of the following conditions:
  * - folio locked
  * - folio_test_lru false
- * - folio_memcg_lock()
  * - folio frozen (refcount of 0)
  *
  * Return: The lruvec this folio is on with its lock held and interrupts
@@ -2375,9 +2372,7 @@ static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
 	 *
 	 * - the page lock
 	 * - LRU isolation
-	 * - folio_memcg_lock()
 	 * - exclusive reference
-	 * - mem_cgroup_trylock_pages()
 	 */
 	folio->memcg_data = (unsigned long)memcg;
 }
diff --git a/mm/rmap.c b/mm/rmap.c
index 4785a693857a..c6c4d4ea29a7 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -32,7 +32,6 @@
  *                   swap_lock (in swap_duplicate, swap_info_get)
  *                     mmlist_lock (in mmput, drain_mmlist and others)
  *                     mapping->private_lock (in block_dirty_folio)
- *                       folio_lock_memcg move_lock (in block_dirty_folio)
  *                         i_pages lock (widely used)
  *                           lruvec->lru_lock (in folio_lruvec_lock_irq)
  *                     inode->i_lock (in set_page_dirty's __mark_inode_dirty)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [RFC PATCH 3/3] memcg-v1: remove memcg move locking code
  2024-10-24  6:57   ` [RFC PATCH 3/3] memcg-v1: remove memcg move locking code Shakeel Butt
  2024-10-24  9:16     ` Michal Hocko
  2024-10-24 16:50     ` Roman Gushchin
@ 2024-10-25  1:23     ` Shakeel Butt
  2 siblings, 0 replies; 56+ messages in thread
From: Shakeel Butt @ 2024-10-25  1:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

The memcg v1's charge move feature has been deprecated. There is no need
to have any locking or protection against the moving charge. Let's
proceed to remove all the locking code related to charge moving.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 fs/buffer.c                |  5 ---
 include/linux/memcontrol.h | 54 -------------------------
 mm/filemap.c               |  1 -
 mm/memcontrol-v1.c         | 82 --------------------------------------
 mm/memcontrol.c            |  5 ---
 mm/page-writeback.c        | 21 ++--------
 mm/rmap.c                  |  1 -
 mm/vmscan.c                | 11 -----
 8 files changed, 3 insertions(+), 177 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 1fc9a50def0b..88e765b0699f 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -736,15 +736,12 @@ bool block_dirty_folio(struct address_space *mapping, struct folio *folio)
 	 * Lock out page's memcg migration to keep PageDirty
 	 * synchronized with per-memcg dirty page counters.
 	 */
-	folio_memcg_lock(folio);
 	newly_dirty = !folio_test_set_dirty(folio);
 	spin_unlock(&mapping->i_private_lock);
 
 	if (newly_dirty)
 		__folio_mark_dirty(folio, mapping, 1);
 
-	folio_memcg_unlock(folio);
-
 	if (newly_dirty)
 		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
 
@@ -1194,13 +1191,11 @@ void mark_buffer_dirty(struct buffer_head *bh)
 		struct folio *folio = bh->b_folio;
 		struct address_space *mapping = NULL;
 
-		folio_memcg_lock(folio);
 		if (!folio_test_set_dirty(folio)) {
 			mapping = folio->mapping;
 			if (mapping)
 				__folio_mark_dirty(folio, mapping, 0);
 		}
-		folio_memcg_unlock(folio);
 		if (mapping)
 			__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
 	}
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 798db70b0a30..932534291ca2 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -299,20 +299,10 @@ struct mem_cgroup {
 	/* For oom notifier event fd */
 	struct list_head oom_notify;
 
-	/* taken only while moving_account > 0 */
-	spinlock_t move_lock;
-	unsigned long move_lock_flags;
-
 	/* Legacy tcp memory accounting */
 	bool tcpmem_active;
 	int tcpmem_pressure;
 
-	/*
-	 * set > 0 if pages under this cgroup are moving to other cgroup.
-	 */
-	atomic_t moving_account;
-	struct task_struct *move_lock_task;
-
 	/* List of events which userspace want to receive */
 	struct list_head event_list;
 	spinlock_t event_list_lock;
@@ -428,9 +418,7 @@ static inline struct obj_cgroup *__folio_objcg(struct folio *folio)
  *
  * - the folio lock
  * - LRU isolation
- * - folio_memcg_lock()
  * - exclusive reference
- * - mem_cgroup_trylock_pages()
  *
  * For a kmem folio a caller should hold an rcu read lock to protect memcg
  * associated with a kmem folio from being released.
@@ -499,9 +487,7 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
  *
  * - the folio lock
  * - LRU isolation
- * - lock_folio_memcg()
  * - exclusive reference
- * - mem_cgroup_trylock_pages()
  *
  * For a kmem folio a caller should hold an rcu read lock to protect memcg
  * associated with a kmem folio from being released.
@@ -1873,26 +1859,6 @@ static inline bool task_in_memcg_oom(struct task_struct *p)
 	return p->memcg_in_oom;
 }
 
-void folio_memcg_lock(struct folio *folio);
-void folio_memcg_unlock(struct folio *folio);
-
-/* try to stablize folio_memcg() for all the pages in a memcg */
-static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg)
-{
-	rcu_read_lock();
-
-	if (mem_cgroup_disabled() || !atomic_read(&memcg->moving_account))
-		return true;
-
-	rcu_read_unlock();
-	return false;
-}
-
-static inline void mem_cgroup_unlock_pages(void)
-{
-	rcu_read_unlock();
-}
-
 static inline void mem_cgroup_enter_user_fault(void)
 {
 	WARN_ON(current->in_user_fault);
@@ -1914,26 +1880,6 @@ unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order,
 	return 0;
 }
 
-static inline void folio_memcg_lock(struct folio *folio)
-{
-}
-
-static inline void folio_memcg_unlock(struct folio *folio)
-{
-}
-
-static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg)
-{
-	/* to match folio_memcg_rcu() */
-	rcu_read_lock();
-	return true;
-}
-
-static inline void mem_cgroup_unlock_pages(void)
-{
-	rcu_read_unlock();
-}
-
 static inline bool task_in_memcg_oom(struct task_struct *p)
 {
 	return false;
diff --git a/mm/filemap.c b/mm/filemap.c
index 630a1c431ea1..e582a1545d2a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -119,7 +119,6 @@
  *    ->i_pages lock		(folio_remove_rmap_pte->set_page_dirty)
  *    bdi.wb->list_lock		(folio_remove_rmap_pte->set_page_dirty)
  *    ->inode->i_lock		(folio_remove_rmap_pte->set_page_dirty)
- *    ->memcg->move_lock	(folio_remove_rmap_pte->folio_memcg_lock)
  *    bdi.wb->list_lock		(zap_pte_range->set_page_dirty)
  *    ->inode->i_lock		(zap_pte_range->set_page_dirty)
  *    ->private_lock		(zap_pte_range->block_dirty_folio)
diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
index 79339cb65b9d..b14f01a93d0c 100644
--- a/mm/memcontrol-v1.c
+++ b/mm/memcontrol-v1.c
@@ -401,87 +401,6 @@ unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order,
 	return nr_reclaimed;
 }
 
-/**
- * folio_memcg_lock - Bind a folio to its memcg.
- * @folio: The folio.
- *
- * This function prevents unlocked LRU folios from being moved to
- * another cgroup.
- *
- * It ensures lifetime of the bound memcg.  The caller is responsible
- * for the lifetime of the folio.
- */
-void folio_memcg_lock(struct folio *folio)
-{
-	struct mem_cgroup *memcg;
-	unsigned long flags;
-
-	/*
-	 * The RCU lock is held throughout the transaction.  The fast
-	 * path can get away without acquiring the memcg->move_lock
-	 * because page moving starts with an RCU grace period.
-         */
-	rcu_read_lock();
-
-	if (mem_cgroup_disabled())
-		return;
-again:
-	memcg = folio_memcg(folio);
-	if (unlikely(!memcg))
-		return;
-
-#ifdef CONFIG_PROVE_LOCKING
-	local_irq_save(flags);
-	might_lock(&memcg->move_lock);
-	local_irq_restore(flags);
-#endif
-
-	if (atomic_read(&memcg->moving_account) <= 0)
-		return;
-
-	spin_lock_irqsave(&memcg->move_lock, flags);
-	if (memcg != folio_memcg(folio)) {
-		spin_unlock_irqrestore(&memcg->move_lock, flags);
-		goto again;
-	}
-
-	/*
-	 * When charge migration first begins, we can have multiple
-	 * critical sections holding the fast-path RCU lock and one
-	 * holding the slowpath move_lock. Track the task who has the
-	 * move_lock for folio_memcg_unlock().
-	 */
-	memcg->move_lock_task = current;
-	memcg->move_lock_flags = flags;
-}
-
-static void __folio_memcg_unlock(struct mem_cgroup *memcg)
-{
-	if (memcg && memcg->move_lock_task == current) {
-		unsigned long flags = memcg->move_lock_flags;
-
-		memcg->move_lock_task = NULL;
-		memcg->move_lock_flags = 0;
-
-		spin_unlock_irqrestore(&memcg->move_lock, flags);
-	}
-
-	rcu_read_unlock();
-}
-
-/**
- * folio_memcg_unlock - Release the binding between a folio and its memcg.
- * @folio: The folio.
- *
- * This releases the binding created by folio_memcg_lock().  This does
- * not change the accounting of this folio to its memcg, but it does
- * permit others to change it.
- */
-void folio_memcg_unlock(struct folio *folio)
-{
-	__folio_memcg_unlock(folio_memcg(folio));
-}
-
 static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css,
 				struct cftype *cft)
 {
@@ -1187,7 +1106,6 @@ void memcg1_memcg_init(struct mem_cgroup *memcg)
 {
 	INIT_LIST_HEAD(&memcg->oom_notify);
 	mutex_init(&memcg->thresholds_lock);
-	spin_lock_init(&memcg->move_lock);
 	INIT_LIST_HEAD(&memcg->event_list);
 	spin_lock_init(&memcg->event_list_lock);
 }
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 94279b9c766a..3c223aaeb6af 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1189,7 +1189,6 @@ void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio)
  * These functions are safe to use under any of the following conditions:
  * - folio locked
  * - folio_test_lru false
- * - folio_memcg_lock()
  * - folio frozen (refcount of 0)
  *
  * Return: The lruvec this folio is on with its lock held.
@@ -1211,7 +1210,6 @@ struct lruvec *folio_lruvec_lock(struct folio *folio)
  * These functions are safe to use under any of the following conditions:
  * - folio locked
  * - folio_test_lru false
- * - folio_memcg_lock()
  * - folio frozen (refcount of 0)
  *
  * Return: The lruvec this folio is on with its lock held and interrupts
@@ -1235,7 +1233,6 @@ struct lruvec *folio_lruvec_lock_irq(struct folio *folio)
  * These functions are safe to use under any of the following conditions:
  * - folio locked
  * - folio_test_lru false
- * - folio_memcg_lock()
  * - folio frozen (refcount of 0)
  *
  * Return: The lruvec this folio is on with its lock held and interrupts
@@ -2375,9 +2372,7 @@ static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
 	 *
 	 * - the page lock
 	 * - LRU isolation
-	 * - folio_memcg_lock()
 	 * - exclusive reference
-	 * - mem_cgroup_trylock_pages()
 	 */
 	folio->memcg_data = (unsigned long)memcg;
 }
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 1d7179aba8e3..e33727dd6b47 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2743,8 +2743,6 @@ EXPORT_SYMBOL(noop_dirty_folio);
 /*
  * Helper function for set_page_dirty family.
  *
- * Caller must hold folio_memcg_lock().
- *
  * NOTE: This relies on being atomic wrt interrupts.
  */
 static void folio_account_dirtied(struct folio *folio,
@@ -2776,8 +2774,6 @@ static void folio_account_dirtied(struct folio *folio,
 
 /*
  * Helper function for deaccounting dirty page without writeback.
- *
- * Caller must hold folio_memcg_lock().
  */
 void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
 {
@@ -2795,9 +2791,8 @@ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
  * If warn is true, then emit a warning if the folio is not uptodate and has
  * not been truncated.
  *
- * The caller must hold folio_memcg_lock().  It is the caller's
- * responsibility to prevent the folio from being truncated while
- * this function is in progress, although it may have been truncated
+ * It is the caller's responsibility to prevent the folio from being truncated
+ * while this function is in progress, although it may have been truncated
  * before this function is called.  Most callers have the folio locked.
  * A few have the folio blocked from truncation through other means (e.g.
  * zap_vma_pages() has it mapped and is holding the page table lock).
@@ -2841,14 +2836,10 @@ void __folio_mark_dirty(struct folio *folio, struct address_space *mapping,
  */
 bool filemap_dirty_folio(struct address_space *mapping, struct folio *folio)
 {
-	folio_memcg_lock(folio);
-	if (folio_test_set_dirty(folio)) {
-		folio_memcg_unlock(folio);
+	if (folio_test_set_dirty(folio))
 		return false;
-	}
 
 	__folio_mark_dirty(folio, mapping, !folio_test_private(folio));
-	folio_memcg_unlock(folio);
 
 	if (mapping->host) {
 		/* !PageAnon && !swapper_space */
@@ -2975,14 +2966,12 @@ void __folio_cancel_dirty(struct folio *folio)
 		struct bdi_writeback *wb;
 		struct wb_lock_cookie cookie = {};
 
-		folio_memcg_lock(folio);
 		wb = unlocked_inode_to_wb_begin(inode, &cookie);
 
 		if (folio_test_clear_dirty(folio))
 			folio_account_cleaned(folio, wb);
 
 		unlocked_inode_to_wb_end(inode, &cookie);
-		folio_memcg_unlock(folio);
 	} else {
 		folio_clear_dirty(folio);
 	}
@@ -3093,7 +3082,6 @@ bool __folio_end_writeback(struct folio *folio)
 	struct address_space *mapping = folio_mapping(folio);
 	bool ret;
 
-	folio_memcg_lock(folio);
 	if (mapping && mapping_use_writeback_tags(mapping)) {
 		struct inode *inode = mapping->host;
 		struct backing_dev_info *bdi = inode_to_bdi(inode);
@@ -3124,7 +3112,6 @@ bool __folio_end_writeback(struct folio *folio)
 	lruvec_stat_mod_folio(folio, NR_WRITEBACK, -nr);
 	zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, -nr);
 	node_stat_mod_folio(folio, NR_WRITTEN, nr);
-	folio_memcg_unlock(folio);
 
 	return ret;
 }
@@ -3137,7 +3124,6 @@ void __folio_start_writeback(struct folio *folio, bool keep_write)
 
 	VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio);
 
-	folio_memcg_lock(folio);
 	if (mapping && mapping_use_writeback_tags(mapping)) {
 		XA_STATE(xas, &mapping->i_pages, folio_index(folio));
 		struct inode *inode = mapping->host;
@@ -3178,7 +3164,6 @@ void __folio_start_writeback(struct folio *folio, bool keep_write)
 
 	lruvec_stat_mod_folio(folio, NR_WRITEBACK, nr);
 	zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, nr);
-	folio_memcg_unlock(folio);
 
 	access_ret = arch_make_folio_accessible(folio);
 	/*
diff --git a/mm/rmap.c b/mm/rmap.c
index 4785a693857a..c6c4d4ea29a7 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -32,7 +32,6 @@
  *                   swap_lock (in swap_duplicate, swap_info_get)
  *                     mmlist_lock (in mmput, drain_mmlist and others)
  *                     mapping->private_lock (in block_dirty_folio)
- *                       folio_lock_memcg move_lock (in block_dirty_folio)
  *                         i_pages lock (widely used)
  *                           lruvec->lru_lock (in folio_lruvec_lock_irq)
  *                     inode->i_lock (in set_page_dirty's __mark_inode_dirty)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 29c098790b01..fd7171658b63 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3662,10 +3662,6 @@ static void walk_mm(struct mm_struct *mm, struct lru_gen_mm_walk *walk)
 		if (walk->seq != max_seq)
 			break;
 
-		/* folio_update_gen() requires stable folio_memcg() */
-		if (!mem_cgroup_trylock_pages(memcg))
-			break;
-
 		/* the caller might be holding the lock for write */
 		if (mmap_read_trylock(mm)) {
 			err = walk_page_range(mm, walk->next_addr, ULONG_MAX, &mm_walk_ops, walk);
@@ -3673,8 +3669,6 @@ static void walk_mm(struct mm_struct *mm, struct lru_gen_mm_walk *walk)
 			mmap_read_unlock(mm);
 		}
 
-		mem_cgroup_unlock_pages();
-
 		if (walk->batched) {
 			spin_lock_irq(&lruvec->lru_lock);
 			reset_batch_size(walk);
@@ -4096,10 +4090,6 @@ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
 		}
 	}
 
-	/* folio_update_gen() requires stable folio_memcg() */
-	if (!mem_cgroup_trylock_pages(memcg))
-		return true;
-
 	arch_enter_lazy_mmu_mode();
 
 	pte -= (addr - start) / PAGE_SIZE;
@@ -4144,7 +4134,6 @@ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
 	}
 
 	arch_leave_lazy_mmu_mode();
-	mem_cgroup_unlock_pages();
 
 	/* feedback from rmap walkers to page table walkers */
 	if (mm_state && suitable_to_scan(i, young))
-- 
2.43.5

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 0/6] memcg-v1: fully deprecate charge moving
  2024-10-25  1:22 [PATCH v1 0/6] memcg-v1: fully deprecate charge moving Shakeel Butt
                   ` (6 preceding siblings ...)
  2024-10-25  1:23 ` [PATCH v1 6/6] memcg-v1: remove memcg move locking code Shakeel Butt
@ 2024-10-25  1:33 ` Shakeel Butt
  7 siblings, 0 replies; 56+ messages in thread
From: Shakeel Butt @ 2024-10-25  1:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

On Thu, Oct 24, 2024 at 06:22:57PM GMT, Shakeel Butt wrote:
> The memcg v1's charge moving feature has been deprecated for almost 2
> years and the kernel warns if someone try to use it. This warning has
> been backported to all stable kernel and there have not been any report
> of the warning or the request to support this feature anymore. Let's
> proceed to fully deprecate this feature.
> 
> Changes since RFC:
> - Writing 0 to memory.move_charge_at_immigrate is allowed.
> - Remove the memcg move locking in separate patches.
> 
> Shakeel Butt (6):
>   memcg-v1: fully deprecate move_charge_at_immigrate
>   memcg-v1: remove charge move code
>   memcg-v1: no need for memcg locking for dirty tracking
>   memcg-v1: no need for memcg locking for writeback tracking
>   memcg-v1: no need for memcg locking for MGLRU
>   memcg-v1: remove memcg move locking code
> 

Please ignore the patch at [1] as it was resent by mistake.

[1] https://lore.kernel.org/linux-mm/20241024065712.1274481-4-shakeel.butt@linux.dev


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 1/6] memcg-v1: fully deprecate move_charge_at_immigrate
  2024-10-25  1:22 ` [PATCH v1 1/6] memcg-v1: fully deprecate move_charge_at_immigrate Shakeel Butt
@ 2024-10-25  6:54   ` Michal Hocko
  2024-10-28 13:53   ` Johannes Weiner
  1 sibling, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2024-10-25  6:54 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

On Thu 24-10-24 18:22:58, Shakeel Butt wrote:
> Proceed with the complete deprecation of memcg v1's charge moving
> feature. The deprecation warning has been in the kernel for almost two
> years and has been ported to all stable kernel since. Now is the time to
> fully deprecate this feature.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>

Acked-by: Michal Hocko <mhocko@suse.com>
Thanks!

> ---
> 
> Changes since RFC:
> - Writing 0 to memory.move_charge_at_immigrate is allowed.
> 
>  .../admin-guide/cgroup-v1/memory.rst          | 82 +------------------
>  mm/memcontrol-v1.c                            | 14 +---
>  2 files changed, 5 insertions(+), 91 deletions(-)
> 
> diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
> index 270501db9f4e..286d16fc22eb 100644
> --- a/Documentation/admin-guide/cgroup-v1/memory.rst
> +++ b/Documentation/admin-guide/cgroup-v1/memory.rst
> @@ -90,9 +90,7 @@ Brief summary of control files.
>                                       used.
>   memory.swappiness		     set/show swappiness parameter of vmscan
>  				     (See sysctl's vm.swappiness)
> - memory.move_charge_at_immigrate     set/show controls of moving charges
> -                                     This knob is deprecated and shouldn't be
> -                                     used.
> + memory.move_charge_at_immigrate     This knob is deprecated.
>   memory.oom_control		     set/show oom controls.
>                                       This knob is deprecated and shouldn't be
>                                       used.
> @@ -243,10 +241,6 @@ behind this approach is that a cgroup that aggressively uses a shared
>  page will eventually get charged for it (once it is uncharged from
>  the cgroup that brought it in -- this will happen on memory pressure).
>  
> -But see :ref:`section 8.2 <cgroup-v1-memory-movable-charges>` when moving a
> -task to another cgroup, its pages may be recharged to the new cgroup, if
> -move_charge_at_immigrate has been chosen.
> -
>  2.4 Swap Extension
>  --------------------------------------
>  
> @@ -756,78 +750,8 @@ If we want to change this to 1G, we can at any time use::
>  
>  THIS IS DEPRECATED!
>  
> -It's expensive and unreliable! It's better practice to launch workload
> -tasks directly from inside their target cgroup. Use dedicated workload
> -cgroups to allow fine-grained policy adjustments without having to
> -move physical pages between control domains.
> -
> -Users can move charges associated with a task along with task migration, that
> -is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
> -This feature is not supported in !CONFIG_MMU environments because of lack of
> -page tables.
> -
> -8.1 Interface
> --------------
> -
> -This feature is disabled by default. It can be enabled (and disabled again) by
> -writing to memory.move_charge_at_immigrate of the destination cgroup.
> -
> -If you want to enable it::
> -
> -	# echo (some positive value) > memory.move_charge_at_immigrate
> -
> -.. note::
> -      Each bits of move_charge_at_immigrate has its own meaning about what type
> -      of charges should be moved. See :ref:`section 8.2
> -      <cgroup-v1-memory-movable-charges>` for details.
> -
> -.. note::
> -      Charges are moved only when you move mm->owner, in other words,
> -      a leader of a thread group.
> -
> -.. note::
> -      If we cannot find enough space for the task in the destination cgroup, we
> -      try to make space by reclaiming memory. Task migration may fail if we
> -      cannot make enough space.
> -
> -.. note::
> -      It can take several seconds if you move charges much.
> -
> -And if you want disable it again::
> -
> -	# echo 0 > memory.move_charge_at_immigrate
> -
> -.. _cgroup-v1-memory-movable-charges:
> -
> -8.2 Type of charges which can be moved
> ---------------------------------------
> -
> -Each bit in move_charge_at_immigrate has its own meaning about what type of
> -charges should be moved. But in any case, it must be noted that an account of
> -a page or a swap can be moved only when it is charged to the task's current
> -(old) memory cgroup.
> -
> -+---+--------------------------------------------------------------------------+
> -|bit| what type of charges would be moved ?                                    |
> -+===+==========================================================================+
> -| 0 | A charge of an anonymous page (or swap of it) used by the target task.   |
> -|   | You must enable Swap Extension (see 2.4) to enable move of swap charges. |
> -+---+--------------------------------------------------------------------------+
> -| 1 | A charge of file pages (normal file, tmpfs file (e.g. ipc shared memory) |
> -|   | and swaps of tmpfs file) mmapped by the target task. Unlike the case of  |
> -|   | anonymous pages, file pages (and swaps) in the range mmapped by the task |
> -|   | will be moved even if the task hasn't done page fault, i.e. they might   |
> -|   | not be the task's "RSS", but other task's "RSS" that maps the same file. |
> -|   | The mapcount of the page is ignored (the page can be moved independent   |
> -|   | of the mapcount). You must enable Swap Extension (see 2.4) to            |
> -|   | enable move of swap charges.                                             |
> -+---+--------------------------------------------------------------------------+
> -
> -8.3 TODO
> ---------
> -
> -- All of moving charge operations are done under cgroup_mutex. It's not good
> -  behavior to hold the mutex too long, so we may need some trick.
> +Reading memory.move_charge_at_immigrate will always return 0 and writing
> +to it will always return -EINVAL.
>  
>  9. Memory thresholds
>  ====================
> diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
> index 81d8819f13cd..9b3b1a446c65 100644
> --- a/mm/memcontrol-v1.c
> +++ b/mm/memcontrol-v1.c
> @@ -593,29 +593,19 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
>  static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css,
>  				struct cftype *cft)
>  {
> -	return mem_cgroup_from_css(css)->move_charge_at_immigrate;
> +	return 0;
>  }
>  
>  #ifdef CONFIG_MMU
>  static int mem_cgroup_move_charge_write(struct cgroup_subsys_state *css,
>  				 struct cftype *cft, u64 val)
>  {
> -	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> -
>  	pr_warn_once("Cgroup memory moving (move_charge_at_immigrate) is deprecated. "
>  		     "Please report your usecase to linux-mm@kvack.org if you "
>  		     "depend on this functionality.\n");
>  
> -	if (val & ~MOVE_MASK)
> +	if (val != 0)
>  		return -EINVAL;
> -
> -	/*
> -	 * No kind of locking is needed in here, because ->can_attach() will
> -	 * check this value once in the beginning of the process, and then carry
> -	 * on with stale data. This means that changes to this value will only
> -	 * affect task migrations starting after the change.
> -	 */
> -	memcg->move_charge_at_immigrate = val;
>  	return 0;
>  }
>  #else
> -- 
> 2.43.5

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 3/6] memcg-v1: no need for memcg locking for dirty tracking
  2024-10-25  1:23 ` [PATCH v1 3/6] memcg-v1: no need for memcg locking for dirty tracking Shakeel Butt
@ 2024-10-25  6:56   ` Michal Hocko
  2024-10-25 16:22     ` Shakeel Butt
  2024-10-25 17:40   ` Roman Gushchin
  2024-10-28 14:00   ` Johannes Weiner
  2 siblings, 1 reply; 56+ messages in thread
From: Michal Hocko @ 2024-10-25  6:56 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

On Thu 24-10-24 18:23:00, Shakeel Butt wrote:
> During the era of memcg charge migration, the kernel has to be make sure
> that the dirty stat updates do not race with the charge migration.
> Otherwise it might update the dirty stats of the wrong memcg. Now with
> the memcg charge migration deprecated, there is no more race for dirty

s@deprecated@gone@

> stat updates and the previous locking can be removed.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

LGTM otherwise
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  fs/buffer.c         |  5 -----
>  mm/page-writeback.c | 16 +++-------------
>  2 files changed, 3 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 1fc9a50def0b..88e765b0699f 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -736,15 +736,12 @@ bool block_dirty_folio(struct address_space *mapping, struct folio *folio)
>  	 * Lock out page's memcg migration to keep PageDirty
>  	 * synchronized with per-memcg dirty page counters.
>  	 */
> -	folio_memcg_lock(folio);
>  	newly_dirty = !folio_test_set_dirty(folio);
>  	spin_unlock(&mapping->i_private_lock);
>  
>  	if (newly_dirty)
>  		__folio_mark_dirty(folio, mapping, 1);
>  
> -	folio_memcg_unlock(folio);
> -
>  	if (newly_dirty)
>  		__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
>  
> @@ -1194,13 +1191,11 @@ void mark_buffer_dirty(struct buffer_head *bh)
>  		struct folio *folio = bh->b_folio;
>  		struct address_space *mapping = NULL;
>  
> -		folio_memcg_lock(folio);
>  		if (!folio_test_set_dirty(folio)) {
>  			mapping = folio->mapping;
>  			if (mapping)
>  				__folio_mark_dirty(folio, mapping, 0);
>  		}
> -		folio_memcg_unlock(folio);
>  		if (mapping)
>  			__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
>  	}
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 1d7179aba8e3..a76a73529fd9 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -2743,8 +2743,6 @@ EXPORT_SYMBOL(noop_dirty_folio);
>  /*
>   * Helper function for set_page_dirty family.
>   *
> - * Caller must hold folio_memcg_lock().
> - *
>   * NOTE: This relies on being atomic wrt interrupts.
>   */
>  static void folio_account_dirtied(struct folio *folio,
> @@ -2777,7 +2775,6 @@ static void folio_account_dirtied(struct folio *folio,
>  /*
>   * Helper function for deaccounting dirty page without writeback.
>   *
> - * Caller must hold folio_memcg_lock().
>   */
>  void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
>  {
> @@ -2795,9 +2792,8 @@ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
>   * If warn is true, then emit a warning if the folio is not uptodate and has
>   * not been truncated.
>   *
> - * The caller must hold folio_memcg_lock().  It is the caller's
> - * responsibility to prevent the folio from being truncated while
> - * this function is in progress, although it may have been truncated
> + * It is the caller's responsibility to prevent the folio from being truncated
> + * while this function is in progress, although it may have been truncated
>   * before this function is called.  Most callers have the folio locked.
>   * A few have the folio blocked from truncation through other means (e.g.
>   * zap_vma_pages() has it mapped and is holding the page table lock).
> @@ -2841,14 +2837,10 @@ void __folio_mark_dirty(struct folio *folio, struct address_space *mapping,
>   */
>  bool filemap_dirty_folio(struct address_space *mapping, struct folio *folio)
>  {
> -	folio_memcg_lock(folio);
> -	if (folio_test_set_dirty(folio)) {
> -		folio_memcg_unlock(folio);
> +	if (folio_test_set_dirty(folio))
>  		return false;
> -	}
>  
>  	__folio_mark_dirty(folio, mapping, !folio_test_private(folio));
> -	folio_memcg_unlock(folio);
>  
>  	if (mapping->host) {
>  		/* !PageAnon && !swapper_space */
> @@ -2975,14 +2967,12 @@ void __folio_cancel_dirty(struct folio *folio)
>  		struct bdi_writeback *wb;
>  		struct wb_lock_cookie cookie = {};
>  
> -		folio_memcg_lock(folio);
>  		wb = unlocked_inode_to_wb_begin(inode, &cookie);
>  
>  		if (folio_test_clear_dirty(folio))
>  			folio_account_cleaned(folio, wb);
>  
>  		unlocked_inode_to_wb_end(inode, &cookie);
> -		folio_memcg_unlock(folio);
>  	} else {
>  		folio_clear_dirty(folio);
>  	}
> -- 
> 2.43.5

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 4/6] memcg-v1: no need for memcg locking for writeback tracking
  2024-10-25  1:23 ` [PATCH v1 4/6] memcg-v1: no need for memcg locking for writeback tracking Shakeel Butt
@ 2024-10-25  6:57   ` Michal Hocko
  2024-10-25 17:40   ` Roman Gushchin
  2024-10-28 14:00   ` Johannes Weiner
  2 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2024-10-25  6:57 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

On Thu 24-10-24 18:23:01, Shakeel Butt wrote:
> During the era of memcg charge migration, the kernel has to be make sure
> that the writeback stat updates do not race with the charge migration.
> Otherwise it might update the writeback stats of the wrong memcg. Now
> with the memcg charge migration deprecated, there is no more race for

s@deprecated@gone

> writeback stat updates and the previous locking can be removed.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Acked-by: Michal Hocko <mhocko@suse.com>
Thanks!

> ---
>  mm/page-writeback.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index a76a73529fd9..9c3317c3a615 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -3083,7 +3083,6 @@ bool __folio_end_writeback(struct folio *folio)
>  	struct address_space *mapping = folio_mapping(folio);
>  	bool ret;
>  
> -	folio_memcg_lock(folio);
>  	if (mapping && mapping_use_writeback_tags(mapping)) {
>  		struct inode *inode = mapping->host;
>  		struct backing_dev_info *bdi = inode_to_bdi(inode);
> @@ -3114,7 +3113,6 @@ bool __folio_end_writeback(struct folio *folio)
>  	lruvec_stat_mod_folio(folio, NR_WRITEBACK, -nr);
>  	zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, -nr);
>  	node_stat_mod_folio(folio, NR_WRITTEN, nr);
> -	folio_memcg_unlock(folio);
>  
>  	return ret;
>  }
> @@ -3127,7 +3125,6 @@ void __folio_start_writeback(struct folio *folio, bool keep_write)
>  
>  	VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio);
>  
> -	folio_memcg_lock(folio);
>  	if (mapping && mapping_use_writeback_tags(mapping)) {
>  		XA_STATE(xas, &mapping->i_pages, folio_index(folio));
>  		struct inode *inode = mapping->host;
> @@ -3168,7 +3165,6 @@ void __folio_start_writeback(struct folio *folio, bool keep_write)
>  
>  	lruvec_stat_mod_folio(folio, NR_WRITEBACK, nr);
>  	zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, nr);
> -	folio_memcg_unlock(folio);
>  
>  	access_ret = arch_make_folio_accessible(folio);
>  	/*
> -- 
> 2.43.5

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 6/6] memcg-v1: remove memcg move locking code
  2024-10-25  1:23 ` [PATCH v1 6/6] memcg-v1: remove memcg move locking code Shakeel Butt
@ 2024-10-25  6:59   ` Michal Hocko
  2024-10-25 17:42   ` Roman Gushchin
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2024-10-25  6:59 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

On Thu 24-10-24 18:23:03, Shakeel Butt wrote:
> The memcg v1's charge move feature has been deprecated. All the places
> using the memcg move lock, have stopped using it as they don't need the
> protection any more. Let's proceed to remove all the locking code
> related to charge moving.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Thank you for restructuring this. Having all callers gone by now
certainly makes this much safer and easier to review.

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
> 
> Changes since RFC:
> - Remove the memcg move locking in separate patches.
> 
>  include/linux/memcontrol.h | 54 -------------------------
>  mm/filemap.c               |  1 -
>  mm/memcontrol-v1.c         | 82 --------------------------------------
>  mm/memcontrol.c            |  5 ---
>  mm/rmap.c                  |  1 -
>  5 files changed, 143 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 798db70b0a30..932534291ca2 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -299,20 +299,10 @@ struct mem_cgroup {
>  	/* For oom notifier event fd */
>  	struct list_head oom_notify;
>  
> -	/* taken only while moving_account > 0 */
> -	spinlock_t move_lock;
> -	unsigned long move_lock_flags;
> -
>  	/* Legacy tcp memory accounting */
>  	bool tcpmem_active;
>  	int tcpmem_pressure;
>  
> -	/*
> -	 * set > 0 if pages under this cgroup are moving to other cgroup.
> -	 */
> -	atomic_t moving_account;
> -	struct task_struct *move_lock_task;
> -
>  	/* List of events which userspace want to receive */
>  	struct list_head event_list;
>  	spinlock_t event_list_lock;
> @@ -428,9 +418,7 @@ static inline struct obj_cgroup *__folio_objcg(struct folio *folio)
>   *
>   * - the folio lock
>   * - LRU isolation
> - * - folio_memcg_lock()
>   * - exclusive reference
> - * - mem_cgroup_trylock_pages()
>   *
>   * For a kmem folio a caller should hold an rcu read lock to protect memcg
>   * associated with a kmem folio from being released.
> @@ -499,9 +487,7 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
>   *
>   * - the folio lock
>   * - LRU isolation
> - * - lock_folio_memcg()
>   * - exclusive reference
> - * - mem_cgroup_trylock_pages()
>   *
>   * For a kmem folio a caller should hold an rcu read lock to protect memcg
>   * associated with a kmem folio from being released.
> @@ -1873,26 +1859,6 @@ static inline bool task_in_memcg_oom(struct task_struct *p)
>  	return p->memcg_in_oom;
>  }
>  
> -void folio_memcg_lock(struct folio *folio);
> -void folio_memcg_unlock(struct folio *folio);
> -
> -/* try to stablize folio_memcg() for all the pages in a memcg */
> -static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg)
> -{
> -	rcu_read_lock();
> -
> -	if (mem_cgroup_disabled() || !atomic_read(&memcg->moving_account))
> -		return true;
> -
> -	rcu_read_unlock();
> -	return false;
> -}
> -
> -static inline void mem_cgroup_unlock_pages(void)
> -{
> -	rcu_read_unlock();
> -}
> -
>  static inline void mem_cgroup_enter_user_fault(void)
>  {
>  	WARN_ON(current->in_user_fault);
> @@ -1914,26 +1880,6 @@ unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order,
>  	return 0;
>  }
>  
> -static inline void folio_memcg_lock(struct folio *folio)
> -{
> -}
> -
> -static inline void folio_memcg_unlock(struct folio *folio)
> -{
> -}
> -
> -static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg)
> -{
> -	/* to match folio_memcg_rcu() */
> -	rcu_read_lock();
> -	return true;
> -}
> -
> -static inline void mem_cgroup_unlock_pages(void)
> -{
> -	rcu_read_unlock();
> -}
> -
>  static inline bool task_in_memcg_oom(struct task_struct *p)
>  {
>  	return false;
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 630a1c431ea1..e582a1545d2a 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -119,7 +119,6 @@
>   *    ->i_pages lock		(folio_remove_rmap_pte->set_page_dirty)
>   *    bdi.wb->list_lock		(folio_remove_rmap_pte->set_page_dirty)
>   *    ->inode->i_lock		(folio_remove_rmap_pte->set_page_dirty)
> - *    ->memcg->move_lock	(folio_remove_rmap_pte->folio_memcg_lock)
>   *    bdi.wb->list_lock		(zap_pte_range->set_page_dirty)
>   *    ->inode->i_lock		(zap_pte_range->set_page_dirty)
>   *    ->private_lock		(zap_pte_range->block_dirty_folio)
> diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
> index 9c0fba8c8a83..539ceefa9d2d 100644
> --- a/mm/memcontrol-v1.c
> +++ b/mm/memcontrol-v1.c
> @@ -401,87 +401,6 @@ unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order,
>  	return nr_reclaimed;
>  }
>  
> -/**
> - * folio_memcg_lock - Bind a folio to its memcg.
> - * @folio: The folio.
> - *
> - * This function prevents unlocked LRU folios from being moved to
> - * another cgroup.
> - *
> - * It ensures lifetime of the bound memcg.  The caller is responsible
> - * for the lifetime of the folio.
> - */
> -void folio_memcg_lock(struct folio *folio)
> -{
> -	struct mem_cgroup *memcg;
> -	unsigned long flags;
> -
> -	/*
> -	 * The RCU lock is held throughout the transaction.  The fast
> -	 * path can get away without acquiring the memcg->move_lock
> -	 * because page moving starts with an RCU grace period.
> -         */
> -	rcu_read_lock();
> -
> -	if (mem_cgroup_disabled())
> -		return;
> -again:
> -	memcg = folio_memcg(folio);
> -	if (unlikely(!memcg))
> -		return;
> -
> -#ifdef CONFIG_PROVE_LOCKING
> -	local_irq_save(flags);
> -	might_lock(&memcg->move_lock);
> -	local_irq_restore(flags);
> -#endif
> -
> -	if (atomic_read(&memcg->moving_account) <= 0)
> -		return;
> -
> -	spin_lock_irqsave(&memcg->move_lock, flags);
> -	if (memcg != folio_memcg(folio)) {
> -		spin_unlock_irqrestore(&memcg->move_lock, flags);
> -		goto again;
> -	}
> -
> -	/*
> -	 * When charge migration first begins, we can have multiple
> -	 * critical sections holding the fast-path RCU lock and one
> -	 * holding the slowpath move_lock. Track the task who has the
> -	 * move_lock for folio_memcg_unlock().
> -	 */
> -	memcg->move_lock_task = current;
> -	memcg->move_lock_flags = flags;
> -}
> -
> -static void __folio_memcg_unlock(struct mem_cgroup *memcg)
> -{
> -	if (memcg && memcg->move_lock_task == current) {
> -		unsigned long flags = memcg->move_lock_flags;
> -
> -		memcg->move_lock_task = NULL;
> -		memcg->move_lock_flags = 0;
> -
> -		spin_unlock_irqrestore(&memcg->move_lock, flags);
> -	}
> -
> -	rcu_read_unlock();
> -}
> -
> -/**
> - * folio_memcg_unlock - Release the binding between a folio and its memcg.
> - * @folio: The folio.
> - *
> - * This releases the binding created by folio_memcg_lock().  This does
> - * not change the accounting of this folio to its memcg, but it does
> - * permit others to change it.
> - */
> -void folio_memcg_unlock(struct folio *folio)
> -{
> -	__folio_memcg_unlock(folio_memcg(folio));
> -}
> -
>  static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css,
>  				struct cftype *cft)
>  {
> @@ -1189,7 +1108,6 @@ void memcg1_memcg_init(struct mem_cgroup *memcg)
>  {
>  	INIT_LIST_HEAD(&memcg->oom_notify);
>  	mutex_init(&memcg->thresholds_lock);
> -	spin_lock_init(&memcg->move_lock);
>  	INIT_LIST_HEAD(&memcg->event_list);
>  	spin_lock_init(&memcg->event_list_lock);
>  }
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 94279b9c766a..3c223aaeb6af 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1189,7 +1189,6 @@ void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio)
>   * These functions are safe to use under any of the following conditions:
>   * - folio locked
>   * - folio_test_lru false
> - * - folio_memcg_lock()
>   * - folio frozen (refcount of 0)
>   *
>   * Return: The lruvec this folio is on with its lock held.
> @@ -1211,7 +1210,6 @@ struct lruvec *folio_lruvec_lock(struct folio *folio)
>   * These functions are safe to use under any of the following conditions:
>   * - folio locked
>   * - folio_test_lru false
> - * - folio_memcg_lock()
>   * - folio frozen (refcount of 0)
>   *
>   * Return: The lruvec this folio is on with its lock held and interrupts
> @@ -1235,7 +1233,6 @@ struct lruvec *folio_lruvec_lock_irq(struct folio *folio)
>   * These functions are safe to use under any of the following conditions:
>   * - folio locked
>   * - folio_test_lru false
> - * - folio_memcg_lock()
>   * - folio frozen (refcount of 0)
>   *
>   * Return: The lruvec this folio is on with its lock held and interrupts
> @@ -2375,9 +2372,7 @@ static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
>  	 *
>  	 * - the page lock
>  	 * - LRU isolation
> -	 * - folio_memcg_lock()
>  	 * - exclusive reference
> -	 * - mem_cgroup_trylock_pages()
>  	 */
>  	folio->memcg_data = (unsigned long)memcg;
>  }
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 4785a693857a..c6c4d4ea29a7 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -32,7 +32,6 @@
>   *                   swap_lock (in swap_duplicate, swap_info_get)
>   *                     mmlist_lock (in mmput, drain_mmlist and others)
>   *                     mapping->private_lock (in block_dirty_folio)
> - *                       folio_lock_memcg move_lock (in block_dirty_folio)
>   *                         i_pages lock (widely used)
>   *                           lruvec->lru_lock (in folio_lruvec_lock_irq)
>   *                     inode->i_lock (in set_page_dirty's __mark_inode_dirty)
> -- 
> 2.43.5

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 3/6] memcg-v1: no need for memcg locking for dirty tracking
  2024-10-25  6:56   ` Michal Hocko
@ 2024-10-25 16:22     ` Shakeel Butt
  0 siblings, 0 replies; 56+ messages in thread
From: Shakeel Butt @ 2024-10-25 16:22 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Johannes Weiner, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

On Fri, Oct 25, 2024 at 08:56:14AM GMT, Michal Hocko wrote:
> On Thu 24-10-24 18:23:00, Shakeel Butt wrote:
> > During the era of memcg charge migration, the kernel has to be make sure
> > that the dirty stat updates do not race with the charge migration.
> > Otherwise it might update the dirty stats of the wrong memcg. Now with
> > the memcg charge migration deprecated, there is no more race for dirty
> 
> s@deprecated@gone@
> 
> > stat updates and the previous locking can be removed.
> > 
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> 
> LGTM otherwise
> Acked-by: Michal Hocko <mhocko@suse.com>
> 

Thanks Michal for the review.

Andrew, please fixup the commit message as suggested by Michal when you
pick this series.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 3/6] memcg-v1: no need for memcg locking for dirty tracking
  2024-10-25  1:23 ` [PATCH v1 3/6] memcg-v1: no need for memcg locking for dirty tracking Shakeel Butt
  2024-10-25  6:56   ` Michal Hocko
@ 2024-10-25 17:40   ` Roman Gushchin
  2024-10-28 14:00   ` Johannes Weiner
  2 siblings, 0 replies; 56+ messages in thread
From: Roman Gushchin @ 2024-10-25 17:40 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

On Thu, Oct 24, 2024 at 06:23:00PM -0700, Shakeel Butt wrote:
> During the era of memcg charge migration, the kernel has to be make sure
> that the dirty stat updates do not race with the charge migration.
> Otherwise it might update the dirty stats of the wrong memcg. Now with
> the memcg charge migration deprecated, there is no more race for dirty
> stat updates and the previous locking can be removed.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 4/6] memcg-v1: no need for memcg locking for writeback tracking
  2024-10-25  1:23 ` [PATCH v1 4/6] memcg-v1: no need for memcg locking for writeback tracking Shakeel Butt
  2024-10-25  6:57   ` Michal Hocko
@ 2024-10-25 17:40   ` Roman Gushchin
  2024-10-28 14:00   ` Johannes Weiner
  2 siblings, 0 replies; 56+ messages in thread
From: Roman Gushchin @ 2024-10-25 17:40 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

On Thu, Oct 24, 2024 at 06:23:01PM -0700, Shakeel Butt wrote:
> During the era of memcg charge migration, the kernel has to be make sure
> that the writeback stat updates do not race with the charge migration.
> Otherwise it might update the writeback stats of the wrong memcg. Now
> with the memcg charge migration deprecated, there is no more race for
> writeback stat updates and the previous locking can be removed.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 5/6] memcg-v1: no need for memcg locking for MGLRU
  2024-10-25  1:23 ` [PATCH v1 5/6] memcg-v1: no need for memcg locking for MGLRU Shakeel Butt
@ 2024-10-25 17:41   ` Roman Gushchin
  2024-10-26  3:55   ` Yu Zhao
  2024-10-26  6:34   ` Shakeel Butt
  2 siblings, 0 replies; 56+ messages in thread
From: Roman Gushchin @ 2024-10-25 17:41 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

On Thu, Oct 24, 2024 at 06:23:02PM -0700, Shakeel Butt wrote:
> While updating the generation of the folios, MGLRU requires that the
> folio's memcg association remains stable. With the charge migration
> deprecated, there is no need for MGLRU to acquire locks to keep the
> folio and memcg association stable.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 6/6] memcg-v1: remove memcg move locking code
  2024-10-25  1:23 ` [PATCH v1 6/6] memcg-v1: remove memcg move locking code Shakeel Butt
  2024-10-25  6:59   ` Michal Hocko
@ 2024-10-25 17:42   ` Roman Gushchin
  2024-10-26  3:58   ` Yu Zhao
  2024-10-28 14:02   ` Johannes Weiner
  3 siblings, 0 replies; 56+ messages in thread
From: Roman Gushchin @ 2024-10-25 17:42 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

On Thu, Oct 24, 2024 at 06:23:03PM -0700, Shakeel Butt wrote:
> The memcg v1's charge move feature has been deprecated. All the places
> using the memcg move lock, have stopped using it as they don't need the
> protection any more. Let's proceed to remove all the locking code
> related to charge moving.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 5/6] memcg-v1: no need for memcg locking for MGLRU
  2024-10-25  1:23 ` [PATCH v1 5/6] memcg-v1: no need for memcg locking for MGLRU Shakeel Butt
  2024-10-25 17:41   ` Roman Gushchin
@ 2024-10-26  3:55   ` Yu Zhao
  2024-10-26  6:20     ` Shakeel Butt
  2024-10-26  6:34   ` Shakeel Butt
  2 siblings, 1 reply; 56+ messages in thread
From: Yu Zhao @ 2024-10-26  3:55 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Hugh Dickins, Yosry Ahmed, linux-mm, cgroups,
	linux-kernel, linux-fsdevel, linux-doc, Meta kernel team

On Thu, Oct 24, 2024 at 7:23 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> While updating the generation of the folios, MGLRU requires that the
> folio's memcg association remains stable. With the charge migration
> deprecated, there is no need for MGLRU to acquire locks to keep the
> folio and memcg association stable.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---
>  mm/vmscan.c | 11 -----------
>  1 file changed, 11 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 29c098790b01..fd7171658b63 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3662,10 +3662,6 @@ static void walk_mm(struct mm_struct *mm, struct lru_gen_mm_walk *walk)
>                 if (walk->seq != max_seq)
>                         break;

Please remove the lingering `struct mem_cgroup *memcg` as well as
folio_memcg_rcu(). Otherwise it causes both build and lockdep
warnings.

> -               /* folio_update_gen() requires stable folio_memcg() */
> -               if (!mem_cgroup_trylock_pages(memcg))
> -                       break;
> -
>                 /* the caller might be holding the lock for write */
>                 if (mmap_read_trylock(mm)) {
>                         err = walk_page_range(mm, walk->next_addr, ULONG_MAX, &mm_walk_ops, walk);
> @@ -3673,8 +3669,6 @@ static void walk_mm(struct mm_struct *mm, struct lru_gen_mm_walk *walk)
>                         mmap_read_unlock(mm);
>                 }
>
> -               mem_cgroup_unlock_pages();
> -
>                 if (walk->batched) {
>                         spin_lock_irq(&lruvec->lru_lock);
>                         reset_batch_size(walk);
> @@ -4096,10 +4090,6 @@ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
>                 }
>         }
>
> -       /* folio_update_gen() requires stable folio_memcg() */
> -       if (!mem_cgroup_trylock_pages(memcg))
> -               return true;
> -
>         arch_enter_lazy_mmu_mode();
>
>         pte -= (addr - start) / PAGE_SIZE;
> @@ -4144,7 +4134,6 @@ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
>         }
>
>         arch_leave_lazy_mmu_mode();
> -       mem_cgroup_unlock_pages();
>
>         /* feedback from rmap walkers to page table walkers */
>         if (mm_state && suitable_to_scan(i, young))
> --
> 2.43.5
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 6/6] memcg-v1: remove memcg move locking code
  2024-10-25  1:23 ` [PATCH v1 6/6] memcg-v1: remove memcg move locking code Shakeel Butt
  2024-10-25  6:59   ` Michal Hocko
  2024-10-25 17:42   ` Roman Gushchin
@ 2024-10-26  3:58   ` Yu Zhao
  2024-10-26  6:26     ` Shakeel Butt
  2024-10-28 14:02   ` Johannes Weiner
  3 siblings, 1 reply; 56+ messages in thread
From: Yu Zhao @ 2024-10-26  3:58 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Hugh Dickins, Yosry Ahmed, linux-mm, cgroups,
	linux-kernel, linux-fsdevel, linux-doc, Meta kernel team

On Thu, Oct 24, 2024 at 7:23 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> The memcg v1's charge move feature has been deprecated. All the places
> using the memcg move lock, have stopped using it as they don't need the
> protection any more. Let's proceed to remove all the locking code
> related to charge moving.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---
>
> Changes since RFC:
> - Remove the memcg move locking in separate patches.
>
>  include/linux/memcontrol.h | 54 -------------------------
>  mm/filemap.c               |  1 -
>  mm/memcontrol-v1.c         | 82 --------------------------------------
>  mm/memcontrol.c            |  5 ---
>  mm/rmap.c                  |  1 -
>  5 files changed, 143 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 798db70b0a30..932534291ca2 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -299,20 +299,10 @@ struct mem_cgroup {
>         /* For oom notifier event fd */
>         struct list_head oom_notify;
>
> -       /* taken only while moving_account > 0 */
> -       spinlock_t move_lock;
> -       unsigned long move_lock_flags;
> -
>         /* Legacy tcp memory accounting */
>         bool tcpmem_active;
>         int tcpmem_pressure;
>
> -       /*
> -        * set > 0 if pages under this cgroup are moving to other cgroup.
> -        */
> -       atomic_t moving_account;
> -       struct task_struct *move_lock_task;
> -
>         /* List of events which userspace want to receive */
>         struct list_head event_list;
>         spinlock_t event_list_lock;
> @@ -428,9 +418,7 @@ static inline struct obj_cgroup *__folio_objcg(struct folio *folio)
>   *
>   * - the folio lock
>   * - LRU isolation
> - * - folio_memcg_lock()
>   * - exclusive reference
> - * - mem_cgroup_trylock_pages()
>   *
>   * For a kmem folio a caller should hold an rcu read lock to protect memcg
>   * associated with a kmem folio from being released.
> @@ -499,9 +487,7 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)

I think you missed folio_memcg_rcu().

(I don't think workingset_activation() needs it, since its only caller
must hold a refcnt on the folio.)

>   *
>   * - the folio lock
>   * - LRU isolation
> - * - lock_folio_memcg()
>   * - exclusive reference
> - * - mem_cgroup_trylock_pages()
>   *
>   * For a kmem folio a caller should hold an rcu read lock to protect memcg
>   * associated with a kmem folio from being released.
> @@ -1873,26 +1859,6 @@ static inline bool task_in_memcg_oom(struct task_struct *p)
>         return p->memcg_in_oom;
>  }
>
> -void folio_memcg_lock(struct folio *folio);
> -void folio_memcg_unlock(struct folio *folio);
> -
> -/* try to stablize folio_memcg() for all the pages in a memcg */
> -static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg)
> -{
> -       rcu_read_lock();
> -
> -       if (mem_cgroup_disabled() || !atomic_read(&memcg->moving_account))
> -               return true;
> -
> -       rcu_read_unlock();
> -       return false;
> -}
> -
> -static inline void mem_cgroup_unlock_pages(void)
> -{
> -       rcu_read_unlock();
> -}
> -
>  static inline void mem_cgroup_enter_user_fault(void)
>  {
>         WARN_ON(current->in_user_fault);
> @@ -1914,26 +1880,6 @@ unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order,
>         return 0;
>  }
>
> -static inline void folio_memcg_lock(struct folio *folio)
> -{
> -}
> -
> -static inline void folio_memcg_unlock(struct folio *folio)
> -{
> -}
> -
> -static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg)
> -{
> -       /* to match folio_memcg_rcu() */
> -       rcu_read_lock();
> -       return true;
> -}
> -
> -static inline void mem_cgroup_unlock_pages(void)
> -{
> -       rcu_read_unlock();
> -}
> -
>  static inline bool task_in_memcg_oom(struct task_struct *p)
>  {
>         return false;
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 630a1c431ea1..e582a1545d2a 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -119,7 +119,6 @@
>   *    ->i_pages lock           (folio_remove_rmap_pte->set_page_dirty)
>   *    bdi.wb->list_lock                (folio_remove_rmap_pte->set_page_dirty)
>   *    ->inode->i_lock          (folio_remove_rmap_pte->set_page_dirty)
> - *    ->memcg->move_lock       (folio_remove_rmap_pte->folio_memcg_lock)
>   *    bdi.wb->list_lock                (zap_pte_range->set_page_dirty)
>   *    ->inode->i_lock          (zap_pte_range->set_page_dirty)
>   *    ->private_lock           (zap_pte_range->block_dirty_folio)
> diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
> index 9c0fba8c8a83..539ceefa9d2d 100644
> --- a/mm/memcontrol-v1.c
> +++ b/mm/memcontrol-v1.c
> @@ -401,87 +401,6 @@ unsigned long memcg1_soft_limit_reclaim(pg_data_t *pgdat, int order,
>         return nr_reclaimed;
>  }
>
> -/**
> - * folio_memcg_lock - Bind a folio to its memcg.
> - * @folio: The folio.
> - *
> - * This function prevents unlocked LRU folios from being moved to
> - * another cgroup.
> - *
> - * It ensures lifetime of the bound memcg.  The caller is responsible
> - * for the lifetime of the folio.
> - */
> -void folio_memcg_lock(struct folio *folio)
> -{
> -       struct mem_cgroup *memcg;
> -       unsigned long flags;
> -
> -       /*
> -        * The RCU lock is held throughout the transaction.  The fast
> -        * path can get away without acquiring the memcg->move_lock
> -        * because page moving starts with an RCU grace period.
> -         */
> -       rcu_read_lock();
> -
> -       if (mem_cgroup_disabled())
> -               return;
> -again:
> -       memcg = folio_memcg(folio);
> -       if (unlikely(!memcg))
> -               return;
> -
> -#ifdef CONFIG_PROVE_LOCKING
> -       local_irq_save(flags);
> -       might_lock(&memcg->move_lock);
> -       local_irq_restore(flags);
> -#endif
> -
> -       if (atomic_read(&memcg->moving_account) <= 0)
> -               return;
> -
> -       spin_lock_irqsave(&memcg->move_lock, flags);
> -       if (memcg != folio_memcg(folio)) {
> -               spin_unlock_irqrestore(&memcg->move_lock, flags);
> -               goto again;
> -       }
> -
> -       /*
> -        * When charge migration first begins, we can have multiple
> -        * critical sections holding the fast-path RCU lock and one
> -        * holding the slowpath move_lock. Track the task who has the
> -        * move_lock for folio_memcg_unlock().
> -        */
> -       memcg->move_lock_task = current;
> -       memcg->move_lock_flags = flags;
> -}
> -
> -static void __folio_memcg_unlock(struct mem_cgroup *memcg)
> -{
> -       if (memcg && memcg->move_lock_task == current) {
> -               unsigned long flags = memcg->move_lock_flags;
> -
> -               memcg->move_lock_task = NULL;
> -               memcg->move_lock_flags = 0;
> -
> -               spin_unlock_irqrestore(&memcg->move_lock, flags);
> -       }
> -
> -       rcu_read_unlock();
> -}
> -
> -/**
> - * folio_memcg_unlock - Release the binding between a folio and its memcg.
> - * @folio: The folio.
> - *
> - * This releases the binding created by folio_memcg_lock().  This does
> - * not change the accounting of this folio to its memcg, but it does
> - * permit others to change it.
> - */
> -void folio_memcg_unlock(struct folio *folio)
> -{
> -       __folio_memcg_unlock(folio_memcg(folio));
> -}
> -
>  static u64 mem_cgroup_move_charge_read(struct cgroup_subsys_state *css,
>                                 struct cftype *cft)
>  {
> @@ -1189,7 +1108,6 @@ void memcg1_memcg_init(struct mem_cgroup *memcg)
>  {
>         INIT_LIST_HEAD(&memcg->oom_notify);
>         mutex_init(&memcg->thresholds_lock);
> -       spin_lock_init(&memcg->move_lock);
>         INIT_LIST_HEAD(&memcg->event_list);
>         spin_lock_init(&memcg->event_list_lock);
>  }
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 94279b9c766a..3c223aaeb6af 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1189,7 +1189,6 @@ void lruvec_memcg_debug(struct lruvec *lruvec, struct folio *folio)
>   * These functions are safe to use under any of the following conditions:
>   * - folio locked
>   * - folio_test_lru false
> - * - folio_memcg_lock()
>   * - folio frozen (refcount of 0)
>   *
>   * Return: The lruvec this folio is on with its lock held.
> @@ -1211,7 +1210,6 @@ struct lruvec *folio_lruvec_lock(struct folio *folio)
>   * These functions are safe to use under any of the following conditions:
>   * - folio locked
>   * - folio_test_lru false
> - * - folio_memcg_lock()
>   * - folio frozen (refcount of 0)
>   *
>   * Return: The lruvec this folio is on with its lock held and interrupts
> @@ -1235,7 +1233,6 @@ struct lruvec *folio_lruvec_lock_irq(struct folio *folio)
>   * These functions are safe to use under any of the following conditions:
>   * - folio locked
>   * - folio_test_lru false
> - * - folio_memcg_lock()
>   * - folio frozen (refcount of 0)
>   *
>   * Return: The lruvec this folio is on with its lock held and interrupts
> @@ -2375,9 +2372,7 @@ static void commit_charge(struct folio *folio, struct mem_cgroup *memcg)
>          *
>          * - the page lock
>          * - LRU isolation
> -        * - folio_memcg_lock()
>          * - exclusive reference
> -        * - mem_cgroup_trylock_pages()
>          */
>         folio->memcg_data = (unsigned long)memcg;
>  }
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 4785a693857a..c6c4d4ea29a7 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -32,7 +32,6 @@
>   *                   swap_lock (in swap_duplicate, swap_info_get)
>   *                     mmlist_lock (in mmput, drain_mmlist and others)
>   *                     mapping->private_lock (in block_dirty_folio)
> - *                       folio_lock_memcg move_lock (in block_dirty_folio)
>   *                         i_pages lock (widely used)
>   *                           lruvec->lru_lock (in folio_lruvec_lock_irq)
>   *                     inode->i_lock (in set_page_dirty's __mark_inode_dirty)
> --
> 2.43.5
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 5/6] memcg-v1: no need for memcg locking for MGLRU
  2024-10-26  3:55   ` Yu Zhao
@ 2024-10-26  6:20     ` Shakeel Butt
  0 siblings, 0 replies; 56+ messages in thread
From: Shakeel Butt @ 2024-10-26  6:20 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Hugh Dickins, Yosry Ahmed, linux-mm, cgroups,
	linux-kernel, linux-fsdevel, linux-doc, Meta kernel team

On Fri, Oct 25, 2024 at 09:55:38PM GMT, Yu Zhao wrote:
> On Thu, Oct 24, 2024 at 7:23 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > While updating the generation of the folios, MGLRU requires that the
> > folio's memcg association remains stable. With the charge migration
> > deprecated, there is no need for MGLRU to acquire locks to keep the
> > folio and memcg association stable.
> >
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > ---
> >  mm/vmscan.c | 11 -----------
> >  1 file changed, 11 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 29c098790b01..fd7171658b63 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -3662,10 +3662,6 @@ static void walk_mm(struct mm_struct *mm, struct lru_gen_mm_walk *walk)
> >                 if (walk->seq != max_seq)
> >                         break;
> 
> Please remove the lingering `struct mem_cgroup *memcg` as well as
> folio_memcg_rcu(). Otherwise it causes both build and lockdep
> warnings.
> 

Thanks for catching this. The unused warning is already fixed by Andrew,
I will fix the folio_memcg_rcu() usage.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 6/6] memcg-v1: remove memcg move locking code
  2024-10-26  3:58   ` Yu Zhao
@ 2024-10-26  6:26     ` Shakeel Butt
  0 siblings, 0 replies; 56+ messages in thread
From: Shakeel Butt @ 2024-10-26  6:26 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Hugh Dickins, Yosry Ahmed, linux-mm, cgroups,
	linux-kernel, linux-fsdevel, linux-doc, Meta kernel team

On Fri, Oct 25, 2024 at 09:58:45PM GMT, Yu Zhao wrote:
> On Thu, Oct 24, 2024 at 7:23 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > The memcg v1's charge move feature has been deprecated. All the places
> > using the memcg move lock, have stopped using it as they don't need the
> > protection any more. Let's proceed to remove all the locking code
> > related to charge moving.
> >
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > ---
> >
> > Changes since RFC:
> > - Remove the memcg move locking in separate patches.
> >
> >  include/linux/memcontrol.h | 54 -------------------------
> >  mm/filemap.c               |  1 -
> >  mm/memcontrol-v1.c         | 82 --------------------------------------
> >  mm/memcontrol.c            |  5 ---
> >  mm/rmap.c                  |  1 -
> >  5 files changed, 143 deletions(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index 798db70b0a30..932534291ca2 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -299,20 +299,10 @@ struct mem_cgroup {
> >         /* For oom notifier event fd */
> >         struct list_head oom_notify;
> >
> > -       /* taken only while moving_account > 0 */
> > -       spinlock_t move_lock;
> > -       unsigned long move_lock_flags;
> > -
> >         /* Legacy tcp memory accounting */
> >         bool tcpmem_active;
> >         int tcpmem_pressure;
> >
> > -       /*
> > -        * set > 0 if pages under this cgroup are moving to other cgroup.
> > -        */
> > -       atomic_t moving_account;
> > -       struct task_struct *move_lock_task;
> > -
> >         /* List of events which userspace want to receive */
> >         struct list_head event_list;
> >         spinlock_t event_list_lock;
> > @@ -428,9 +418,7 @@ static inline struct obj_cgroup *__folio_objcg(struct folio *folio)
> >   *
> >   * - the folio lock
> >   * - LRU isolation
> > - * - folio_memcg_lock()
> >   * - exclusive reference
> > - * - mem_cgroup_trylock_pages()
> >   *
> >   * For a kmem folio a caller should hold an rcu read lock to protect memcg
> >   * associated with a kmem folio from being released.
> > @@ -499,9 +487,7 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
> 
> I think you missed folio_memcg_rcu().
> 
> (I don't think workingset_activation() needs it, since its only caller
> must hold a refcnt on the folio.)
> 

Yes I think so too but I will send a separate followup patch for that.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 5/6] memcg-v1: no need for memcg locking for MGLRU
  2024-10-25  1:23 ` [PATCH v1 5/6] memcg-v1: no need for memcg locking for MGLRU Shakeel Butt
  2024-10-25 17:41   ` Roman Gushchin
  2024-10-26  3:55   ` Yu Zhao
@ 2024-10-26  6:34   ` Shakeel Butt
  2024-10-26 15:26     ` Yu Zhao
  2 siblings, 1 reply; 56+ messages in thread
From: Shakeel Butt @ 2024-10-26  6:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

On Thu, Oct 24, 2024 at 06:23:02PM GMT, Shakeel Butt wrote:
> While updating the generation of the folios, MGLRU requires that the
> folio's memcg association remains stable. With the charge migration
> deprecated, there is no need for MGLRU to acquire locks to keep the
> folio and memcg association stable.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Andrew, can you please apply the following fix to this patch after your
unused fixup?


index fd7171658b63..b8b0e8fa1332 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3353,7 +3353,7 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg,
        if (folio_nid(folio) != pgdat->node_id)
                return NULL;

-       if (folio_memcg_rcu(folio) != memcg)
+       if (folio_memcg(folio) != memcg)
                return NULL;

        /* file VMAs can contain anon pages from COW */


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 5/6] memcg-v1: no need for memcg locking for MGLRU
  2024-10-26  6:34   ` Shakeel Butt
@ 2024-10-26 15:26     ` Yu Zhao
  2024-11-04 17:30       ` Yu Zhao
  0 siblings, 1 reply; 56+ messages in thread
From: Yu Zhao @ 2024-10-26 15:26 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Hugh Dickins, Yosry Ahmed, linux-mm, cgroups,
	linux-kernel, linux-fsdevel, linux-doc, Meta kernel team

On Sat, Oct 26, 2024 at 12:34 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Thu, Oct 24, 2024 at 06:23:02PM GMT, Shakeel Butt wrote:
> > While updating the generation of the folios, MGLRU requires that the
> > folio's memcg association remains stable. With the charge migration
> > deprecated, there is no need for MGLRU to acquire locks to keep the
> > folio and memcg association stable.
> >
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
>
> Andrew, can you please apply the following fix to this patch after your
> unused fixup?

Thanks!

> index fd7171658b63..b8b0e8fa1332 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3353,7 +3353,7 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg,
>         if (folio_nid(folio) != pgdat->node_id)
>                 return NULL;
>
> -       if (folio_memcg_rcu(folio) != memcg)
> +       if (folio_memcg(folio) != memcg)
>                 return NULL;
>
>         /* file VMAs can contain anon pages from COW */
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 2/6] memcg-v1: remove charge move code
  2024-10-25  1:22 ` [PATCH v1 2/6] memcg-v1: remove charge move code Shakeel Butt
@ 2024-10-28 10:22   ` David Hildenbrand
  2024-10-28 10:40     ` David Hildenbrand
  2024-10-28 13:54   ` Johannes Weiner
  1 sibling, 1 reply; 56+ messages in thread
From: David Hildenbrand @ 2024-10-28 10:22 UTC (permalink / raw)
  To: Shakeel Butt, Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team, Michal Hocko

> -
> -	pgdat = folio_pgdat(folio);
> -	from_vec = mem_cgroup_lruvec(from, pgdat);
> -	to_vec = mem_cgroup_lruvec(to, pgdat);
> -
> -	folio_memcg_lock(folio);
> -
> -	if (folio_test_anon(folio)) {
> -		if (folio_mapped(folio)) {
> -			__mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages);
> -			__mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages);

Good, because this code was likely wrong :) (-> partially mapped anon 
folios)


-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 2/6] memcg-v1: remove charge move code
  2024-10-28 10:22   ` David Hildenbrand
@ 2024-10-28 10:40     ` David Hildenbrand
  0 siblings, 0 replies; 56+ messages in thread
From: David Hildenbrand @ 2024-10-28 10:40 UTC (permalink / raw)
  To: Shakeel Butt, Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team, Michal Hocko

On 28.10.24 11:22, David Hildenbrand wrote:
>> -
>> -	pgdat = folio_pgdat(folio);
>> -	from_vec = mem_cgroup_lruvec(from, pgdat);
>> -	to_vec = mem_cgroup_lruvec(to, pgdat);
>> -
>> -	folio_memcg_lock(folio);
>> -
>> -	if (folio_test_anon(folio)) {
>> -		if (folio_mapped(folio)) {
>> -			__mod_lruvec_state(from_vec, NR_ANON_MAPPED, -nr_pages);
>> -			__mod_lruvec_state(to_vec, NR_ANON_MAPPED, nr_pages);
> 
> Good, because this code was likely wrong :) (-> partially mapped anon
> folios)

Staring at the code some more, mem_cgroup_move_charge_pte_range() refuses
PTE-mapped large folios, so that might have done the right thing.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 1/6] memcg-v1: fully deprecate move_charge_at_immigrate
  2024-10-25  1:22 ` [PATCH v1 1/6] memcg-v1: fully deprecate move_charge_at_immigrate Shakeel Butt
  2024-10-25  6:54   ` Michal Hocko
@ 2024-10-28 13:53   ` Johannes Weiner
  1 sibling, 0 replies; 56+ messages in thread
From: Johannes Weiner @ 2024-10-28 13:53 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

On Thu, Oct 24, 2024 at 06:22:58PM -0700, Shakeel Butt wrote:
> Proceed with the complete deprecation of memcg v1's charge moving
> feature. The deprecation warning has been in the kernel for almost two
> years and has been ported to all stable kernel since. Now is the time to
> fully deprecate this feature.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 2/6] memcg-v1: remove charge move code
  2024-10-25  1:22 ` [PATCH v1 2/6] memcg-v1: remove charge move code Shakeel Butt
  2024-10-28 10:22   ` David Hildenbrand
@ 2024-10-28 13:54   ` Johannes Weiner
  1 sibling, 0 replies; 56+ messages in thread
From: Johannes Weiner @ 2024-10-28 13:54 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team, Michal Hocko

On Thu, Oct 24, 2024 at 06:22:59PM -0700, Shakeel Butt wrote:
> The memcg-v1 charge move feature has been deprecated completely and
> let's remove the relevant code as well.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 3/6] memcg-v1: no need for memcg locking for dirty tracking
  2024-10-25  1:23 ` [PATCH v1 3/6] memcg-v1: no need for memcg locking for dirty tracking Shakeel Butt
  2024-10-25  6:56   ` Michal Hocko
  2024-10-25 17:40   ` Roman Gushchin
@ 2024-10-28 14:00   ` Johannes Weiner
  2 siblings, 0 replies; 56+ messages in thread
From: Johannes Weiner @ 2024-10-28 14:00 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

On Thu, Oct 24, 2024 at 06:23:00PM -0700, Shakeel Butt wrote:
> During the era of memcg charge migration, the kernel has to be make sure
> that the dirty stat updates do not race with the charge migration.
> Otherwise it might update the dirty stats of the wrong memcg. Now with
> the memcg charge migration deprecated, there is no more race for dirty
> stat updates and the previous locking can be removed.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 4/6] memcg-v1: no need for memcg locking for writeback tracking
  2024-10-25  1:23 ` [PATCH v1 4/6] memcg-v1: no need for memcg locking for writeback tracking Shakeel Butt
  2024-10-25  6:57   ` Michal Hocko
  2024-10-25 17:40   ` Roman Gushchin
@ 2024-10-28 14:00   ` Johannes Weiner
  2 siblings, 0 replies; 56+ messages in thread
From: Johannes Weiner @ 2024-10-28 14:00 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

On Thu, Oct 24, 2024 at 06:23:01PM -0700, Shakeel Butt wrote:
> During the era of memcg charge migration, the kernel has to be make sure
> that the writeback stat updates do not race with the charge migration.
> Otherwise it might update the writeback stats of the wrong memcg. Now
> with the memcg charge migration deprecated, there is no more race for
> writeback stat updates and the previous locking can be removed.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 6/6] memcg-v1: remove memcg move locking code
  2024-10-25  1:23 ` [PATCH v1 6/6] memcg-v1: remove memcg move locking code Shakeel Butt
                     ` (2 preceding siblings ...)
  2024-10-26  3:58   ` Yu Zhao
@ 2024-10-28 14:02   ` Johannes Weiner
  3 siblings, 0 replies; 56+ messages in thread
From: Johannes Weiner @ 2024-10-28 14:02 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Michal Hocko, Roman Gushchin, Muchun Song,
	Hugh Dickins, Yosry Ahmed, linux-mm, cgroups, linux-kernel,
	linux-fsdevel, linux-doc, Meta kernel team

On Thu, Oct 24, 2024 at 06:23:03PM -0700, Shakeel Butt wrote:
> The memcg v1's charge move feature has been deprecated. All the places
> using the memcg move lock, have stopped using it as they don't need the
> protection any more. Let's proceed to remove all the locking code
> related to charge moving.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 5/6] memcg-v1: no need for memcg locking for MGLRU
  2024-10-26 15:26     ` Yu Zhao
@ 2024-11-04 17:30       ` Yu Zhao
  2024-11-04 21:38         ` Andrew Morton
  0 siblings, 1 reply; 56+ messages in thread
From: Yu Zhao @ 2024-11-04 17:30 UTC (permalink / raw)
  To: Shakeel Butt, Andrew Morton
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Hugh Dickins, Yosry Ahmed, linux-mm, cgroups,
	linux-kernel, linux-fsdevel, linux-doc, Meta kernel team

On Sat, Oct 26, 2024 at 09:26:04AM -0600, Yu Zhao wrote:
> On Sat, Oct 26, 2024 at 12:34 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > On Thu, Oct 24, 2024 at 06:23:02PM GMT, Shakeel Butt wrote:
> > > While updating the generation of the folios, MGLRU requires that the
> > > folio's memcg association remains stable. With the charge migration
> > > deprecated, there is no need for MGLRU to acquire locks to keep the
> > > folio and memcg association stable.
> > >
> > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> >
> > Andrew, can you please apply the following fix to this patch after your
> > unused fixup?
> 
> Thanks!

syzbot caught the following:

  WARNING: CPU: 0 PID: 85 at mm/vmscan.c:3140 folio_update_gen+0x23d/0x250 mm/vmscan.c:3140
  ...

Andrew, can you please fix this in place? Thank you.

diff --git a/mm/vmscan.c b/mm/vmscan.c
index ddaaff67642e..9a610dbff384 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3138,7 +3138,6 @@ static int folio_update_gen(struct folio *folio, int gen)
 	unsigned long new_flags, old_flags = READ_ONCE(folio->flags);
 
 	VM_WARN_ON_ONCE(gen >= MAX_NR_GENS);
-	VM_WARN_ON_ONCE(!rcu_read_lock_held());
 
 	do {
 		/* lru_gen_del_folio() has isolated this page? */

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 5/6] memcg-v1: no need for memcg locking for MGLRU
  2024-11-04 17:30       ` Yu Zhao
@ 2024-11-04 21:38         ` Andrew Morton
  2024-11-04 22:04           ` Shakeel Butt
  2024-11-04 22:04           ` Yu Zhao
  0 siblings, 2 replies; 56+ messages in thread
From: Andrew Morton @ 2024-11-04 21:38 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Shakeel Butt, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Hugh Dickins, Yosry Ahmed, linux-mm, cgroups,
	linux-kernel, linux-fsdevel, linux-doc, Meta kernel team

On Mon, 4 Nov 2024 10:30:29 -0700 Yu Zhao <yuzhao@google.com> wrote:

> On Sat, Oct 26, 2024 at 09:26:04AM -0600, Yu Zhao wrote:
> > On Sat, Oct 26, 2024 at 12:34 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > >
> > > On Thu, Oct 24, 2024 at 06:23:02PM GMT, Shakeel Butt wrote:
> > > > While updating the generation of the folios, MGLRU requires that the
> > > > folio's memcg association remains stable. With the charge migration
> > > > deprecated, there is no need for MGLRU to acquire locks to keep the
> > > > folio and memcg association stable.
> > > >
> > > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > >
> > > Andrew, can you please apply the following fix to this patch after your
> > > unused fixup?
> > 
> > Thanks!
> 
> syzbot caught the following:
> 
>   WARNING: CPU: 0 PID: 85 at mm/vmscan.c:3140 folio_update_gen+0x23d/0x250 mm/vmscan.c:3140
>   ...
> 
> Andrew, can you please fix this in place?

OK, but...

> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3138,7 +3138,6 @@ static int folio_update_gen(struct folio *folio, int gen)
>  	unsigned long new_flags, old_flags = READ_ONCE(folio->flags);
>  
>  	VM_WARN_ON_ONCE(gen >= MAX_NR_GENS);
> -	VM_WARN_ON_ONCE(!rcu_read_lock_held());
>  
>  	do {
>  		/* lru_gen_del_folio() has isolated this page? */

it would be good to know why this assertion is considered incorrect? 
And a link to the sysbot report?

Thanks.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 5/6] memcg-v1: no need for memcg locking for MGLRU
  2024-11-04 21:38         ` Andrew Morton
@ 2024-11-04 22:04           ` Shakeel Butt
  2024-11-04 22:04           ` Yu Zhao
  1 sibling, 0 replies; 56+ messages in thread
From: Shakeel Butt @ 2024-11-04 22:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yu Zhao, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Hugh Dickins, Yosry Ahmed, linux-mm, cgroups,
	linux-kernel, linux-fsdevel, linux-doc, Meta kernel team

On Mon, Nov 04, 2024 at 01:38:34PM -0800, Andrew Morton wrote:
> On Mon, 4 Nov 2024 10:30:29 -0700 Yu Zhao <yuzhao@google.com> wrote:
> 
> > On Sat, Oct 26, 2024 at 09:26:04AM -0600, Yu Zhao wrote:
> > > On Sat, Oct 26, 2024 at 12:34 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > > >
> > > > On Thu, Oct 24, 2024 at 06:23:02PM GMT, Shakeel Butt wrote:
> > > > > While updating the generation of the folios, MGLRU requires that the
> > > > > folio's memcg association remains stable. With the charge migration
> > > > > deprecated, there is no need for MGLRU to acquire locks to keep the
> > > > > folio and memcg association stable.
> > > > >
> > > > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > > >
> > > > Andrew, can you please apply the following fix to this patch after your
> > > > unused fixup?
> > > 
> > > Thanks!
> > 
> > syzbot caught the following:
> > 
> >   WARNING: CPU: 0 PID: 85 at mm/vmscan.c:3140 folio_update_gen+0x23d/0x250 mm/vmscan.c:3140
> >   ...
> > 
> > Andrew, can you please fix this in place?
> 
> OK, but...
> 
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -3138,7 +3138,6 @@ static int folio_update_gen(struct folio *folio, int gen)
> >  	unsigned long new_flags, old_flags = READ_ONCE(folio->flags);
> >  
> >  	VM_WARN_ON_ONCE(gen >= MAX_NR_GENS);
> > -	VM_WARN_ON_ONCE(!rcu_read_lock_held());
> >  
> >  	do {
> >  		/* lru_gen_del_folio() has isolated this page? */
> 
> it would be good to know why this assertion is considered incorrect? 
> And a link to the sysbot report?

So, this assertion is incorrect after this patch series that has removed
the charge migration and has removed mem_cgroup_trylock_pages() / 
mem_cgroup_unlock_pages() from the caller of this function
(folio_update_gen()).

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 5/6] memcg-v1: no need for memcg locking for MGLRU
  2024-11-04 21:38         ` Andrew Morton
  2024-11-04 22:04           ` Shakeel Butt
@ 2024-11-04 22:04           ` Yu Zhao
  2024-11-04 22:08             ` Yu Zhao
  1 sibling, 1 reply; 56+ messages in thread
From: Yu Zhao @ 2024-11-04 22:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Shakeel Butt, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Hugh Dickins, Yosry Ahmed, linux-mm, cgroups,
	linux-kernel, linux-fsdevel, linux-doc, Meta kernel team

On Mon, Nov 4, 2024 at 2:38 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Mon, 4 Nov 2024 10:30:29 -0700 Yu Zhao <yuzhao@google.com> wrote:
>
> > On Sat, Oct 26, 2024 at 09:26:04AM -0600, Yu Zhao wrote:
> > > On Sat, Oct 26, 2024 at 12:34 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > > >
> > > > On Thu, Oct 24, 2024 at 06:23:02PM GMT, Shakeel Butt wrote:
> > > > > While updating the generation of the folios, MGLRU requires that the
> > > > > folio's memcg association remains stable. With the charge migration
> > > > > deprecated, there is no need for MGLRU to acquire locks to keep the
> > > > > folio and memcg association stable.
> > > > >
> > > > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > > >
> > > > Andrew, can you please apply the following fix to this patch after your
> > > > unused fixup?
> > >
> > > Thanks!
> >
> > syzbot caught the following:
> >
> >   WARNING: CPU: 0 PID: 85 at mm/vmscan.c:3140 folio_update_gen+0x23d/0x250 mm/vmscan.c:3140
> >   ...
> >
> > Andrew, can you please fix this in place?
>
> OK, but...
>
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -3138,7 +3138,6 @@ static int folio_update_gen(struct folio *folio, int gen)
> >       unsigned long new_flags, old_flags = READ_ONCE(folio->flags);
> >
> >       VM_WARN_ON_ONCE(gen >= MAX_NR_GENS);
> > -     VM_WARN_ON_ONCE(!rcu_read_lock_held());
> >
> >       do {
> >               /* lru_gen_del_folio() has isolated this page? */
>
> it would be good to know why this assertion is considered incorrect?

The assertion was caused by the patch in this thread. It used to
assert that a folio must be protected from charge migration. Charge
migration is removed by this series, and as part of the effort, this
patch removes the RCU lock.

> And a link to the sysbot report?

https://syzkaller.appspot.com/bug?extid=24f45b8beab9788e467e

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 5/6] memcg-v1: no need for memcg locking for MGLRU
  2024-11-04 22:04           ` Yu Zhao
@ 2024-11-04 22:08             ` Yu Zhao
  2024-11-04 22:18               ` Andrew Morton
  0 siblings, 1 reply; 56+ messages in thread
From: Yu Zhao @ 2024-11-04 22:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Shakeel Butt, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Hugh Dickins, Yosry Ahmed, linux-mm, cgroups,
	linux-kernel, linux-fsdevel, linux-doc, Meta kernel team

On Mon, Nov 4, 2024 at 3:04 PM Yu Zhao <yuzhao@google.com> wrote:
>
> On Mon, Nov 4, 2024 at 2:38 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Mon, 4 Nov 2024 10:30:29 -0700 Yu Zhao <yuzhao@google.com> wrote:
> >
> > > On Sat, Oct 26, 2024 at 09:26:04AM -0600, Yu Zhao wrote:
> > > > On Sat, Oct 26, 2024 at 12:34 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > > > >
> > > > > On Thu, Oct 24, 2024 at 06:23:02PM GMT, Shakeel Butt wrote:
> > > > > > While updating the generation of the folios, MGLRU requires that the
> > > > > > folio's memcg association remains stable. With the charge migration
> > > > > > deprecated, there is no need for MGLRU to acquire locks to keep the
> > > > > > folio and memcg association stable.
> > > > > >
> > > > > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > > > >
> > > > > Andrew, can you please apply the following fix to this patch after your
> > > > > unused fixup?
> > > >
> > > > Thanks!
> > >
> > > syzbot caught the following:
> > >
> > >   WARNING: CPU: 0 PID: 85 at mm/vmscan.c:3140 folio_update_gen+0x23d/0x250 mm/vmscan.c:3140
> > >   ...
> > >
> > > Andrew, can you please fix this in place?
> >
> > OK, but...
> >
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -3138,7 +3138,6 @@ static int folio_update_gen(struct folio *folio, int gen)
> > >       unsigned long new_flags, old_flags = READ_ONCE(folio->flags);
> > >
> > >       VM_WARN_ON_ONCE(gen >= MAX_NR_GENS);
> > > -     VM_WARN_ON_ONCE(!rcu_read_lock_held());
> > >
> > >       do {
> > >               /* lru_gen_del_folio() has isolated this page? */
> >
> > it would be good to know why this assertion is considered incorrect?
>
> The assertion was caused by the patch in this thread. It used to
> assert that a folio must be protected from charge migration. Charge
> migration is removed by this series, and as part of the effort, this
> patch removes the RCU lock.
>
> > And a link to the sysbot report?
>
> https://syzkaller.appspot.com/bug?extid=24f45b8beab9788e467e

Or this link would work better:

https://lore.kernel.org/lkml/67294349.050a0220.701a.0010.GAE@google.com/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v1 5/6] memcg-v1: no need for memcg locking for MGLRU
  2024-11-04 22:08             ` Yu Zhao
@ 2024-11-04 22:18               ` Andrew Morton
  0 siblings, 0 replies; 56+ messages in thread
From: Andrew Morton @ 2024-11-04 22:18 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Shakeel Butt, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Hugh Dickins, Yosry Ahmed, linux-mm, cgroups,
	linux-kernel, linux-fsdevel, linux-doc, Meta kernel team

On Mon, 4 Nov 2024 15:08:09 -0700 Yu Zhao <yuzhao@google.com> wrote:

> > The assertion was caused by the patch in this thread. It used to
> > assert that a folio must be protected from charge migration. Charge
> > migration is removed by this series, and as part of the effort, this
> > patch removes the RCU lock.
> >
> > > And a link to the sysbot report?
> >
> > https://syzkaller.appspot.com/bug?extid=24f45b8beab9788e467e
> 
> Or this link would work better:
> 
> https://lore.kernel.org/lkml/67294349.050a0220.701a.0010.GAE@google.com/

Thanks, I pasted everyone's everything in there, so it will all be
accessible by the sufficiently patient.


^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2024-11-04 22:18 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-25  1:22 [PATCH v1 0/6] memcg-v1: fully deprecate charge moving Shakeel Butt
2024-10-24  6:57 ` [RFC PATCH 0/3] " Shakeel Butt
2024-10-24  6:57   ` [RFC PATCH 1/3] memcg-v1: fully deprecate move_charge_at_immigrate Shakeel Butt
2024-10-24  9:14     ` Michal Hocko
2024-10-24 16:51       ` Roman Gushchin
2024-10-24 17:16         ` Shakeel Butt
2024-10-24 16:49     ` Roman Gushchin
2024-10-24  6:57   ` [RFC PATCH 2/3] memcg-v1: remove charge move code Shakeel Butt
2024-10-24  9:14     ` Michal Hocko
2024-10-24 16:50     ` Roman Gushchin
2024-10-24  6:57   ` [RFC PATCH 3/3] memcg-v1: remove memcg move locking code Shakeel Butt
2024-10-24  9:16     ` Michal Hocko
2024-10-24 17:23       ` Shakeel Butt
2024-10-24 18:54         ` Roman Gushchin
2024-10-24 19:38           ` Shakeel Butt
2024-10-24 16:50     ` Roman Gushchin
2024-10-24 17:26       ` Shakeel Butt
2024-10-24 19:45         ` Michal Hocko
2024-10-24 20:32           ` Yosry Ahmed
2024-10-24 21:08             ` Michal Hocko
2024-10-25  1:23     ` Shakeel Butt
2024-10-25  1:22 ` [PATCH v1 1/6] memcg-v1: fully deprecate move_charge_at_immigrate Shakeel Butt
2024-10-25  6:54   ` Michal Hocko
2024-10-28 13:53   ` Johannes Weiner
2024-10-25  1:22 ` [PATCH v1 2/6] memcg-v1: remove charge move code Shakeel Butt
2024-10-28 10:22   ` David Hildenbrand
2024-10-28 10:40     ` David Hildenbrand
2024-10-28 13:54   ` Johannes Weiner
2024-10-25  1:23 ` [PATCH v1 3/6] memcg-v1: no need for memcg locking for dirty tracking Shakeel Butt
2024-10-25  6:56   ` Michal Hocko
2024-10-25 16:22     ` Shakeel Butt
2024-10-25 17:40   ` Roman Gushchin
2024-10-28 14:00   ` Johannes Weiner
2024-10-25  1:23 ` [PATCH v1 4/6] memcg-v1: no need for memcg locking for writeback tracking Shakeel Butt
2024-10-25  6:57   ` Michal Hocko
2024-10-25 17:40   ` Roman Gushchin
2024-10-28 14:00   ` Johannes Weiner
2024-10-25  1:23 ` [PATCH v1 5/6] memcg-v1: no need for memcg locking for MGLRU Shakeel Butt
2024-10-25 17:41   ` Roman Gushchin
2024-10-26  3:55   ` Yu Zhao
2024-10-26  6:20     ` Shakeel Butt
2024-10-26  6:34   ` Shakeel Butt
2024-10-26 15:26     ` Yu Zhao
2024-11-04 17:30       ` Yu Zhao
2024-11-04 21:38         ` Andrew Morton
2024-11-04 22:04           ` Shakeel Butt
2024-11-04 22:04           ` Yu Zhao
2024-11-04 22:08             ` Yu Zhao
2024-11-04 22:18               ` Andrew Morton
2024-10-25  1:23 ` [PATCH v1 6/6] memcg-v1: remove memcg move locking code Shakeel Butt
2024-10-25  6:59   ` Michal Hocko
2024-10-25 17:42   ` Roman Gushchin
2024-10-26  3:58   ` Yu Zhao
2024-10-26  6:26     ` Shakeel Butt
2024-10-28 14:02   ` Johannes Weiner
2024-10-25  1:33 ` [PATCH v1 0/6] memcg-v1: fully deprecate charge moving Shakeel Butt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).