linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] Backport MEMCG changes from v5.17
@ 2022-08-08 12:55 David Oberhollenzer
  2022-08-08 12:55 ` [PATCH v2 01/10] mm/memcg: Revert ("mm/memcg: optimize user context object stock access") David Oberhollenzer
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: David Oberhollenzer @ 2022-08-08 12:55 UTC (permalink / raw)
  To: linux-rt-users
  Cc: williams, bigeasy, richard, joseph.salisbury, David Oberhollenzer

This is a backport of Sebastian's MEMCG changes to v5.15. With these
patches applied, it is possible to use memory cgroups together
with PREEMPT_RT on v5.15, just like on v5.17.

The patch set was tested on my end by building an x86_64 kernel
with PREEMPT_RT, cgroup debug and lockdep enabled. The kernel was
run in Qemu, where a small number of child cgroups were created,
with memory controller enabled for each and an instance of a small
test program running.


Changes in v2:
  - Use the exact patch set merged upstream

David Oberhollenzer (1):
  Allow MEMCG on PREEMPT_RT

Johannes Weiner (1):
  mm/memcg: Opencode the inner part of obj_cgroup_uncharge_pages() in drain_obj_stock()

Michal Hocko (1):
  mm/memcg: Revert ("mm/memcg: optimize user context object stock access")

Sebastian Andrzej Siewior (7):
  mm/memcg: Disable threshold event handlers on PREEMPT_RT
  mm/memcg: Protect per-CPU counter by disabling preemption on PREEMPT_RT where needed.
  mm/memcg: Protect memcg_stock with a local_lock_t
  mm/memcg: Disable migration instead of preemption in drain_all_stock().
  mm/memcg: Add missing counter index which are not update in interrupt.
  mm/memcg: Add a comment regarding the release `obj'.
  mm/memcg: Only perform the debug checks on !PREEMPT_RT

 .../admin-guide/cgroup-v1/memory.rst          |   2 +
 init/Kconfig                                  |   1 -
 mm/memcontrol.c                               | 232 +++++++++++-------
 3 files changed, 146 insertions(+), 89 deletions(-)

-- 
2.37.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 01/10] mm/memcg: Revert ("mm/memcg: optimize user context object stock access")
  2022-08-08 12:55 [PATCH v2 00/10] Backport MEMCG changes from v5.17 David Oberhollenzer
@ 2022-08-08 12:55 ` David Oberhollenzer
  2022-08-08 12:55 ` [PATCH v2 02/10] mm/memcg: Disable threshold event handlers on PREEMPT_RT David Oberhollenzer
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: David Oberhollenzer @ 2022-08-08 12:55 UTC (permalink / raw)
  To: linux-rt-users
  Cc: williams, bigeasy, richard, joseph.salisbury, Michal Hocko,
	Roman Gushchin, Johannes Weiner, Shakeel Butt,
	David Oberhollenzer

From: Michal Hocko <mhocko@suse.com>

The optimisation is based on a micro benchmark where local_irq_save() is
more expensive than a preempt_disable(). There is no evidence that it is
visible in a real-world workload and there are CPUs where the opposite is
true (local_irq_save() is cheaper than preempt_disable()).

Based on micro benchmarks, the optimisation makes sense on PREEMPT_NONE
where preempt_disable() is optimized away. There is no improvement with
PREEMPT_DYNAMIC since the preemption counter is always available.

The optimization makes also the PREEMPT_RT integration more complicated
since most of the assumption are not true on PREEMPT_RT.

Revert the optimisation since it complicates the PREEMPT_RT integration
and the improvement is hardly visible.

[ bigeasy: Patch body around Michal's diff ]

Link: https://lore.kernel.org/all/YgOGkXXCrD%2F1k+p4@dhcp22.suse.cz
Link: https://lkml.kernel.org/r/YdX+INO9gQje6d0S@linutronix.de
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
[do: backported to v5.15]
Signed-off-by: David Oberhollenzer <goliath@infraroot.at>
---
 mm/memcontrol.c | 94 ++++++++++++++-----------------------------------
 1 file changed, 27 insertions(+), 67 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 971546bb99e0..a07e068989b4 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2102,23 +2102,17 @@ void unlock_page_memcg(struct page *page)
 }
 EXPORT_SYMBOL(unlock_page_memcg);
 
-struct obj_stock {
+struct memcg_stock_pcp {
+	struct mem_cgroup *cached; /* this never be root cgroup */
+	unsigned int nr_pages;
+
 #ifdef CONFIG_MEMCG_KMEM
 	struct obj_cgroup *cached_objcg;
 	struct pglist_data *cached_pgdat;
 	unsigned int nr_bytes;
 	int nr_slab_reclaimable_b;
 	int nr_slab_unreclaimable_b;
-#else
-	int dummy[0];
 #endif
-};
-
-struct memcg_stock_pcp {
-	struct mem_cgroup *cached; /* this never be root cgroup */
-	unsigned int nr_pages;
-	struct obj_stock task_obj;
-	struct obj_stock irq_obj;
 
 	struct work_struct work;
 	unsigned long flags;
@@ -2128,12 +2122,12 @@ static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
 static DEFINE_MUTEX(percpu_charge_mutex);
 
 #ifdef CONFIG_MEMCG_KMEM
-static void drain_obj_stock(struct obj_stock *stock);
+static void drain_obj_stock(struct memcg_stock_pcp *stock);
 static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
 				     struct mem_cgroup *root_memcg);
 
 #else
-static inline void drain_obj_stock(struct obj_stock *stock)
+static inline void drain_obj_stock(struct memcg_stock_pcp *stock)
 {
 }
 static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
@@ -2144,41 +2138,6 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
 #endif
 
 /*
- * Most kmem_cache_alloc() calls are from user context. The irq disable/enable
- * sequence used in this case to access content from object stock is slow.
- * To optimize for user context access, there are now two object stocks for
- * task context and interrupt context access respectively.
- *
- * The task context object stock can be accessed by disabling preemption only
- * which is cheap in non-preempt kernel. The interrupt context object stock
- * can only be accessed after disabling interrupt. User context code can
- * access interrupt object stock, but not vice versa.
- */
-static inline struct obj_stock *get_obj_stock(unsigned long *pflags)
-{
-	struct memcg_stock_pcp *stock;
-
-	if (likely(in_task())) {
-		*pflags = 0UL;
-		preempt_disable();
-		stock = this_cpu_ptr(&memcg_stock);
-		return &stock->task_obj;
-	}
-
-	local_irq_save(*pflags);
-	stock = this_cpu_ptr(&memcg_stock);
-	return &stock->irq_obj;
-}
-
-static inline void put_obj_stock(unsigned long flags)
-{
-	if (likely(in_task()))
-		preempt_enable();
-	else
-		local_irq_restore(flags);
-}
-
-/**
  * consume_stock: Try to consume stocked charge on this cpu.
  * @memcg: memcg to consume from.
  * @nr_pages: how many pages to charge.
@@ -2245,9 +2204,7 @@ static void drain_local_stock(struct work_struct *dummy)
 	local_irq_save(flags);
 
 	stock = this_cpu_ptr(&memcg_stock);
-	drain_obj_stock(&stock->irq_obj);
-	if (in_task())
-		drain_obj_stock(&stock->task_obj);
+	drain_obj_stock(stock);
 	drain_stock(stock);
 	clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags);
 
@@ -3084,10 +3041,13 @@ void __memcg_kmem_uncharge_page(struct page *page, int order)
 void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
 		     enum node_stat_item idx, int nr)
 {
+	struct memcg_stock_pcp *stock;
 	unsigned long flags;
-	struct obj_stock *stock = get_obj_stock(&flags);
 	int *bytes;
 
+	local_irq_save(flags);
+	stock = this_cpu_ptr(&memcg_stock);
+
 	/*
 	 * Save vmstat data in stock and skip vmstat array update unless
 	 * accumulating over a page of vmstat data or when pgdat or idx
@@ -3138,26 +3098,29 @@ void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
 	if (nr)
 		mod_objcg_mlstate(objcg, pgdat, idx, nr);
 
-	put_obj_stock(flags);
+	local_irq_restore(flags);
 }
 
 static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
 {
+	struct memcg_stock_pcp *stock;
 	unsigned long flags;
-	struct obj_stock *stock = get_obj_stock(&flags);
 	bool ret = false;
 
+	local_irq_save(flags);
+
+	stock = this_cpu_ptr(&memcg_stock);
 	if (objcg == stock->cached_objcg && stock->nr_bytes >= nr_bytes) {
 		stock->nr_bytes -= nr_bytes;
 		ret = true;
 	}
 
-	put_obj_stock(flags);
+	local_irq_restore(flags);
 
 	return ret;
 }
 
-static void drain_obj_stock(struct obj_stock *stock)
+static void drain_obj_stock(struct memcg_stock_pcp *stock)
 {
 	struct obj_cgroup *old = stock->cached_objcg;
 
@@ -3213,13 +3176,8 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
 {
 	struct mem_cgroup *memcg;
 
-	if (in_task() && stock->task_obj.cached_objcg) {
-		memcg = obj_cgroup_memcg(stock->task_obj.cached_objcg);
-		if (memcg && mem_cgroup_is_descendant(memcg, root_memcg))
-			return true;
-	}
-	if (stock->irq_obj.cached_objcg) {
-		memcg = obj_cgroup_memcg(stock->irq_obj.cached_objcg);
+	if (stock->cached_objcg) {
+		memcg = obj_cgroup_memcg(stock->cached_objcg);
 		if (memcg && mem_cgroup_is_descendant(memcg, root_memcg))
 			return true;
 	}
@@ -3230,10 +3188,13 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
 static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
 			     bool allow_uncharge)
 {
+	struct memcg_stock_pcp *stock;
 	unsigned long flags;
-	struct obj_stock *stock = get_obj_stock(&flags);
 	unsigned int nr_pages = 0;
 
+	local_irq_save(flags);
+
+	stock = this_cpu_ptr(&memcg_stock);
 	if (stock->cached_objcg != objcg) { /* reset if necessary */
 		drain_obj_stock(stock);
 		obj_cgroup_get(objcg);
@@ -3249,7 +3210,7 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
 		stock->nr_bytes &= (PAGE_SIZE - 1);
 	}
 
-	put_obj_stock(flags);
+	local_irq_restore(flags);
 
 	if (nr_pages)
 		obj_cgroup_uncharge_pages(objcg, nr_pages);
@@ -6874,7 +6835,6 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 	unsigned long nr_pages;
 	struct mem_cgroup *memcg;
 	struct obj_cgroup *objcg;
-	bool use_objcg = PageMemcgKmem(page);
 
 	VM_BUG_ON_PAGE(PageLRU(page), page);
 
@@ -6883,7 +6843,7 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 	 * page memcg or objcg at this point, we have fully
 	 * exclusive access to the page.
 	 */
-	if (use_objcg) {
+	if (PageMemcgKmem(page)) {
 		objcg = __page_objcg(page);
 		/*
 		 * This get matches the put at the end of the function and
@@ -6911,7 +6871,7 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
 
 	nr_pages = compound_nr(page);
 
-	if (use_objcg) {
+	if (PageMemcgKmem(page)) {
 		ug->nr_memory += nr_pages;
 		ug->nr_kmem += nr_pages;
 
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 02/10] mm/memcg: Disable threshold event handlers on PREEMPT_RT
  2022-08-08 12:55 [PATCH v2 00/10] Backport MEMCG changes from v5.17 David Oberhollenzer
  2022-08-08 12:55 ` [PATCH v2 01/10] mm/memcg: Revert ("mm/memcg: optimize user context object stock access") David Oberhollenzer
@ 2022-08-08 12:55 ` David Oberhollenzer
  2022-08-08 12:55 ` [PATCH v2 03/10] mm/memcg: Protect per-CPU counter by disabling preemption on PREEMPT_RT where needed David Oberhollenzer
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: David Oberhollenzer @ 2022-08-08 12:55 UTC (permalink / raw)
  To: linux-rt-users
  Cc: williams, bigeasy, richard, joseph.salisbury, Michal Hocko,
	Michal Koutný, Roman Gushchin, Johannes Weiner, Shakeel Butt,
	Michal Hocko, David Oberhollenzer

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

During the integration of PREEMPT_RT support, the code flow around
memcg_check_events() resulted in `twisted code'. Moving the code around
and avoiding then would then lead to an additional local-irq-save
section within memcg_check_events(). While looking better, it adds a
local-irq-save section to code flow which is usually within an
local-irq-off block on non-PREEMPT_RT configurations.

The threshold event handler is a deprecated memcg v1 feature. Instead of
trying to get it to work under PREEMPT_RT just disable it. There should
be no users on PREEMPT_RT. From that perspective it makes even less
sense to get it to work under PREEMPT_RT while having zero users.

Make memory.soft_limit_in_bytes and cgroup.event_control return
-EOPNOTSUPP on PREEMPT_RT. Make an empty memcg_check_events() and
memcg_write_event_control() which return only -EOPNOTSUPP on PREEMPT_RT.
Document that the two knobs are disabled on PREEMPT_RT.

Suggested-by: Michal Hocko <mhocko@kernel.org>
Suggested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
[do: backported to v5.15]
Signed-off-by: David Oberhollenzer <goliath@infraroot.at>
---
 Documentation/admin-guide/cgroup-v1/memory.rst |  2 ++
 mm/memcontrol.c                                | 14 ++++++++++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
index 41191b5fb69d..c45291ac9ffb 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -64,6 +64,7 @@ Brief summary of control files.
 				     threads
  cgroup.procs			     show list of processes
  cgroup.event_control		     an interface for event_fd()
+				     This knob is not available on CONFIG_PREEMPT_RT systems.
  memory.usage_in_bytes		     show current usage for memory
 				     (See 5.5 for details)
  memory.memsw.usage_in_bytes	     show current usage for memory+Swap
@@ -75,6 +76,7 @@ Brief summary of control files.
  memory.max_usage_in_bytes	     show max memory usage recorded
  memory.memsw.max_usage_in_bytes     show max memory+Swap usage recorded
  memory.soft_limit_in_bytes	     set/show soft limit of memory usage
+				     This knob is not available on CONFIG_PREEMPT_RT systems.
  memory.stat			     show various statistics
  memory.use_hierarchy		     set/show hierarchical account enabled
                                      This knob is deprecated and shouldn't be
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a07e068989b4..6078d57aee0f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -909,6 +909,9 @@ static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg,
  */
 static void memcg_check_events(struct mem_cgroup *memcg, struct page *page)
 {
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))
+		return;
+
 	/* threshold event is triggered in finer grain than soft limit */
 	if (unlikely(mem_cgroup_event_ratelimit(memcg,
 						MEM_CGROUP_TARGET_THRESH))) {
@@ -3777,8 +3780,12 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file *of,
 		}
 		break;
 	case RES_SOFT_LIMIT:
-		memcg->soft_limit = nr_pages;
-		ret = 0;
+		if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
+			ret = -EOPNOTSUPP;
+		} else {
+			memcg->soft_limit = nr_pages;
+			ret = 0;
+		}
 		break;
 	}
 	return ret ?: nbytes;
@@ -4754,6 +4761,9 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
 	char *endp;
 	int ret;
 
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))
+		return -EOPNOTSUPP;
+
 	buf = strstrip(buf);
 
 	efd = simple_strtoul(buf, &endp, 10);
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 03/10] mm/memcg: Protect per-CPU counter by disabling preemption on PREEMPT_RT where needed.
  2022-08-08 12:55 [PATCH v2 00/10] Backport MEMCG changes from v5.17 David Oberhollenzer
  2022-08-08 12:55 ` [PATCH v2 01/10] mm/memcg: Revert ("mm/memcg: optimize user context object stock access") David Oberhollenzer
  2022-08-08 12:55 ` [PATCH v2 02/10] mm/memcg: Disable threshold event handlers on PREEMPT_RT David Oberhollenzer
@ 2022-08-08 12:55 ` David Oberhollenzer
  2022-08-08 12:55 ` [PATCH v2 04/10] mm/memcg: Opencode the inner part of obj_cgroup_uncharge_pages() in drain_obj_stock() David Oberhollenzer
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: David Oberhollenzer @ 2022-08-08 12:55 UTC (permalink / raw)
  To: linux-rt-users
  Cc: williams, bigeasy, richard, joseph.salisbury, Roman Gushchin,
	David Oberhollenzer

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The per-CPU counter are modified with the non-atomic modifier. The
consistency is ensured by disabling interrupts for the update.
On non PREEMPT_RT configuration this works because acquiring a
spinlock_t typed lock with the _irq() suffix disables interrupts. On
PREEMPT_RT configurations the RMW operation can be interrupted.

Another problem is that mem_cgroup_swapout() expects to be invoked with
disabled interrupts because the caller has to acquire a spinlock_t which
is acquired with disabled interrupts. Since spinlock_t never disables
interrupts on PREEMPT_RT the interrupts are never disabled at this
point.

The code is never called from in_irq() context on PREEMPT_RT therefore
disabling preemption during the update is sufficient on PREEMPT_RT.
The sections which explicitly disable interrupts can remain on
PREEMPT_RT because the sections remain short and they don't involve
sleeping locks (memcg_check_events() is doing nothing on PREEMPT_RT).

Disable preemption during update of the per-CPU variables which do not
explicitly disable interrupts.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Roman Gushchin <guro@fb.com>
[do: backported to v5.15]
Signed-off-by: David Oberhollenzer <goliath@infraroot.at>
---
 mm/memcontrol.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 48 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6078d57aee0f..e566c34466a1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -654,6 +654,35 @@ static u64 flush_next_time;
 
 #define FLUSH_TIME (2UL*HZ)
 
+/*
+ * Accessors to ensure that preemption is disabled on PREEMPT_RT because it can
+ * not rely on this as part of an acquired spinlock_t lock. These functions are
+ * never used in hardirq context on PREEMPT_RT and therefore disabling preemtion
+ * is sufficient.
+ */
+static void memcg_stats_lock(void)
+{
+#ifdef CONFIG_PREEMPT_RT
+      preempt_disable();
+#else
+      VM_BUG_ON(!irqs_disabled());
+#endif
+}
+
+static void __memcg_stats_lock(void)
+{
+#ifdef CONFIG_PREEMPT_RT
+      preempt_disable();
+#endif
+}
+
+static void memcg_stats_unlock(void)
+{
+#ifdef CONFIG_PREEMPT_RT
+      preempt_enable();
+#endif
+}
+
 static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
 {
 	unsigned int x;
@@ -737,6 +766,20 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 	pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
 	memcg = pn->memcg;
 
+	/*
+	 * The caller from rmap relay on disabled preemption becase they never
+	 * update their counter from in-interrupt context. For these two
+	 * counters we check that the update is never performed from an
+	 * interrupt context while other caller need to have disabled interrupt.
+	 */
+	__memcg_stats_lock();
+	if (IS_ENABLED(CONFIG_DEBUG_VM)) {
+		if (idx == NR_ANON_MAPPED || idx == NR_FILE_MAPPED)
+			WARN_ON_ONCE(!in_task());
+		else
+			WARN_ON_ONCE(!irqs_disabled());
+	}
+
 	/* Update memcg */
 	__this_cpu_add(memcg->vmstats_percpu->state[idx], val);
 
@@ -744,6 +787,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 	__this_cpu_add(pn->lruvec_stats_percpu->state[idx], val);
 
 	memcg_rstat_updated(memcg, val);
+	memcg_stats_unlock();
 }
 
 /**
@@ -844,8 +888,10 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
 	if (mem_cgroup_disabled())
 		return;
 
+	memcg_stats_lock();
 	__this_cpu_add(memcg->vmstats_percpu->events[idx], count);
 	memcg_rstat_updated(memcg, count);
+	memcg_stats_unlock();
 }
 
 static unsigned long memcg_events(struct mem_cgroup *memcg, int event)
@@ -7211,8 +7257,9 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
 	 * important here to have the interrupts disabled because it is the
 	 * only synchronisation we have for updating the per-CPU variables.
 	 */
-	VM_BUG_ON(!irqs_disabled());
+	memcg_stats_lock();
 	mem_cgroup_charge_statistics(memcg, page, -nr_entries);
+	memcg_stats_unlock();
 	memcg_check_events(memcg, page);
 
 	css_put(&memcg->css);
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 04/10] mm/memcg: Opencode the inner part of obj_cgroup_uncharge_pages() in drain_obj_stock()
  2022-08-08 12:55 [PATCH v2 00/10] Backport MEMCG changes from v5.17 David Oberhollenzer
                   ` (2 preceding siblings ...)
  2022-08-08 12:55 ` [PATCH v2 03/10] mm/memcg: Protect per-CPU counter by disabling preemption on PREEMPT_RT where needed David Oberhollenzer
@ 2022-08-08 12:55 ` David Oberhollenzer
  2022-08-08 12:55 ` [PATCH v2 05/10] mm/memcg: Protect memcg_stock with a local_lock_t David Oberhollenzer
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: David Oberhollenzer @ 2022-08-08 12:55 UTC (permalink / raw)
  To: linux-rt-users
  Cc: williams, bigeasy, richard, joseph.salisbury, Johannes Weiner,
	Shakeel Butt, Roman Gushchin, Michal Hocko, David Oberhollenzer

From: Johannes Weiner <hannes@cmpxchg.org>

Provide the inner part of refill_stock() as __refill_stock() without
disabling interrupts. This eases the integration of local_lock_t where
recursive locking must be avoided.
Open code obj_cgroup_uncharge_pages() in drain_obj_stock() and use
__refill_stock(). The caller of drain_obj_stock() already disables
interrupts.

[bigeasy: Patch body around Johannes' diff ]

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Michal Hocko <mhocko@suse.com>
[do: backported to v5.15]
Signed-off-by: David Oberhollenzer <goliath@infraroot.at>
---
 mm/memcontrol.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e566c34466a1..2cf55b3ab4b0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2264,12 +2264,9 @@ static void drain_local_stock(struct work_struct *dummy)
  * Cache charges(val) to local per_cpu area.
  * This will be consumed by consume_stock() function, later.
  */
-static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
+static void __refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
 	struct memcg_stock_pcp *stock;
-	unsigned long flags;
-
-	local_irq_save(flags);
 
 	stock = this_cpu_ptr(&memcg_stock);
 	if (stock->cached != memcg) { /* reset if necessary */
@@ -2281,7 +2278,14 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 
 	if (stock->nr_pages > MEMCG_CHARGE_BATCH)
 		drain_stock(stock);
+}
 
+static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	__refill_stock(memcg, nr_pages);
 	local_irq_restore(flags);
 }
 
@@ -3180,8 +3184,18 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
 		unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT;
 		unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1);
 
-		if (nr_pages)
-			obj_cgroup_uncharge_pages(old, nr_pages);
+		if (nr_pages) {
+			struct mem_cgroup *memcg;
+
+			memcg = get_mem_cgroup_from_objcg(old);
+
+			if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
+				page_counter_uncharge(&memcg->kmem, nr_pages);
+
+			__refill_stock(memcg, nr_pages);
+
+			css_put(&memcg->css);
+		}
 
 		/*
 		 * The leftover is flushed to the centralized per-memcg value.
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 05/10] mm/memcg: Protect memcg_stock with a local_lock_t
  2022-08-08 12:55 [PATCH v2 00/10] Backport MEMCG changes from v5.17 David Oberhollenzer
                   ` (3 preceding siblings ...)
  2022-08-08 12:55 ` [PATCH v2 04/10] mm/memcg: Opencode the inner part of obj_cgroup_uncharge_pages() in drain_obj_stock() David Oberhollenzer
@ 2022-08-08 12:55 ` David Oberhollenzer
  2022-08-08 12:55 ` [PATCH v2 06/10] mm/memcg: Disable migration instead of preemption in drain_all_stock() David Oberhollenzer
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: David Oberhollenzer @ 2022-08-08 12:55 UTC (permalink / raw)
  To: linux-rt-users
  Cc: williams, bigeasy, richard, joseph.salisbury, kernel test robot,
	David Oberhollenzer

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The members of the per-CPU structure memcg_stock_pcp are protected by
disabling interrupts. This is not working on PREEMPT_RT because it
creates atomic context in which actions are performed which require
preemptible context. One example is obj_cgroup_release().

The IRQ-disable sections can be replaced with local_lock_t which
preserves the explicit disabling of interrupts while keeps the code
preemptible on PREEMPT_RT.

drain_obj_stock() drops a reference on obj_cgroup which leads to an invocation
of obj_cgroup_release() if it is the last object. This in turn leads to
recursive locking of the local_lock_t. To avoid this, obj_cgroup_release() is
invoked outside of the locked section.

obj_cgroup_uncharge_pages() can be invoked with the local_lock_t acquired and
without it. This will lead later to a recursion in refill_stock(). To
avoid the locking recursion provide obj_cgroup_uncharge_pages_locked()
which uses the locked version of refill_stock().

- Replace disabling interrupts for memcg_stock with a local_lock_t.

- Let drain_obj_stock() return the old struct obj_cgroup which is passed
  to obj_cgroup_put() outside of the locked section.

- Provide obj_cgroup_uncharge_pages_locked() which uses the locked
  version of refill_stock() to avoid recursive locking in
  drain_obj_stock().

Link: https://lkml.kernel.org/r/20220209014709.GA26885@xsang-OptiPlex-9020
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[do: backported to v5.15]
Signed-off-by: David Oberhollenzer <goliath@infraroot.at>
---
 mm/memcontrol.c | 55 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 34 insertions(+), 21 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2cf55b3ab4b0..20bce9682e8a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2152,6 +2152,7 @@ void unlock_page_memcg(struct page *page)
 EXPORT_SYMBOL(unlock_page_memcg);
 
 struct memcg_stock_pcp {
+	local_lock_t stock_lock;
 	struct mem_cgroup *cached; /* this never be root cgroup */
 	unsigned int nr_pages;
 
@@ -2167,17 +2168,20 @@ struct memcg_stock_pcp {
 	unsigned long flags;
 #define FLUSHING_CACHED_CHARGE	0
 };
-static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
+static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock) = {
+	.stock_lock = INIT_LOCAL_LOCK(stock_lock),
+};
 static DEFINE_MUTEX(percpu_charge_mutex);
 
 #ifdef CONFIG_MEMCG_KMEM
-static void drain_obj_stock(struct memcg_stock_pcp *stock);
+static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock);
 static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
 				     struct mem_cgroup *root_memcg);
 
 #else
-static inline void drain_obj_stock(struct memcg_stock_pcp *stock)
+static inline struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
 {
+	return NULL;
 }
 static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
 				     struct mem_cgroup *root_memcg)
@@ -2206,7 +2210,7 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 	if (nr_pages > MEMCG_CHARGE_BATCH)
 		return ret;
 
-	local_irq_save(flags);
+	local_lock_irqsave(&memcg_stock.stock_lock, flags);
 
 	stock = this_cpu_ptr(&memcg_stock);
 	if (memcg == stock->cached && stock->nr_pages >= nr_pages) {
@@ -2214,7 +2218,7 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 		ret = true;
 	}
 
-	local_irq_restore(flags);
+	local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
 
 	return ret;
 }
@@ -2243,6 +2247,7 @@ static void drain_stock(struct memcg_stock_pcp *stock)
 static void drain_local_stock(struct work_struct *dummy)
 {
 	struct memcg_stock_pcp *stock;
+	struct obj_cgroup *old = NULL;
 	unsigned long flags;
 
 	/*
@@ -2250,14 +2255,16 @@ static void drain_local_stock(struct work_struct *dummy)
 	 * drain_stock races is that we always operate on local CPU stock
 	 * here with IRQ disabled
 	 */
-	local_irq_save(flags);
+	local_lock_irqsave(&memcg_stock.stock_lock, flags);
 
 	stock = this_cpu_ptr(&memcg_stock);
-	drain_obj_stock(stock);
+	old = drain_obj_stock(stock);
 	drain_stock(stock);
 	clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags);
 
-	local_irq_restore(flags);
+	local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
+	if (old)
+		obj_cgroup_put(old);
 }
 
 /*
@@ -2284,9 +2291,9 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
 	unsigned long flags;
 
-	local_irq_save(flags);
+	local_lock_irqsave(&memcg_stock.stock_lock, flags);
 	__refill_stock(memcg, nr_pages);
-	local_irq_restore(flags);
+	local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
 }
 
 /*
@@ -3095,10 +3102,11 @@ void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
 		     enum node_stat_item idx, int nr)
 {
 	struct memcg_stock_pcp *stock;
+	struct obj_cgroup *old = NULL;
 	unsigned long flags;
 	int *bytes;
 
-	local_irq_save(flags);
+	local_lock_irqsave(&memcg_stock.stock_lock, flags);
 	stock = this_cpu_ptr(&memcg_stock);
 
 	/*
@@ -3107,7 +3115,7 @@ void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
 	 * changes.
 	 */
 	if (stock->cached_objcg != objcg) {
-		drain_obj_stock(stock);
+		old = drain_obj_stock(stock);
 		obj_cgroup_get(objcg);
 		stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
 				? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
@@ -3151,7 +3159,9 @@ void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
 	if (nr)
 		mod_objcg_mlstate(objcg, pgdat, idx, nr);
 
-	local_irq_restore(flags);
+	local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
+	if (old)
+		obj_cgroup_put(old);
 }
 
 static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
@@ -3160,7 +3170,7 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
 	unsigned long flags;
 	bool ret = false;
 
-	local_irq_save(flags);
+	local_lock_irqsave(&memcg_stock.stock_lock, flags);
 
 	stock = this_cpu_ptr(&memcg_stock);
 	if (objcg == stock->cached_objcg && stock->nr_bytes >= nr_bytes) {
@@ -3168,17 +3178,17 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
 		ret = true;
 	}
 
-	local_irq_restore(flags);
+	local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
 
 	return ret;
 }
 
-static void drain_obj_stock(struct memcg_stock_pcp *stock)
+static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
 {
 	struct obj_cgroup *old = stock->cached_objcg;
 
 	if (!old)
-		return;
+		return NULL;
 
 	if (stock->nr_bytes) {
 		unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT;
@@ -3230,8 +3240,8 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
 		stock->cached_pgdat = NULL;
 	}
 
-	obj_cgroup_put(old);
 	stock->cached_objcg = NULL;
+	return old;
 }
 
 static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
@@ -3252,14 +3262,15 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
 			     bool allow_uncharge)
 {
 	struct memcg_stock_pcp *stock;
+	struct obj_cgroup *old = NULL;
 	unsigned long flags;
 	unsigned int nr_pages = 0;
 
-	local_irq_save(flags);
+	local_lock_irqsave(&memcg_stock.stock_lock, flags);
 
 	stock = this_cpu_ptr(&memcg_stock);
 	if (stock->cached_objcg != objcg) { /* reset if necessary */
-		drain_obj_stock(stock);
+		old = drain_obj_stock(stock);
 		obj_cgroup_get(objcg);
 		stock->cached_objcg = objcg;
 		stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
@@ -3273,7 +3284,9 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
 		stock->nr_bytes &= (PAGE_SIZE - 1);
 	}
 
-	local_irq_restore(flags);
+	local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
+	if (old)
+		obj_cgroup_put(old);
 
 	if (nr_pages)
 		obj_cgroup_uncharge_pages(objcg, nr_pages);
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 06/10] mm/memcg: Disable migration instead of preemption in drain_all_stock().
  2022-08-08 12:55 [PATCH v2 00/10] Backport MEMCG changes from v5.17 David Oberhollenzer
                   ` (4 preceding siblings ...)
  2022-08-08 12:55 ` [PATCH v2 05/10] mm/memcg: Protect memcg_stock with a local_lock_t David Oberhollenzer
@ 2022-08-08 12:55 ` David Oberhollenzer
  2022-08-08 12:55 ` [PATCH v2 07/10] mm/memcg: Add missing counter index which are not update in interrupt David Oberhollenzer
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: David Oberhollenzer @ 2022-08-08 12:55 UTC (permalink / raw)
  To: linux-rt-users
  Cc: williams, bigeasy, richard, joseph.salisbury, Michal Hocko,
	David Oberhollenzer

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Before the for-each-CPU loop, preemption is disabled so that so that
drain_local_stock() can be invoked directly instead of scheduling a
worker. Ensuring that drain_local_stock() completed on the local CPU is
not correctness problem. It _could_ be that the charging path will be
forced to reclaim memory because cached charges are still waiting for
their draining.

Disabling preemption before invoking drain_local_stock() is problematic
on PREEMPT_RT due to the sleeping locks involved. To ensure that no CPU
migrations happens across for_each_online_cpu() it is enouhg to use
migrate_disable() which disables migration and keeps context preemptible
to a sleeping lock can be acquired.
A race with CPU hotplug is not a problem because pcp data is not going away.
In the worst case we just schedule draining of an empty stock.

Use migrate_disable() instead of get_cpu() around the
for_each_online_cpu() loop.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Michal Hocko <mhocko@suse.com>
[do: backported to v5.15]
Signed-off-by: David Oberhollenzer <goliath@infraroot.at>
---
 mm/memcontrol.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 20bce9682e8a..4b5888bf0a60 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2313,7 +2313,8 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
 	 * as well as workers from this path always operate on the local
 	 * per-cpu data. CPU up doesn't touch memcg_stock at all.
 	 */
-	curcpu = get_cpu();
+	migrate_disable();
+	curcpu = smp_processor_id();
 	for_each_online_cpu(cpu) {
 		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
 		struct mem_cgroup *memcg;
@@ -2336,7 +2337,7 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
 				schedule_work_on(cpu, &stock->work);
 		}
 	}
-	put_cpu();
+	migrate_enable();
 	mutex_unlock(&percpu_charge_mutex);
 }
 
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 07/10] mm/memcg: Add missing counter index which are not update in interrupt.
  2022-08-08 12:55 [PATCH v2 00/10] Backport MEMCG changes from v5.17 David Oberhollenzer
                   ` (5 preceding siblings ...)
  2022-08-08 12:55 ` [PATCH v2 06/10] mm/memcg: Disable migration instead of preemption in drain_all_stock() David Oberhollenzer
@ 2022-08-08 12:55 ` David Oberhollenzer
  2022-08-08 12:56 ` [PATCH v2 08/10] mm/memcg: Add a comment regarding the release `obj' David Oberhollenzer
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: David Oberhollenzer @ 2022-08-08 12:55 UTC (permalink / raw)
  To: linux-rt-users
  Cc: williams, bigeasy, richard, joseph.salisbury, David Oberhollenzer

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Shakeel Butt reported that I missed a few counters which are not updated
in-interrupt context and therefore disabling preemption is fine.

Please fold into:
     "Protect per-CPU counter by disabling preemption on PREEMPT_RT"

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[do: backported to v5.15]
Signed-off-by: David Oberhollenzer <goliath@infraroot.at>
---
 mm/memcontrol.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4b5888bf0a60..50d2d5432018 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -774,10 +774,17 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 	 */
 	__memcg_stats_lock();
 	if (IS_ENABLED(CONFIG_DEBUG_VM)) {
-		if (idx == NR_ANON_MAPPED || idx == NR_FILE_MAPPED)
+		switch (idx) {
+		case NR_ANON_MAPPED:
+		case NR_FILE_MAPPED:
+		case NR_ANON_THPS:
+		case NR_SHMEM_PMDMAPPED:
+		case NR_FILE_PMDMAPPED:
 			WARN_ON_ONCE(!in_task());
-		else
+			break;
+		default:
 			WARN_ON_ONCE(!irqs_disabled());
+		}
 	}
 
 	/* Update memcg */
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 08/10] mm/memcg: Add a comment regarding the release `obj'.
  2022-08-08 12:55 [PATCH v2 00/10] Backport MEMCG changes from v5.17 David Oberhollenzer
                   ` (6 preceding siblings ...)
  2022-08-08 12:55 ` [PATCH v2 07/10] mm/memcg: Add missing counter index which are not update in interrupt David Oberhollenzer
@ 2022-08-08 12:56 ` David Oberhollenzer
  2022-08-08 12:56 ` [PATCH v2 09/10] mm/memcg: Only perform the debug checks on !PREEMPT_RT David Oberhollenzer
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: David Oberhollenzer @ 2022-08-08 12:56 UTC (permalink / raw)
  To: linux-rt-users
  Cc: williams, bigeasy, richard, joseph.salisbury, David Oberhollenzer

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Please fold into
    mm/memcg: Protect memcg_stock with a local_lock_t

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[do: backported to v5.15]
Signed-off-by: David Oberhollenzer <goliath@infraroot.at>
---
 mm/memcontrol.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 50d2d5432018..70c374761bb8 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3249,6 +3249,10 @@ static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
 	}
 
 	stock->cached_objcg = NULL;
+	/*
+	 * The `old' objects needs to be released by the caller via
+	 * obj_cgroup_put() outside of memcg_stock_pcp::stock_lock.
+	 */
 	return old;
 }
 
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 09/10] mm/memcg: Only perform the debug checks on !PREEMPT_RT
  2022-08-08 12:55 [PATCH v2 00/10] Backport MEMCG changes from v5.17 David Oberhollenzer
                   ` (7 preceding siblings ...)
  2022-08-08 12:56 ` [PATCH v2 08/10] mm/memcg: Add a comment regarding the release `obj' David Oberhollenzer
@ 2022-08-08 12:56 ` David Oberhollenzer
  2022-08-08 12:56 ` [PATCH v2 10/10] Allow MEMCG on PREEMPT_RT David Oberhollenzer
  2022-08-18 15:55 ` [PATCH v2 00/10] Backport MEMCG changes from v5.17 Sebastian Andrzej Siewior
  10 siblings, 0 replies; 12+ messages in thread
From: David Oberhollenzer @ 2022-08-08 12:56 UTC (permalink / raw)
  To: linux-rt-users
  Cc: williams, bigeasy, richard, joseph.salisbury, David Oberhollenzer

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

On PREEMPT_RT interrupts and preemption is always enabled. The locking
function __memcg_stats_lock() always disabled preemptions. The recently
added checks need to performed only on !PREEMPT_RT where preemption and
disabled interrupts are used.

Please fold into:
	"Protect per-CPU counter by disabling preemption on PREEMPT_RT"

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[do: backported to v5.15]
Signed-off-by: David Oberhollenzer <goliath@infraroot.at>
---
 mm/memcontrol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 70c374761bb8..74025c2e2bcc 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -773,7 +773,7 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 	 * interrupt context while other caller need to have disabled interrupt.
 	 */
 	__memcg_stats_lock();
-	if (IS_ENABLED(CONFIG_DEBUG_VM)) {
+	if (IS_ENABLED(CONFIG_DEBUG_VM) && !IS_ENABLED(CONFIG_PREEMPT_RT)) {
 		switch (idx) {
 		case NR_ANON_MAPPED:
 		case NR_FILE_MAPPED:
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 10/10] Allow MEMCG on PREEMPT_RT
  2022-08-08 12:55 [PATCH v2 00/10] Backport MEMCG changes from v5.17 David Oberhollenzer
                   ` (8 preceding siblings ...)
  2022-08-08 12:56 ` [PATCH v2 09/10] mm/memcg: Only perform the debug checks on !PREEMPT_RT David Oberhollenzer
@ 2022-08-08 12:56 ` David Oberhollenzer
  2022-08-18 15:55 ` [PATCH v2 00/10] Backport MEMCG changes from v5.17 Sebastian Andrzej Siewior
  10 siblings, 0 replies; 12+ messages in thread
From: David Oberhollenzer @ 2022-08-08 12:56 UTC (permalink / raw)
  To: linux-rt-users
  Cc: williams, bigeasy, richard, joseph.salisbury, David Oberhollenzer

With these backported, it is safe to use MEMCG again:
0001-mm-memcg-Revert-mm-memcg-optimize-user-context-objec.patch
0002-mm-memcg-Disable-threshold-event-handlers-on-PREEMPT.patch
0003-mm-memcg-Protect-per-CPU-counter-by-disabling-preemp.patch
0004-mm-memcg-Opencode-the-inner-part-of-obj_cgroup_uncha.patch
0005-mm-memcg-Protect-memcg_stock-with-a-local_lock_t.patch
0006-mm-memcg-Disable-migration-instead-of-preemption-in-.patch
0007-mm-memcg-Add-missing-counter-index-which-are-not-upd.patch
0008-mm-memcg-Add-a-comment-regarding-the-release-obj.patch
0009-mm-memcg-Only-perform-the-debug-checks-on-PREEMPT_RT.patch

Signed-off-by: David Oberhollenzer <goliath@infraroot.at>
---
 init/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 160f836f81c7..1646c25f6ec5 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -943,7 +943,6 @@ config PAGE_COUNTER
 
 config MEMCG
 	bool "Memory controller"
-	depends on !PREEMPT_RT
 	select PAGE_COUNTER
 	select EVENTFD
 	help
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 00/10] Backport MEMCG changes from v5.17
  2022-08-08 12:55 [PATCH v2 00/10] Backport MEMCG changes from v5.17 David Oberhollenzer
                   ` (9 preceding siblings ...)
  2022-08-08 12:56 ` [PATCH v2 10/10] Allow MEMCG on PREEMPT_RT David Oberhollenzer
@ 2022-08-18 15:55 ` Sebastian Andrzej Siewior
  10 siblings, 0 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-08-18 15:55 UTC (permalink / raw)
  To: David Oberhollenzer; +Cc: linux-rt-users, williams, richard, joseph.salisbury

On 2022-08-08 14:55:52 [+0200], David Oberhollenzer wrote:
> This is a backport of Sebastian's MEMCG changes to v5.15. With these
> patches applied, it is possible to use memory cgroups together
> with PREEMPT_RT on v5.15, just like on v5.17.
> 
> The patch set was tested on my end by building an x86_64 kernel
> with PREEMPT_RT, cgroup debug and lockdep enabled. The kernel was
> run in Qemu, where a small number of child cgroups were created,
> with memory controller enabled for each and an instance of a small
> test program running.

This looks good, thank you.

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-08-18 15:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-08 12:55 [PATCH v2 00/10] Backport MEMCG changes from v5.17 David Oberhollenzer
2022-08-08 12:55 ` [PATCH v2 01/10] mm/memcg: Revert ("mm/memcg: optimize user context object stock access") David Oberhollenzer
2022-08-08 12:55 ` [PATCH v2 02/10] mm/memcg: Disable threshold event handlers on PREEMPT_RT David Oberhollenzer
2022-08-08 12:55 ` [PATCH v2 03/10] mm/memcg: Protect per-CPU counter by disabling preemption on PREEMPT_RT where needed David Oberhollenzer
2022-08-08 12:55 ` [PATCH v2 04/10] mm/memcg: Opencode the inner part of obj_cgroup_uncharge_pages() in drain_obj_stock() David Oberhollenzer
2022-08-08 12:55 ` [PATCH v2 05/10] mm/memcg: Protect memcg_stock with a local_lock_t David Oberhollenzer
2022-08-08 12:55 ` [PATCH v2 06/10] mm/memcg: Disable migration instead of preemption in drain_all_stock() David Oberhollenzer
2022-08-08 12:55 ` [PATCH v2 07/10] mm/memcg: Add missing counter index which are not update in interrupt David Oberhollenzer
2022-08-08 12:56 ` [PATCH v2 08/10] mm/memcg: Add a comment regarding the release `obj' David Oberhollenzer
2022-08-08 12:56 ` [PATCH v2 09/10] mm/memcg: Only perform the debug checks on !PREEMPT_RT David Oberhollenzer
2022-08-08 12:56 ` [PATCH v2 10/10] Allow MEMCG on PREEMPT_RT David Oberhollenzer
2022-08-18 15:55 ` [PATCH v2 00/10] Backport MEMCG changes from v5.17 Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).