linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] debugobjects: Allow object pool refill mostly in non-atomic context
@ 2025-06-17  5:35 Waiman Long
  2025-06-17  5:35 ` [PATCH v3 1/2] debugobjects: Show the state of debug_objects_enabled Waiman Long
  2025-06-17  5:35 ` [PATCH v3 2/2] debugobjects: Allow object pool refill mostly in non-atomic context Waiman Long
  0 siblings, 2 replies; 5+ messages in thread
From: Waiman Long @ 2025-06-17  5:35 UTC (permalink / raw)
  To: Thomas Gleixner, Andrew Morton; +Cc: linux-kernel, Waiman Long

 v3:
  - Use deferred printk for printing the debug object disable message.
  - Drop the last timer specific memory allocation disablement patch &
    limit memory allocation mostly to non-atomic context instead.

A circular locking dependency was reported by lockdep involving printk()
called from within the memory allocator, console driver with timeout
capability and timer code calling debug_object_activate() doing object
pool refill by allocating memory.

The original approach of disabling memory allocation for timer debug
objects didn't work. Now this series try to limit memory allocation
mostly to non-atomic contexts instead. This will ensure that new object
allocation won't happen in contexts where a spinlock (or a raw spinlock)
has been held or in a non-task context except for the init call, though
a sleeping lock may still be held. This should reduce the chance of
deadlock due to this kind of circular lock dependency.

Waiman Long (2):
  debugobjects: Show the state of debug_objects_enabled
  debugobjects: Allow object pool refill mostly in non-atomic context

 lib/Kconfig.debug  |  1 +
 lib/debugobjects.c | 34 +++++++++++++++++++++++-----------
 2 files changed, 24 insertions(+), 11 deletions(-)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3 1/2] debugobjects: Show the state of debug_objects_enabled
  2025-06-17  5:35 [PATCH v3 0/2] debugobjects: Allow object pool refill mostly in non-atomic context Waiman Long
@ 2025-06-17  5:35 ` Waiman Long
  2025-06-17  5:35 ` [PATCH v3 2/2] debugobjects: Allow object pool refill mostly in non-atomic context Waiman Long
  1 sibling, 0 replies; 5+ messages in thread
From: Waiman Long @ 2025-06-17  5:35 UTC (permalink / raw)
  To: Thomas Gleixner, Andrew Morton; +Cc: linux-kernel, Waiman Long

With a PREEMPT_RT kernel, there is a fair chance that debug_objects
could get disabled because we are running out of free debug objects as
debug object allocation is disabled in non-preemptible context. With
!PREEMPT_RT kernels, the chance of this should be minimal. When we
consider imposing restrictions on where debug object allocation can be
done, the chance of running out of them increases.

Currently, it is not easy to figure if debug objects tracking is
disabled.  Fix that by showing the state of "debug_objects_enabled"
in the stats debugfs file as well as always printing a message in the
console log.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 lib/debugobjects.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 7f50c4480a4e..5598105ecf0d 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -125,6 +125,12 @@ static int __init disable_object_debug(char *str)
 }
 early_param("no_debug_objects", disable_object_debug);
 
+static void debug_objects_disable(const char *msg)
+{
+	debug_objects_enabled = false;
+	printk_deferred(KERN_WARNING "debug_objects disabled: %s\n", msg);
+}
+
 static const char *obj_states[ODEBUG_STATE_MAX] = {
 	[ODEBUG_STATE_NONE]		= "none",
 	[ODEBUG_STATE_INIT]		= "initialized",
@@ -690,7 +696,7 @@ static struct debug_obj *lookup_object_or_alloc(void *addr, struct debug_bucket
 	}
 
 	/* Out of memory. Do the cleanup outside of the locked region */
-	debug_objects_enabled = false;
+	debug_objects_disable("out of memory");
 	return NULL;
 }
 
@@ -1161,6 +1167,8 @@ static int debug_stats_show(struct seq_file *m, void *v)
 	seq_printf(m, "on_free_list  : %u\n", pool_count(&pool_to_free));
 	seq_printf(m, "objs_allocated: %d\n", debug_objects_allocated);
 	seq_printf(m, "objs_freed    : %d\n", debug_objects_freed);
+	seq_printf(m, "debug_objects : %s\n", debug_objects_enabled ? "enabled"
+								    : "disabled");
 	return 0;
 }
 DEFINE_SHOW_ATTRIBUTE(debug_stats);
@@ -1314,7 +1322,7 @@ check_results(void *addr, enum debug_obj_state state, int fixups, int warnings)
 out:
 	raw_spin_unlock_irqrestore(&db->lock, flags);
 	if (res)
-		debug_objects_enabled = false;
+		debug_objects_disable("selftest");
 	return res;
 }
 
@@ -1486,11 +1494,8 @@ void __init debug_objects_mem_init(void)
 	cache = kmem_cache_create("debug_objects_cache", sizeof (struct debug_obj), 0,
 				  SLAB_DEBUG_OBJECTS | SLAB_NOLEAKTRACE, NULL);
 
-	if (!cache || !debug_objects_replace_static_objects(cache)) {
-		debug_objects_enabled = false;
-		pr_warn("Out of memory.\n");
-		return;
-	}
+	if (!cache || !debug_objects_replace_static_objects(cache))
+		debug_objects_disable("out of memory");
 
 	/*
 	 * Adjust the thresholds for allocating and freeing objects
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v3 2/2] debugobjects: Allow object pool refill mostly in non-atomic context
  2025-06-17  5:35 [PATCH v3 0/2] debugobjects: Allow object pool refill mostly in non-atomic context Waiman Long
  2025-06-17  5:35 ` [PATCH v3 1/2] debugobjects: Show the state of debug_objects_enabled Waiman Long
@ 2025-06-17  5:35 ` Waiman Long
  2025-06-24  8:28   ` kernel test robot
  2025-06-26 15:22   ` Thomas Gleixner
  1 sibling, 2 replies; 5+ messages in thread
From: Waiman Long @ 2025-06-17  5:35 UTC (permalink / raw)
  To: Thomas Gleixner, Andrew Morton; +Cc: linux-kernel, Waiman Long

With PREEMPT_RT kernel, object pool refill can only happen in preemptible
context. For other !PREEMPT_RT kernels, pool refill can happen in any
context. This can sometimes lead to problem like the following circular
locking dependency shown below.

  -> #3 (&zone->lock){-.-.}-{2:2}:
  -> #2 (&base->lock){-.-.}-{2:2}:
  -> #1 (&console_sch_key){-.-.}-{2:2}:
  -> #0 (console_owner){..-.}-{0:0}:

The "console_owner" is from calling printk() from the following call
chain:

  rmqueue_bulk() => expand() => __warn_printk() => _printk()

This is due to the invocation of the VM_WARN_ONCE() macro in
__add_to_free_list().

The "base->lock" is from lock_timer_base() and "zone->lock" is due to
calling add_timer_on() leading to debug_object_activate() doing actual
memory allocation in pool refill acquiring the zone lock.

The "console_sch_key" comes from a s390 console driver in
driver/s390/cio.  The console_sch_key -> timer dependency happens
because the console driver is setting a timeout value while holding
its lock. Apparently it is pretty common for a console driver to use
timer for timeout or other timing purposes. So this may happen to other
console drivers as well.

There are three debug objects functions that will invoke
debug_objects_fill_pool() for pool refill - __debug_object_init(),
debug_object_activate() & debug_object_assert_init().  Thomas suggested
that we may enforce the pool refill only in the init function and
remove the debug_objects_fill_pool() call from the other two to avoid
the kind of circular locking problem shown above. It is because the init
function can be called in a cluster with many debug objects initialized
consecutively which can lead to exhaustion of the global object pool
if we disable the init function from doing pool refill. See [1] for
such an example. The call patterns of the other two are typically more
spread out.  Of the three, the activate function is called at least an
order of magnitude more than the other two.

Removing the pool refill call from the other two may make pool
exhaustion happen more easily leading to the disabling of the debug
object tracking. As a middle ground, we will allow pool refill from the
activate and assert_init functions if they are called from a non-atomic
context which is roughly half of the times depending on the workloads.

As in_atomic() may not know preemption has been disabled, when
a spinlock has been acquired for example, if CONFIG_PREEMPT_COUNT
hasn't been set. So make DEBUG_OBJECTS select PREEMPT_COUNT to make
sure that the preemption state is properly captured. The overhead of
adding PREEMPT_COUNT should be insignificant compared with the overhead
imposed by enabling the debug object tracking code itself.

[1] https://lore.kernel.org/lkml/202506121115.b69b8c2-lkp@intel.com/

Signed-off-by: Waiman Long <longman@redhat.com>
---
 lib/Kconfig.debug  |  1 +
 lib/debugobjects.c | 15 +++++++++++----
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index ebe33181b6e6..854a2f12a64b 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -723,6 +723,7 @@ source "mm/Kconfig.debug"
 config DEBUG_OBJECTS
 	bool "Debug object operations"
 	depends on DEBUG_KERNEL
+	select PREEMPT_COUNT
 	help
 	  If you say Y here, additional code will be inserted into the
 	  kernel to track the life time of various objects and validate
diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 5598105ecf0d..d85f87f359d2 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -700,11 +700,18 @@ static struct debug_obj *lookup_object_or_alloc(void *addr, struct debug_bucket
 	return NULL;
 }
 
-static void debug_objects_fill_pool(void)
+static void debug_objects_fill_pool(bool init)
 {
 	if (!static_branch_likely(&obj_cache_enabled))
 		return;
 
+	/*
+	 * Attempt to fill the pool only if called from debug_objects_init()
+	 * or not in atomic context.
+	 */
+	if (!init && in_atomic())
+		return;
+
 	if (likely(!pool_should_refill(&pool_global)))
 		return;
 
@@ -740,7 +747,7 @@ __debug_object_init(void *addr, const struct debug_obj_descr *descr, int onstack
 	struct debug_bucket *db;
 	unsigned long flags;
 
-	debug_objects_fill_pool();
+	debug_objects_fill_pool(true);
 
 	db = get_bucket((unsigned long) addr);
 
@@ -817,7 +824,7 @@ int debug_object_activate(void *addr, const struct debug_obj_descr *descr)
 	if (!debug_objects_enabled)
 		return 0;
 
-	debug_objects_fill_pool();
+	debug_objects_fill_pool(false);
 
 	db = get_bucket((unsigned long) addr);
 
@@ -1006,7 +1013,7 @@ void debug_object_assert_init(void *addr, const struct debug_obj_descr *descr)
 	if (!debug_objects_enabled)
 		return;
 
-	debug_objects_fill_pool();
+	debug_objects_fill_pool(false);
 
 	db = get_bucket((unsigned long) addr);
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 2/2] debugobjects: Allow object pool refill mostly in non-atomic context
  2025-06-17  5:35 ` [PATCH v3 2/2] debugobjects: Allow object pool refill mostly in non-atomic context Waiman Long
@ 2025-06-24  8:28   ` kernel test robot
  2025-06-26 15:22   ` Thomas Gleixner
  1 sibling, 0 replies; 5+ messages in thread
From: kernel test robot @ 2025-06-24  8:28 UTC (permalink / raw)
  To: Waiman Long
  Cc: oe-lkp, lkp, linux-kernel, Thomas Gleixner, Andrew Morton,
	Waiman Long, oliver.sang

[-- Attachment #1: Type: text/plain, Size: 3410 bytes --]



Hello,

we don't have enough knowledge to understand why this patch causes seemingly
auto-reboot in this trinity tests. but the issue is quite persistent.

94d15b49922495cd 81c49454d188c6eb34bb736f432
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
           :50         100%          50:50    last_state.is_incomplete_run
           :50          98%          50:50    last_state.running

we don't see hint from dmesg uploaded to [1], so we also attached one dmesg
from parent FYI in case we miss anything.


kernel test robot noticed "last_state.running" on:

commit: 81c49454d188c6eb34bb736f432e4b1ef623d70f ("[PATCH v3 2/2] debugobjects: Allow object pool refill mostly in non-atomic context")
url: https://github.com/intel-lab-lkp/linux/commits/Waiman-Long/debugobjects-Show-the-state-of-debug_objects_enabled/20250617-133635
base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-nonmm-unstable
patch link: https://lore.kernel.org/all/20250617053527.1223411-3-longman@redhat.com/
patch subject: [PATCH v3 2/2] debugobjects: Allow object pool refill mostly in non-atomic context

in testcase: trinity
version: 
with following parameters:

	runtime: 300s
	group: group-01
	nr_groups: 5



config: i386-randconfig-004-20250618
compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202506201449.f3047c4-lkp@intel.com


LKP: ttyS0: 246: Kernel tests: Boot OK!
LKP: ttyS0: 246: HOSTNAME vm-snb, MAC 52:54:00:12:34:56, kernel 6.16.0-rc2-00044-g81c49454d188 1

Poky (Yocto Project Reference Distro) 2.7+snapshot vm-snb /dev/ttyS0

vm-snb login: LKP: ttyS0: 246:  /lkp/lkp/src/bin/run-lkp /lkp/jobs/scheduled/vm-meta-243/trinity-group-01-5-300s-yocto-i386-minimal-20190520.cgz-i386-randconfig-004-20250618-81c49454d188-20250619-2383740-f5ihl3-14.yaml
[   13.087284][  T290] mkdir: can't create directory '/sys/kernel/debug': Operation not permitted
[   13.087284][  T290] mount: mounting debug on /sys/kernel/debug failed: No such file or directory
[   34.877921][ T3450] UDPLite: UDP-Lite is deprecated and scheduled to be removed in 2025, please contact the netdev mailing list
[   34.879165][ T3450] trinity-main uses obsolete (PF_INET,SOCK_PACKET)
[   34.931165][ T3450] UDPLite6: UDP-Lite is deprecated and scheduled to be removed in 2025, please contact the netdev mailing list
[   35.302518][ T3582] VFS: Warning: trinity-c0 using old stat() call. Recompile your binary.
[   35.303816][ T3582] VFS: Warning: trinity-c0 using old stat() call. Recompile your binary.
[   35.307560][ T3585] VFS: Warning: trinity-c3 using old stat() call. Recompile your binary.
[   35.308370][ T3585] VFS: Warning: trinity-c3 using old stat() call. Recompile your binary.
[   35.314526][ T3582] VFS: Warning: trinity-c0 using old stat() call. Recompile your binary.



[1]
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250620/202506201449.f3047c4-lkp@intel.com


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


[-- Attachment #2: dmesg-94d15b4992.xz --]
[-- Type: application/x-xz, Size: 12976 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 2/2] debugobjects: Allow object pool refill mostly in non-atomic context
  2025-06-17  5:35 ` [PATCH v3 2/2] debugobjects: Allow object pool refill mostly in non-atomic context Waiman Long
  2025-06-24  8:28   ` kernel test robot
@ 2025-06-26 15:22   ` Thomas Gleixner
  1 sibling, 0 replies; 5+ messages in thread
From: Thomas Gleixner @ 2025-06-26 15:22 UTC (permalink / raw)
  To: Waiman Long, Andrew Morton; +Cc: linux-kernel, Waiman Long

On Tue, Jun 17 2025 at 01:35, Waiman Long wrote:
> Removing the pool refill call from the other two may make pool
> exhaustion happen more easily leading to the disabling of the debug
> object tracking.

May is not a technical argument. Why would that happen?

As we established before the only reason why activate() or assert() can
consume an object from the pool is when the object was statically
initialized.

Are there really enough statically initialized objects to actually lead
to pool depletion without that activate/assert hackery?

> As a middle ground, we will allow pool refill from the
> activate and assert_init functions if they are called from a non-atomic
> context which is roughly half of the times depending on the workloads.

Which brings you back to the square one on RT enabled kernels because
the held locks in this circular lock chain scenario do _NOT_ disable
preemption, so in_atomic() evaluates to false ....

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-06-26 15:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-17  5:35 [PATCH v3 0/2] debugobjects: Allow object pool refill mostly in non-atomic context Waiman Long
2025-06-17  5:35 ` [PATCH v3 1/2] debugobjects: Show the state of debug_objects_enabled Waiman Long
2025-06-17  5:35 ` [PATCH v3 2/2] debugobjects: Allow object pool refill mostly in non-atomic context Waiman Long
2025-06-24  8:28   ` kernel test robot
2025-06-26 15:22   ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).