linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will.deacon@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, kasan-dev@googlegroups.com,
	linux-mm@kvack.org, iommu@lists.linux-foundation.org,
	Petr Mladek <pmladek@suse.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Tejun Heo <tj@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections
Date: Thu, 22 Nov 2018 14:57:02 -0500	[thread overview]
Message-ID: <c8c29a58-f356-d379-2bf4-cea09b03dc3e@redhat.com> (raw)
In-Reply-To: <20181122020422.GA3441@jagdpanzerIV>

On 11/21/2018 09:04 PM, Sergey Senozhatsky wrote:
> On (11/21/18 11:49), Waiman Long wrote:
> [..]
>>>  	case ODEBUG_STATE_ACTIVE:
>>> -		debug_print_object(obj, "init");
>>>  		state = obj->state;
>>>  		raw_spin_unlock_irqrestore(&db->lock, flags);
>>> +		debug_print_object(obj, "init");
>>>  		debug_object_fixup(descr->fixup_init, addr, state);
>>>  		return;
>>>  
>>>  	case ODEBUG_STATE_DESTROYED:
>>> -		debug_print_object(obj, "init");
>>> +		debug_printobj = true;
>>>  		break;
>>>  	default:
>>>  		break;
>>>  	}
>>>  
>>>  	raw_spin_unlock_irqrestore(&db->lock, flags);
>>> +	if (debug_chkstack)
>>> +		debug_object_is_on_stack(addr, onstack);
>>> +	if (debug_printobj)
>>> +		debug_print_object(obj, "init");
>>>
> [..]
>> As a side note, one of the test systems that I used generated a
>> debugobjects splat in the bootup process and the system hanged
>> afterward. Applying this patch alone fix the hanging problem and the
>> system booted up successfully. So it is not really a good idea to call
>> printk() while holding a raw spinlock.
> Right, I like this patch.
> And I think that we, maybe, can go even further.
>
> Some serial consoles call mod_timer(). So what we could have with the
> debug objects enabled was
>
> 	mod_timer()
> 	 lock_timer_base()
> 	  debug_activate()
> 	   printk()
> 	    call_console_drivers()
> 	     foo_console()
> 	      mod_timer()
> 	       lock_timer_base()       << deadlock
>
> That's one possible scenario. The other one can involve console's
> IRQ handler, uart port spinlock, mod_timer, debug objects, printk,
> and an eventual deadlock on the uart port spinlock. This one can
> be mitigated with printk_safe. But mod_timer() deadlock will require
> a different fix.
>
> So maybe we need to switch debug objects print-outs to _always_
> printk_deferred(). Debug objects can be used in code which cannot
> do direct printk() - timekeeping is just one example.
>
> 	-ss

Actually, I don't think that was the cause of the hang. The debugobjects
splat was caused by debug_object_is_on_stack(), below was the output:

[    6.890048] ODEBUG: object (____ptrval____) is NOT on stack
(____ptrval____), but annotated.
[    6.891000] WARNING: CPU: 28 PID: 1 at lib/debugobjects.c:369
__debug_object_init.cold.11+0x51/0x2d6
[    6.891000] Modules linked in:
[    6.891000] CPU: 28 PID: 1 Comm: swapper/0 Not tainted
4.18.0-41.el8.bz1651764_cgroup_debug.x86_64+debug #1
[    6.891000] Hardware name: HPE ProLiant DL120 Gen10/ProLiant DL120
Gen10, BIOS U36 11/14/2017
[    6.891000] RIP: 0010:__debug_object_init.cold.11+0x51/0x2d6
[    6.891000] Code: ea 03 80 3c 02 00 0f 85 85 02 00 00 49 8b 54 24 18
48 89 de 4c 89 44 24 10 48 c7 c7 00 ce 22 94 e8 73 18 62 ff 4c 8b 44 24
10 <0f> 0b e9 60 db ff ff 41 83 c4 01 b8 ff ff 37 00 44 89 25 ce 46 f9
[    6.891000] RSP: 0000:ffff880104187960 EFLAGS: 00010086
[    6.891000] RAX: 0000000000000050 RBX: ffffffff9764c570 RCX:
0000000000000000
[    6.891000] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
ffff880104178ca8
[    6.891000] RBP: 1ffff10020830f34 R08: ffff8807ce68a1d0 R09:
fffffbfff2923554
[    6.891000] R10: fffffbfff2923554 R11: ffffffff9491aaa3 R12:
ffff880104178000
[    6.891000] R13: ffffffff96c809b8 R14: 000000000000a370 R15:
ffff8807ce68a1c0
[    6.891000] FS:  0000000000000000(0000) GS:ffff8807d4200000(0000)
knlGS:0000000000000000
[    6.891000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.891000] CR2: 0000000000000000 CR3: 000000028de16001 CR4:
00000000007606e0
[    6.891000] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[    6.891000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[    6.891000] PKRU: 00000000
[    6.891000] Call Trace:
[    6.891000]  ? debug_object_fixup+0x30/0x30
[    6.891000]  ? _raw_spin_unlock_irqrestore+0x4b/0x60
[    6.891000]  ? __lockdep_init_map+0x12f/0x510
[    6.891000]  ? __lockdep_init_map+0x12f/0x510
[    6.891000]  virt_efi_get_next_variable+0xa2/0x160
[    6.891000]  efivar_init+0x1c4/0x6d7
[    6.891000]  ? efivar_ssdt_setup+0x3b/0x3b
[    6.891000]  ? efivar_entry_iter+0x120/0x120
[    6.891000]  ? find_held_lock+0x3a/0x1c0
[    6.891000]  ? lock_downgrade+0x5e0/0x5e0
[    6.891000]  ? kmsg_dump_rewind_nolock+0xd9/0xd9
[    6.891000]  ? _raw_spin_unlock_irqrestore+0x4b/0x60
[    6.891000]  ? trace_hardirqs_on_caller+0x381/0x570
[    6.891000]  ? efivar_ssdt_iter+0x1f4/0x1f4
[    6.891000]  efisubsys_init+0x1be/0x4ae
[    6.891000]  ? kernfs_get.part.8+0x4c/0x60
[    6.891000]  ? efivar_ssdt_iter+0x1f4/0x1f4
[    6.891000]  ? __kernfs_create_file+0x235/0x2e0
[    6.891000]  ? efivar_ssdt_iter+0x1f4/0x1f4
[    6.891000]  do_one_initcall+0xe9/0x5fd
[    6.891000]  ? perf_trace_initcall_level+0x450/0x450
[    6.891000]  ? __wake_up_common+0x5a0/0x5a0
[    6.891000]  ? lock_downgrade+0x5e0/0x5e0
[    6.891000]  kernel_init_freeable+0x51a/0x5f2
[    6.891000]  ? start_kernel+0x7b8/0x7b8
[    6.891000]  ? finish_task_switch+0x19a/0x690
[    6.891000]  ? __switch_to_asm+0x40/0x70
[    6.891000]  ? __switch_to_asm+0x34/0x70
[    6.891000]  ? rest_init+0xe9/0xe9
[    6.891000]  kernel_init+0xc/0x110
[    6.891000]  ? rest_init+0xe9/0xe9
[    6.891000]  ret_from_fork+0x24/0x50
[    6.891000] irq event stamp: 1081352
[    6.891000] hardirqs last  enabled at (1081351): [<ffffffff93af7dab>]
_raw_spin_unlock_irqrestore+0x4b/0x60
[    6.891000] hardirqs last disabled at (1081352): [<ffffffff93af85c2>]
_raw_spin_lock_irqsave+0x22/0x81
[    6.891000] softirqs last  enabled at (1081334): [<ffffffff93e006f9>]
__do_softirq+0x6f9/0xaa0
[    6.891000] softirqs last disabled at (1081325): [<ffffffff921b993f>]
irq_exit+0x27f/0x2d0
[    6.891000] ---[ end trace 15e1083fc009a526 ]---

All the messages above were printed while holding a raw spinlock with
IRQ disabled. Further down the bootup sequence, the system appeared to hang:

   11.270654] systemd[1]: systemd 239 running in system mode. (+PAM
+AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP
+GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN
+PCRE2 default-hierarchy=legacy)
[   11.311307] systemd[1]: Detected architecture x86-64.
[   11.316420] systemd[1]: Running in initial RAM disk.

Welcome to

The system is not responsive at this point.

I am not totally sure what caused this. Maybe it was caused by disabling
IRQ for too long leading to some kind of corruption. Anyway, moving
debug_object_is_on_stack() outside of the IRQ disabled lock critical
section seemed to fix the hang problem.

Cheers,
Longman

  parent reply	other threads:[~2018-11-22 19:57 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-19 18:55 [PATCH v2 00/17] locking/lockdep: Add a new class of terminal locks Waiman Long
2018-11-19 18:55 ` [PATCH v2 01/17] locking/lockdep: Remove version from lock_class structure Waiman Long
2018-11-19 18:55 ` [PATCH v2 02/17] locking/lockdep: Rework lockdep_set_novalidate_class() Waiman Long
2018-11-19 18:55 ` [PATCH v2 03/17] locking/lockdep: Add a new terminal lock type Waiman Long
2018-12-07  9:19   ` Peter Zijlstra
2018-11-19 18:55 ` [PATCH v2 04/17] locking/lockdep: Add DEFINE_TERMINAL_SPINLOCK() and related macros Waiman Long
2018-11-19 18:55 ` [PATCH v2 05/17] printk: Mark logbuf_lock & console_owner_lock as terminal locks Waiman Long
2018-11-19 18:55 ` [PATCH v2 06/17] debugobjects: Mark pool_lock as a terminal lock Waiman Long
2018-11-19 18:55 ` [PATCH v2 07/17] debugobjects: Move printk out of db lock critical sections Waiman Long
2018-11-21 16:49   ` Waiman Long
2018-11-22  2:04     ` Sergey Senozhatsky
2018-11-22 10:16       ` Peter Zijlstra
2018-11-23  2:40         ` Sergey Senozhatsky
2018-11-23 11:48           ` Petr Mladek
2018-11-26  4:57             ` Sergey Senozhatsky
2018-11-29 13:09               ` Petr Mladek
2018-11-22 16:02       ` Petr Mladek
2018-11-22 20:29         ` Waiman Long
2018-11-23 11:36           ` Petr Mladek
2018-11-22 19:57       ` Waiman Long [this message]
2018-11-23  2:35         ` Sergey Senozhatsky
2018-11-23 11:20         ` Petr Mladek
2018-12-07  9:21   ` Peter Zijlstra
2018-11-19 18:55 ` [PATCH v2 08/17] locking/lockdep: Add support for nestable terminal locks Waiman Long
2018-12-07  9:22   ` Peter Zijlstra
2018-12-07  9:52     ` Peter Zijlstra
2018-11-19 18:55 ` [PATCH v2 09/17] debugobjects: Make object hash locks " Waiman Long
2018-11-22 15:33   ` Petr Mladek
2018-11-22 20:17     ` Waiman Long
2018-11-23  9:29       ` Petr Mladek
2018-12-07  9:47   ` Peter Zijlstra
2018-11-19 18:55 ` [PATCH v2 10/17] lib/stackdepot: Make depot_lock a terminal spinlock Waiman Long
2018-11-19 18:55 ` [PATCH v2 11/17] locking/rwsem: Mark rwsem.wait_lock as a terminal lock Waiman Long
2018-11-19 18:55 ` [PATCH v2 12/17] cgroup: Mark the rstat percpu lock as terminal Waiman Long
2018-11-19 18:55 ` [PATCH v2 13/17] mm/kasan: Make quarantine_lock a terminal lock Waiman Long
2018-11-19 18:55 ` [PATCH v2 14/17] dma-debug: Mark free_entries_lock as terminal Waiman Long
2018-11-19 18:55 ` [PATCH v2 15/17] kernfs: Mark kernfs_open_node_lock as terminal lock Waiman Long
2018-11-19 18:55 ` [PATCH v2 16/17] delay_acct: Mark task's delays->lock as terminal spinlock Waiman Long
2018-11-19 18:55 ` [PATCH v2 17/17] locking/lockdep: Check raw/non-raw locking conflicts Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c8c29a58-f356-d379-2bf4-cea09b03dc3e@redhat.com \
    --to=longman@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aryabinin@virtuozzo.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).