Re: [Intel-gfx] [PATCH v3 6/6] freezer, sched: Rewrite core freezer logic

public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed

* Re: [Intel-gfx] [PATCH v3 6/6] freezer, sched: Rewrite core freezer logic
       [not found] ` <20220822114649.055452969@infradead.org>
@ 2022-10-21 17:22   ` Ville Syrjälä
  2022-10-25  4:52     ` Ville Syrjälä
  0 siblings, 1 reply; 15+ messages in thread
From: Ville Syrjälä @ 2022-10-21 17:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-pm, linux-kernel, bigeasy, rjw, oleg, rostedt, mingo,
	mgorman, intel-gfx, tj, Will Deacon, dietmar.eggemann, ebiederm

On Mon, Aug 22, 2022 at 01:18:22PM +0200, Peter Zijlstra wrote:
> +#ifdef CONFIG_LOCKDEP
> +	/*
> +	 * It's dangerous to freeze with locks held; there be dragons there.
> +	 */
> +	if (!(state & __TASK_FREEZABLE_UNSAFE))
> +		WARN_ON_ONCE(debug_locks && p->lockdep_depth);
> +#endif

We now seem to be hitting this sporadically in the intel gfx CI.

I've spotted it on two machines so far:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12270/shard-tglb7/igt@gem_ctx_isolation@preservation-s3@vcs0.html
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109950v1/shard-snb5/igt@kms_flip@flip-vs-suspend-interruptible@a-vga1.html

Here's the full splat. Looks a bit funny since the
WARN()->printk()->console_lock() itself trips lockdep:

<6>[   59.998117] PM: suspend entry (s2idle)
<6>[   59.999878] Filesystems sync: 0.001 seconds
<6>[   60.000881] Freezing user space processes ... 
<4>[   60.001059] ------------[ cut here ]------------
<4>[   60.001071] ======================================================
<4>[   60.001071] WARNING: possible circular locking dependency detected
<4>[   60.001072] 6.1.0-rc1-CI_DRM_12270-ga9d18ead9885+ #1 Not tainted
<4>[   60.001073] ------------------------------------------------------
<4>[   60.001073] rtcwake/1152 is trying to acquire lock:
<4>[   60.001074] ffffffff82735198 ((console_sem).lock){..-.}-{2:2}, at: down_trylock+0xa/0x30
<4>[   60.001082] 
                  but task is already holding lock:
<4>[   60.001082] ffff888111a708e0 (&p->pi_lock){-.-.}-{2:2}, at: task_call_func+0x34/0xe0
<4>[   60.001088] 
                  which lock already depends on the new lock.

<4>[   60.001089] 
                  the existing dependency chain (in reverse order) is:
<4>[   60.001089] 
                  -> #1 (&p->pi_lock){-.-.}-{2:2}:
<4>[   60.001091]        lock_acquire+0xd3/0x310
<4>[   60.001094]        _raw_spin_lock_irqsave+0x33/0x50
<4>[   60.001097]        try_to_wake_up+0x6b/0x610
<4>[   60.001098]        up+0x3b/0x50
<4>[   60.001099]        __up_console_sem+0x5c/0x70
<4>[   60.001102]        console_unlock+0x1bc/0x1d0
<4>[   60.001104]        do_con_write+0x654/0xa20
<4>[   60.001108]        con_write+0xa/0x20
<4>[   60.001110]        do_output_char+0x119/0x1e0
<4>[   60.001113]        n_tty_write+0x20f/0x490
<4>[   60.001114]        file_tty_write.isra.29+0x17d/0x320
<4>[   60.001117]        do_iter_readv_writev+0xdb/0x140
<4>[   60.001120]        do_iter_write+0x6c/0x1a0
<4>[   60.001121]        vfs_writev+0x97/0x290
<4>[   60.001123]        do_writev+0x63/0x110
<4>[   60.001125]        do_syscall_64+0x37/0x90
<4>[   60.001128]        entry_SYSCALL_64_after_hwframe+0x63/0xcd
<4>[   60.001130] 
                  -> #0 ((console_sem).lock){..-.}-{2:2}:
<4>[   60.001131]        validate_chain+0xb3d/0x2000
<4>[   60.001132]        __lock_acquire+0x5a4/0xb70
<4>[   60.001133]        lock_acquire+0xd3/0x310
<4>[   60.001134]        _raw_spin_lock_irqsave+0x33/0x50
<4>[   60.001136]        down_trylock+0xa/0x30
<4>[   60.001137]        __down_trylock_console_sem+0x25/0xb0
<4>[   60.001139]        console_trylock+0xe/0x70
<4>[   60.001140]        vprintk_emit+0x13c/0x380
<4>[   60.001142]        _printk+0x53/0x6e
<4>[   60.001145]        report_bug.cold.2+0x10/0x52
<4>[   60.001147]        handle_bug+0x3f/0x70
<4>[   60.001148]        exc_invalid_op+0x13/0x60
<4>[   60.001150]        asm_exc_invalid_op+0x16/0x20
<4>[   60.001152]        __set_task_frozen+0x58/0x80
<4>[   60.001156]        task_call_func+0xc2/0xe0
<4>[   60.001157]        freeze_task+0x84/0xe0
<4>[   60.001159]        try_to_freeze_tasks+0xac/0x260
<4>[   60.001160]        freeze_processes+0x56/0xb0
<4>[   60.001162]        pm_suspend.cold.7+0x1d9/0x31c
<4>[   60.001164]        state_store+0x7b/0xe0
<4>[   60.001165]        kernfs_fop_write_iter+0x121/0x1c0
<4>[   60.001169]        vfs_write+0x34c/0x4e0
<4>[   60.001170]        ksys_write+0x57/0xd0
<4>[   60.001172]        do_syscall_64+0x37/0x90
<4>[   60.001174]        entry_SYSCALL_64_after_hwframe+0x63/0xcd
<4>[   60.001176] 
                  other info that might help us debug this:

<4>[   60.001176]  Possible unsafe locking scenario:

<4>[   60.001176]        CPU0                    CPU1
<4>[   60.001176]        ----                    ----
<4>[   60.001177]   lock(&p->pi_lock);
<4>[   60.001177]                                lock((console_sem).lock);
<4>[   60.001178]                                lock(&p->pi_lock);
<4>[   60.001179]   lock((console_sem).lock);
<4>[   60.001179] 
                   *** DEADLOCK ***

<4>[   60.001180] 7 locks held by rtcwake/1152:
<4>[   60.001180]  #0: ffff888105e99430 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x57/0xd0
<4>[   60.001184]  #1: ffff88810a048288 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xee/0x1c0
<4>[   60.001191]  #2: ffff888100c58538 (kn->active#155){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xf7/0x1c0
<4>[   60.001194]  #3: ffffffff8264db08 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend.cold.7+0xfa/0x31c
<4>[   60.001197]  #4: ffffffff82606098 (tasklist_lock){.+.+}-{2:2}, at: try_to_freeze_tasks+0x63/0x260
<4>[   60.001201]  #5: ffffffff8273aed8 (freezer_lock){....}-{2:2}, at: freeze_task+0x27/0xe0
<4>[   60.001204]  #6: ffff888111a708e0 (&p->pi_lock){-.-.}-{2:2}, at: task_call_func+0x34/0xe0
<4>[   60.001207] 
                  stack backtrace:
<4>[   60.001207] CPU: 2 PID: 1152 Comm: rtcwake Not tainted 6.1.0-rc1-CI_DRM_12270-ga9d18ead9885+ #1
<4>[   60.001210] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.3197.A00.2005110542 05/11/2020
<4>[   60.001211] Call Trace:
<4>[   60.001211]  <TASK>
<4>[   60.001212]  dump_stack_lvl+0x56/0x7f
<4>[   60.001215]  check_noncircular+0x132/0x150
<4>[   60.001217]  validate_chain+0xb3d/0x2000
<4>[   60.001220]  __lock_acquire+0x5a4/0xb70
<4>[   60.001222]  lock_acquire+0xd3/0x310
<4>[   60.001223]  ? down_trylock+0xa/0x30
<4>[   60.001226]  ? vprintk_emit+0x13c/0x380
<4>[   60.001228]  _raw_spin_lock_irqsave+0x33/0x50
<4>[   60.001230]  ? down_trylock+0xa/0x30
<4>[   60.001231]  down_trylock+0xa/0x30
<4>[   60.001233]  __down_trylock_console_sem+0x25/0xb0
<4>[   60.001234]  console_trylock+0xe/0x70
<4>[   60.001235]  vprintk_emit+0x13c/0x380
<4>[   60.001237]  _printk+0x53/0x6e
<4>[   60.001240]  ? __set_task_frozen+0x58/0x80
<4>[   60.001241]  report_bug.cold.2+0x10/0x52
<4>[   60.001244]  handle_bug+0x3f/0x70
<4>[   60.001245]  exc_invalid_op+0x13/0x60
<4>[   60.001247]  asm_exc_invalid_op+0x16/0x20
<4>[   60.001250] RIP: 0010:__set_task_frozen+0x58/0x80
<4>[   60.001252] Code: f7 c5 00 20 00 00 74 06 40 f6 c5 03 74 3a 81 e5 00 40 00 00 75 16 8b 15 a2 b9 71 01 85 d2 74 0c 8b 83 60 09 00 00 85 c0 74 02 <0f> 0b c7 43 18 00 80 00 00 b8 00 80 00 00 5b 5d c3 cc cc cc cc 31
<4>[   60.001254] RSP: 0018:ffffc9000335fcf0 EFLAGS: 00010002
<4>[   60.001255] RAX: 0000000000000001 RBX: ffff888111a70040 RCX: 0000000000000000
<4>[   60.001256] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff888111a70040
<4>[   60.001257] RBP: 0000000000000000 R08: 0000000000000001 R09: 00000000fffffffe
<4>[   60.001258] R10: 0000000001e6f6b9 R11: 00000000934a4c67 R12: 0000000000000246
<4>[   60.001259] R13: ffffffff811653e0 R14: 0000000000000000 R15: ffff888111a70040
<4>[   60.001260]  ? __set_task_special+0x40/0x40
<4>[   60.001263]  task_call_func+0xc2/0xe0
<4>[   60.001265]  freeze_task+0x84/0xe0
<4>[   60.001267]  try_to_freeze_tasks+0xac/0x260
<4>[   60.001270]  freeze_processes+0x56/0xb0
<4>[   60.001272]  pm_suspend.cold.7+0x1d9/0x31c
<4>[   60.001274]  state_store+0x7b/0xe0
<4>[   60.001276]  kernfs_fop_write_iter+0x121/0x1c0
<4>[   60.001278]  vfs_write+0x34c/0x4e0
<4>[   60.001281]  ksys_write+0x57/0xd0
<4>[   60.001284]  do_syscall_64+0x37/0x90
<4>[   60.001285]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
<4>[   60.001288] RIP: 0033:0x7fb4705521e7
<4>[   60.001289] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
<4>[   60.001290] RSP: 002b:00007ffe3efac3d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4>[   60.001291] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fb4705521e7
<4>[   60.001292] RDX: 0000000000000004 RSI: 0000559af7969590 RDI: 000000000000000b
<4>[   60.001293] RBP: 0000559af7969590 R08: 0000000000000000 R09: 0000000000000004
<4>[   60.001293] R10: 0000559af60922a6 R11: 0000000000000246 R12: 0000000000000004
<4>[   60.001294] R13: 0000559af7967540 R14: 00007fb47062e4a0 R15: 00007fb47062d8a0
<4>[   60.001296]  </TASK>
<4>[   60.001634] WARNING: CPU: 2 PID: 1152 at kernel/freezer.c:129 __set_task_frozen+0x58/0x80
<4>[   60.001641] Modules linked in: fuse snd_hda_codec_hdmi i915 x86_pkg_temp_thermal coretemp mei_pxp mei_hdcp kvm_intel wmi_bmof snd_hda_intel kvm snd_intel_dspcfg prime_numbers snd_hda_codec cdc_ether ttm e1000e irqbypass snd_hwdep crct10dif_pclmul usbnet drm_buddy crc32_pclmul mii ghash_clmulni_intel snd_hda_core drm_display_helper ptp i2c_i801 pps_core mei_me drm_kms_helper i2c_smbus snd_pcm syscopyarea mei sysfillrect sysimgblt intel_lpss_pci fb_sys_fops video wmi
<4>[   60.001717] CPU: 2 PID: 1152 Comm: rtcwake Not tainted 6.1.0-rc1-CI_DRM_12270-ga9d18ead9885+ #1
<4>[   60.001723] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.3197.A00.2005110542 05/11/2020
<4>[   60.001729] RIP: 0010:__set_task_frozen+0x58/0x80
<4>[   60.001735] Code: f7 c5 00 20 00 00 74 06 40 f6 c5 03 74 3a 81 e5 00 40 00 00 75 16 8b 15 a2 b9 71 01 85 d2 74 0c 8b 83 60 09 00 00 85 c0 74 02 <0f> 0b c7 43 18 00 80 00 00 b8 00 80 00 00 5b 5d c3 cc cc cc cc 31
<4>[   60.001744] RSP: 0018:ffffc9000335fcf0 EFLAGS: 00010002
<4>[   60.001747] RAX: 0000000000000001 RBX: ffff888111a70040 RCX: 0000000000000000
<4>[   60.001751] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff888111a70040
<4>[   60.001757] RBP: 0000000000000000 R08: 0000000000000001 R09: 00000000fffffffe
<4>[   60.001763] R10: 0000000001e6f6b9 R11: 00000000934a4c67 R12: 0000000000000246
<4>[   60.001769] R13: ffffffff811653e0 R14: 0000000000000000 R15: ffff888111a70040
<4>[   60.001776] FS:  00007fb47043e740(0000) GS:ffff8884a0300000(0000) knlGS:0000000000000000
<4>[   60.001784] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[   60.001789] CR2: 00007f3903c603d8 CR3: 000000010a25a003 CR4: 0000000000770ee0
<4>[   60.001795] PKRU: 55555554
<4>[   60.001798] Call Trace:
<4>[   60.001801]  <TASK>
<4>[   60.001804]  task_call_func+0xc2/0xe0
<4>[   60.001809]  freeze_task+0x84/0xe0
<4>[   60.001815]  try_to_freeze_tasks+0xac/0x260
<4>[   60.001821]  freeze_processes+0x56/0xb0
<4>[   60.001826]  pm_suspend.cold.7+0x1d9/0x31c
<4>[   60.001832]  state_store+0x7b/0xe0
<4>[   60.001837]  kernfs_fop_write_iter+0x121/0x1c0
<4>[   60.001843]  vfs_write+0x34c/0x4e0
<4>[   60.001850]  ksys_write+0x57/0xd0
<4>[   60.001855]  do_syscall_64+0x37/0x90
<4>[   60.001860]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
<4>[   60.001866] RIP: 0033:0x7fb4705521e7
<4>[   60.001870] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
<4>[   60.001884] RSP: 002b:00007ffe3efac3d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4>[   60.001892] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fb4705521e7
<4>[   60.001898] RDX: 0000000000000004 RSI: 0000559af7969590 RDI: 000000000000000b
<4>[   60.001904] RBP: 0000559af7969590 R08: 0000000000000000 R09: 0000000000000004
<4>[   60.001910] R10: 0000559af60922a6 R11: 0000000000000246 R12: 0000000000000004
<4>[   60.001917] R13: 0000559af7967540 R14: 00007fb47062e4a0 R15: 00007fb47062d8a0
<4>[   60.001925]  </TASK>
<4>[   60.001928] irq event stamp: 8712
<4>[   60.001931] hardirqs last  enabled at (8711): [<ffffffff81b73784>] _raw_spin_unlock_irqrestore+0x54/0x70
<4>[   60.001941] hardirqs last disabled at (8712): [<ffffffff81b7352b>] _raw_spin_lock_irqsave+0x4b/0x50
<4>[   60.001950] softirqs last  enabled at (8348): [<ffffffff81e0031e>] __do_softirq+0x31e/0x48a
<4>[   60.001957] softirqs last disabled at (8341): [<ffffffff810c1b08>] irq_exit_rcu+0xb8/0xe0
<4>[   60.001969] ---[ end trace 0000000000000000 ]---
<4>[   60.003326] (elapsed 0.002 seconds) done.
<6>[   60.003332] OOM killer disabled.
<6>[   60.003334] Freezing remaining freezable tasks ... (elapsed 0.006 seconds) done.
<6>[   60.010062] printk: Suspending console(s) (use no_console_suspend to debug)
<6>[   60.041543] e1000e: EEE TX LPI TIMER: 00000011
<6>[   60.368938] ACPI: EC: interrupt blocked

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH v3 6/6] freezer, sched: Rewrite core freezer logic
  2022-10-21 17:22   ` [Intel-gfx] [PATCH v3 6/6] freezer, sched: Rewrite core freezer logic Ville Syrjälä
@ 2022-10-25  4:52     ` Ville Syrjälä
  2022-10-25 10:49       ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Ville Syrjälä @ 2022-10-25  4:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-pm, linux-kernel, bigeasy, rjw, oleg, rostedt, mingo,
	mgorman, intel-gfx, tj, Will Deacon, dietmar.eggemann, ebiederm

On Fri, Oct 21, 2022 at 08:22:41PM +0300, Ville Syrjälä wrote:
> On Mon, Aug 22, 2022 at 01:18:22PM +0200, Peter Zijlstra wrote:
> > +#ifdef CONFIG_LOCKDEP
> > +	/*
> > +	 * It's dangerous to freeze with locks held; there be dragons there.
> > +	 */
> > +	if (!(state & __TASK_FREEZABLE_UNSAFE))
> > +		WARN_ON_ONCE(debug_locks && p->lockdep_depth);
> > +#endif
> 
> We now seem to be hitting this sporadically in the intel gfx CI.
> 
> I've spotted it on two machines so far:
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12270/shard-tglb7/igt@gem_ctx_isolation@preservation-s3@vcs0.html
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109950v1/shard-snb5/igt@kms_flip@flip-vs-suspend-interruptible@a-vga1.html

Sadly no luck in reproducing this locally so far. In the meantime
I added the following patch into our topic/core-for-CI branch in
the hopes of CI stumbling on it again and dumping a bit more data:

--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -125,8 +125,16 @@ static int __set_task_frozen(struct task_struct *p, void *arg)
 	/*
 	 * It's dangerous to freeze with locks held; there be dragons there.
 	 */
-	if (!(state & __TASK_FREEZABLE_UNSAFE))
-		WARN_ON_ONCE(debug_locks && p->lockdep_depth);
+	if (!(state & __TASK_FREEZABLE_UNSAFE)) {
+		static bool warned = false;
+
+		if (!warned && debug_locks && p->lockdep_depth) {
+			debug_show_held_locks(p);
+			WARN(1, "%s/%d holding locks while freezing\n",
+			     p->comm, task_pid_nr(p));
+			warned = true;
+		}
+	}
 #endif
 
 	WRITE_ONCE(p->__state, TASK_FROZEN);
-- 
2.37.4

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH v3 6/6] freezer, sched: Rewrite core freezer logic
  2022-10-25  4:52     ` Ville Syrjälä
@ 2022-10-25 10:49       ` Peter Zijlstra
  2022-10-26 10:32         ` Ville Syrjälä
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2022-10-25 10:49 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: linux-pm, linux-kernel, bigeasy, rjw, oleg, rostedt, mingo,
	mgorman, intel-gfx, tj, Will Deacon, dietmar.eggemann, ebiederm

On Tue, Oct 25, 2022 at 07:52:07AM +0300, Ville Syrjälä wrote:
> On Fri, Oct 21, 2022 at 08:22:41PM +0300, Ville Syrjälä wrote:
> > On Mon, Aug 22, 2022 at 01:18:22PM +0200, Peter Zijlstra wrote:
> > > +#ifdef CONFIG_LOCKDEP
> > > +	/*
> > > +	 * It's dangerous to freeze with locks held; there be dragons there.
> > > +	 */
> > > +	if (!(state & __TASK_FREEZABLE_UNSAFE))
> > > +		WARN_ON_ONCE(debug_locks && p->lockdep_depth);
> > > +#endif
> > 
> > We now seem to be hitting this sporadically in the intel gfx CI.
> > 
> > I've spotted it on two machines so far:
> > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12270/shard-tglb7/igt@gem_ctx_isolation@preservation-s3@vcs0.html
> > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109950v1/shard-snb5/igt@kms_flip@flip-vs-suspend-interruptible@a-vga1.html
> 
> Sadly no luck in reproducing this locally so far. In the meantime
> I added the following patch into our topic/core-for-CI branch in
> the hopes of CI stumbling on it again and dumping a bit more data:
> 
> --- a/kernel/freezer.c
> +++ b/kernel/freezer.c
> @@ -125,8 +125,16 @@ static int __set_task_frozen(struct task_struct *p, void *arg)
>  	/*
>  	 * It's dangerous to freeze with locks held; there be dragons there.
>  	 */
> -	if (!(state & __TASK_FREEZABLE_UNSAFE))
> -		WARN_ON_ONCE(debug_locks && p->lockdep_depth);
> +	if (!(state & __TASK_FREEZABLE_UNSAFE)) {
> +		static bool warned = false;
> +
> +		if (!warned && debug_locks && p->lockdep_depth) {
> +			debug_show_held_locks(p);
> +			WARN(1, "%s/%d holding locks while freezing\n",
> +			     p->comm, task_pid_nr(p));
> +			warned = true;
> +		}
> +	}
>  #endif
>  
>  	WRITE_ONCE(p->__state, TASK_FROZEN);

That seems reasonable. But note that this constraint isn't new; the
previous freezer had much the same constraint but perhaps it wasn't
triggered for mysterious raisins. see the previous
try_to_freeze_unsafe() function.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH v3 6/6] freezer, sched: Rewrite core freezer logic
  2022-10-25 10:49       ` Peter Zijlstra
@ 2022-10-26 10:32         ` Ville Syrjälä
  2022-10-26 11:43           ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Ville Syrjälä @ 2022-10-26 10:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-pm, linux-kernel, bigeasy, rjw, oleg, rostedt, mingo,
	mgorman, intel-gfx, tj, Will Deacon, dietmar.eggemann, ebiederm

On Tue, Oct 25, 2022 at 12:49:13PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 25, 2022 at 07:52:07AM +0300, Ville Syrjälä wrote:
> > On Fri, Oct 21, 2022 at 08:22:41PM +0300, Ville Syrjälä wrote:
> > > On Mon, Aug 22, 2022 at 01:18:22PM +0200, Peter Zijlstra wrote:
> > > > +#ifdef CONFIG_LOCKDEP
> > > > +	/*
> > > > +	 * It's dangerous to freeze with locks held; there be dragons there.
> > > > +	 */
> > > > +	if (!(state & __TASK_FREEZABLE_UNSAFE))
> > > > +		WARN_ON_ONCE(debug_locks && p->lockdep_depth);
> > > > +#endif
> > > 
> > > We now seem to be hitting this sporadically in the intel gfx CI.
> > > 
> > > I've spotted it on two machines so far:
> > > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12270/shard-tglb7/igt@gem_ctx_isolation@preservation-s3@vcs0.html
> > > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109950v1/shard-snb5/igt@kms_flip@flip-vs-suspend-interruptible@a-vga1.html
> > 
> > Sadly no luck in reproducing this locally so far. In the meantime
> > I added the following patch into our topic/core-for-CI branch in
> > the hopes of CI stumbling on it again and dumping a bit more data:
> > 
> > --- a/kernel/freezer.c
> > +++ b/kernel/freezer.c
> > @@ -125,8 +125,16 @@ static int __set_task_frozen(struct task_struct *p, void *arg)
> >  	/*
> >  	 * It's dangerous to freeze with locks held; there be dragons there.
> >  	 */
> > -	if (!(state & __TASK_FREEZABLE_UNSAFE))
> > -		WARN_ON_ONCE(debug_locks && p->lockdep_depth);
> > +	if (!(state & __TASK_FREEZABLE_UNSAFE)) {
> > +		static bool warned = false;
> > +
> > +		if (!warned && debug_locks && p->lockdep_depth) {
> > +			debug_show_held_locks(p);
> > +			WARN(1, "%s/%d holding locks while freezing\n",
> > +			     p->comm, task_pid_nr(p));
> > +			warned = true;
> > +		}
> > +	}
> >  #endif
> >  
> >  	WRITE_ONCE(p->__state, TASK_FROZEN);
> 
> That seems reasonable. But note that this constraint isn't new; the
> previous freezer had much the same constraint but perhaps it wasn't
> triggered for mysterious raisins. see the previous
> try_to_freeze_unsafe() function.

Looks like we caught one with the extra debugs now.

Short form looks to be this:
<4>[  355.437846] 1 lock held by rs:main Q:Reg/359:
<4>[  355.438418]  #0: ffff88844693b758 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x1b/0x30
<4>[  355.438432] rs:main Q:Reg/359 holding locks while freezing

Based on a quick google that process seems to be some rsyslog thing.


Here's the full splat with the console_lock mess included:
<6>[  355.437502] Freezing user space processes ... 
<4>[  355.437846] 1 lock held by rs:main Q:Reg/359:

<4>[  355.437865] ======================================================
<4>[  355.437866] WARNING: possible circular locking dependency detected
<4>[  355.437867] 6.1.0-rc2-CI_DRM_12295-g3844a56a0922+ #1 Tainted: G     U            
<4>[  355.437870] ------------------------------------------------------
<4>[  355.437871] rtcwake/6211 is trying to acquire lock:
<4>[  355.437872] ffffffff82735198 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0xa/0x30
<4>[  355.437883] 
                  but task is already holding lock:
<4>[  355.437885] ffff88810d0908e0 (&p->pi_lock){-.-.}-{2:2}, at: task_call_func+0x34/0xe0
<4>[  355.437893] 
                  which lock already depends on the new lock.

<4>[  355.437894] 
                  the existing dependency chain (in reverse order) is:
<4>[  355.437895] 
                  -> #1 (&p->pi_lock){-.-.}-{2:2}:
<4>[  355.437899]        lock_acquire+0xd3/0x310
<4>[  355.437903]        _raw_spin_lock_irqsave+0x33/0x50
<4>[  355.437907]        try_to_wake_up+0x6b/0x610
<4>[  355.437911]        up+0x3b/0x50
<4>[  355.437914]        __up_console_sem+0x5c/0x70
<4>[  355.437917]        console_unlock+0x1bc/0x1d0
<4>[  355.437920]        con_font_op+0x2e2/0x3a0
<4>[  355.437925]        vt_ioctl+0x4f5/0x13b0
<4>[  355.437930]        tty_ioctl+0x233/0x8e0
<4>[  355.437934]        __x64_sys_ioctl+0x71/0xb0
<4>[  355.437938]        do_syscall_64+0x3a/0x90
<4>[  355.437943]        entry_SYSCALL_64_after_hwframe+0x63/0xcd
<4>[  355.437948] 
                  -> #0 ((console_sem).lock){-.-.}-{2:2}:
<4>[  355.437952]        validate_chain+0xb3d/0x2000
<4>[  355.437955]        __lock_acquire+0x5a4/0xb70
<4>[  355.437958]        lock_acquire+0xd3/0x310
<4>[  355.437960]        _raw_spin_lock_irqsave+0x33/0x50
<4>[  355.437965]        down_trylock+0xa/0x30
<4>[  355.437968]        __down_trylock_console_sem+0x25/0xb0
<4>[  355.437971]        console_trylock+0xe/0x70
<4>[  355.437974]        vprintk_emit+0x13c/0x380
<4>[  355.437977]        _printk+0x53/0x6e
<4>[  355.437981]        lockdep_print_held_locks+0x5c/0xab
<4>[  355.437985]        __set_task_frozen+0x6d/0xb0
<4>[  355.437989]        task_call_func+0xc4/0xe0
<4>[  355.437993]        freeze_task+0x84/0xe0
<4>[  355.437997]        try_to_freeze_tasks+0xac/0x260
<4>[  355.438001]        freeze_processes+0x56/0xb0
<4>[  355.438005]        pm_suspend.cold.7+0x1d9/0x31c
<4>[  355.438008]        state_store+0x7b/0xe0
<4>[  355.438012]        kernfs_fop_write_iter+0x124/0x1c0
<4>[  355.438016]        vfs_write+0x34f/0x4e0
<4>[  355.438021]        ksys_write+0x57/0xd0
<4>[  355.438025]        do_syscall_64+0x3a/0x90
<4>[  355.438029]        entry_SYSCALL_64_after_hwframe+0x63/0xcd
<4>[  355.438034] 
                  other info that might help us debug this:

<4>[  355.438035]  Possible unsafe locking scenario:

<4>[  355.438036]        CPU0                    CPU1
<4>[  355.438037]        ----                    ----
<4>[  355.438037]   lock(&p->pi_lock);
<4>[  355.438040]                                lock((console_sem).lock);
<4>[  355.438042]                                lock(&p->pi_lock);
<4>[  355.438044]   lock((console_sem).lock);
<4>[  355.438046] 
                   *** DEADLOCK ***

<4>[  355.438047] 7 locks held by rtcwake/6211:
<4>[  355.438049]  #0: ffff888104d11430 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x57/0xd0
<4>[  355.438058]  #1: ffff88810e6bac88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xee/0x1c0
<4>[  355.438066]  #2: ffff8881001c0538 (kn->active#167){.+.+}-{0:0}, at: kernfs_fop_write_iter+0xf7/0x1c0
<4>[  355.438074]  #3: ffffffff8264db08 (system_transition_mutex){+.+.}-{3:3}, at: pm_suspend.cold.7+0xfa/0x31c
<4>[  355.438082]  #4: ffffffff82606098 (tasklist_lock){.+.+}-{2:2}, at: try_to_freeze_tasks+0x63/0x260
<4>[  355.438090]  #5: ffffffff8273aed8 (freezer_lock){....}-{2:2}, at: freeze_task+0x27/0xe0
<4>[  355.438098]  #6: ffff88810d0908e0 (&p->pi_lock){-.-.}-{2:2}, at: task_call_func+0x34/0xe0
<4>[  355.438105] 
                  stack backtrace:
<4>[  355.438107] CPU: 0 PID: 6211 Comm: rtcwake Tainted: G     U             6.1.0-rc2-CI_DRM_12295-g3844a56a0922+ #1
<4>[  355.438110] Hardware name:  /NUC5i7RYB, BIOS RYBDWi35.86A.0385.2020.0519.1558 05/19/2020
<4>[  355.438112] Call Trace:
<4>[  355.438114]  <TASK>
<4>[  355.438116]  dump_stack_lvl+0x56/0x7f
<4>[  355.438121]  check_noncircular+0x132/0x150
<4>[  355.438125]  ? validate_chain+0x247/0x2000
<4>[  355.438131]  validate_chain+0xb3d/0x2000
<4>[  355.438138]  __lock_acquire+0x5a4/0xb70
<4>[  355.438144]  lock_acquire+0xd3/0x310
<4>[  355.438147]  ? down_trylock+0xa/0x30
<4>[  355.438154]  ? vprintk_emit+0x13c/0x380
<4>[  355.438158]  _raw_spin_lock_irqsave+0x33/0x50
<4>[  355.438163]  ? down_trylock+0xa/0x30
<4>[  355.438167]  down_trylock+0xa/0x30
<4>[  355.438171]  __down_trylock_console_sem+0x25/0xb0
<4>[  355.438175]  console_trylock+0xe/0x70
<4>[  355.438178]  vprintk_emit+0x13c/0x380
<4>[  355.438183]  ? __set_task_special+0x40/0x40
<4>[  355.438187]  _printk+0x53/0x6e
<4>[  355.438195]  lockdep_print_held_locks+0x5c/0xab
<4>[  355.438199]  ? __set_task_special+0x40/0x40
<4>[  355.438203]  __set_task_frozen+0x6d/0xb0
<4>[  355.438208]  task_call_func+0xc4/0xe0
<4>[  355.438214]  freeze_task+0x84/0xe0
<4>[  355.438219]  try_to_freeze_tasks+0xac/0x260
<4>[  355.438226]  freeze_processes+0x56/0xb0
<4>[  355.438230]  pm_suspend.cold.7+0x1d9/0x31c
<4>[  355.438235]  state_store+0x7b/0xe0
<4>[  355.438241]  kernfs_fop_write_iter+0x124/0x1c0
<4>[  355.438247]  vfs_write+0x34f/0x4e0
<4>[  355.438255]  ksys_write+0x57/0xd0
<4>[  355.438261]  do_syscall_64+0x3a/0x90
<4>[  355.438266]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
<4>[  355.438271] RIP: 0033:0x7fcfa44d80a7
<4>[  355.438275] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
<4>[  355.438278] RSP: 002b:00007ffd72160e28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4>[  355.438281] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007fcfa44d80a7
<4>[  355.438284] RDX: 0000000000000007 RSI: 000055dd45bf4590 RDI: 000000000000000b
<4>[  355.438286] RBP: 000055dd45bf4590 R08: 0000000000000000 R09: 0000000000000007
<4>[  355.438288] R10: 000055dd441d22a6 R11: 0000000000000246 R12: 0000000000000007
<4>[  355.438290] R13: 000055dd45bf2540 R14: 00007fcfa45b34a0 R15: 00007fcfa45b28a0
<4>[  355.438298]  </TASK>
<4>[  355.438418]  #0: ffff88844693b758 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x1b/0x30
<4>[  355.438429] ------------[ cut here ]------------
<4>[  355.438432] rs:main Q:Reg/359 holding locks while freezing
<4>[  355.438439] WARNING: CPU: 0 PID: 6211 at kernel/freezer.c:134 __set_task_frozen+0x86/0xb0
<4>[  355.438447] Modules linked in: snd_hda_intel i915 mei_hdcp mei_pxp drm_display_helper drm_kms_helper vgem drm_shmem_helper snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm prime_numbers ttm drm_buddy syscopyarea sysfillrect sysimgblt fb_sys_fops fuse x86_pkg_temp_thermal coretemp kvm_intel btusb btrtl btbcm btintel kvm irqbypass bluetooth crct10dif_pclmul crc32_pclmul ecdh_generic ghash_clmulni_intel ecc e1000e mei_me i2c_i801 ptp mei i2c_smbus pps_core lpc_ich video wmi [last unloaded: drm_kms_helper]
<4>[  355.438521] CPU: 0 PID: 6211 Comm: rtcwake Tainted: G     U             6.1.0-rc2-CI_DRM_12295-g3844a56a0922+ #1
<4>[  355.438526] Hardware name:  /NUC5i7RYB, BIOS RYBDWi35.86A.0385.2020.0519.1558 05/19/2020
<4>[  355.438530] RIP: 0010:__set_task_frozen+0x86/0xb0
<4>[  355.438536] Code: 83 60 09 00 00 85 c0 74 2a 48 89 df e8 ac 02 9b 00 8b 93 38 05 00 00 48 8d b3 48 07 00 00 48 c7 c7 a0 62 2b 82 e8 ee c1 9a 00 <0f> 0b c6 05 51 75 e3 02 01 c7 43 18 00 80 00 00 b8 00 80 00 00 5b
<4>[  355.438541] RSP: 0018:ffffc900012cbcf0 EFLAGS: 00010086
<4>[  355.438546] RAX: 0000000000000000 RBX: ffff88810d090040 RCX: 0000000000000004
<4>[  355.438550] RDX: 0000000000000004 RSI: 00000000fffff5de RDI: 00000000ffffffff
<4>[  355.438553] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000fffff5de
<4>[  355.438557] R10: 00000000002335f8 R11: ffffc900012cbb88 R12: 0000000000000246
<4>[  355.438561] R13: ffffffff81165430 R14: 0000000000000000 R15: ffff88810d090040
<4>[  355.438565] FS:  00007fcfa43c7740(0000) GS:ffff888446800000(0000) knlGS:0000000000000000
<4>[  355.438569] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  355.438582] CR2: 00007fceb380f6b8 CR3: 0000000117c5c004 CR4: 00000000003706f0
<4>[  355.438586] Call Trace:
<4>[  355.438589]  <TASK>
<4>[  355.438592]  task_call_func+0xc4/0xe0
<4>[  355.438600]  freeze_task+0x84/0xe0
<4>[  355.438607]  try_to_freeze_tasks+0xac/0x260
<4>[  355.438616]  freeze_processes+0x56/0xb0
<4>[  355.438622]  pm_suspend.cold.7+0x1d9/0x31c
<4>[  355.438629]  state_store+0x7b/0xe0
<4>[  355.438637]  kernfs_fop_write_iter+0x124/0x1c0
<4>[  355.438644]  vfs_write+0x34f/0x4e0
<4>[  355.438655]  ksys_write+0x57/0xd0
<4>[  355.438663]  do_syscall_64+0x3a/0x90
<4>[  355.438670]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
<4>[  355.438676] RIP: 0033:0x7fcfa44d80a7
<4>[  355.438681] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
<4>[  355.438685] RSP: 002b:00007ffd72160e28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4>[  355.438690] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007fcfa44d80a7
<4>[  355.438695] RDX: 0000000000000007 RSI: 000055dd45bf4590 RDI: 000000000000000b
<4>[  355.438698] RBP: 000055dd45bf4590 R08: 0000000000000000 R09: 0000000000000007
<4>[  355.438702] R10: 000055dd441d22a6 R11: 0000000000000246 R12: 0000000000000007
<4>[  355.438706] R13: 000055dd45bf2540 R14: 00007fcfa45b34a0 R15: 00007fcfa45b28a0
<4>[  355.438716]  </TASK>
<4>[  355.438718] irq event stamp: 7462
<4>[  355.438721] hardirqs last  enabled at (7461): [<ffffffff81b73764>] _raw_spin_unlock_irqrestore+0x54/0x70
<4>[  355.438729] hardirqs last disabled at (7462): [<ffffffff81b7350b>] _raw_spin_lock_irqsave+0x4b/0x50
<4>[  355.438736] softirqs last  enabled at (7322): [<ffffffff81e0031e>] __do_softirq+0x31e/0x48a
<4>[  355.438742] softirqs last disabled at (7313): [<ffffffff810c1b58>] irq_exit_rcu+0xb8/0xe0
<4>[  355.438749] ---[ end trace 0000000000000000 ]---
<4>[  355.440204] (elapsed 0.002 seconds) done.
<6>[  355.440210] OOM killer disabled.
<6>[  355.440212] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH v3 6/6] freezer, sched: Rewrite core freezer logic
  2022-10-26 10:32         ` Ville Syrjälä
@ 2022-10-26 11:43           ` Peter Zijlstra
  2022-10-26 12:12             ` Peter Zijlstra
                               ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Peter Zijlstra @ 2022-10-26 11:43 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: linux-pm, linux-kernel, bigeasy, rjw, oleg, rostedt, mingo,
	mgorman, intel-gfx, tj, Will Deacon, dietmar.eggemann, ebiederm

On Wed, Oct 26, 2022 at 01:32:31PM +0300, Ville Syrjälä wrote:
> Short form looks to be this:
> <4>[  355.437846] 1 lock held by rs:main Q:Reg/359:
> <4>[  355.438418]  #0: ffff88844693b758 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x1b/0x30
> <4>[  355.438432] rs:main Q:Reg/359 holding locks while freezing

> <4>[  355.438429] ------------[ cut here ]------------
> <4>[  355.438432] rs:main Q:Reg/359 holding locks while freezing
> <4>[  355.438439] WARNING: CPU: 0 PID: 6211 at kernel/freezer.c:134 __set_task_frozen+0x86/0xb0
> <4>[  355.438447] Modules linked in: snd_hda_intel i915 mei_hdcp mei_pxp drm_display_helper drm_kms_helper vgem drm_shmem_helper snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm prime_numbers ttm drm_buddy syscopyarea sysfillrect sysimgblt fb_sys_fops fuse x86_pkg_temp_thermal coretemp kvm_intel btusb btrtl btbcm btintel kvm irqbypass bluetooth crct10dif_pclmul crc32_pclmul ecdh_generic ghash_clmulni_intel ecc e1000e mei_me i2c_i801 ptp mei i2c_smbus pps_core lpc_ich video wmi [last unloaded: drm_kms_helper]
> <4>[  355.438521] CPU: 0 PID: 6211 Comm: rtcwake Tainted: G     U             6.1.0-rc2-CI_DRM_12295-g3844a56a0922+ #1
> <4>[  355.438526] Hardware name:  /NUC5i7RYB, BIOS RYBDWi35.86A.0385.2020.0519.1558 05/19/2020
> <4>[  355.438530] RIP: 0010:__set_task_frozen+0x86/0xb0
> <4>[  355.438536] Code: 83 60 09 00 00 85 c0 74 2a 48 89 df e8 ac 02 9b 00 8b 93 38 05 00 00 48 8d b3 48 07 00 00 48 c7 c7 a0 62 2b 82 e8 ee c1 9a 00 <0f> 0b c6 05 51 75 e3 02 01 c7 43 18 00 80 00 00 b8 00 80 00 00 5b
> <4>[  355.438541] RSP: 0018:ffffc900012cbcf0 EFLAGS: 00010086
> <4>[  355.438546] RAX: 0000000000000000 RBX: ffff88810d090040 RCX: 0000000000000004
> <4>[  355.438550] RDX: 0000000000000004 RSI: 00000000fffff5de RDI: 00000000ffffffff
> <4>[  355.438553] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000fffff5de
> <4>[  355.438557] R10: 00000000002335f8 R11: ffffc900012cbb88 R12: 0000000000000246
> <4>[  355.438561] R13: ffffffff81165430 R14: 0000000000000000 R15: ffff88810d090040
> <4>[  355.438565] FS:  00007fcfa43c7740(0000) GS:ffff888446800000(0000) knlGS:0000000000000000
> <4>[  355.438569] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[  355.438582] CR2: 00007fceb380f6b8 CR3: 0000000117c5c004 CR4: 00000000003706f0
> <4>[  355.438586] Call Trace:
> <4>[  355.438589]  <TASK>
> <4>[  355.438592]  task_call_func+0xc4/0xe0
> <4>[  355.438600]  freeze_task+0x84/0xe0
> <4>[  355.438607]  try_to_freeze_tasks+0xac/0x260
> <4>[  355.438616]  freeze_processes+0x56/0xb0
> <4>[  355.438622]  pm_suspend.cold.7+0x1d9/0x31c
> <4>[  355.438629]  state_store+0x7b/0xe0
> <4>[  355.438637]  kernfs_fop_write_iter+0x124/0x1c0
> <4>[  355.438644]  vfs_write+0x34f/0x4e0
> <4>[  355.438655]  ksys_write+0x57/0xd0
> <4>[  355.438663]  do_syscall_64+0x3a/0x90
> <4>[  355.438670]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Oh I think I see what's going on.

It's a very narrow race between schedule() and task_call_func().

  CPU0						CPU1

  __schedule()
    rq_lock();
    prev_state = READ_ONCE(prev->__state);
    if (... && prev_state) {
      deactivate_tasl(rq, prev, ...)
        prev->on_rq = 0;

						task_call_func()
						  raw_spin_lock_irqsave(p->pi_lock);
						  state = READ_ONCE(p->__state);
						  smp_rmb();
						  if (... || p->on_rq) // false!!!
						    rq = __task_rq_lock()

						  ret = func();

    next = pick_next_task();
    rq = context_switch(prev, next)
      prepare_lock_switch()
        spin_release(&__rq_lockp(rq)->dep_map...)



So while the task is on it's way out, it still holds rq->lock for a
little while, and right then task_call_func() comes in and figures it
doesn't need rq->lock anymore (because the task is already dequeued --
but still running there) and then the __set_task_frozen() thing observes
it's holding rq->lock and yells murder.

Could you please give the below a spin?

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cb2aa2b54c7a..f519f44cd4c7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4200,6 +4200,37 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	return success;
 }
 
+static bool __task_needs_rq_lock(struct task_struct *p)
+{
+	unsigned int state = READ_ONCE(p->__state);
+
+	/*
+	 * Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
+	 * the task is blocked. Make sure to check @state since ttwu() can drop
+	 * locks at the end, see ttwu_queue_wakelist().
+	 */
+	if (state == TASK_RUNNING || state == TASK_WAKING)
+		return true;
+
+	/*
+	 * Ensure we load p->on_rq after p->__state, otherwise it would be
+	 * possible to, falsely, observe p->on_rq == 0.
+	 *
+	 * See try_to_wake_up() for a longer comment.
+	 */
+	smp_rmb();
+	if (p->on_rq)
+		return true;
+
+#ifdef CONFIG_SMP
+	smp_rmb();
+	if (p->on_cpu)
+		return true;
+#endif
+
+	return false;
+}
+
 /**
  * task_call_func - Invoke a function on task in fixed state
  * @p: Process for which the function is to be invoked, can be @current.
@@ -4217,28 +4248,12 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 int task_call_func(struct task_struct *p, task_call_f func, void *arg)
 {
 	struct rq *rq = NULL;
-	unsigned int state;
 	struct rq_flags rf;
 	int ret;
 
 	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
 
-	state = READ_ONCE(p->__state);
-
-	/*
-	 * Ensure we load p->on_rq after p->__state, otherwise it would be
-	 * possible to, falsely, observe p->on_rq == 0.
-	 *
-	 * See try_to_wake_up() for a longer comment.
-	 */
-	smp_rmb();
-
-	/*
-	 * Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
-	 * the task is blocked. Make sure to check @state since ttwu() can drop
-	 * locks at the end, see ttwu_queue_wakelist().
-	 */
-	if (state == TASK_RUNNING || state == TASK_WAKING || p->on_rq)
+	if (__task_needs_rq_lock(p))
 		rq = __task_rq_lock(p, &rf);
 
 	/*

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH v3 6/6] freezer, sched: Rewrite core freezer logic
  2022-10-26 11:43           ` Peter Zijlstra
@ 2022-10-26 12:12             ` Peter Zijlstra
  2022-10-26 12:14               ` Peter Zijlstra
  2022-10-27  5:58             ` Chen Yu
  2022-10-27 13:09             ` Ville Syrjälä
  2 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2022-10-26 12:12 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: linux-pm, linux-kernel, bigeasy, rjw, oleg, rostedt, mingo,
	mgorman, intel-gfx, tj, Will Deacon, dietmar.eggemann, ebiederm

On Wed, Oct 26, 2022 at 01:43:00PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 26, 2022 at 01:32:31PM +0300, Ville Syrjälä wrote:
> > Short form looks to be this:
> > <4>[  355.437846] 1 lock held by rs:main Q:Reg/359:
> > <4>[  355.438418]  #0: ffff88844693b758 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x1b/0x30
> > <4>[  355.438432] rs:main Q:Reg/359 holding locks while freezing
> 
> > <4>[  355.438429] ------------[ cut here ]------------
> > <4>[  355.438432] rs:main Q:Reg/359 holding locks while freezing
> > <4>[  355.438439] WARNING: CPU: 0 PID: 6211 at kernel/freezer.c:134 __set_task_frozen+0x86/0xb0
> > <4>[  355.438447] Modules linked in: snd_hda_intel i915 mei_hdcp mei_pxp drm_display_helper drm_kms_helper vgem drm_shmem_helper snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm prime_numbers ttm drm_buddy syscopyarea sysfillrect sysimgblt fb_sys_fops fuse x86_pkg_temp_thermal coretemp kvm_intel btusb btrtl btbcm btintel kvm irqbypass bluetooth crct10dif_pclmul crc32_pclmul ecdh_generic ghash_clmulni_intel ecc e1000e mei_me i2c_i801 ptp mei i2c_smbus pps_core lpc_ich video wmi [last unloaded: drm_kms_helper]
> > <4>[  355.438521] CPU: 0 PID: 6211 Comm: rtcwake Tainted: G     U             6.1.0-rc2-CI_DRM_12295-g3844a56a0922+ #1
> > <4>[  355.438526] Hardware name:  /NUC5i7RYB, BIOS RYBDWi35.86A.0385.2020.0519.1558 05/19/2020
> > <4>[  355.438530] RIP: 0010:__set_task_frozen+0x86/0xb0
> > <4>[  355.438536] Code: 83 60 09 00 00 85 c0 74 2a 48 89 df e8 ac 02 9b 00 8b 93 38 05 00 00 48 8d b3 48 07 00 00 48 c7 c7 a0 62 2b 82 e8 ee c1 9a 00 <0f> 0b c6 05 51 75 e3 02 01 c7 43 18 00 80 00 00 b8 00 80 00 00 5b
> > <4>[  355.438541] RSP: 0018:ffffc900012cbcf0 EFLAGS: 00010086
> > <4>[  355.438546] RAX: 0000000000000000 RBX: ffff88810d090040 RCX: 0000000000000004
> > <4>[  355.438550] RDX: 0000000000000004 RSI: 00000000fffff5de RDI: 00000000ffffffff
> > <4>[  355.438553] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000fffff5de
> > <4>[  355.438557] R10: 00000000002335f8 R11: ffffc900012cbb88 R12: 0000000000000246
> > <4>[  355.438561] R13: ffffffff81165430 R14: 0000000000000000 R15: ffff88810d090040
> > <4>[  355.438565] FS:  00007fcfa43c7740(0000) GS:ffff888446800000(0000) knlGS:0000000000000000
> > <4>[  355.438569] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > <4>[  355.438582] CR2: 00007fceb380f6b8 CR3: 0000000117c5c004 CR4: 00000000003706f0
> > <4>[  355.438586] Call Trace:
> > <4>[  355.438589]  <TASK>
> > <4>[  355.438592]  task_call_func+0xc4/0xe0
> > <4>[  355.438600]  freeze_task+0x84/0xe0
> > <4>[  355.438607]  try_to_freeze_tasks+0xac/0x260
> > <4>[  355.438616]  freeze_processes+0x56/0xb0
> > <4>[  355.438622]  pm_suspend.cold.7+0x1d9/0x31c
> > <4>[  355.438629]  state_store+0x7b/0xe0
> > <4>[  355.438637]  kernfs_fop_write_iter+0x124/0x1c0
> > <4>[  355.438644]  vfs_write+0x34f/0x4e0
> > <4>[  355.438655]  ksys_write+0x57/0xd0
> > <4>[  355.438663]  do_syscall_64+0x3a/0x90
> > <4>[  355.438670]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
> Oh I think I see what's going on.
> 
> It's a very narrow race between schedule() and task_call_func().
> 
>   CPU0						CPU1
> 
>   __schedule()
>     rq_lock();
>     prev_state = READ_ONCE(prev->__state);
>     if (... && prev_state) {
>       deactivate_tasl(rq, prev, ...)
>         prev->on_rq = 0;
> 
> 						task_call_func()
> 						  raw_spin_lock_irqsave(p->pi_lock);
> 						  state = READ_ONCE(p->__state);
> 						  smp_rmb();
> 						  if (... || p->on_rq) // false!!!
> 						    rq = __task_rq_lock()
> 
> 						  ret = func();
> 
>     next = pick_next_task();
>     rq = context_switch(prev, next)
>       prepare_lock_switch()
>         spin_release(&__rq_lockp(rq)->dep_map...)
> 
> 
> 
> So while the task is on it's way out, it still holds rq->lock for a
> little while, and right then task_call_func() comes in and figures it
> doesn't need rq->lock anymore (because the task is already dequeued --
> but still running there) and then the __set_task_frozen() thing observes
> it's holding rq->lock and yells murder.
> 
> Could you please give the below a spin?

Urgh.. that'll narrow the race more, but won't solve it, that
prepare_lock_switch() is after we clear ->on_cpu.

Let me ponder this a wee bit more..

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH v3 6/6] freezer, sched: Rewrite core freezer logic
  2022-10-26 12:12             ` Peter Zijlstra
@ 2022-10-26 12:14               ` Peter Zijlstra
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2022-10-26 12:14 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: linux-pm, linux-kernel, bigeasy, rjw, oleg, rostedt, mingo,
	mgorman, intel-gfx, tj, Will Deacon, dietmar.eggemann, ebiederm

On Wed, Oct 26, 2022 at 02:12:02PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 26, 2022 at 01:43:00PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 26, 2022 at 01:32:31PM +0300, Ville Syrjälä wrote:
> > > Short form looks to be this:
> > > <4>[  355.437846] 1 lock held by rs:main Q:Reg/359:
> > > <4>[  355.438418]  #0: ffff88844693b758 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x1b/0x30
> > > <4>[  355.438432] rs:main Q:Reg/359 holding locks while freezing
> > 
> > > <4>[  355.438429] ------------[ cut here ]------------
> > > <4>[  355.438432] rs:main Q:Reg/359 holding locks while freezing
> > > <4>[  355.438439] WARNING: CPU: 0 PID: 6211 at kernel/freezer.c:134 __set_task_frozen+0x86/0xb0
> > > <4>[  355.438447] Modules linked in: snd_hda_intel i915 mei_hdcp mei_pxp drm_display_helper drm_kms_helper vgem drm_shmem_helper snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm prime_numbers ttm drm_buddy syscopyarea sysfillrect sysimgblt fb_sys_fops fuse x86_pkg_temp_thermal coretemp kvm_intel btusb btrtl btbcm btintel kvm irqbypass bluetooth crct10dif_pclmul crc32_pclmul ecdh_generic ghash_clmulni_intel ecc e1000e mei_me i2c_i801 ptp mei i2c_smbus pps_core lpc_ich video wmi [last unloaded: drm_kms_helper]
> > > <4>[  355.438521] CPU: 0 PID: 6211 Comm: rtcwake Tainted: G     U             6.1.0-rc2-CI_DRM_12295-g3844a56a0922+ #1
> > > <4>[  355.438526] Hardware name:  /NUC5i7RYB, BIOS RYBDWi35.86A.0385.2020.0519.1558 05/19/2020
> > > <4>[  355.438530] RIP: 0010:__set_task_frozen+0x86/0xb0
> > > <4>[  355.438536] Code: 83 60 09 00 00 85 c0 74 2a 48 89 df e8 ac 02 9b 00 8b 93 38 05 00 00 48 8d b3 48 07 00 00 48 c7 c7 a0 62 2b 82 e8 ee c1 9a 00 <0f> 0b c6 05 51 75 e3 02 01 c7 43 18 00 80 00 00 b8 00 80 00 00 5b
> > > <4>[  355.438541] RSP: 0018:ffffc900012cbcf0 EFLAGS: 00010086
> > > <4>[  355.438546] RAX: 0000000000000000 RBX: ffff88810d090040 RCX: 0000000000000004
> > > <4>[  355.438550] RDX: 0000000000000004 RSI: 00000000fffff5de RDI: 00000000ffffffff
> > > <4>[  355.438553] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000fffff5de
> > > <4>[  355.438557] R10: 00000000002335f8 R11: ffffc900012cbb88 R12: 0000000000000246
> > > <4>[  355.438561] R13: ffffffff81165430 R14: 0000000000000000 R15: ffff88810d090040
> > > <4>[  355.438565] FS:  00007fcfa43c7740(0000) GS:ffff888446800000(0000) knlGS:0000000000000000
> > > <4>[  355.438569] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > <4>[  355.438582] CR2: 00007fceb380f6b8 CR3: 0000000117c5c004 CR4: 00000000003706f0
> > > <4>[  355.438586] Call Trace:
> > > <4>[  355.438589]  <TASK>
> > > <4>[  355.438592]  task_call_func+0xc4/0xe0
> > > <4>[  355.438600]  freeze_task+0x84/0xe0
> > > <4>[  355.438607]  try_to_freeze_tasks+0xac/0x260
> > > <4>[  355.438616]  freeze_processes+0x56/0xb0
> > > <4>[  355.438622]  pm_suspend.cold.7+0x1d9/0x31c
> > > <4>[  355.438629]  state_store+0x7b/0xe0
> > > <4>[  355.438637]  kernfs_fop_write_iter+0x124/0x1c0
> > > <4>[  355.438644]  vfs_write+0x34f/0x4e0
> > > <4>[  355.438655]  ksys_write+0x57/0xd0
> > > <4>[  355.438663]  do_syscall_64+0x3a/0x90
> > > <4>[  355.438670]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > 
> > Oh I think I see what's going on.
> > 
> > It's a very narrow race between schedule() and task_call_func().
> > 
> >   CPU0						CPU1
> > 
> >   __schedule()
> >     rq_lock();
> >     prev_state = READ_ONCE(prev->__state);
> >     if (... && prev_state) {
> >       deactivate_tasl(rq, prev, ...)
> >         prev->on_rq = 0;
> > 
> > 						task_call_func()
> > 						  raw_spin_lock_irqsave(p->pi_lock);
> > 						  state = READ_ONCE(p->__state);
> > 						  smp_rmb();
> > 						  if (... || p->on_rq) // false!!!
> > 						    rq = __task_rq_lock()
> > 
> > 						  ret = func();
> > 
> >     next = pick_next_task();
> >     rq = context_switch(prev, next)
> >       prepare_lock_switch()
> >         spin_release(&__rq_lockp(rq)->dep_map...)
> > 
> > 
> > 
> > So while the task is on it's way out, it still holds rq->lock for a
> > little while, and right then task_call_func() comes in and figures it
> > doesn't need rq->lock anymore (because the task is already dequeued --
> > but still running there) and then the __set_task_frozen() thing observes
> > it's holding rq->lock and yells murder.
> > 
> > Could you please give the below a spin?
> 
> Urgh.. that'll narrow the race more, but won't solve it, that
> prepare_lock_switch() is after we clear ->on_cpu.
> 
> Let me ponder this a wee bit more..

Oh, n/m, I got myself confused, it's prepare_lock_switch() that releases
the lock (from lockdep's pov) and that *IS* before finish_task() which
clears ->on_cpu.

So all well, please test the patch.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH v3 6/6] freezer, sched: Rewrite core freezer logic
  2022-10-26 11:43           ` Peter Zijlstra
  2022-10-26 12:12             ` Peter Zijlstra
@ 2022-10-27  5:58             ` Chen Yu
  2022-10-27  7:39               ` Peter Zijlstra
  2022-10-27 13:09             ` Ville Syrjälä
  2 siblings, 1 reply; 15+ messages in thread
From: Chen Yu @ 2022-10-27  5:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-pm, linux-kernel, bigeasy, rjw, oleg, rostedt, mingo,
	mgorman, intel-gfx, tj, Will Deacon, dietmar.eggemann, ebiederm

On 2022-10-26 at 13:43:00 +0200, Peter Zijlstra wrote:
> On Wed, Oct 26, 2022 at 01:32:31PM +0300, Ville Syrjälä wrote:
> > Short form looks to be this:
> > <4>[  355.437846] 1 lock held by rs:main Q:Reg/359:
> > <4>[  355.438418]  #0: ffff88844693b758 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x1b/0x30
> > <4>[  355.438432] rs:main Q:Reg/359 holding locks while freezing
> 
> > <4>[  355.438429] ------------[ cut here ]------------
> > <4>[  355.438432] rs:main Q:Reg/359 holding locks while freezing
> > <4>[  355.438439] WARNING: CPU: 0 PID: 6211 at kernel/freezer.c:134 __set_task_frozen+0x86/0xb0
> > <4>[  355.438447] Modules linked in: snd_hda_intel i915 mei_hdcp mei_pxp drm_display_helper drm_kms_helper vgem drm_shmem_helper snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm prime_numbers ttm drm_buddy syscopyarea sysfillrect sysimgblt fb_sys_fops fuse x86_pkg_temp_thermal coretemp kvm_intel btusb btrtl btbcm btintel kvm irqbypass bluetooth crct10dif_pclmul crc32_pclmul ecdh_generic ghash_clmulni_intel ecc e1000e mei_me i2c_i801 ptp mei i2c_smbus pps_core lpc_ich video wmi [last unloaded: drm_kms_helper]
> > <4>[  355.438521] CPU: 0 PID: 6211 Comm: rtcwake Tainted: G     U             6.1.0-rc2-CI_DRM_12295-g3844a56a0922+ #1
> > <4>[  355.438526] Hardware name:  /NUC5i7RYB, BIOS RYBDWi35.86A.0385.2020.0519.1558 05/19/2020
> > <4>[  355.438530] RIP: 0010:__set_task_frozen+0x86/0xb0
> > <4>[  355.438536] Code: 83 60 09 00 00 85 c0 74 2a 48 89 df e8 ac 02 9b 00 8b 93 38 05 00 00 48 8d b3 48 07 00 00 48 c7 c7 a0 62 2b 82 e8 ee c1 9a 00 <0f> 0b c6 05 51 75 e3 02 01 c7 43 18 00 80 00 00 b8 00 80 00 00 5b
> > <4>[  355.438541] RSP: 0018:ffffc900012cbcf0 EFLAGS: 00010086
> > <4>[  355.438546] RAX: 0000000000000000 RBX: ffff88810d090040 RCX: 0000000000000004
> > <4>[  355.438550] RDX: 0000000000000004 RSI: 00000000fffff5de RDI: 00000000ffffffff
> > <4>[  355.438553] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000fffff5de
> > <4>[  355.438557] R10: 00000000002335f8 R11: ffffc900012cbb88 R12: 0000000000000246
> > <4>[  355.438561] R13: ffffffff81165430 R14: 0000000000000000 R15: ffff88810d090040
> > <4>[  355.438565] FS:  00007fcfa43c7740(0000) GS:ffff888446800000(0000) knlGS:0000000000000000
> > <4>[  355.438569] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > <4>[  355.438582] CR2: 00007fceb380f6b8 CR3: 0000000117c5c004 CR4: 00000000003706f0
> > <4>[  355.438586] Call Trace:
> > <4>[  355.438589]  <TASK>
> > <4>[  355.438592]  task_call_func+0xc4/0xe0
> > <4>[  355.438600]  freeze_task+0x84/0xe0
> > <4>[  355.438607]  try_to_freeze_tasks+0xac/0x260
> > <4>[  355.438616]  freeze_processes+0x56/0xb0
> > <4>[  355.438622]  pm_suspend.cold.7+0x1d9/0x31c
> > <4>[  355.438629]  state_store+0x7b/0xe0
> > <4>[  355.438637]  kernfs_fop_write_iter+0x124/0x1c0
> > <4>[  355.438644]  vfs_write+0x34f/0x4e0
> > <4>[  355.438655]  ksys_write+0x57/0xd0
> > <4>[  355.438663]  do_syscall_64+0x3a/0x90
> > <4>[  355.438670]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
> Oh I think I see what's going on.
> 
> It's a very narrow race between schedule() and task_call_func().
> 
>   CPU0						CPU1
> 
>   __schedule()
>     rq_lock();
>     prev_state = READ_ONCE(prev->__state);
>     if (... && prev_state) {
>       deactivate_tasl(rq, prev, ...)
>         prev->on_rq = 0;
> 
> 						task_call_func()
> 						  raw_spin_lock_irqsave(p->pi_lock);
> 						  state = READ_ONCE(p->__state);
> 						  smp_rmb();
> 						  if (... || p->on_rq) // false!!!
> 						    rq = __task_rq_lock()
> 
> 						  ret = func();
> 
>     next = pick_next_task();
>     rq = context_switch(prev, next)
>       prepare_lock_switch()
>         spin_release(&__rq_lockp(rq)->dep_map...)
> 
> 
> 
> So while the task is on it's way out, it still holds rq->lock for a
> little while, and right then task_call_func() comes in and figures it
> doesn't need rq->lock anymore (because the task is already dequeued --
> but still running there) and then the __set_task_frozen() thing observes
> it's holding rq->lock and yells murder.
> 
> Could you please give the below a spin?
> 
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index cb2aa2b54c7a..f519f44cd4c7 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4200,6 +4200,37 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>  	return success;
>  }
>  
> +static bool __task_needs_rq_lock(struct task_struct *p)
> +{
> +	unsigned int state = READ_ONCE(p->__state);
> +
> +	/*
> +	 * Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
> +	 * the task is blocked. Make sure to check @state since ttwu() can drop
> +	 * locks at the end, see ttwu_queue_wakelist().
> +	 */
> +	if (state == TASK_RUNNING || state == TASK_WAKING)
> +		return true;
> +
> +	/*
> +	 * Ensure we load p->on_rq after p->__state, otherwise it would be
> +	 * possible to, falsely, observe p->on_rq == 0.
> +	 *
> +	 * See try_to_wake_up() for a longer comment.
> +	 */
> +	smp_rmb();
> +	if (p->on_rq)
> +		return true;
> +
> +#ifdef CONFIG_SMP
> +	smp_rmb();
> +	if (p->on_cpu)
> +		return true;
> +#endif
Should we also add p->on_cpu check to return 0 in __set_task_frozen()?
Otherwise it might still warn that p is holding the lock?

thanks,
Chenyu

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH v3 6/6] freezer, sched: Rewrite core freezer logic
  2022-10-27  5:58             ` Chen Yu
@ 2022-10-27  7:39               ` Peter Zijlstra
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2022-10-27  7:39 UTC (permalink / raw)
  To: Chen Yu
  Cc: linux-pm, linux-kernel, bigeasy, rjw, oleg, rostedt, mingo,
	mgorman, intel-gfx, tj, Will Deacon, dietmar.eggemann, ebiederm

On Thu, Oct 27, 2022 at 01:58:09PM +0800, Chen Yu wrote:

> > It's a very narrow race between schedule() and task_call_func().
> > 
> >   CPU0						CPU1
> > 
> >   __schedule()
> >     rq_lock();
> >     prev_state = READ_ONCE(prev->__state);
> >     if (... && prev_state) {
> >       deactivate_tasl(rq, prev, ...)
> >         prev->on_rq = 0;
> > 
> > 						task_call_func()
> > 						  raw_spin_lock_irqsave(p->pi_lock);
> > 						  state = READ_ONCE(p->__state);
> > 						  smp_rmb();
> > 						  if (... || p->on_rq) // false!!!
> > 						    rq = __task_rq_lock()
> > 
> > 						  ret = func();
> > 
> >     next = pick_next_task();
> >     rq = context_switch(prev, next)
> >       prepare_lock_switch()
> >         spin_release(&__rq_lockp(rq)->dep_map...)
> > 
> > 
> > 
> > So while the task is on it's way out, it still holds rq->lock for a
> > little while, and right then task_call_func() comes in and figures it
> > doesn't need rq->lock anymore (because the task is already dequeued --
> > but still running there) and then the __set_task_frozen() thing observes
> > it's holding rq->lock and yells murder.
> > 
> > Could you please give the below a spin?
> > 
> > ---
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index cb2aa2b54c7a..f519f44cd4c7 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4200,6 +4200,37 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> >  	return success;
> >  }
> >  
> > +static bool __task_needs_rq_lock(struct task_struct *p)
> > +{
> > +	unsigned int state = READ_ONCE(p->__state);
> > +
> > +	/*
> > +	 * Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
> > +	 * the task is blocked. Make sure to check @state since ttwu() can drop
> > +	 * locks at the end, see ttwu_queue_wakelist().
> > +	 */
> > +	if (state == TASK_RUNNING || state == TASK_WAKING)
> > +		return true;
> > +
> > +	/*
> > +	 * Ensure we load p->on_rq after p->__state, otherwise it would be
> > +	 * possible to, falsely, observe p->on_rq == 0.
> > +	 *
> > +	 * See try_to_wake_up() for a longer comment.
> > +	 */
> > +	smp_rmb();
> > +	if (p->on_rq)
> > +		return true;
> > +
> > +#ifdef CONFIG_SMP
> > +	smp_rmb();
> > +	if (p->on_cpu)
> > +		return true;
> > +#endif
> Should we also add p->on_cpu check to return 0 in __set_task_frozen()?
> Otherwise it might still warn that p is holding the lock?

With this, I don't think __set_task_frozen() should ever see
'p->on_cpu && !p->on_rq'. By forcing task_call_func() to acquire
rq->lock that window is closed. That is, this window only exits in
__schedule() while it holds rq->lock, since we're now serializing
against that, we should no longer observe it.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH v3 6/6] freezer, sched: Rewrite core freezer logic
  2022-10-26 11:43           ` Peter Zijlstra
  2022-10-26 12:12             ` Peter Zijlstra
  2022-10-27  5:58             ` Chen Yu
@ 2022-10-27 13:09             ` Ville Syrjälä
  2022-10-27 16:53               ` Peter Zijlstra
  2 siblings, 1 reply; 15+ messages in thread
From: Ville Syrjälä @ 2022-10-27 13:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-pm, linux-kernel, bigeasy, rjw, oleg, rostedt, mingo,
	mgorman, intel-gfx, tj, Will Deacon, dietmar.eggemann, ebiederm

On Wed, Oct 26, 2022 at 01:43:00PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 26, 2022 at 01:32:31PM +0300, Ville Syrjälä wrote:
> > Short form looks to be this:
> > <4>[  355.437846] 1 lock held by rs:main Q:Reg/359:
> > <4>[  355.438418]  #0: ffff88844693b758 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x1b/0x30
> > <4>[  355.438432] rs:main Q:Reg/359 holding locks while freezing
> 
> > <4>[  355.438429] ------------[ cut here ]------------
> > <4>[  355.438432] rs:main Q:Reg/359 holding locks while freezing
> > <4>[  355.438439] WARNING: CPU: 0 PID: 6211 at kernel/freezer.c:134 __set_task_frozen+0x86/0xb0
> > <4>[  355.438447] Modules linked in: snd_hda_intel i915 mei_hdcp mei_pxp drm_display_helper drm_kms_helper vgem drm_shmem_helper snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm prime_numbers ttm drm_buddy syscopyarea sysfillrect sysimgblt fb_sys_fops fuse x86_pkg_temp_thermal coretemp kvm_intel btusb btrtl btbcm btintel kvm irqbypass bluetooth crct10dif_pclmul crc32_pclmul ecdh_generic ghash_clmulni_intel ecc e1000e mei_me i2c_i801 ptp mei i2c_smbus pps_core lpc_ich video wmi [last unloaded: drm_kms_helper]
> > <4>[  355.438521] CPU: 0 PID: 6211 Comm: rtcwake Tainted: G     U             6.1.0-rc2-CI_DRM_12295-g3844a56a0922+ #1
> > <4>[  355.438526] Hardware name:  /NUC5i7RYB, BIOS RYBDWi35.86A.0385.2020.0519.1558 05/19/2020
> > <4>[  355.438530] RIP: 0010:__set_task_frozen+0x86/0xb0
> > <4>[  355.438536] Code: 83 60 09 00 00 85 c0 74 2a 48 89 df e8 ac 02 9b 00 8b 93 38 05 00 00 48 8d b3 48 07 00 00 48 c7 c7 a0 62 2b 82 e8 ee c1 9a 00 <0f> 0b c6 05 51 75 e3 02 01 c7 43 18 00 80 00 00 b8 00 80 00 00 5b
> > <4>[  355.438541] RSP: 0018:ffffc900012cbcf0 EFLAGS: 00010086
> > <4>[  355.438546] RAX: 0000000000000000 RBX: ffff88810d090040 RCX: 0000000000000004
> > <4>[  355.438550] RDX: 0000000000000004 RSI: 00000000fffff5de RDI: 00000000ffffffff
> > <4>[  355.438553] RBP: 0000000000000000 R08: 0000000000000000 R09: c0000000fffff5de
> > <4>[  355.438557] R10: 00000000002335f8 R11: ffffc900012cbb88 R12: 0000000000000246
> > <4>[  355.438561] R13: ffffffff81165430 R14: 0000000000000000 R15: ffff88810d090040
> > <4>[  355.438565] FS:  00007fcfa43c7740(0000) GS:ffff888446800000(0000) knlGS:0000000000000000
> > <4>[  355.438569] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > <4>[  355.438582] CR2: 00007fceb380f6b8 CR3: 0000000117c5c004 CR4: 00000000003706f0
> > <4>[  355.438586] Call Trace:
> > <4>[  355.438589]  <TASK>
> > <4>[  355.438592]  task_call_func+0xc4/0xe0
> > <4>[  355.438600]  freeze_task+0x84/0xe0
> > <4>[  355.438607]  try_to_freeze_tasks+0xac/0x260
> > <4>[  355.438616]  freeze_processes+0x56/0xb0
> > <4>[  355.438622]  pm_suspend.cold.7+0x1d9/0x31c
> > <4>[  355.438629]  state_store+0x7b/0xe0
> > <4>[  355.438637]  kernfs_fop_write_iter+0x124/0x1c0
> > <4>[  355.438644]  vfs_write+0x34f/0x4e0
> > <4>[  355.438655]  ksys_write+0x57/0xd0
> > <4>[  355.438663]  do_syscall_64+0x3a/0x90
> > <4>[  355.438670]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
> Oh I think I see what's going on.
> 
> It's a very narrow race between schedule() and task_call_func().
> 
>   CPU0						CPU1
> 
>   __schedule()
>     rq_lock();
>     prev_state = READ_ONCE(prev->__state);
>     if (... && prev_state) {
>       deactivate_tasl(rq, prev, ...)
>         prev->on_rq = 0;
> 
> 						task_call_func()
> 						  raw_spin_lock_irqsave(p->pi_lock);
> 						  state = READ_ONCE(p->__state);
> 						  smp_rmb();
> 						  if (... || p->on_rq) // false!!!
> 						    rq = __task_rq_lock()
> 
> 						  ret = func();
> 
>     next = pick_next_task();
>     rq = context_switch(prev, next)
>       prepare_lock_switch()
>         spin_release(&__rq_lockp(rq)->dep_map...)
> 
> 
> 
> So while the task is on it's way out, it still holds rq->lock for a
> little while, and right then task_call_func() comes in and figures it
> doesn't need rq->lock anymore (because the task is already dequeued --
> but still running there) and then the __set_task_frozen() thing observes
> it's holding rq->lock and yells murder.
> 
> Could you please give the below a spin?

Thanks. I've added this to our CI branch. I'll try to keep and eye
on it in the coming days and let you know if anything still trips.
And I'll report back maybe ~middle of next week if we haven't caught
anything by then.

> 
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index cb2aa2b54c7a..f519f44cd4c7 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4200,6 +4200,37 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>  	return success;
>  }
>  
> +static bool __task_needs_rq_lock(struct task_struct *p)
> +{
> +	unsigned int state = READ_ONCE(p->__state);
> +
> +	/*
> +	 * Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
> +	 * the task is blocked. Make sure to check @state since ttwu() can drop
> +	 * locks at the end, see ttwu_queue_wakelist().
> +	 */
> +	if (state == TASK_RUNNING || state == TASK_WAKING)
> +		return true;
> +
> +	/*
> +	 * Ensure we load p->on_rq after p->__state, otherwise it would be
> +	 * possible to, falsely, observe p->on_rq == 0.
> +	 *
> +	 * See try_to_wake_up() for a longer comment.
> +	 */
> +	smp_rmb();
> +	if (p->on_rq)
> +		return true;
> +
> +#ifdef CONFIG_SMP
> +	smp_rmb();
> +	if (p->on_cpu)
> +		return true;
> +#endif
> +
> +	return false;
> +}
> +
>  /**
>   * task_call_func - Invoke a function on task in fixed state
>   * @p: Process for which the function is to be invoked, can be @current.
> @@ -4217,28 +4248,12 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>  int task_call_func(struct task_struct *p, task_call_f func, void *arg)
>  {
>  	struct rq *rq = NULL;
> -	unsigned int state;
>  	struct rq_flags rf;
>  	int ret;
>  
>  	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
>  
> -	state = READ_ONCE(p->__state);
> -
> -	/*
> -	 * Ensure we load p->on_rq after p->__state, otherwise it would be
> -	 * possible to, falsely, observe p->on_rq == 0.
> -	 *
> -	 * See try_to_wake_up() for a longer comment.
> -	 */
> -	smp_rmb();
> -
> -	/*
> -	 * Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
> -	 * the task is blocked. Make sure to check @state since ttwu() can drop
> -	 * locks at the end, see ttwu_queue_wakelist().
> -	 */
> -	if (state == TASK_RUNNING || state == TASK_WAKING || p->on_rq)
> +	if (__task_needs_rq_lock(p))
>  		rq = __task_rq_lock(p, &rf);
>  
>  	/*

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH v3 6/6] freezer, sched: Rewrite core freezer logic
  2022-10-27 13:09             ` Ville Syrjälä
@ 2022-10-27 16:53               ` Peter Zijlstra
  2022-11-02 16:57                 ` Ville Syrjälä
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2022-10-27 16:53 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: linux-pm, linux-kernel, bigeasy, rjw, oleg, rostedt, mingo,
	mgorman, intel-gfx, tj, Will Deacon, dietmar.eggemann, ebiederm

On Thu, Oct 27, 2022 at 04:09:01PM +0300, Ville Syrjälä wrote:
> On Wed, Oct 26, 2022 at 01:43:00PM +0200, Peter Zijlstra wrote:

> > Could you please give the below a spin?
> 
> Thanks. I've added this to our CI branch. I'll try to keep and eye
> on it in the coming days and let you know if anything still trips.
> And I'll report back maybe ~middle of next week if we haven't caught
> anything by then.

Thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH v3 6/6] freezer, sched: Rewrite core freezer logic
  2022-10-27 16:53               ` Peter Zijlstra
@ 2022-11-02 16:57                 ` Ville Syrjälä
  2022-11-02 22:16                   ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Ville Syrjälä @ 2022-11-02 16:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-pm, linux-kernel, bigeasy, rjw, oleg, rostedt, mingo,
	mgorman, intel-gfx, tj, Will Deacon, dietmar.eggemann, ebiederm

On Thu, Oct 27, 2022 at 06:53:23PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 27, 2022 at 04:09:01PM +0300, Ville Syrjälä wrote:
> > On Wed, Oct 26, 2022 at 01:43:00PM +0200, Peter Zijlstra wrote:
> 
> > > Could you please give the below a spin?
> > 
> > Thanks. I've added this to our CI branch. I'll try to keep and eye
> > on it in the coming days and let you know if anything still trips.
> > And I'll report back maybe ~middle of next week if we haven't caught
> > anything by then.
> 
> Thanks!

Looks like we haven't caught anything since I put the patch in.
So the fix seems good.

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH v3 6/6] freezer, sched: Rewrite core freezer logic
  2022-11-02 16:57                 ` Ville Syrjälä
@ 2022-11-02 22:16                   ` Peter Zijlstra
  2022-11-07 11:47                     ` Ville Syrjälä
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2022-11-02 22:16 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: linux-pm, linux-kernel, bigeasy, rjw, oleg, rostedt, mingo,
	mgorman, intel-gfx, tj, Will Deacon, dietmar.eggemann, ebiederm

On Wed, Nov 02, 2022 at 06:57:51PM +0200, Ville Syrjälä wrote:
> On Thu, Oct 27, 2022 at 06:53:23PM +0200, Peter Zijlstra wrote:
> > On Thu, Oct 27, 2022 at 04:09:01PM +0300, Ville Syrjälä wrote:
> > > On Wed, Oct 26, 2022 at 01:43:00PM +0200, Peter Zijlstra wrote:
> > 
> > > > Could you please give the below a spin?
> > > 
> > > Thanks. I've added this to our CI branch. I'll try to keep and eye
> > > on it in the coming days and let you know if anything still trips.
> > > And I'll report back maybe ~middle of next week if we haven't caught
> > > anything by then.
> > 
> > Thanks!
> 
> Looks like we haven't caught anything since I put the patch in.
> So the fix seems good.

While writing up the Changelog, it occured to me it might be possible to
fix another way, could I bother you to also run the below patch for a
bit?

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cb2aa2b54c7a..daff72f00385 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4200,6 +4200,40 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	return success;
 }
 
+static bool __task_needs_rq_lock(struct task_struct *p)
+{
+	unsigned int state = READ_ONCE(p->__state);
+
+	/*
+	 * Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
+	 * the task is blocked. Make sure to check @state since ttwu() can drop
+	 * locks at the end, see ttwu_queue_wakelist().
+	 */
+	if (state == TASK_RUNNING || state == TASK_WAKING)
+		return true;
+
+	/*
+	 * Ensure we load p->on_rq after p->__state, otherwise it would be
+	 * possible to, falsely, observe p->on_rq == 0.
+	 *
+	 * See try_to_wake_up() for a longer comment.
+	 */
+	smp_rmb();
+	if (p->on_rq)
+		return true;
+
+#ifdef CONFIG_SMP
+	/*
+	 * Ensure the task has finished __schedule() and will not be referenced
+	 * anymore. Again, see try_to_wake_up() for a longer comment.
+	 */
+	smp_rmb();
+	smp_cond_load_acquire(&p->on_cpu, !VAL);
+#endif
+
+	return false;
+}
+
 /**
  * task_call_func - Invoke a function on task in fixed state
  * @p: Process for which the function is to be invoked, can be @current.
@@ -4217,28 +4251,12 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 int task_call_func(struct task_struct *p, task_call_f func, void *arg)
 {
 	struct rq *rq = NULL;
-	unsigned int state;
 	struct rq_flags rf;
 	int ret;
 
 	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
 
-	state = READ_ONCE(p->__state);
-
-	/*
-	 * Ensure we load p->on_rq after p->__state, otherwise it would be
-	 * possible to, falsely, observe p->on_rq == 0.
-	 *
-	 * See try_to_wake_up() for a longer comment.
-	 */
-	smp_rmb();
-
-	/*
-	 * Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
-	 * the task is blocked. Make sure to check @state since ttwu() can drop
-	 * locks at the end, see ttwu_queue_wakelist().
-	 */
-	if (state == TASK_RUNNING || state == TASK_WAKING || p->on_rq)
+	if (__task_needs_rq_lock(p))
 		rq = __task_rq_lock(p, &rf);
 
 	/*

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH v3 6/6] freezer, sched: Rewrite core freezer logic
  2022-11-02 22:16                   ` Peter Zijlstra
@ 2022-11-07 11:47                     ` Ville Syrjälä
  2022-11-10 20:27                       ` Ville Syrjälä
  0 siblings, 1 reply; 15+ messages in thread
From: Ville Syrjälä @ 2022-11-07 11:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-pm, linux-kernel, bigeasy, rjw, oleg, rostedt, mingo,
	mgorman, intel-gfx, tj, Will Deacon, dietmar.eggemann, ebiederm

On Wed, Nov 02, 2022 at 11:16:48PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 02, 2022 at 06:57:51PM +0200, Ville Syrjälä wrote:
> > On Thu, Oct 27, 2022 at 06:53:23PM +0200, Peter Zijlstra wrote:
> > > On Thu, Oct 27, 2022 at 04:09:01PM +0300, Ville Syrjälä wrote:
> > > > On Wed, Oct 26, 2022 at 01:43:00PM +0200, Peter Zijlstra wrote:
> > > 
> > > > > Could you please give the below a spin?
> > > > 
> > > > Thanks. I've added this to our CI branch. I'll try to keep and eye
> > > > on it in the coming days and let you know if anything still trips.
> > > > And I'll report back maybe ~middle of next week if we haven't caught
> > > > anything by then.
> > > 
> > > Thanks!
> > 
> > Looks like we haven't caught anything since I put the patch in.
> > So the fix seems good.
> 
> While writing up the Changelog, it occured to me it might be possible to
> fix another way, could I bother you to also run the below patch for a
> bit?

I swapped in the new patch to the CI branch. I'll check back
after a few days.

> 
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index cb2aa2b54c7a..daff72f00385 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4200,6 +4200,40 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>  	return success;
>  }
>  
> +static bool __task_needs_rq_lock(struct task_struct *p)
> +{
> +	unsigned int state = READ_ONCE(p->__state);
> +
> +	/*
> +	 * Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
> +	 * the task is blocked. Make sure to check @state since ttwu() can drop
> +	 * locks at the end, see ttwu_queue_wakelist().
> +	 */
> +	if (state == TASK_RUNNING || state == TASK_WAKING)
> +		return true;
> +
> +	/*
> +	 * Ensure we load p->on_rq after p->__state, otherwise it would be
> +	 * possible to, falsely, observe p->on_rq == 0.
> +	 *
> +	 * See try_to_wake_up() for a longer comment.
> +	 */
> +	smp_rmb();
> +	if (p->on_rq)
> +		return true;
> +
> +#ifdef CONFIG_SMP
> +	/*
> +	 * Ensure the task has finished __schedule() and will not be referenced
> +	 * anymore. Again, see try_to_wake_up() for a longer comment.
> +	 */
> +	smp_rmb();
> +	smp_cond_load_acquire(&p->on_cpu, !VAL);
> +#endif
> +
> +	return false;
> +}
> +
>  /**
>   * task_call_func - Invoke a function on task in fixed state
>   * @p: Process for which the function is to be invoked, can be @current.
> @@ -4217,28 +4251,12 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>  int task_call_func(struct task_struct *p, task_call_f func, void *arg)
>  {
>  	struct rq *rq = NULL;
> -	unsigned int state;
>  	struct rq_flags rf;
>  	int ret;
>  
>  	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
>  
> -	state = READ_ONCE(p->__state);
> -
> -	/*
> -	 * Ensure we load p->on_rq after p->__state, otherwise it would be
> -	 * possible to, falsely, observe p->on_rq == 0.
> -	 *
> -	 * See try_to_wake_up() for a longer comment.
> -	 */
> -	smp_rmb();
> -
> -	/*
> -	 * Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
> -	 * the task is blocked. Make sure to check @state since ttwu() can drop
> -	 * locks at the end, see ttwu_queue_wakelist().
> -	 */
> -	if (state == TASK_RUNNING || state == TASK_WAKING || p->on_rq)
> +	if (__task_needs_rq_lock(p))
>  		rq = __task_rq_lock(p, &rf);
>  
>  	/*

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH v3 6/6] freezer, sched: Rewrite core freezer logic
  2022-11-07 11:47                     ` Ville Syrjälä
@ 2022-11-10 20:27                       ` Ville Syrjälä
  0 siblings, 0 replies; 15+ messages in thread
From: Ville Syrjälä @ 2022-11-10 20:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-pm, Will Deacon, bigeasy, rjw, oleg, rostedt, linux-kernel,
	mgorman, tj, intel-gfx, mingo, dietmar.eggemann, ebiederm

On Mon, Nov 07, 2022 at 01:47:23PM +0200, Ville Syrjälä wrote:
> On Wed, Nov 02, 2022 at 11:16:48PM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 02, 2022 at 06:57:51PM +0200, Ville Syrjälä wrote:
> > > On Thu, Oct 27, 2022 at 06:53:23PM +0200, Peter Zijlstra wrote:
> > > > On Thu, Oct 27, 2022 at 04:09:01PM +0300, Ville Syrjälä wrote:
> > > > > On Wed, Oct 26, 2022 at 01:43:00PM +0200, Peter Zijlstra wrote:
> > > > 
> > > > > > Could you please give the below a spin?
> > > > > 
> > > > > Thanks. I've added this to our CI branch. I'll try to keep and eye
> > > > > on it in the coming days and let you know if anything still trips.
> > > > > And I'll report back maybe ~middle of next week if we haven't caught
> > > > > anything by then.
> > > > 
> > > > Thanks!
> > > 
> > > Looks like we haven't caught anything since I put the patch in.
> > > So the fix seems good.
> > 
> > While writing up the Changelog, it occured to me it might be possible to
> > fix another way, could I bother you to also run the below patch for a
> > bit?
> 
> I swapped in the new patch to the CI branch. I'll check back
> after a few days.

CI hasn't had anything new to report AFAICS, so looks like this
version is good as well.

> 
> > 
> > ---
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index cb2aa2b54c7a..daff72f00385 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4200,6 +4200,40 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> >  	return success;
> >  }
> >  
> > +static bool __task_needs_rq_lock(struct task_struct *p)
> > +{
> > +	unsigned int state = READ_ONCE(p->__state);
> > +
> > +	/*
> > +	 * Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
> > +	 * the task is blocked. Make sure to check @state since ttwu() can drop
> > +	 * locks at the end, see ttwu_queue_wakelist().
> > +	 */
> > +	if (state == TASK_RUNNING || state == TASK_WAKING)
> > +		return true;
> > +
> > +	/*
> > +	 * Ensure we load p->on_rq after p->__state, otherwise it would be
> > +	 * possible to, falsely, observe p->on_rq == 0.
> > +	 *
> > +	 * See try_to_wake_up() for a longer comment.
> > +	 */
> > +	smp_rmb();
> > +	if (p->on_rq)
> > +		return true;
> > +
> > +#ifdef CONFIG_SMP
> > +	/*
> > +	 * Ensure the task has finished __schedule() and will not be referenced
> > +	 * anymore. Again, see try_to_wake_up() for a longer comment.
> > +	 */
> > +	smp_rmb();
> > +	smp_cond_load_acquire(&p->on_cpu, !VAL);
> > +#endif
> > +
> > +	return false;
> > +}
> > +
> >  /**
> >   * task_call_func - Invoke a function on task in fixed state
> >   * @p: Process for which the function is to be invoked, can be @current.
> > @@ -4217,28 +4251,12 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> >  int task_call_func(struct task_struct *p, task_call_f func, void *arg)
> >  {
> >  	struct rq *rq = NULL;
> > -	unsigned int state;
> >  	struct rq_flags rf;
> >  	int ret;
> >  
> >  	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
> >  
> > -	state = READ_ONCE(p->__state);
> > -
> > -	/*
> > -	 * Ensure we load p->on_rq after p->__state, otherwise it would be
> > -	 * possible to, falsely, observe p->on_rq == 0.
> > -	 *
> > -	 * See try_to_wake_up() for a longer comment.
> > -	 */
> > -	smp_rmb();
> > -
> > -	/*
> > -	 * Since pi->lock blocks try_to_wake_up(), we don't need rq->lock when
> > -	 * the task is blocked. Make sure to check @state since ttwu() can drop
> > -	 * locks at the end, see ttwu_queue_wakelist().
> > -	 */
> > -	if (state == TASK_RUNNING || state == TASK_WAKING || p->on_rq)
> > +	if (__task_needs_rq_lock(p))
> >  		rq = __task_rq_lock(p, &rf);
> >  
> >  	/*
> 
> -- 
> Ville Syrjälä
> Intel

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-11-10 20:27 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20220822111816.760285417@infradead.org>
     [not found] ` <20220822114649.055452969@infradead.org>
2022-10-21 17:22   ` [Intel-gfx] [PATCH v3 6/6] freezer, sched: Rewrite core freezer logic Ville Syrjälä
2022-10-25  4:52     ` Ville Syrjälä
2022-10-25 10:49       ` Peter Zijlstra
2022-10-26 10:32         ` Ville Syrjälä
2022-10-26 11:43           ` Peter Zijlstra
2022-10-26 12:12             ` Peter Zijlstra
2022-10-26 12:14               ` Peter Zijlstra
2022-10-27  5:58             ` Chen Yu
2022-10-27  7:39               ` Peter Zijlstra
2022-10-27 13:09             ` Ville Syrjälä
2022-10-27 16:53               ` Peter Zijlstra
2022-11-02 16:57                 ` Ville Syrjälä
2022-11-02 22:16                   ` Peter Zijlstra
2022-11-07 11:47                     ` Ville Syrjälä
2022-11-10 20:27                       ` Ville Syrjälä

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox