Re: [PATCH v6 8/8] cgroup/cpuset: Call housekeeping_update() without holding cpus_read_lock

Linux Kernel Selftest development
 help / color / mirror / Atom feed

From: Waiman Long <longman@redhat.com>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: "Chen Ridong" <chenridong@huaweicloud.com>,
	"Tejun Heo" <tj@kernel.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Juri Lelli" <juri.lelli@redhat.com>,
	"Vincent Guittot" <vincent.guittot@linaro.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Ben Segall" <bsegall@google.com>, "Mel Gorman" <mgorman@suse.de>,
	"Valentin Schneider" <vschneid@redhat.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Shuah Khan" <shuah@kernel.org>,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org
Subject: Re: [PATCH v6 8/8] cgroup/cpuset: Call housekeeping_update() without holding cpus_read_lock
Date: Mon, 2 Mar 2026 10:40:45 -0500	[thread overview]
Message-ID: <3936bd39-2e1c-44b7-8098-e8e950a6df11@redhat.com> (raw)
In-Reply-To: <69ec4b3c-818b-4784-9b90-1ac5e977ae58@redhat.com>

On 3/2/26 9:15 AM, Waiman Long wrote:
> On 3/2/26 7:14 AM, Frederic Weisbecker wrote:
>> On Sat, Feb 21, 2026 at 01:54:18PM -0500, Waiman Long wrote:
>>> The current cpuset partition code is able to dynamically update
>>> the sched domains of a running system and the corresponding
>>> HK_TYPE_DOMAIN housekeeping cpumask to perform what is essentally the
>>> "isolcpus=domain,..." boot command line feature at run time.
>>>
>>> The housekeeping cpumask update requires flushing a number of different
>>> workqueues which may not be safe with cpus_read_lock() held as the
>>> workqueue flushing code may acquire cpus_read_lock() or acquiring locks
>>> which have locking dependency with cpus_read_lock() down the chain. 
>>> Below
>>> is an example of such circular locking problem.
>>>
>>>    ======================================================
>>>    WARNING: possible circular locking dependency detected
>>>    6.18.0-test+ #2 Tainted: G S
>>>    ------------------------------------------------------
>>>    test_cpuset_prs/10971 is trying to acquire lock:
>>>    ffff888112ba4958 ((wq_completion)sync_wq){+.+.}-{0:0}, at: 
>>> touch_wq_lockdep_map+0x7a/0x180
>>>
>>>    but task is already holding lock:
>>>    ffffffffae47f450 (cpuset_mutex){+.+.}-{4:4}, at: 
>>> cpuset_partition_write+0x85/0x130
>>>
>>>    which lock already depends on the new lock.
>>>
>>>    the existing dependency chain (in reverse order) is:
>>>    -> #4 (cpuset_mutex){+.+.}-{4:4}:
>>>    -> #3 (cpu_hotplug_lock){++++}-{0:0}:
>>>    -> #2 (rtnl_mutex){+.+.}-{4:4}:
>>>    -> #1 ((work_completion)(&arg.work)){+.+.}-{0:0}:
>>>    -> #0 ((wq_completion)sync_wq){+.+.}-{0:0}:
>>>
>>>    Chain exists of:
>>>      (wq_completion)sync_wq --> cpu_hotplug_lock --> cpuset_mutex
>> Which workqueue is involved here that holds rtnl_mutex?
>> Is this an existing problem or added test code?
>
> Circular locking dependency here may not necessarily mean that 
> rtnl_mutex is directly used in a work function.  However it can be 
> used in a locking chain involving multiple parties that can result in 
> a deadlock situation if they happen in the right order. So it is 
> better safe that sorry even if the chance of this occurrence is minimal. 

Below is the full lockdep splat, I didn't include the individual stack 
traces to make the commit log less verbose.

The rtnl_mutex is indeed involved in local_pci_probe().

Cheers,
Longman

[  909.360022] ======================================================
[  909.366208] WARNING: possible circular locking dependency detected
[  909.372387] 7.0.0-rc1-test+ #3 Tainted: G S
[  909.378044] ------------------------------------------------------
[  909.384225] test_cpuset_prs/8673 is trying to acquire lock:
[  909.389798] ffff8890b0fd6558 ((wq_completion)sync_wq){+.+.}-{0:0}, 
at: touch_wq_lockdep_map+0x7a/0x180
[  909.399114]
                but task is already holding lock:
[  909.404946] ffffffffb9741c10 (cpuset_mutex){+.+.}-{4:4}, at: 
cpuset_partition_write+0x85/0x130
[  909.413562]
                which lock already depends on the new lock.

[  909.421733]
                the existing dependency chain (in reverse order) is:
[  909.429213]
                -> #4 (cpuset_mutex){+.+.}-{4:4}:
[  909.435056]        __lock_acquire+0x58c/0xbd0
[  909.439421]        lock_acquire.part.0+0xbd/0x260
[  909.444129]        __mutex_lock+0x1a7/0x1ba0
[  909.448411]        cpuset_css_online+0x59/0x410
[  909.452948]        online_css+0x9b/0x2d0
[  909.456877]        css_create+0x3c6/0x610
[  909.460895]        cgroup_apply_control_enable+0x2ff/0x460
[  909.466384]        cgroup_subtree_control_write+0x79a/0xc70
[  909.471963]        cgroup_file_write+0x1a5/0x680
[  909.476582]        kernfs_fop_write_iter+0x3df/0x5f0
[  909.481550]        vfs_write+0x525/0xfd0
[  909.485482]        ksys_write+0xf9/0x1d0
[  909.489410]        do_syscall_64+0x13a/0x1520
[  909.493778]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  909.499361]
                -> #3 (cpu_hotplug_lock){++++}-{0:0}:
[  909.505547]        __lock_acquire+0x58c/0xbd0
[  909.509914]        lock_acquire.part.0+0xbd/0x260
[  909.514630]        cpus_read_lock+0x40/0xe0
[  909.518824]        flush_all_backlogs+0x83/0x4b0
[  909.523451] unregister_netdevice_many_notify+0x7e8/0x1fa0
[  909.529465]        default_device_exit_batch+0x356/0x490
[  909.534788]        ops_undo_list+0x2f4/0x930
[  909.539067]        cleanup_net+0x40a/0x8f0
[  909.543168]        process_one_work+0xd8b/0x1320
[  909.547795]        worker_thread+0x5f3/0xfe0
[  909.552068]        kthread+0x36c/0x470
[  909.555830]        ret_from_fork+0x5dc/0x8e0
[  909.560109]        ret_from_fork_asm+0x1a/0x30
[  909.564557]
                -> #2 (rtnl_mutex){+.+.}-{4:4}:
[  909.570224]        __lock_acquire+0x58c/0xbd0
[  909.574592]        lock_acquire.part.0+0xbd/0x260
[  909.579304]        __mutex_lock+0x1a7/0x1ba0
[  909.583580]        rtnl_net_lock_killable+0x1e/0x70
[  909.588465]        register_netdev+0x40/0x70
[  909.592738]        i40e_vsi_setup+0x892/0x14b0 [i40e]
[  909.597854]        i40e_setup_pf_switch+0xaa1/0xe80 [i40e]
[  909.603392]        i40e_probe.cold+0xdb0/0x1d1b [i40e]
[  909.608582]        local_pci_probe+0xdb/0x180
[  909.612951]        local_pci_probe_callback+0x35/0x80
[  909.618008]        process_one_work+0xd8b/0x1320
[  909.622631]        worker_thread+0x5f3/0xfe0
[  909.626912]        kthread+0x36c/0x470
[  909.630673]        ret_from_fork+0x5dc/0x8e0
[  909.634951]        ret_from_fork_asm+0x1a/0x30
[  909.639399]
                -> #1 ((work_completion)(&arg.work)){+.+.}-{0:0}:
[  909.646627]        __lock_acquire+0x58c/0xbd0
[  909.650994]        lock_acquire.part.0+0xbd/0x260
[  909.655699]        process_one_work+0xd58/0x1320
[  909.660321]        worker_thread+0x5f3/0xfe0
[  909.664602]        kthread+0x36c/0x470
[  909.668363]        ret_from_fork+0x5dc/0x8e0
[  909.672641]        ret_from_fork_asm+0x1a/0x30
[  909.677089]
                -> #0 ((wq_completion)sync_wq){+.+.}-{0:0}:
[  909.683795]        check_prev_add+0xf1/0xc80
[  909.688068]        validate_chain+0x481/0x560
[  909.692431]        __lock_acquire+0x58c/0xbd0
[  909.696797]        lock_acquire.part.0+0xbd/0x260
[  909.701511]        touch_wq_lockdep_map+0x93/0x180
[  909.706314]        __flush_workqueue+0x111/0x10b0
[  909.711026]        housekeeping_update+0x12d/0x2d0
[  909.715819]        update_parent_effective_cpumask+0x595/0x2440
[  909.721747]        update_prstate+0x89d/0xce0
[  909.726105]        cpuset_partition_write+0xc5/0x130
[  909.731073]        cgroup_file_write+0x1a5/0x680
[  909.735701]        kernfs_fop_write_iter+0x3df/0x5f0
[  909.740664]        vfs_write+0x525/0xfd0
[  909.744592]        ksys_write+0xf9/0x1d0
[  909.748520]        do_syscall_64+0x13a/0x1520
[  909.752887]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  909.758465]
                other info that might help us debug this:

[  909.766466] Chain exists of:
                  (wq_completion)sync_wq --> cpu_hotplug_lock --> 
cpuset_mutex

[  909.777679]  Possible unsafe locking scenario:

[  909.783599]        CPU0                    CPU1
[  909.788130]        ----                    ----
[  909.792666]   lock(cpuset_mutex);
[  909.795991] lock(cpu_hotplug_lock);
[  909.802171]                                lock(cpuset_mutex);
[  909.808013]   lock((wq_completion)sync_wq);
[  909.812207]
                 *** DEADLOCK ***

[  909.818127] 5 locks held by test_cpuset_prs/8673:
[  909.822830]  #0: ffff888140592440 (sb_writers#7){.+.+}-{0:0}, at: 
ksys_write+0xf9/0x1d0
[  909.830839]  #1: ffff889100a49890 (&of->mutex#2){+.+.}-{4:4}, at: 
kernfs_fop_write_iter+0x260/0x5f0
[  909.839890]  #2: ffff8890fbfa5368 (kn->active#353){.+.+}-{0:0}, at: 
kernfs_fop_write_iter+0x2b6/0x5f0
[  909.849118]  #3: ffffffffb9134d00 (cpu_hotplug_lock){++++}-{0:0}, at: 
cpuset_partition_write+0x77/0x130
[  909.858522]  #4: ffffffffb9741c10 (cpuset_mutex){+.+.}-{4:4}, at: 
cpuset_partition_write+0x85/0x130
[  909.867576]
                stack backtrace:
[  909.871940] CPU: 95 UID: 0 PID: 8673 Comm: test_cpuset_prs Kdump: 
loaded Tainted: G S                  7.0.0-rc1-test+ #3 PREEMPT(full)
[  909.871946] Tainted: [S]=CPU_OUT_OF_SPEC
[  909.871948] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS 
SE5C620.86B.0X.02.0001.043020191705 04/30/2019
[  909.871950] Call Trace:
[  909.871952]  <TASK>
[  909.871955]  dump_stack_lvl+0x6f/0xb0
[  909.871961]  print_circular_bug.cold+0x38/0x45
[  909.871968]  check_noncircular+0x146/0x160
[  909.871975]  check_prev_add+0xf1/0xc80
[  909.871978]  ? alloc_chain_hlocks+0x13e/0x1d0
[  909.871982]  ? add_chain_cache+0x11c/0x300
[  909.871986]  validate_chain+0x481/0x560
[  909.871991]  __lock_acquire+0x58c/0xbd0
[  909.871995]  ? lockdep_init_map_type+0x66/0x250
[  909.872000]  lock_acquire.part.0+0xbd/0x260
[  909.872004]  ? touch_wq_lockdep_map+0x7a/0x180
[  909.872009]  ? rcu_is_watching+0x15/0xb0
[  909.872013]  ? trace_rcu_sr_normal+0x1d5/0x2e0
[  909.872018]  ? touch_wq_lockdep_map+0x7a/0x180
[  909.872021]  ? lock_acquire+0x159/0x180
[  909.872026]  ? touch_wq_lockdep_map+0x7a/0x180
[  909.872030]  touch_wq_lockdep_map+0x93/0x180
[  909.872034]  ? touch_wq_lockdep_map+0x7a/0x180
[  909.872038]  __flush_workqueue+0x111/0x10b0
[  909.872042]  ? local_clock_noinstr+0xd/0xe0
[  909.872049]  ? __pfx___flush_workqueue+0x10/0x10
[  909.872059]  housekeeping_update+0x12d/0x2d0
[  909.872063]  update_parent_effective_cpumask+0x595/0x2440
[  909.872070]  update_prstate+0x89d/0xce0
[  909.872076]  ? __pfx_update_prstate+0x10/0x10
[  909.872085]  cpuset_partition_write+0xc5/0x130
[  909.872089]  cgroup_file_write+0x1a5/0x680
[  909.872093]  ? __pfx_cgroup_file_write+0x10/0x10
[  909.872097]  ? kernfs_fop_write_iter+0x2b6/0x5f0
[  909.872102]  ? __pfx_cgroup_file_write+0x10/0x10
[  909.872105]  kernfs_fop_write_iter+0x3df/0x5f0
[  909.872109]  vfs_write+0x525/0xfd0
[  909.872113]  ? __pfx_vfs_write+0x10/0x10
[  909.872118]  ? __lock_acquire+0x58c/0xbd0
[  909.872124]  ? find_held_lock+0x32/0x90
[  909.872130]  ksys_write+0xf9/0x1d0
[  909.872133]  ? __pfx_ksys_write+0x10/0x10
[  909.872136]  ? lockdep_hardirqs_on+0x78/0x100
[  909.872141]  ? do_syscall_64+0xde/0x1520
[  909.872146]  do_syscall_64+0x13a/0x1520
[  909.872151]  ? rcu_is_watching+0x15/0xb0
[  909.872154]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  909.872157]  ? lockdep_hardirqs_on+0x78/0x100
[  909.872161]  ? do_syscall_64+0x212/0x1520
[  909.872166]  ? find_held_lock+0x32/0x90
[  909.872170]  ? local_clock_noinstr+0xd/0xe0
[  909.872174]  ? __lock_release.isra.0+0x1a2/0x2c0
[  909.872178]  ? exc_page_fault+0x78/0xf0
[  909.872183]  ? rcu_is_watching+0x15/0xb0
[  909.872186]  ? trace_irq_enable.constprop.0+0x194/0x200
[  909.872191]  ? lockdep_hardirqs_on_prepare.part.0+0x8e/0x170
[  909.872196]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  909.872199] RIP: 0033:0x7f877d3e9544
[  909.872203] Code: 89 02 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 
00 0f 1f 40 00 f3 0f 1e fa 80 3d a5 cb 0d 00 00 74 13 b8 01 00 00 00 0f 
05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48
[  909.872206] RSP: 002b:00007ffd6ff21b28 EFLAGS: 00000202 ORIG_RAX: 
0000000000000001
[  909.872210] RAX: ffffffffffffffda RBX: 00007f877d4bf5c0 RCX: 
00007f877d3e9544
[  909.872213] RDX: 0000000000000009 RSI: 0000557ff7ec2320 RDI: 
0000000000000001
[  909.872215] RBP: 0000000000000009 R08: 0000000000000073 R09: 
00000000ffffffff
[  909.872217] R10: 0000000000000000 R11: 0000000000000202 R12: 
0000000000000009
[  909.872219] R13: 0000557ff7ec2320 R14: 0000000000000009 R15: 
00007f877d4bcf00
[  909.872226]  </TASK>

next prev parent reply	other threads:[~2026-03-02 15:41 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-21 18:54 [PATCH v6 0/8] cgroup/cpuset: Fix partition related locking issues Waiman Long
2026-02-21 18:54 ` [PATCH v6 1/8] cgroup/cpuset: Fix incorrect change to effective_xcpus in partition_xcpus_del() Waiman Long
2026-02-21 18:54 ` [PATCH v6 2/8] cgroup/cpuset: Fix incorrect use of cpuset_update_tasks_cpumask() in update_cpumasks_hier() Waiman Long
2026-02-21 18:54 ` [PATCH v6 3/8] cgroup/cpuset: Clarify exclusion rules for cpuset internal variables Waiman Long
2026-02-26 15:00   ` Frederic Weisbecker
2026-02-21 18:54 ` [PATCH v6 4/8] cgroup/cpuset: Set isolated_cpus_updating only if isolated_cpus is changed Waiman Long
2026-02-26 15:07   ` Frederic Weisbecker
2026-02-21 18:54 ` [PATCH v6 5/8] kselftest/cgroup: Simplify test_cpuset_prs.sh by removing "S+" command Waiman Long
2026-02-21 18:54 ` [PATCH v6 6/8] cgroup/cpuset: Move housekeeping_update()/rebuild_sched_domains() together Waiman Long
2026-02-26 15:51   ` Frederic Weisbecker
2026-02-21 18:54 ` [PATCH v6 7/8] cgroup/cpuset: Defer housekeeping_update() calls from CPU hotplug to workqueue Waiman Long
2026-02-26 16:06   ` Frederic Weisbecker
2026-03-03 16:00     ` Waiman Long
2026-03-03 22:48       ` Frederic Weisbecker
2026-03-04  4:05         ` Waiman Long
2026-03-02 11:49   ` Frederic Weisbecker
2026-03-03 15:18   ` Jon Hunter
2026-03-03 16:09     ` Waiman Long
2026-03-04  3:58     ` Waiman Long
2026-03-04 11:07       ` Jon Hunter
2026-03-04 18:11         ` Waiman Long
2026-02-21 18:54 ` [PATCH v6 8/8] cgroup/cpuset: Call housekeeping_update() without holding cpus_read_lock Waiman Long
2026-03-02 12:14   ` Frederic Weisbecker
2026-03-02 14:15     ` Waiman Long
2026-03-02 15:40       ` Waiman Long [this message]
2026-02-23 20:57 ` [PATCH v6 0/8] cgroup/cpuset: Fix partition related locking issues Tejun Heo
2026-02-23 21:11   ` Waiman Long
2026-02-24  7:51     ` Chen Ridong
2026-03-02 12:21   ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3936bd39-2e1c-44b7-8098-e8e950a6df11@redhat.com \
    --to=longman@redhat.com \
    --cc=bsegall@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chenridong@huaweicloud.com \
    --cc=frederic@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox