From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C492411605 for ; Mon, 2 Mar 2026 15:41:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772466066; cv=none; b=S6XZRBuHgeZZluIJEcvnub4dCGN3Is7BxVSFImnF7QO+kVFsJYn455FPZwB/CN960pv63FzL06rvFf5En7UJAd5gH4nGisjnDfM57+RecamKkEVG7puLKD5l4jCzCrRFR2Zi/g2xOW0GZiQ1zZvTIrzFZ/LW+fqmtdZpOcXtEEo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772466066; c=relaxed/simple; bh=Sn79Tp4j4e37SQOpZGw6KBv32AznSaZqQjez4uDw8UU=; h=Message-ID:Date:MIME-Version:Subject:From:To:Cc:References: In-Reply-To:Content-Type; b=QXr07i3h5NtRt2qUThlfTwW4XYytckQzPd1MDFyYLzlWRo8ocDSRcN9Qg10n1P3zMLnedWM3cONVl3enXHExa9F4UaqtvWy6pLK54/0aLBv1XN9I5KZIJehaPMq7IS8URLTKtSWtLA6k21hxyvsMs0bF7a6SWv7ZLfXwpIdt+40= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=catNdLn9; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="catNdLn9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1772466063; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jGDe4JIc5UBthN6hNfTBDgbGPk6BWkYMi6VpaUo3v2k=; b=catNdLn9Y+kbRfgIbO62+jeuQnVBLhVO5TlWQI0/TlywqAH0iziEt0stPyiL3iZ9MH6Eg5 FtqKnPgGYg/GjiYNNzlx6FDYVVKIczphEjP7lJ0nsuqyO1e1RPotbRRlCbnVC+Mlb9eR2S T/5QJw82bdOFOL/wqKXk8cRU1CNp2hM= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-1-s5WJt1DOOcidKUj6rskWYQ-1; Mon, 02 Mar 2026 10:40:57 -0500 X-MC-Unique: s5WJt1DOOcidKUj6rskWYQ-1 X-Mimecast-MFC-AGG-ID: s5WJt1DOOcidKUj6rskWYQ_1772466052 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 97F6618003FC; Mon, 2 Mar 2026 15:40:51 +0000 (UTC) Received: from [10.22.65.79] (unknown [10.22.65.79]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 39DCB30001B9; Mon, 2 Mar 2026 15:40:45 +0000 (UTC) Message-ID: <3936bd39-2e1c-44b7-8098-e8e950a6df11@redhat.com> Date: Mon, 2 Mar 2026 10:40:45 -0500 Precedence: bulk X-Mailing-List: cgroups@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v6 8/8] cgroup/cpuset: Call housekeeping_update() without holding cpus_read_lock From: Waiman Long To: Frederic Weisbecker Cc: Chen Ridong , Tejun Heo , Johannes Weiner , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Thomas Gleixner , Shuah Khan , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org References: <20260221185418.29319-1-longman@redhat.com> <20260221185418.29319-9-longman@redhat.com> <69ec4b3c-818b-4784-9b90-1ac5e977ae58@redhat.com> Content-Language: en-US In-Reply-To: <69ec4b3c-818b-4784-9b90-1ac5e977ae58@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 On 3/2/26 9:15 AM, Waiman Long wrote: > On 3/2/26 7:14 AM, Frederic Weisbecker wrote: >> On Sat, Feb 21, 2026 at 01:54:18PM -0500, Waiman Long wrote: >>> The current cpuset partition code is able to dynamically update >>> the sched domains of a running system and the corresponding >>> HK_TYPE_DOMAIN housekeeping cpumask to perform what is essentally the >>> "isolcpus=domain,..." boot command line feature at run time. >>> >>> The housekeeping cpumask update requires flushing a number of different >>> workqueues which may not be safe with cpus_read_lock() held as the >>> workqueue flushing code may acquire cpus_read_lock() or acquiring locks >>> which have locking dependency with cpus_read_lock() down the chain. >>> Below >>> is an example of such circular locking problem. >>> >>>    ====================================================== >>>    WARNING: possible circular locking dependency detected >>>    6.18.0-test+ #2 Tainted: G S >>>    ------------------------------------------------------ >>>    test_cpuset_prs/10971 is trying to acquire lock: >>>    ffff888112ba4958 ((wq_completion)sync_wq){+.+.}-{0:0}, at: >>> touch_wq_lockdep_map+0x7a/0x180 >>> >>>    but task is already holding lock: >>>    ffffffffae47f450 (cpuset_mutex){+.+.}-{4:4}, at: >>> cpuset_partition_write+0x85/0x130 >>> >>>    which lock already depends on the new lock. >>> >>>    the existing dependency chain (in reverse order) is: >>>    -> #4 (cpuset_mutex){+.+.}-{4:4}: >>>    -> #3 (cpu_hotplug_lock){++++}-{0:0}: >>>    -> #2 (rtnl_mutex){+.+.}-{4:4}: >>>    -> #1 ((work_completion)(&arg.work)){+.+.}-{0:0}: >>>    -> #0 ((wq_completion)sync_wq){+.+.}-{0:0}: >>> >>>    Chain exists of: >>>      (wq_completion)sync_wq --> cpu_hotplug_lock --> cpuset_mutex >> Which workqueue is involved here that holds rtnl_mutex? >> Is this an existing problem or added test code? > > Circular locking dependency here may not necessarily mean that > rtnl_mutex is directly used in a work function.  However it can be > used in a locking chain involving multiple parties that can result in > a deadlock situation if they happen in the right order. So it is > better safe that sorry even if the chance of this occurrence is minimal. Below is the full lockdep splat, I didn't include the individual stack traces to make the commit log less verbose. The rtnl_mutex is indeed involved in local_pci_probe(). Cheers, Longman [  909.360022] ====================================================== [  909.366208] WARNING: possible circular locking dependency detected [  909.372387] 7.0.0-rc1-test+ #3 Tainted: G S [  909.378044] ------------------------------------------------------ [  909.384225] test_cpuset_prs/8673 is trying to acquire lock: [  909.389798] ffff8890b0fd6558 ((wq_completion)sync_wq){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x7a/0x180 [  909.399114]                but task is already holding lock: [  909.404946] ffffffffb9741c10 (cpuset_mutex){+.+.}-{4:4}, at: cpuset_partition_write+0x85/0x130 [  909.413562]                which lock already depends on the new lock. [  909.421733]                the existing dependency chain (in reverse order) is: [  909.429213]                -> #4 (cpuset_mutex){+.+.}-{4:4}: [  909.435056]        __lock_acquire+0x58c/0xbd0 [  909.439421]        lock_acquire.part.0+0xbd/0x260 [  909.444129]        __mutex_lock+0x1a7/0x1ba0 [  909.448411]        cpuset_css_online+0x59/0x410 [  909.452948]        online_css+0x9b/0x2d0 [  909.456877]        css_create+0x3c6/0x610 [  909.460895]        cgroup_apply_control_enable+0x2ff/0x460 [  909.466384]        cgroup_subtree_control_write+0x79a/0xc70 [  909.471963]        cgroup_file_write+0x1a5/0x680 [  909.476582]        kernfs_fop_write_iter+0x3df/0x5f0 [  909.481550]        vfs_write+0x525/0xfd0 [  909.485482]        ksys_write+0xf9/0x1d0 [  909.489410]        do_syscall_64+0x13a/0x1520 [  909.493778]        entry_SYSCALL_64_after_hwframe+0x76/0x7e [  909.499361]                -> #3 (cpu_hotplug_lock){++++}-{0:0}: [  909.505547]        __lock_acquire+0x58c/0xbd0 [  909.509914]        lock_acquire.part.0+0xbd/0x260 [  909.514630]        cpus_read_lock+0x40/0xe0 [  909.518824]        flush_all_backlogs+0x83/0x4b0 [  909.523451] unregister_netdevice_many_notify+0x7e8/0x1fa0 [  909.529465]        default_device_exit_batch+0x356/0x490 [  909.534788]        ops_undo_list+0x2f4/0x930 [  909.539067]        cleanup_net+0x40a/0x8f0 [  909.543168]        process_one_work+0xd8b/0x1320 [  909.547795]        worker_thread+0x5f3/0xfe0 [  909.552068]        kthread+0x36c/0x470 [  909.555830]        ret_from_fork+0x5dc/0x8e0 [  909.560109]        ret_from_fork_asm+0x1a/0x30 [  909.564557]                -> #2 (rtnl_mutex){+.+.}-{4:4}: [  909.570224]        __lock_acquire+0x58c/0xbd0 [  909.574592]        lock_acquire.part.0+0xbd/0x260 [  909.579304]        __mutex_lock+0x1a7/0x1ba0 [  909.583580]        rtnl_net_lock_killable+0x1e/0x70 [  909.588465]        register_netdev+0x40/0x70 [  909.592738]        i40e_vsi_setup+0x892/0x14b0 [i40e] [  909.597854]        i40e_setup_pf_switch+0xaa1/0xe80 [i40e] [  909.603392]        i40e_probe.cold+0xdb0/0x1d1b [i40e] [  909.608582]        local_pci_probe+0xdb/0x180 [  909.612951]        local_pci_probe_callback+0x35/0x80 [  909.618008]        process_one_work+0xd8b/0x1320 [  909.622631]        worker_thread+0x5f3/0xfe0 [  909.626912]        kthread+0x36c/0x470 [  909.630673]        ret_from_fork+0x5dc/0x8e0 [  909.634951]        ret_from_fork_asm+0x1a/0x30 [  909.639399]                -> #1 ((work_completion)(&arg.work)){+.+.}-{0:0}: [  909.646627]        __lock_acquire+0x58c/0xbd0 [  909.650994]        lock_acquire.part.0+0xbd/0x260 [  909.655699]        process_one_work+0xd58/0x1320 [  909.660321]        worker_thread+0x5f3/0xfe0 [  909.664602]        kthread+0x36c/0x470 [  909.668363]        ret_from_fork+0x5dc/0x8e0 [  909.672641]        ret_from_fork_asm+0x1a/0x30 [  909.677089]                -> #0 ((wq_completion)sync_wq){+.+.}-{0:0}: [  909.683795]        check_prev_add+0xf1/0xc80 [  909.688068]        validate_chain+0x481/0x560 [  909.692431]        __lock_acquire+0x58c/0xbd0 [  909.696797]        lock_acquire.part.0+0xbd/0x260 [  909.701511]        touch_wq_lockdep_map+0x93/0x180 [  909.706314]        __flush_workqueue+0x111/0x10b0 [  909.711026]        housekeeping_update+0x12d/0x2d0 [  909.715819]        update_parent_effective_cpumask+0x595/0x2440 [  909.721747]        update_prstate+0x89d/0xce0 [  909.726105]        cpuset_partition_write+0xc5/0x130 [  909.731073]        cgroup_file_write+0x1a5/0x680 [  909.735701]        kernfs_fop_write_iter+0x3df/0x5f0 [  909.740664]        vfs_write+0x525/0xfd0 [  909.744592]        ksys_write+0xf9/0x1d0 [  909.748520]        do_syscall_64+0x13a/0x1520 [  909.752887]        entry_SYSCALL_64_after_hwframe+0x76/0x7e [  909.758465]                other info that might help us debug this: [  909.766466] Chain exists of:                  (wq_completion)sync_wq --> cpu_hotplug_lock --> cpuset_mutex [  909.777679]  Possible unsafe locking scenario: [  909.783599]        CPU0                    CPU1 [  909.788130]        ----                    ---- [  909.792666]   lock(cpuset_mutex); [  909.795991] lock(cpu_hotplug_lock); [  909.802171]                                lock(cpuset_mutex); [  909.808013]   lock((wq_completion)sync_wq); [  909.812207]                 *** DEADLOCK *** [  909.818127] 5 locks held by test_cpuset_prs/8673: [  909.822830]  #0: ffff888140592440 (sb_writers#7){.+.+}-{0:0}, at: ksys_write+0xf9/0x1d0 [  909.830839]  #1: ffff889100a49890 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x260/0x5f0 [  909.839890]  #2: ffff8890fbfa5368 (kn->active#353){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x2b6/0x5f0 [  909.849118]  #3: ffffffffb9134d00 (cpu_hotplug_lock){++++}-{0:0}, at: cpuset_partition_write+0x77/0x130 [  909.858522]  #4: ffffffffb9741c10 (cpuset_mutex){+.+.}-{4:4}, at: cpuset_partition_write+0x85/0x130 [  909.867576]                stack backtrace: [  909.871940] CPU: 95 UID: 0 PID: 8673 Comm: test_cpuset_prs Kdump: loaded Tainted: G S                  7.0.0-rc1-test+ #3 PREEMPT(full) [  909.871946] Tainted: [S]=CPU_OUT_OF_SPEC [  909.871948] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.0X.02.0001.043020191705 04/30/2019 [  909.871950] Call Trace: [  909.871952]  [  909.871955]  dump_stack_lvl+0x6f/0xb0 [  909.871961]  print_circular_bug.cold+0x38/0x45 [  909.871968]  check_noncircular+0x146/0x160 [  909.871975]  check_prev_add+0xf1/0xc80 [  909.871978]  ? alloc_chain_hlocks+0x13e/0x1d0 [  909.871982]  ? add_chain_cache+0x11c/0x300 [  909.871986]  validate_chain+0x481/0x560 [  909.871991]  __lock_acquire+0x58c/0xbd0 [  909.871995]  ? lockdep_init_map_type+0x66/0x250 [  909.872000]  lock_acquire.part.0+0xbd/0x260 [  909.872004]  ? touch_wq_lockdep_map+0x7a/0x180 [  909.872009]  ? rcu_is_watching+0x15/0xb0 [  909.872013]  ? trace_rcu_sr_normal+0x1d5/0x2e0 [  909.872018]  ? touch_wq_lockdep_map+0x7a/0x180 [  909.872021]  ? lock_acquire+0x159/0x180 [  909.872026]  ? touch_wq_lockdep_map+0x7a/0x180 [  909.872030]  touch_wq_lockdep_map+0x93/0x180 [  909.872034]  ? touch_wq_lockdep_map+0x7a/0x180 [  909.872038]  __flush_workqueue+0x111/0x10b0 [  909.872042]  ? local_clock_noinstr+0xd/0xe0 [  909.872049]  ? __pfx___flush_workqueue+0x10/0x10 [  909.872059]  housekeeping_update+0x12d/0x2d0 [  909.872063]  update_parent_effective_cpumask+0x595/0x2440 [  909.872070]  update_prstate+0x89d/0xce0 [  909.872076]  ? __pfx_update_prstate+0x10/0x10 [  909.872085]  cpuset_partition_write+0xc5/0x130 [  909.872089]  cgroup_file_write+0x1a5/0x680 [  909.872093]  ? __pfx_cgroup_file_write+0x10/0x10 [  909.872097]  ? kernfs_fop_write_iter+0x2b6/0x5f0 [  909.872102]  ? __pfx_cgroup_file_write+0x10/0x10 [  909.872105]  kernfs_fop_write_iter+0x3df/0x5f0 [  909.872109]  vfs_write+0x525/0xfd0 [  909.872113]  ? __pfx_vfs_write+0x10/0x10 [  909.872118]  ? __lock_acquire+0x58c/0xbd0 [  909.872124]  ? find_held_lock+0x32/0x90 [  909.872130]  ksys_write+0xf9/0x1d0 [  909.872133]  ? __pfx_ksys_write+0x10/0x10 [  909.872136]  ? lockdep_hardirqs_on+0x78/0x100 [  909.872141]  ? do_syscall_64+0xde/0x1520 [  909.872146]  do_syscall_64+0x13a/0x1520 [  909.872151]  ? rcu_is_watching+0x15/0xb0 [  909.872154]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e [  909.872157]  ? lockdep_hardirqs_on+0x78/0x100 [  909.872161]  ? do_syscall_64+0x212/0x1520 [  909.872166]  ? find_held_lock+0x32/0x90 [  909.872170]  ? local_clock_noinstr+0xd/0xe0 [  909.872174]  ? __lock_release.isra.0+0x1a2/0x2c0 [  909.872178]  ? exc_page_fault+0x78/0xf0 [  909.872183]  ? rcu_is_watching+0x15/0xb0 [  909.872186]  ? trace_irq_enable.constprop.0+0x194/0x200 [  909.872191]  ? lockdep_hardirqs_on_prepare.part.0+0x8e/0x170 [  909.872196]  entry_SYSCALL_64_after_hwframe+0x76/0x7e [  909.872199] RIP: 0033:0x7f877d3e9544 [  909.872203] Code: 89 02 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 80 3d a5 cb 0d 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48 [  909.872206] RSP: 002b:00007ffd6ff21b28 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [  909.872210] RAX: ffffffffffffffda RBX: 00007f877d4bf5c0 RCX: 00007f877d3e9544 [  909.872213] RDX: 0000000000000009 RSI: 0000557ff7ec2320 RDI: 0000000000000001 [  909.872215] RBP: 0000000000000009 R08: 0000000000000073 R09: 00000000ffffffff [  909.872217] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000009 [  909.872219] R13: 0000557ff7ec2320 R14: 0000000000000009 R15: 00007f877d4bcf00 [  909.872226]