From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CA18C34029C for ; Sat, 31 Jan 2026 23:13:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769901196; cv=none; b=IK4ULlwGScN7RaLR/PA1/5WA3NWMT9MajF2e73pMFlESjmHhbQ+7JEh3HTNg8mp/ptBmQk469iTN5SotQ39RbsZ7QmPGLY0xmvenUOWKNNlmayX0EmdjInVNaF9u3dOJa39Q//ns2+e5ZFTxtkkfewEx+Gu6UdM8XAeo4WhAtV8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769901196; c=relaxed/simple; bh=DtEYf+/8dVGERPUSIP6nPLPLDmauxryQtSCXG9Agjn8=; h=From:Message-ID:Date:MIME-Version:Subject:To:Cc:References: In-Reply-To:Content-Type; b=gqmKRKraIh9Yf+hwboZztJu6W0jnsRqWbE5SXWMIsOMrDCFsvUmF+ikW/CHYb+nSUsOTBJmroU6lYI0JjaniHrI61KAu+TbAnkGnSLif6REA4fUngbNIauHLDuMW5kKKIc/3RMPGPJbgrYEZ9GIok8h/OArltRmN+mjNpK5wLgA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ZHEtNEdg; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=im7jo4YX; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZHEtNEdg"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="im7jo4YX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1769901194; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2sf+9S76xELHSUmtw/UoSFJlDFF6KBx58ZpWb07Yd8Q=; b=ZHEtNEdg4OeeYPcF2qzPBKgowXRXjxp5kRlL6Y5UIx7HKa5NFV6LHM6tBpo3U4S7ShHmNu 86Q07+TQ28J5oCaX6Av8nlSD9sa19oGsa7GqWDlMgQKY5bKI+m7zXoaBqxSQxSOGn6WEUj 3yLRvA7bbtPPezyDJoh+/C4w1n+XjxA= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-88-xT2IeqBSNzO0CHF4xqyHFw-1; Sat, 31 Jan 2026 18:13:12 -0500 X-MC-Unique: xT2IeqBSNzO0CHF4xqyHFw-1 X-Mimecast-MFC-AGG-ID: xT2IeqBSNzO0CHF4xqyHFw_1769901192 Received: by mail-qt1-f200.google.com with SMTP id d75a77b69052e-5014c9ee70bso10625981cf.1 for ; Sat, 31 Jan 2026 15:13:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1769901192; x=1770505992; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:user-agent:mime-version:date:message-id:from:from:to :cc:subject:date:message-id:reply-to; bh=2sf+9S76xELHSUmtw/UoSFJlDFF6KBx58ZpWb07Yd8Q=; b=im7jo4YX4kOqYOWvJuUShfI1ddkxn5mjIcgAhaEJ8xNhpcQmEMZ8Gkk/BcAzZ/B9vB RRlvhFhpKReWJc9aKpTPxhuYaA74/5LDBL+YJhtaoBOaYosI14TdgUNq/7yBBKBTUvVq n60VqriRwYrDrCEt0XW2OqpLz1Bt1CnU9uKiwPGOrlpsjBpVQxOQb2l37Nt+Cg+CmdQx iVfKI48JuVoFJkf8ALZ6iSq/GUrvdYx3bHZgAQyQx+IN3FVRP+XphVrpPfhyYS44V+iS NuNi8+kTQBqhSsQqj9VQS3Hps36h1mnBPCqREvlthyiQ+/v/b2hV4HkKF6Zkjb8QKet6 JK3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769901192; x=1770505992; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:user-agent:mime-version:date:message-id:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2sf+9S76xELHSUmtw/UoSFJlDFF6KBx58ZpWb07Yd8Q=; b=FYqg5OsVKEEF0W92ZCRt2oKut60Sg94YS+BnSWgYpMlMHJFRQsHIS3+tSzdLqMDDgS ImIBd6gXhDQ0ukRE3wv9yo8dA+upvQPt5Lv3aJEQb7SxnigNfqKIMcABgcVFDmPnyGha G4UxDCGGpbUXZz5xaOsnos1/iZwZ3NZn/ZL/Uy61WH9lDSjdBUvlSawdWrhCstWvcfgu iAHBSPnn142WmjSiKE1I8M/FM0qKb13FigOuzUsRcA98tWUwGT8CyB0phXvUtamS+p7d U1R3qtpAyQiVcAhLMrrevdVTFo6spapT0w8VdMUUNM80srcyZ5Xag9ymhH2+R7xwFnUJ zNpQ== X-Forwarded-Encrypted: i=1; AJvYcCVobZLFii/Nfo/+si5ikIdwvDnv5SEW27tE7+dMLBVfE8KCHTFz+s6fsnVe0XLyGdgRERLX//ikL5BY6/JpUn0=@vger.kernel.org X-Gm-Message-State: AOJu0YwOG9d9DIlbsyf5h/wmlUF8cliKLgHpak1xnaMGn38GMq14gIo3 Ik/f91k5uuy11TWIjqOEAnIHnB62HNJdzDb+mIItYLfjXIgWiS1IZwuG8hJysdgmu05hZTArp+w L+JSNFfBqGB1jxmtmu9QWwuHDAy/0/YXrHERqRb8MFCW6DtBIql38Tx/4ER1jpEDR/OvwzA== X-Gm-Gg: AZuq6aIoSN/uG15qDsBeShTJgNG+HuWyS/rOediQr4JlFlEbSdIpLFMZOBHEVAyMaOf 9hAJ6YwB9l6nIOj2ub5U0Jt6+UibstkRID8N2cF5l+Utw9/STBRo03PHOwDOOhcapn4Y4XyfuvZ NoIMwj+YKruU71bTb0rdFfXf6UerQskxedjDl6ixGFHjuGy7CM47S60AAs1zSTWr7YitqSO0GRl FF4/A5K+wovrKRwZs0BgegrKSMcJHwQnrA5naGAD/N813p3vJbf9WHw+gk/mIO26rlk8nON2ne7 luQL//laj27Q/qTGemS6/yHu2Db2z0XMTsVroRXX7kNo02t0eP7JYo8klI5av3Nej1vBbR+LCId 25OfgYi23EaCqjtzel2yguzySD1e+0c/NRoeEq57hYNfNX1sO1kuGcvFS X-Received: by 2002:ac8:5854:0:b0:501:4ff5:ae3 with SMTP id d75a77b69052e-505d2275ae3mr98375671cf.42.1769901191963; Sat, 31 Jan 2026 15:13:11 -0800 (PST) X-Received: by 2002:ac8:5854:0:b0:501:4ff5:ae3 with SMTP id d75a77b69052e-505d2275ae3mr98375491cf.42.1769901191517; Sat, 31 Jan 2026 15:13:11 -0800 (PST) Received: from ?IPV6:2601:188:c102:b180:1f8b:71d0:77b1:1f6e? ([2601:188:c102:b180:1f8b:71d0:77b1:1f6e]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-50337cc19d7sm81204921cf.35.2026.01.31.15.13.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 31 Jan 2026 15:13:11 -0800 (PST) From: Waiman Long X-Google-Original-From: Waiman Long Message-ID: Date: Sat, 31 Jan 2026 18:13:09 -0500 Precedence: bulk X-Mailing-List: linux-kselftest@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH/for-next v2 2/2] cgroup/cpuset: Introduce a new top level cpuset_top_mutex To: Chen Ridong , Tejun Heo , Johannes Weiner , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Anna-Maria Behnsen , Frederic Weisbecker , Thomas Gleixner , Shuah Khan Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org References: <20260130154254.1422113-1-longman@redhat.com> <20260130154254.1422113-3-longman@redhat.com> <62022397-287c-4046-94de-058ff87ad728@huaweicloud.com> Content-Language: en-US In-Reply-To: <62022397-287c-4046-94de-058ff87ad728@huaweicloud.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 1/30/26 9:53 PM, Chen Ridong wrote: > > On 2026/1/30 23:42, Waiman Long wrote: >> The current cpuset partition code is able to dynamically update >> the sched domains of a running system and the corresponding >> HK_TYPE_DOMAIN housekeeping cpumask to perform what is essentally the >> "isolcpus=domain,..." boot command line feature at run time. >> >> The housekeeping cpumask update requires flushing a number of different >> workqueues which may not be safe with cpus_read_lock() held as the >> workqueue flushing code may acquire cpus_read_lock() or acquiring locks >> which have locking dependency with cpus_read_lock() down the chain. Below >> is an example of such circular locking problem. >> >> ====================================================== >> WARNING: possible circular locking dependency detected >> 6.18.0-test+ #2 Tainted: G S >> ------------------------------------------------------ >> test_cpuset_prs/10971 is trying to acquire lock: >> ffff888112ba4958 ((wq_completion)sync_wq){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x7a/0x180 >> >> but task is already holding lock: >> ffffffffae47f450 (cpuset_mutex){+.+.}-{4:4}, at: cpuset_partition_write+0x85/0x130 >> >> which lock already depends on the new lock. >> >> the existing dependency chain (in reverse order) is: >> -> #4 (cpuset_mutex){+.+.}-{4:4}: >> -> #3 (cpu_hotplug_lock){++++}-{0:0}: >> -> #2 (rtnl_mutex){+.+.}-{4:4}: >> -> #1 ((work_completion)(&arg.work)){+.+.}-{0:0}: >> -> #0 ((wq_completion)sync_wq){+.+.}-{0:0}: >> >> Chain exists of: >> (wq_completion)sync_wq --> cpu_hotplug_lock --> cpuset_mutex >> >> 5 locks held by test_cpuset_prs/10971: >> #0: ffff88816810e440 (sb_writers#7){.+.+}-{0:0}, at: ksys_write+0xf9/0x1d0 >> #1: ffff8891ab620890 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x260/0x5f0 >> #2: ffff8890a78b83e8 (kn->active#187){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x2b6/0x5f0 >> #3: ffffffffadf32900 (cpu_hotplug_lock){++++}-{0:0}, at: cpuset_partition_write+0x77/0x130 >> #4: ffffffffae47f450 (cpuset_mutex){+.+.}-{4:4}, at: cpuset_partition_write+0x85/0x130 >> >> Call Trace: >> >> : >> touch_wq_lockdep_map+0x93/0x180 >> __flush_workqueue+0x111/0x10b0 >> housekeeping_update+0x12d/0x2d0 >> update_parent_effective_cpumask+0x595/0x2440 >> update_prstate+0x89d/0xce0 >> cpuset_partition_write+0xc5/0x130 >> cgroup_file_write+0x1a5/0x680 >> kernfs_fop_write_iter+0x3df/0x5f0 >> vfs_write+0x525/0xfd0 >> ksys_write+0xf9/0x1d0 >> do_syscall_64+0x95/0x520 >> entry_SYSCALL_64_after_hwframe+0x76/0x7e >> >> To avoid such a circular locking dependency problem, we have to >> call housekeeping_update() without holding the cpus_read_lock() and >> cpuset_mutex. The current set of wq's flushed by housekeeping_update() >> may not have work functions that call cpus_read_lock() directly, >> but we are likely to extend the list of wq's that are flushed in the >> future. Moreover, the current set of work functions may hold locks that >> may have cpu_hotplug_lock down the dependency chain. >> >> One way to do that is to introduce a new top level cpuset_top_mutex >> which will be acquired first. This new cpuset_top_mutex will provide >> the need mutual exclusion without the need to hold cpus_read_lock(). >> > Introducing a new global lock warrants careful consideration. I wonder if we > could make all updates to isolated_cpus asynchronous. If that is feasible, we > could avoid adding a global lock altogether. If not, we need to clarify which > updates must remain synchronous and which ones can be handled asynchronously. Almost all the cpuset code are run with cpuset_mutex held with either cpus_read_lock or cpus_write_lock. So there is no concurrent access/update to any of the cpuset internal data. The new cpuset_top_mutex is aded to resolve the possible deadlock scenarios with the new housekeeping_update() call without breaking this model. Allow parallel concurrent access/update to cpuset data will greatly complicate the code and we will likely missed some corner cases that we have to fix in the future. We will only do that if cpuset is in a critical performance path, but it is not. It is not just isolated_cpus that we are protecting, all the other cpuset data may be at risk if we don't have another top level mutex to protect them. Cheers, Longman