From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from canpmsgout03.his.huawei.com (canpmsgout03.his.huawei.com [113.46.200.218]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C3FCB190664 for ; Mon, 2 Feb 2026 01:19:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.218 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769995175; cv=none; b=cYzlBnjwfa7+TLwDTNOUotWx67zuMK8LlIbHXzBAFONqOJvlIHwBtYMP3jT7EKcfO+eFDg/FY/EjjgAlTjc7emFKmK8XGjuRPk0DmR1z0pqKtgWwRTVsGrN4XVs3vtQVHoFG6RJj5KtFPc8oxo8kQwZt/OzyJ2MPRYaJ+MS8akY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769995175; c=relaxed/simple; bh=nDydltF1LQZ9cjsdVDGO/kyIbeOcczo664mY57X7VU0=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=D0YMMLQHz8KKcZ9zWfqUqGDudkbQ7/kxwKJdhXoYOvtrWYcDh/7mS7uKyFNCcWqAuzqqJknl3+oYObtDNUVVns8V1JIwC3EM0AyLc/tnYANwhO6STRLG0Z/hf6i4uAsjeOggiTOw5SD07u7tKvy1/U9MuwVLteXRCdP2xfkqhPA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=qr5/XE/X; arc=none smtp.client-ip=113.46.200.218 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="qr5/XE/X" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=nDydltF1LQZ9cjsdVDGO/kyIbeOcczo664mY57X7VU0=; b=qr5/XE/XVq5TXwygK5y8eidHWrt09lwt/JF+222GIPDYHsvPlEng8rWDLwY0RkN8tlBD2h/3Q Bn+TLG+vv2t7D4xcfAhMNI5vvDCgAAiSOr7BuIJoIoQIe7QoqcZFPZpnQjKi75KqZORwYCkjqdN sbTJDM6f9g8OJhP3B0GPF7U= Received: from mail.maildlp.com (unknown [172.19.162.144]) by canpmsgout03.his.huawei.com (SkyGuard) with ESMTPS id 4f47vm6WQtzpStX; Mon, 2 Feb 2026 09:15:16 +0800 (CST) Received: from kwepemg100014.china.huawei.com (unknown [7.202.181.54]) by mail.maildlp.com (Postfix) with ESMTPS id 683B840538; Mon, 2 Feb 2026 09:19:23 +0800 (CST) Received: from kwepemj100010.china.huawei.com (7.202.194.4) by kwepemg100014.china.huawei.com (7.202.181.54) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.36; Mon, 2 Feb 2026 09:19:23 +0800 Received: from kwepemj100010.china.huawei.com ([7.202.194.4]) by kwepemj100010.china.huawei.com ([7.202.194.4]) with mapi id 15.02.1544.036; Mon, 2 Feb 2026 09:19:23 +0800 From: Zhangjiaji To: "stable@vger.kernel.org" CC: "huyu (D)" , "Wangqinxiao (Tom)" , "regressions@lists.linux.dev" , Liumengqiu Subject: lock contention: x86/kvm: Potential deadlock between shrinker_rwsem and kvm_lock under high VM load Thread-Topic: lock contention: x86/kvm: Potential deadlock between shrinker_rwsem and kvm_lock under high VM load Thread-Index: AdySmyxt6ZfKKTQjQGq+FIE+eIgsTQBRISkAAABmVOA= Date: Mon, 2 Feb 2026 01:19:22 +0000 Message-ID: References: <505c34d2cef84117b7e995c211efc393@huawei.com> In-Reply-To: Accept-Language: en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Hi all, I'm hitting a lock contention / long stall issue on an x86 KVM host under h= eavy VM load, and I'd like to ask for advice on the proper fix direction. Problem summary When the host is under heavy VM pressure and a cache drop is triggered, the= reclaim path can hold shrinker_rwsem for a long time due to lock contentio= n on kvm_lock inside the KVM/MMU shrinker, which then blocks systemd in a w= ay that also holds cgroup_mutex, causing cascading issues (e.g., journald l= og gaps). Observed lock chain / flow >From what I see: 1. drop_caches leads to slab reclaim and enters shrink_slab() 2. shrink_slab() takes shrinker_rwsem 3. It then enters do_shrink_slab() 4. During slab shrinking, the KVM/MMU shrinker callback is invoked (e.g mmu= _shrink_scan()) to reclaim KVM-related caches 5. mmu_shrink_scan() attempts to take kvm_lock 6. Under heavy VM load, kvm_lock is highly contended, so the shrinker callb= ack stalls and shrinker_rwsem remains held for an extended time In parallel: 7. systemd holds cgroup_mutex (e.g. during cgroup operations) and then trie= s to acquire shrinker_rwsem 8. Because shrinker_rwsem is still held by the drop_caches reclaim path, sy= stemd blocks while still holding cgroup_mutex 9. Other components (e.g. systemd-journald) needing cgroup_mutex become blo= cked, leading to issues such as logging stalls/gaps Impact - Long stalls in systemd-controlled cgroup operations - systemd-journald (and possibly others) blocked on cgroup_mutex, causing l= og dropouts / discontinuities - Overall system responsiveness degradation during the cache-drop operation Questions 1. Is it expected/acceptable for a shrinker callback (KVM/MMU shrinker) to = contend on a highly contended lock like kvm_lock while shrinker_rwsem is he= ld? 2. Are there known recommendations to avoid holding shrinker_rwsem across p= otentially blocking/contended shrinker callbacks? 3. Would the preferred fix be on the KVM shrinker side (e.g. using mutex_tr= ylock()/spin_trylock() semantics and returning SHRINK_STOP/-EAGAIN style be= havior when contended), or on the shrink_slab/shrinker infrastructure side? 4. Alternatively, is there any known guidance for systemd/cgroup codepaths = to avoid waiting on shrinker_rwsem while holding cgroup_mutex (to avoid loc= k chaining)? Please let me know what the most useful information would be, and what dire= ction you would recommend for a fix. Thanks, Huyu