From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com [209.85.128.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1DDAF3563CA for ; Mon, 12 Jan 2026 11:18:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.41 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768216692; cv=none; b=U6yhVQe5CsLw7DRDveGHq9Ay4u+XTbHPwJ/NIzHgyGZcAIUUh8pxb086utu7yMS8lVe7r0d1wIJAjOwMappm5QDYqqB7dl+o67u3IKkl9OGM1EhmnYxZ+touq7edwdGTCIMc60WORocyr0vSbx6S+ls12sWXD4aET6jcu8YU+zs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768216692; c=relaxed/simple; bh=+lzNnO3EZoAX6937LfVXAUk38eYprgK6ZLoVn7TWovE=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type; b=dn733oT4XOoKwsZoPCwnDanPDvNlKf+PTo3eOpLwMj5LVfoJzS6L9mr9wkOoPE7Q7qT1EWZLRwRQUnMS03gIViwGCBXPcZE2IHNUep1C8B98OZxo6ttxQxM22xWVci3OB9eCUFPbTZaF7r8RIKrL9dQNz/OrKprczP6qA2EoLRA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=readmodwrite.com; spf=none smtp.mailfrom=readmodwrite.com; dkim=pass (2048-bit key) header.d=readmodwrite-com.20230601.gappssmtp.com header.i=@readmodwrite-com.20230601.gappssmtp.com header.b=gz8zYUrZ; arc=none smtp.client-ip=209.85.128.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=readmodwrite.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=readmodwrite.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=readmodwrite-com.20230601.gappssmtp.com header.i=@readmodwrite-com.20230601.gappssmtp.com header.b="gz8zYUrZ" Received: by mail-wm1-f41.google.com with SMTP id 5b1f17b1804b1-477619f8ae5so46257865e9.3 for ; Mon, 12 Jan 2026 03:18:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=readmodwrite-com.20230601.gappssmtp.com; s=20230601; t=1768216689; x=1768821489; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=p9T3d71YKL4EMfDR/cWS8mNsqGkj5jXmfI0Eur6XceY=; b=gz8zYUrZuW0PqzKDkayI00wCsS+qzZ0lR7jNXHjkBj/HOrBuJ5cvfYYtOo5XzPV/Pq 4KDb9axtElo4JfmGtpLaDI5+On3t7YfWBzU3Wv782JSVbZPtkbjN9UWPsiAvCjz2HfcR MKtLjbkiMatSSkMw2h5aDcZT96t6IYYQLze4XD0EyeDN1ODoDZVMJegPuPeiiTCQ306H gTxFdw/U1qV1DpoMzLvuEPsByM5/y0pdv2fjuUDjc+Y8EgAPGd6fk8XjQz8W7FLhnGJF 4A/uItsDyt/WR7P8LL6XOqkSIXbkt+Fdkq75NUaXB7OPimahaeIKxLkI6Wb4gLVtRv5l b5kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768216689; x=1768821489; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=p9T3d71YKL4EMfDR/cWS8mNsqGkj5jXmfI0Eur6XceY=; b=qoHwhrwWQGFBB3C4LxkcgAvepDI2vENc6yiE+f5NrXt+1JQ3Wqp1zHhbGx2Mlb+N+m wdSzJIOM1I0nNlQEiA8HO0tJzRe02lXOiRN7bu/BtyUWorOoF3NKpRrMSxHlp4la9Glt ob1VPS1lx8jAbExqx6UvC8gN0J0lVkix/i2xpp+huQPpeHKyYyd8RWlX23dRSCYOVnmI foUQ4yEGnFl+S59uhFywY1ENJNeZ47pz1JP4rsVllM+uEGgNvGL6xOQSJNFTTjp1B8qG NB5BQ3MDo0s1pps9TwDGrOtVOQW5ISH+3CRelx37h+Dw9xV/ELqjiZCXAL+d1VougBpK z9Zg== X-Gm-Message-State: AOJu0Yzx2mhEa/ydLlH2tgvuoVgND5/cHxtrAfup8NCH6W2xu7PXxylk nQokhumXXp0eHjW8ycl3PqtLHe1uz6PoyOzFOJoy5x90Xyr4SanSH3lG/t61GKwx1Jc= X-Gm-Gg: AY/fxX7AZWhWIYj601MUVQV3Spf18D29MYjjN81fm1wvf/2JSxR7mZ8USKnXaPh52h3 N9gvUCYFmy0uySlkCSXX1IlBudUQ2px2v/A4oixtvBOn0qNUewab9e40lGEeB35qoZKmbl7U+F5 jvMU0mdv7yB1ESnwBM5wTN59C71xq+9o8pwoSGHfMgs3xCVftVnL+mozBpLbUsGXNL13Q3d0AVg EUrApIxsmoq6vxGJIXlsRjE0HQAPVZF6U7oV5ja1BZiWAqUVKwl02Ag6rC7c/sHl9z6s6dnF83m l9Wp3h06hjcNj1jVUe2v2TFByh2gU/oHUstFA0pWvqX3+dktO/uf6VIM3Z7bCpA2uWEGNTH4ev1 lUSWuZMLpRVvg9OrtFCMynMOYkpz+xTCoYrJj9OF6COls6Tx4Q3sEKR0Slh8OBLOykEJ+Vl6wHl RKItFLVOmKoOEOm3KYpK5E5JIitkFrIEQneuI= X-Google-Smtp-Source: AGHT+IE8VvQD2h057zZTq10HWR7xOcvU+coYtUX2017x46qZjQsqBv/YBw6sYjaUyLmpStwwKSLG3A== X-Received: by 2002:a05:600c:198e:b0:479:2f95:5179 with SMTP id 5b1f17b1804b1-47d84b2cf2dmr203924545e9.15.1768216688934; Mon, 12 Jan 2026 03:18:08 -0800 (PST) Received: from matt-Precision-5490.. ([104.28.192.51]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-432bd5dfa07sm37852759f8f.25.2026.01.12.03.18.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Jan 2026 03:18:08 -0800 (PST) From: Matt Fleming To: Jan Kara Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Tejun Heo , Christian Brauner , linux-fsdevel@vger.kernel.org, kernel-team@cloudflare.com Subject: [REGRESSION] 6.12: Workqueue lockups in inode_switch_wbs_work_fn (suspect commit 66c14dccd810) Date: Mon, 12 Jan 2026 11:18:04 +0000 Message-ID: <20260112111804.3773280-1-matt@readmodwrite.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: cgroups@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Hi Jan, it's me again :) I’m writing to report a regression we are observing in our production environment running kernel 6.12. We are seeing severe workqueue lockups that appear to be triggered by high-volume cgroup destruction. We have isolated the issue to 66c14dccd810 ("writeback: Avoid softlockup when switching many inodes"). We're seeing stalled tasks in the inode_switch_wbs workqueue. The worker appears to be CPU-bound within inode_switch_wbs_work_fn, leading to RCU stalls and eventual system lockups. Here is a representative trace from a stalled CPU-bound worker pool: [1437023.584832][ C0] Showing backtraces of running workers in stalled CPU-bound worker pools: [1437023.733923][ C0] pool 358: [1437023.733924][ C0] task:kworker/89:0 state:R running task stack:0 pid:3136989 tgid:3136989 ppid:2 task_flags:0x4208060 flags:0x00004000 [1437023.733929][ C0] Workqueue: inode_switch_wbs inode_switch_wbs_work_fn [1437023.733933][ C0] Call Trace: [1437023.733934][ C0] [1437023.733937][ C0] __schedule+0x4fb/0xbf0 [1437023.733942][ C0] __cond_resched+0x33/0x60 [1437023.733944][ C0] inode_switch_wbs_work_fn+0x481/0x710 [1437023.733948][ C0] process_one_work+0x17b/0x330 [1437023.733950][ C0] worker_thread+0x2ce/0x3f0 Our environment makes heavy use of cgroup-based services. When these services -- specifically our caching layer -- are shut down, they can trigger the offlining of a massive number of inodes (approx. 200k-250k+ inodes per service). We have verified that reverting 66c14dccd810 completely eliminates these lockups in our production environment. I am currently working on creating a synthetic reproduction case in the lab to replicate the inode/cgroup density required to trigger this on demand. In the meantime, I wanted to share these findings to see if you have any insights. Thanks, Matt