From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f53.google.com (mail-wr1-f53.google.com [209.85.221.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF76833C19E for ; Mon, 22 Jun 2026 16:56:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.53 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782147420; cv=none; b=a2mNw56rxT48MrrNW3+c998CPDmHeOBkExeqyIIYh1XYepxxSbocF4Qtjh1cjrgDAPh3aT9TKUoSQ7OfNiGEqRWNaYC6PY4tPP7LOhCuyETHLe2XN+axty/WVL1CLEui8BtbsOfoXwzAsVwdExZE6xNDGCjGFVBsJ8o/ISZc6r4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782147420; c=relaxed/simple; bh=qf40zRcsnuOEoXpgmPAowgMFBY9KeKcVOJqAjuE3l7A=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=WgQLfnpB/qjW3mqYDmINj2P6xYonp32LZ0acuq7Ue9G3XaPRRufyGv4OKHEIEhrOu4encthRdlMp0XbShNUkNcM/cvFVfaHHstjxx4waUUQnfnHp6oZWgJe0lBHoNCttpMR9Rot0KYfr7ED3WwVOO9eN885Qp0xvxWnpfzUDTRU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=jZWkqeiM; arc=none smtp.client-ip=209.85.221.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="jZWkqeiM" Received: by mail-wr1-f53.google.com with SMTP id ffacd0b85a97d-460662fcb4eso3129327f8f.0 for ; Mon, 22 Jun 2026 09:56:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1782147417; x=1782752217; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=O5Tb/07yihMUcIpxO3HGksKRnPq3PukLrx7/l/ZZeSY=; b=jZWkqeiMA5wywF11Tr5OiYllTWvklJXHUxGu2AY0o8M9OHN9ypMKlfEyczVVJHqKpU jKkgWfR017S4wU0s3wgbVkAbk6KJ8+3aPtYyped0AwN5DCGDNPjMF2N8SGckZNDp7e69 jFsopp0LwzeuYMqxprKUfQUfMgtJsFC711IdSUcI8Hld4ko8s99G7ZE6IQyQnmQiV3fl X+6xi6ys/eREKIBbKMMDdXOVmfZOh+Tr7kEpl/plhwtwzYSWNKAhqDSXyFyI1m6LKJGw 6Y0aK72+kQZ71sdZR6GplUdxUW6/Q+7EO4oZbxt2XLtysGFKkF4t6GWnJbL+3ColPmSg dWlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1782147417; x=1782752217; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=O5Tb/07yihMUcIpxO3HGksKRnPq3PukLrx7/l/ZZeSY=; b=monfYyX4Oo1kDi5M6zAz0ivHMseIcOJwe/2ylIw6LRtHR6FChrdQ4y9HgqiPYgxPkt Ur7NnVD0FUUSvoe2sghroGWmDn4bxcnFtKI/26ffNn2fySb7Pm1eLRzzymYQx+yGr0Ak sdWaJT5a1qvWgJgBsG398HevWSYBioOPLQdR+sFSYISCL3nsEX4HDRJurUfGjh4pY8YS UmHad7BrIrBXxpehzpQGmIXsWLFMPW4N+OzlrPNt81yAZ2JFuYiircN4K6CmfQrP0G+o e+1YWL7OLPmNOeFlOKZRx77ZVJL3CJP0cCqVJv8zrw2PmrtnorzeG2SzHGb0H4TZmWUN p16Q== X-Forwarded-Encrypted: i=1; AFNElJ/P4YP4FbPG5opfcoAvXT5kdWXnU/n2/j3XU6+1yR14K3m61tt/D5sUuiiTZM1dpoMHze9AwE0GnVRB2OU=@vger.kernel.org X-Gm-Message-State: AOJu0YwbVxW5VjDxpVHxiBlsWb5psylZZkaXQs4KD+Ffp4dih+9UzO6n PlG5reXv40ipyUv7H66+Z7GpL9icoV/SaDKGxYAoiITNvDMFspeATacU X-Gm-Gg: AfdE7ck5lPYoMb+kL5l22E4QTdspdV3XlOw8DN8wDw98m2sRakmNxXx2FLjZM+nCzfc UWTOJphljoIKq1Wirk+AMoJgS8ZA/WWC6EpAK65EYSj1AQ6SkOqA9sSGs3CN4x06Or0Z8ClupHV j3fhwO6LELmnpoh42VbllZZ9JwsFhOyNoLJHifk0XxhMbeUCsQC32O8pst49F9kxmeMoRZsT7vF Z68vSfzNovk9SCK+DtF6efZJ6B5XUeX3VuaEKUCh5JGvvX+Qy4psWn3lBeFAelPpfsruviNfkwU o62xgoGbTQsxbA4Tyi0Mfpn9k2kiTo7jxpmqlFRaoFgAAcbCf/glRguW+yeYldl37sEYWmkEgT9 eedBaVYGRY40mrom1GaLZnb0Zzw1fZxmlzQz00hhkEebg05jRcY3LSUXP+kIAWUbDzCjBG42BzF m+ewrzBE1uf9f7vcbrS1cHvih70u0CuxoEeTIBjEWX3GBkCiMM3Z6CnKB23hLj X-Received: by 2002:a05:600c:348f:b0:492:4636:87ae with SMTP id 5b1f17b1804b1-492463687d1mr229683685e9.17.1782147417176; Mon, 22 Jun 2026 09:56:57 -0700 (PDT) Received: from pumpkin (82-69-66-36.dsl.in-addr.zen.co.uk. [82.69.66.36]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4924944faa8sm214897175e9.13.2026.06.22.09.56.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jun 2026 09:56:56 -0700 (PDT) Date: Mon, 22 Jun 2026 17:56:55 +0100 From: David Laight To: Petr Mladek Cc: Aaron Tomlin , akpm@linux-foundation.org, lance.yang@linux.dev, mhiramat@kernel.org, linux-kernel@vger.kernel.org, neelx@suse.com, sean@ashe.io, chjohnst@gmail.com, steve@abita.co, mproche@gmail.com, nick.lange@gmail.com Subject: Re: [PATCH v3] hung_task: deduplicate identical hang reports Message-ID: <20260622175655.7598befc@pumpkin> In-Reply-To: References: <20260621213756.43225-1-atomlin@atomlin.com> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.38; arm-unknown-linux-gnueabihf) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Mon, 22 Jun 2026 17:58:54 +0200 Petr Mladek wrote: > On Sun 2026-06-21 17:37:56, Aaron Tomlin wrote: > > Currently, during severe lock contention, multiple tasks can hang while > > waiting on the exact same resource. The khungtaskd kthread > > indiscriminately reports every single instance with a stack trace. > > This can roll the kernel ring buffer and prematurely exhaust the > > kernel.hung_task_warnings budget. Consequently, the kernel is left > > entirely blind to subsequent, unrelated deadlocks. > > > > To preserve the warning budget and ring buffer without sacrificing > > observability, introduce a Wait Channel (wchan) and task-state based > > deduplicator: > > > > 1. Implement a lightweight, stack-allocated 64-slot Wait Channel > > (wchan) hash map. Tasks blocked on the exact same wchan during a > > single scan are recognised as sharing the same bottleneck, > > successfully deduplicating contentions even when the callers > > possess entirely disparate call stacks. > > I am sorry but I do not like this. It would show one random task blocked > using a locking/wait API (mutex, semaphore, wait). But it will not be > able to distinguish whether they are waiting for the same lock or > event. > > It might easily skip the lock/event which is the root of the problem. > > By other words, the motivation for this patch is to avoid duplicated > backtraces because the global limit of shown backtraces is too low > and it hides too much. But this would hide even more backtraces. > As a result administrators and developers will be even more blind. > > Honestly, the previous version looked more acceptable to me. The > problem with not-exactly same backtraces might be solved by > comparing (hashing) only the top N backtrace levels, e.g. 10th. > Anyway, we should compare the callers of the locking/waiter API. > > IMHO, we should always print backtraces of all hung tasks when > a hung_task is detected for the 1st time. Because we do not > know which of the hung tasks is pointing to the root of the problem > and which is a secondary victim. > > Also I would primary try to increase the ring buffer size when > backtraces get lost. Mostly the traces wont be seen until they get written to file by syslogd (assuming it can run). So why not write them slowly enough that it keeps up? > > > 2. Introduce a hung_task_reported bit-field in task_struct. If a task > > remains hung across multiple intervals, khungtaskd recognises it > > has already been reported. The bit is safely cleared without > > locks or atomics the moment the task's context switch counter > > increments. > > Also this looks like an interesting optimization which might help > to reduce printing the same backtrace again and again. It looks > much better than the global limit of printed backtraces. > > > 3. For duplicate tasks, we still print the single-line > > "INFO: task ..." message and trigger tracepoint > > trace_sched_process_hang(). It merely skips calling > > sched_show_task() and debug_show_blocker(), printing a concise > > suppression notice instead. > > Yes, this is important as well. And would need to include the pid of the duplicate trace so you can see which one it is. David > > Best Regards, > Petr