From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 42F9C3CAE70 for ; Mon, 22 Jun 2026 15:58:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.44 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782143939; cv=none; b=aHeR7GzagQhn/0lbg12f6mP6KjWZyL1GLwTAg0RJBfycuUNt+rO/lJYxn7FhfZt62G6aalPzaKHKHNZpXvUaUaEArNPyk+8RbiII+Jvd9jWxqJlOAHC6UyG7bAs2jiKbqUnLAKSGHv1+2NbamCYuu9xWGMq+OBRyS9tLVl+CFPs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782143939; c=relaxed/simple; bh=Xa0j8gVHUQrD5OOFNc0YEPkZ9v9HA2vKtT3Lpq7RG2M=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=iIRIv2CZD53kGVXFNGrYckhbOH3nfJPw8XgajcKVsW6TjAx18DcP74B8Txm3N00BnxBSW0hF/pxJjTjWKZBH3OGbBeXccu2gDkCW0XmYLl072IVIaIIYDccian4sq7gtMVTqngsoym4zU0RVD9Il6m0IGTB7a7Q4MCZp6TT8ptI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=IfDUD3Gz; arc=none smtp.client-ip=209.85.221.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="IfDUD3Gz" Received: by mail-wr1-f44.google.com with SMTP id ffacd0b85a97d-4631679f204so22076f8f.0 for ; Mon, 22 Jun 2026 08:58:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1782143937; x=1782748737; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=2Ui6UQsLcMz73wEQsBokLOxB6LshEcV9K9A/c+tSjUY=; b=IfDUD3GzemmF7hEQ/GGBkOptA/IUowQ0wxh2rToRsIlrDsp9jvQdAF4LAfG/TaBky4 dvhQ0gaxA5Z5eWzYzt+WhwX/hRItn2GUr0v0MMRudf+mroZwtCh9cj59oJmnfAosRF8F Xtsa+JuWM5VRG3j5PdWTD8OL0Gvj7JCNZk9+59RhGtlC/x91jMZO7YvzFT7wuWu/YuQJ pnLcJJ0rOHRiTJD+xjlEUU66gZ/7T84gWh0CQtKoRWmbFR4+TiG44y+D2fyCvOy8T4a0 MpOF6YoNQDw5hXYlhcqPpsiS7JOrRrPOxVndBV0oFkCPYmCqkSwRtUNRkfZi2/tMuMr1 M6PA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1782143937; x=1782748737; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2Ui6UQsLcMz73wEQsBokLOxB6LshEcV9K9A/c+tSjUY=; b=pHfAP9yL0rZB5hcZjNrskfCeCQjkGaPwTVm/BLE4R3zC3hxhg4G6YtFy5hPx7zbZr/ IWaxDmGcXYTJRZBhrPvVg6AsxwirrxSaWM2kjJrLWUZ7pfSLGaffZ4x3dYvtQ1cjT7cB kaF04PRDBZcZM3BO1Kkhqq4edsf9LJNOSEClhc2Ui8phepUUoebboXgxgkc3s4hiR0QJ VjD3Gq+v0cbL8McoWg56idFxLMUi19LTwD/3sW8NvUJOVm8YJB7ON0jeGWQijpkBk9h0 FLKRwsPdIbWc/2inOR6yAYVcZbJYfhywQpnmvjYuFwLmAJlY47wA4wKxngbTQpjBggrr tyqA== X-Forwarded-Encrypted: i=1; AFNElJ8gG5QPAd9+IRrS6QmB3hZhfKoWLCMVQBKZAMPM/dC/nMGEJNnnzwHBrArIkm6Q1Q3ZidkQC3w54T4bFnE=@vger.kernel.org X-Gm-Message-State: AOJu0YwCTqx3fDEfivEzfejW81CZHnSFkP0MipWzhyov7bF20e6Wp3gx 9Kvzc8RTLTb5NsP6S7mDEqU60VXZqrnFtHOS2sjyqHqnp5SFMhiz5XYak4cDwDNTxpQ= X-Gm-Gg: AfdE7cleGjkPBVtVoH60/n6iNtD5P1P28W7p+424EdxVtNXVdvCO1T2uD1DuFtwAho/ PV7liWIyRKJVkh1v0FqUgGiKsjPIDiRwvZj4D7b5Elwle0s4MNHFqzTjR10ujmSZirNwuTSoI46 w6z6UuyhFhyYU0L58HryMuJVHHfMh8eoxnoPVZX9Gyq1N3GtyIyYd/1R4FR2apKjbeisCqW29Sf h1p+nCN/K/NFMEDOi+LPD2WItfx5d+llQP62Z5b3eSo9Whrgv/s4AMSFZsktWjMPytysIOeC3ej EaBR3Gd9yZ2r/x/uJ0TwAJoeSC47n/o2WjZRbjZqOEiaOy1jqOfP14RLiAwEoAs1cNcMlD1gLYg cGqgZnJZvh8uvXO5ueNr7Y1oXGEQC1x4AdgIKEDaQD22q6i4w4lS1GlepThj2+iGTe6xspxSFtR b/dNfmcqXnZC2ZcsI= X-Received: by 2002:a05:600c:2ac2:b0:492:2ffd:8a74 with SMTP id 5b1f17b1804b1-4925a0ea928mr825065e9.12.1782143936727; Mon, 22 Jun 2026 08:58:56 -0700 (PDT) Received: from pathway.suse.cz ([176.114.240.130]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-466643f4e3esm26121113f8f.8.2026.06.22.08.58.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jun 2026 08:58:56 -0700 (PDT) Date: Mon, 22 Jun 2026 17:58:54 +0200 From: Petr Mladek To: Aaron Tomlin Cc: akpm@linux-foundation.org, lance.yang@linux.dev, mhiramat@kernel.org, linux-kernel@vger.kernel.org, david.laight.linux@gmail.com, neelx@suse.com, sean@ashe.io, chjohnst@gmail.com, steve@abita.co, mproche@gmail.com, nick.lange@gmail.com Subject: Re: [PATCH v3] hung_task: deduplicate identical hang reports Message-ID: References: <20260621213756.43225-1-atomlin@atomlin.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260621213756.43225-1-atomlin@atomlin.com> On Sun 2026-06-21 17:37:56, Aaron Tomlin wrote: > Currently, during severe lock contention, multiple tasks can hang while > waiting on the exact same resource. The khungtaskd kthread > indiscriminately reports every single instance with a stack trace. > This can roll the kernel ring buffer and prematurely exhaust the > kernel.hung_task_warnings budget. Consequently, the kernel is left > entirely blind to subsequent, unrelated deadlocks. > > To preserve the warning budget and ring buffer without sacrificing > observability, introduce a Wait Channel (wchan) and task-state based > deduplicator: > > 1. Implement a lightweight, stack-allocated 64-slot Wait Channel > (wchan) hash map. Tasks blocked on the exact same wchan during a > single scan are recognised as sharing the same bottleneck, > successfully deduplicating contentions even when the callers > possess entirely disparate call stacks. I am sorry but I do not like this. It would show one random task blocked using a locking/wait API (mutex, semaphore, wait). But it will not be able to distinguish whether they are waiting for the same lock or event. It might easily skip the lock/event which is the root of the problem. By other words, the motivation for this patch is to avoid duplicated backtraces because the global limit of shown backtraces is too low and it hides too much. But this would hide even more backtraces. As a result administrators and developers will be even more blind. Honestly, the previous version looked more acceptable to me. The problem with not-exactly same backtraces might be solved by comparing (hashing) only the top N backtrace levels, e.g. 10th. Anyway, we should compare the callers of the locking/waiter API. IMHO, we should always print backtraces of all hung tasks when a hung_task is detected for the 1st time. Because we do not know which of the hung tasks is pointing to the root of the problem and which is a secondary victim. Also I would primary try to increase the ring buffer size when backtraces get lost. > 2. Introduce a hung_task_reported bit-field in task_struct. If a task > remains hung across multiple intervals, khungtaskd recognises it > has already been reported. The bit is safely cleared without > locks or atomics the moment the task's context switch counter > increments. Also this looks like an interesting optimization which might help to reduce printing the same backtrace again and again. It looks much better than the global limit of printed backtraces. > 3. For duplicate tasks, we still print the single-line > "INFO: task ..." message and trigger tracepoint > trace_sched_process_hang(). It merely skips calling > sched_show_task() and debug_show_blocker(), printing a concise > suppression notice instead. Yes, this is important as well. Best Regards, Petr