From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 421BAF9C0; Sat, 28 Feb 2026 17:15:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772298947; cv=none; b=DOtig7tk23fFDUNjCj9xQNwtZh/P9jmOcvfdX9J5HMQ0aNZv80f4y0xgUhh/Nqywi6+TjwmHkUUi69ZwbNfTrgRPJoEyA+F9B9pSV8QLg67L1CJ/n3HbIjvlockPKkH9bMlDqlbdOVeoXlnEcOUlqAFFJFTpAeCbr11QBBAP+5A= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772298947; c=relaxed/simple; bh=g5zuGVKS7v7fviuuVEC+z+dh+biXaZe2a+7u1x3Txx4=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ngAGlFRCaMTqqKCzrX3mPardTFGSZ7jtFThOywrR5EPyO1DsA0ykzTUbiwu4C4cDcmYbHoOwW/zBdcs7+npPYa6Fp+fjMMCDB0sWGsScg7lRNBX2fsJ/hffr74OCUVKw/khLjNuUX22UEwnUunsy38pxWpTMgPn3hblqRozFzrg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=aLQL9r7y; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="aLQL9r7y" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4CC96C116D0; Sat, 28 Feb 2026 17:15:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772298946; bh=g5zuGVKS7v7fviuuVEC+z+dh+biXaZe2a+7u1x3Txx4=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=aLQL9r7yE9NDzL8fJ1nHueq4A01CPlMlAh9Xrw/Mb+AFs+v/KyvGYdL1DS31Kgly9 dp5qRwlira7rMEWlqOUT1o9V6PIvutz2EYDdAFG6fg0j1hql8C9rjcc2xhvilSuo2d 9uSIrcakN18H6KxKO01ac9sDL6Ar67gFg900e4675ixG5A4t4u6i/zo2Tf4nCZcjwk 1AF31EQ8E8JCGzxzkujoH2Y6ioU7XqgqmuZrlWNbbRzKm73icV+W/RQs19f9cM039/ 08/HmWlF2khBttTY1dKQPtipft5TEE8Jc7refbadKMGMojjBbgfsZ7iJvfRfDKBFfy qXHZtPNL5NXuA== Date: Sat, 28 Feb 2026 09:15:45 -0800 From: Jakub Kicinski To: Jiayuan Chen Cc: netdev@vger.kernel.org, Jiayuan Chen , syzbot+ca1345cca66556f3d79b@syzkaller.appspotmail.com, John Fastabend , Sabrina Dubroca , "David S. Miller" , Eric Dumazet , Paolo Abeni , Simon Horman , Vakul Garg , linux-kernel@vger.kernel.org Subject: Re: [PATCH net v1] tls: fix hung task in tx_work_handler by using non-blocking sends Message-ID: <20260228091545.412a9a2d@kernel.org> In-Reply-To: <20260227063231.168520-1-jiayuan.chen@linux.dev> References: <20260227063231.168520-1-jiayuan.chen@linux.dev> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Fri, 27 Feb 2026 14:32:31 +0800 Jiayuan Chen wrote: > tx_work_handler calls tls_tx_records with flags=-1, which preserves > each record's original tx_flags but results in tcp_sendmsg_locked > using an infinite send timeout. When the peer is unresponsive and the > send buffer is full, tcp_sendmsg_locked blocks indefinitely in > sk_stream_wait_memory. This causes tls_sk_proto_close to hang in > cancel_delayed_work_sync waiting for tx_work_handler to finish, > leading to a hung task: > > INFO: task ...: blocked for more than ... seconds. > Call Trace: > cancel_delayed_work_sync > tls_sw_cancel_work_tx > tls_sk_proto_close > > A workqueue handler should never block indefinitely. Fix this by > introducing __tls_tx_records() with an extra_flags parameter that > gets OR'd into each record's tx_flags. tx_work_handler uses this to > pass MSG_DONTWAIT so tcp_sendmsg_locked returns -EAGAIN immediately > when the send buffer is full, without overwriting the original > per-record flags (MSG_MORE, MSG_NOSIGNAL, etc.). On -EAGAIN, the > existing reschedule mechanism retries after a short delay. > > Also consolidate the two identical reschedule paths (lock contention > and -EAGAIN) into one. It's not that simple. The default semantics for TCP sockets is that queuing data and then calling close() is a legitimate thing to do and the data should be sent cleanly, followed by a normal FIN in such case. Maybe we should explore trying to make sure we have enough wmem before we start creating records. Get rid of the entire workqueue mess? Regarding your patch I think all callers passing -1 as flags are on the close path, you could have just added | DONTWAIT if the flags are -1. -- pw-bot: cr