From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Miller Subject: Re: [PATCH net] rxrpc: Fix lockup due to no error backoff after ack transmit error Date: Sat, 03 Nov 2018 00:00:18 -0700 (PDT) Message-ID: <20181103.000018.2014183024812085135.davem@davemloft.net> References: <154107959336.26689.262397874166625147.stgit@warthog.procyon.org.uk> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org, linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org To: dhowells@redhat.com Return-path: In-Reply-To: <154107959336.26689.262397874166625147.stgit@warthog.procyon.org.uk> Sender: linux-kernel-owner@vger.kernel.org List-Id: netdev.vger.kernel.org From: David Howells Date: Thu, 01 Nov 2018 13:39:53 +0000 > If the network becomes (partially) unavailable, say by disabling IPv6, the > background ACK transmission routine can get itself into a tizzy by > proposing immediate ACK retransmission. Since we're in the call event > processor, that happens immediately without returning to the workqueue > manager. > > The condition should clear after a while when either the network comes back > or the call times out. > > Fix this by: > > (1) When re-proposing an ACK on failed Tx, don't schedule it immediately. > This will allow a certain amount of time to elapse before we try > again. > > (2) Enforce a return to the workqueue manager after a certain number of > iterations of the call processing loop. > > (3) Add a backoff delay that increases the delay on deferred ACKs by a > jiffy per failed transmission to a limit of HZ. The backoff delay is > cleared on a successful return from kernel_sendmsg(). > > (4) Cancel calls immediately if the opening sendmsg fails. The layer > above can arrange retransmission or rotate to another server. > > Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") > Signed-off-by: David Howells Applied and queued up for -stable.