From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7A27872 for ; Wed, 8 Sep 2021 18:35:25 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10101"; a="207797105" X-IronPort-AV: E=Sophos;i="5.85,278,1624345200"; d="scan'208";a="207797105" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Sep 2021 11:35:18 -0700 X-IronPort-AV: E=Sophos;i="5.85,278,1624345200"; d="scan'208";a="469714607" Received: from rdbrim-mobl3.amr.corp.intel.com ([10.209.14.139]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Sep 2021 11:35:18 -0700 Date: Wed, 8 Sep 2021 11:35:18 -0700 (PDT) From: Mat Martineau To: Florian Westphal cc: mptcp@lists.linux.dev Subject: Re: [mptcp-next 0/2] Fix mptcp connection hangs after link failover In-Reply-To: <20210908154236.GH23554@breakpoint.cc> Message-ID: References: <20210906131045.18513-1-fw@strlen.de> <20210908082050.GG23554@breakpoint.cc> <20210908154236.GH23554@breakpoint.cc> Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed On Wed, 8 Sep 2021, Florian Westphal wrote: > Florian Westphal wrote: >> Mat Martineau wrote: >>> On Mon, 6 Sep 2021, Florian Westphal wrote: >>> >>>> First patch is preparation work: tx_pending_data counter is unreliable. >>>> Second patch fixes premature stop of the retransmit timer. >>>> >>>> Florian Westphal (2): >>>> mptcp: remove tx_pending_data >>>> mptcp: re-set push-pending bit on retransmit failure >>>> >>>> net/mptcp/protocol.c | 32 +++++++++++++++++++++++++------- >>>> net/mptcp/protocol.h | 1 - >>>> 2 files changed, 25 insertions(+), 8 deletions(-) >>>> >>>> -- >>>> 2.32.0 >>> >>> The code changes look ok, and I don't see any copyfd_io_poll errors. But I >>> do get consistent failures in the same group of self tests related to stale >>> links and recovery: >>> >>> 15 multiple flows, signal, link failure syn[ ok ] - synack[ ok ] - ack[ ok ] >>> add[ ok ] - echo [ ok ] >>> stale [fail] got 0 >>> stale[s] 0 recover[s], expected stale in range [1..5], stale-recover >>> delta 1 >> >> I'm on export, 9c7d1b9a14eba479466423d64f99c8a4e29b66f4, with these two >> patches and I don't see this error. >> >> Running mptcp_join -l in a loop now, no luck so far. > > Gave up, its not triggering for me. > > Any hints on reproducing this? > > Does it pass for you without my changes? > > I don't see how they would cause this; if anything this patch makes > the stale detection reliable because the retrans timer is not stopped > too early anymore and it makes sure that mptcp_subflow_get_retrans() is > called once for each retrans timer call. > Hi Florian - My apologies for wasting your time on this, it was 100% user error on my side. With everything set up right on my end, the tests are passing. So, the changes look good: Reviewed-by: Mat Martineau -- Mat Martineau Intel