From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vladislav Yasevich Subject: Re: [PATCHv2] sctp: Enforce retransmission limit during shutdown Date: Wed, 06 Jul 2011 10:31:56 -0400 Message-ID: <4E1471DC.2090407@hp.com> References: <20110629135704.GB10085@canuck.infradead.org> <4E0B3491.1060603@hp.com> <20110629143649.GC10085@canuck.infradead.org> <4E0B3DA1.9060200@hp.com> <20110629154814.GD10085@canuck.infradead.org> <4E0B4F71.4020108@hp.com> <20110630084933.GA24074@canuck.infradead.org> <4E0C8368.5090502@hp.com> <20110704135019.GA801@canuck.infradead.org> <4E146652.7010205@hp.com> <20110706141808.GA17652@canuck.infradead.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit To: netdev@vger.kernel.org, davem@davemloft.net, Wei Yongjun , Sridhar Samudrala , linux-sctp@vger.kernel.org Return-path: Received: from g4t0016.houston.hp.com ([15.201.24.19]:3491 "EHLO g4t0016.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753148Ab1GFOcD (ORCPT ); Wed, 6 Jul 2011 10:32:03 -0400 In-Reply-To: <20110706141808.GA17652@canuck.infradead.org> Sender: netdev-owner@vger.kernel.org List-ID: On 07/06/2011 10:18 AM, Thomas Graf wrote: > On Wed, Jul 06, 2011 at 09:42:42AM -0400, Vladislav Yasevich wrote: >> On a related note, were you going to re-submit the receiver patch as well? > > Yes > >> On 07/04/2011 09:50 AM, Thomas Graf wrote: >>> + * retransmission limit. Stop that timer as soon >>> + * as the receiver acknowledged any data. >>> + */ >>> + t = &asoc->timers[SCTP_EVENT_TIMEOUT_T5_SHUTDOWN_GUARD]; >>> + if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING && >>> + timer_pending(t) && del_timer(t)) >>> + sctp_association_put(asoc); >>> + >> >> I believe 'state' and 'timers' are in different cache lines, so might be able to optimize it >> a little by checking the state prior to referencing timers array. > > gcc should do that but I'm fine with changing it. > >>> + * >>> + * Allow the association to timeout if SHUTDOWN is >>> + * pending in case the receiver stays in zero window >>> + * mode forever. >>> */ >>> if (!q->asoc->peer.rwnd && >>> !list_empty(&tlist) && >>> - (sack_ctsn+2 == q->asoc->next_tsn)) { >>> + (sack_ctsn+2 == q->asoc->next_tsn) && >>> + !(q->asoc->state >= SCTP_STATE_SHUTDOWN_PENDING)) { >> >> Would a test for (q->asoc->state != SCTP_STATE_SHUTDOWN_PENDING) be clearer? We only >> care about the PENDING state here. > > I think SHUTDOWN_RECEIVED should also be included. We continue to transmit and > process SACKs after receiving a SHUTDOWN. I am not sure about SHUTDOWN_RECEIVED. If we received shutdown, then we are not in a 0 window situation. Additionally, the sender of the SHUTDOWN started the GUARD timer and will abort after it expires. So there is no special handling on our part. -vlad > >>> + * Although RFC2960 and RFC4460 specify that the overall error >>> + * count must be cleared when a HEARTBEAT ACK is received this >>> + * behaviour may prevent the maximum retransmission count from >>> + * being reached while in SHUTDOWN. If the peer keeps its window >>> + * closed not acknowledging any outstanding TSN we may rely on >>> + * reaching the max_retrans limit via the T3-rtx timer to close >>> + * the association which will never happen if the error count is >>> + * reset every heartbeat interval. >>> + */ >>> + if (!(t->asoc->state >= SCTP_STATE_SHUTDOWN_PENDING)) >>> + t->asoc->overall_error_count = 0; >> >> Same here. We only care about the PENDING state. Also, please fix the comment to reflect >> the code. > > Agreed. > >>> + if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING) { >>> + /* >>> + * We are here likely because the receiver had its rwnd >>> + * closed for a while and we have not been able to >>> + * transmit the locally queued data within the maximum >>> + * retransmission attempts limit. Start the T5 >>> + * shutdown guard timer to give the receiver one last >>> + * chance and some additional time to recover before >>> + * aborting. >>> + */ >>> + sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_RESTART, >>> + SCTP_TO(SCTP_EVENT_TIMEOUT_T5_SHUTDOWN_GUARD)); >> >> This is bug. You don't want to restart the timer every time you hit a T3-timeout. Remember, since you fall >> through here, you do another retransmission and schedule another timeout. So next time the timeout happens, >> you'll restart the SHUTDOWN_GUARD, which is not what you want. >> >> We want to start it once if it isn't pending, and leave it running without restart if it is already pending. > > Doh, absolutely. The timer_pending() check got lost between testing and submission. >