From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B240FC43334 for ; Fri, 10 Jun 2022 17:42:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345038AbiFJRmL (ORCPT ); Fri, 10 Jun 2022 13:42:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38026 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346863AbiFJRmL (ORCPT ); Fri, 10 Jun 2022 13:42:11 -0400 Received: from 1wt.eu (wtarreau.pck.nerim.net [62.212.114.60]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 3A5E155226 for ; Fri, 10 Jun 2022 10:42:05 -0700 (PDT) Received: (from willy@localhost) by pcw.home.local (8.15.2/8.15.2/Submit) id 25AHg1Nr020088; Fri, 10 Jun 2022 19:42:01 +0200 Date: Fri, 10 Jun 2022 19:42:01 +0200 From: Willy Tarreau To: Ronny Meeus Cc: David Laight , Eric Dumazet , netdev Subject: Re: TCP socket send return EAGAIN unexpectedly when sending small fragments Message-ID: <20220610174201.GC19540@1wt.eu> References: <0e02ea2593204cd9805c6ed4b7f46c98@AcuMS.aculab.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Fri, Jun 10, 2022 at 07:16:06PM +0200, Ronny Meeus wrote: > Op vr 10 jun. 2022 om 17:21 schreef David Laight : > > > > ... > > > If the 5 queued packets on the sending side would cause the EAGAIN > > > issue, the real question maybe is why the receiving side is not > > > sending the ACK within the 10ms while for earlier messages the ACK is > > > sent much sooner. > > > > Have you disabled Nagle (TCP_NODELAY) ? > > Yes I enabled TCP_NODELAY so the Nagle algo is disabled. > I did a lot of tests over the last couple of days but if I remember well > enable or disable TCP_NODELAY does not influence the result. There are many possible causes for what you're observing. For example if your NIC has too small a tx ring and small buffers, you can imagine that the Nx106 bytes fit in the buffers but not the N*107, which cause a tiny delay waiting for the Tx IRQ to recycle the buffers, and that during this time your subsequent send() are coalesced into larger segments that are sent at once when using 107. If you do not want packets to be sent individually and you know you still have more to come, you need to put MSG_MORE on the send() flags (or to disable TCP_NODELAY). Clearly, when running with TCP_NODELAY you're asking the whole stack "do your best to send as fast as possible", which implies "without any consideration for efficiency optimization". I've seen a situation in the past where it was impossible to send any extra segment after a first unacked PUSH was in flight. Simply sending full segments was enough to considerably increase the performance. I analysed this as a result of the SWS avoidance algorithm and concluded that it was normal in that situation, though I've not witnessed it anymore in a while. So just keep in mind to try not to abuse TCP_NODELAY too much. Willy