From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f181.google.com ([209.85.192.181]:33357 "EHLO mail-pf0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753996AbeD3Pnw (ORCPT ); Mon, 30 Apr 2018 11:43:52 -0400 Received: by mail-pf0-f181.google.com with SMTP id f15so7041632pfn.0 for ; Mon, 30 Apr 2018 08:43:52 -0700 (PDT) Subject: Re: [PATCH V2 net-next 1/2] tcp: send in-queue bytes in cmsg upon read To: David Miller , soheil.kdev@gmail.com Cc: netdev@vger.kernel.org, ycheng@google.com, ncardwell@google.com, edumazet@google.com, willemb@google.com, soheil@google.com References: <20180427185733.36855-1-soheil.kdev@gmail.com> <20180430.113834.1760530542793231849.davem@davemloft.net> From: Eric Dumazet Message-ID: Date: Mon, 30 Apr 2018 08:43:50 -0700 MIME-Version: 1.0 In-Reply-To: <20180430.113834.1760530542793231849.davem@davemloft.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org List-ID: On 04/30/2018 08:38 AM, David Miller wrote: > From: Soheil Hassas Yeganeh > Date: Fri, 27 Apr 2018 14:57:32 -0400 > >> Since the socket lock is not held when calculating the size of >> receive queue, TCP_INQ is a hint. For example, it can overestimate >> the queue size by one byte, if FIN is received. > > I think it is even worse than that. > > If another application comes in and does a recvmsg() in parallel with > these calculations, you could even report a negative value. > > These READ_ONCE() make it look like some of these issues are being > addressed but they are not. > > You could freeze the values just by taking sk->sk_lock.slock, but I > don't know if that cost is considered acceptable or not. > > Another idea is to sample both values in a loop, similar to a sequence > lock sequence: > > again: > tmp1 = A; > tmp2 = B; > barrier(); > tmp3 = A; > if (tmp1 != tmp3) > goto again; > > But the current state of affairs is not going to work well. > We want a hint, and max_t(int, 0, ....) does not return a negative value ? If the hint is wrong in 0.1 % of the cases, we really do not care, it is not meant to replace the existing precise ( well, sort of ) mechanism. I say sort of, because by the time we have any number, TCP might have received more packets anyway.