All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: netdev@vger.kernel.org, davem@davemloft.net, edumazet@google.com,
	pabeni@redhat.com, soheil@google.com,
	Willem de Bruijn <willemb@google.com>
Subject: Re: [PATCH net-next] net_tstamp: add SOF_TIMESTAMPING_OPT_ID_TCP
Date: Tue, 6 Dec 2022 12:22:39 -0800	[thread overview]
Message-ID: <20221206122239.58e16ae4@kernel.org> (raw)
In-Reply-To: <20221205230925.3002558-1-willemdebruijn.kernel@gmail.com>

On Mon,  5 Dec 2022 18:09:25 -0500 Willem de Bruijn wrote:
> Add an option to initialize SOF_TIMESTAMPING_OPT_ID for TCP from
> write_seq sockets instead of snd_una.
> 
> Intuitively the contract is that the counter is zero after the
> setsockopt, so that the next write N results in a notification for
> last byte N - 1.
> 
> On idle sockets snd_una == write_seq so this holds for both. But on
> sockets with data in transmission, snd_una depends on the ACK response
> from the peer. A process cannot learn this in a race free manner
> (ioctl SIOCOUTQ is one racy approach).

We can't just copy back the value of 

	tcp_sk(sk)->snd_una - tcp_sk(sk)->write_seq

to the user if the input of setsockopt is large enough (ie. extend the
struct, if len >= sizeof(new struct) -> user is asking to get this?
Or even add a bit somewhere that requests a copy back?

Highly unlikely to break anything, I reckon? But whether setsockopt()
can copy back is not 100% clear to me...

> write_seq is a better starting point because based on the seqno of
> data written by the process only.
> 
> But the existing behavior may already be relied upon. So make the new
> behavior optional behind a flag.
> 
> The new timestamp flag necessitates increasing sk_tsflags to 32 bits.
> Move the field in struct sock to avoid growing the socket (for some
> common CONFIG variants). The UAPI interface so_timestamping.flags is
> already int, so 32 bits wide.
> 
> Reported-by: Jakub Kicinski <kuba@kernel.org>

Reported-by: Sotirios Delimanolis <sotodel@meta.com>

I'm just a bad human information router.

> Signed-off-by: Willem de Bruijn <willemb@google.com>
> 
> ---
> 
> Alternative solutions are
> 
> * make the change unconditionally: a one line change.
> * make the condition a (per netns) sysctl instead of flag
> * make SOF_TIMESTAMPING_OPT_ID_TCP not a modifier of, but alternative
>   to SOF_TIMESTAMPING_OPT_ID. That requires also updating all existing
>   code that now tests OPT_ID to test a new OPT_ID_MASK.

 * copy back the SIOCOUTQ

;)

> Weighing the variants, this seemed the best option to me.
> ---
>  Documentation/networking/timestamping.rst | 19 +++++++++++++++++++
>  include/net/sock.h                        |  6 +++---
>  include/uapi/linux/net_tstamp.h           |  3 ++-
>  net/core/sock.c                           |  9 ++++++++-
>  net/ethtool/common.c                      |  1 +
>  5 files changed, 33 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
> index be4eb1242057..578f24731be5 100644
> --- a/Documentation/networking/timestamping.rst
> +++ b/Documentation/networking/timestamping.rst
> @@ -192,6 +192,25 @@ SOF_TIMESTAMPING_OPT_ID:
>    among all possibly concurrently outstanding timestamp requests for
>    that socket.
>  
> +SOF_TIMESTAMPING_OPT_ID_TCP:
> +  Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
> +  timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
> +  counter increments for stream sockets, but its starting point is
> +  not entirely trivial. This modifier option changes that point.
> +
> +  A reasonable expectation is that the counter is reset to zero with
> +  the system call, so that a subsequent write() of N bytes generates
> +  a timestamp with counter N-1. SOF_TIMESTAMPING_OPT_ID_TCP
> +  implements this behavior under all conditions.
> +
> +  SOF_TIMESTAMPING_OPT_ID without modifier often reports the same,
> +  especially when the socket option is set when no data is in
> +  transmission. If data is being transmitted, it may be off by the
> +  length of the output queue (SIOCOUTQ) due to being based on snd_una
> +  rather than write_seq. That offset depends on factors outside of
> +  process control, including network RTT and peer response time. The
> +  difference is subtle and unlikely to be noticed when confiugred at
> +  initial socket creation. But .._OPT_ID behavior is more predictable.

I reckon this needs to be more informative. Say how exactly they differ
(written vs queued for transmission). And I'd add to
SOF_TIMESTAMPING_OPT_ID docs a note to "see also .._OPT_ID_TCP version".

  parent reply	other threads:[~2022-12-06 20:22 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-05 23:09 [PATCH net-next] net_tstamp: add SOF_TIMESTAMPING_OPT_ID_TCP Willem de Bruijn
2022-12-06  0:34 ` Soheil Hassas Yeganeh
2022-12-06 20:22 ` Jakub Kicinski [this message]
2022-12-06 20:46   ` Willem de Bruijn
2022-12-06 20:58     ` Jakub Kicinski
2022-12-06 21:21       ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221206122239.58e16ae4@kernel.org \
    --to=kuba@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=soheil@google.com \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.