SOF_TIMESTAMPING_OPT_ID is unreliable when sendmsg fails

All of lore.kernel.org
 help / color / mirror / Atom feed

* SOF_TIMESTAMPING_OPT_ID is unreliable when sendmsg fails
@ 2024-02-08 18:02 Andy Lutomirski
  2024-02-08 19:55 ` Vadim Fedorenko
  0 siblings, 1 reply; 8+ messages in thread
From: Andy Lutomirski @ 2024-02-08 18:02 UTC (permalink / raw)
  To: Vadim Fedorenko, Willem de Bruijn, David S. Miller,
	Network Development

I’ve been using OPT_ID-style timestamping for years, but for some
reason this issue only bit me last week: if sendmsg() fails on a UDP
or ping socket, sk_tskey is poorly.  It may or may not get incremented
by the failed sendmsg().

I can think of at least three ways to improve this:

1. Make it so that the sequence number is genuinely only incremented
on success. This may be tedious to implement and may be nearly
impossible if there are multiple concurrent sendmsg() calls on the
same socket.

2. Allow the user program to specify an explicit ID.  cmsg values are
variable length, so for datagram sockets, extending the
SO_TIMESTAMPING cmsg with 64 bits of sequence number to be used for
the TX timestamp on that particular packet might be a nice solution.

3. Add a getsockopt to read sk_tskey, which user code could use to
determine the next sequence number after a failed sendmsg() call.

Thanks,
Andy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: SOF_TIMESTAMPING_OPT_ID is unreliable when sendmsg fails
  2024-02-08 18:02 SOF_TIMESTAMPING_OPT_ID is unreliable when sendmsg fails Andy Lutomirski
@ 2024-02-08 19:55 ` Vadim Fedorenko
  2024-02-08 20:05   ` Andy Lutomirski
  0 siblings, 1 reply; 8+ messages in thread
From: Vadim Fedorenko @ 2024-02-08 19:55 UTC (permalink / raw)
  To: Andy Lutomirski, Willem de Bruijn, David S. Miller,
	Network Development, Jakub Kicinski

On 08/02/2024 18:02, Andy Lutomirski wrote:
> I’ve been using OPT_ID-style timestamping for years, but for some
> reason this issue only bit me last week: if sendmsg() fails on a UDP
> or ping socket, sk_tskey is poorly.  It may or may not get incremented
> by the failed sendmsg().
> 
Well, there are several error paths, for sure. For the sockets you
mention the increment of tskey happens at __ip{,6}_append_data. There
are 2 different types of failures which can happen after the increment.
The first is MTU check fail, another one is memory allocation failures.
I believe we can move increment to a later position, after MTU check in
both functions to avoid first type of problem.

> I can think of at least three ways to improve this:
> 
> 1. Make it so that the sequence number is genuinely only incremented
> on success. This may be tedious to implement and may be nearly
> impossible if there are multiple concurrent sendmsg() calls on the
> same socket.

Multiple concurrent sendmsg() should bring a lot of problems on user-
space side. With current implementation the application has to track the
value of tskey to check incoming TX timestamp later. But with parallel
sendmsg() the app cannot be sure which value is assigned to which call
even in case of proper track value synchronization. That brings us to
the other solutions if we consider having parallel threads working with
same socket. Or we can simply pretend that it's impossible and then fix
error path to decrement tskey value.
> 
> 2. Allow the user program to specify an explicit ID.  cmsg values are
> variable length, so for datagram sockets, extending the
> SO_TIMESTAMPING cmsg with 64 bits of sequence number to be used for
> the TX timestamp on that particular packet might be a nice solution.
> 

This option can be really useful in case of really parallel work with
sockets.

> 3. Add a getsockopt to read sk_tskey, which user code could use to
> determine the next sequence number after a failed sendmsg() call.

I don't believe it's usable interface. Especially if we think about
multiple threads working with the same socket.

> 
> Thanks,
> Andy




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: SOF_TIMESTAMPING_OPT_ID is unreliable when sendmsg fails
  2024-02-08 19:55 ` Vadim Fedorenko
@ 2024-02-08 20:05   ` Andy Lutomirski
  2024-02-08 20:40     ` Andy Lutomirski
  2024-02-08 21:51     ` Willem de Bruijn
  0 siblings, 2 replies; 8+ messages in thread
From: Andy Lutomirski @ 2024-02-08 20:05 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Willem de Bruijn, David S. Miller, Network Development,
	Jakub Kicinski

On Thu, Feb 8, 2024 at 11:55 AM Vadim Fedorenko
<vadim.fedorenko@linux.dev> wrote:
>
> On 08/02/2024 18:02, Andy Lutomirski wrote:
> > I’ve been using OPT_ID-style timestamping for years, but for some
> > reason this issue only bit me last week: if sendmsg() fails on a UDP
> > or ping socket, sk_tskey is poorly.  It may or may not get incremented
> > by the failed sendmsg().
> >
> Well, there are several error paths, for sure. For the sockets you
> mention the increment of tskey happens at __ip{,6}_append_data. There
> are 2 different types of failures which can happen after the increment.
> The first is MTU check fail, another one is memory allocation failures.
> I believe we can move increment to a later position, after MTU check in
> both functions to avoid first type of problem.

For reasons that I still haven't deciphered, I'm sporadically getting
EHOSTUNREACH after the increment.  I can't find anything in the code
that would cause that, and every time I try to instrument it, it stops
happening :(  I sendmsg to the same destination several times in rapid
succession, and at most one of them will get EHOSTUNREACH.

>
> > I can think of at least three ways to improve this:
> >
> > 1. Make it so that the sequence number is genuinely only incremented
> > on success. This may be tedious to implement and may be nearly
> > impossible if there are multiple concurrent sendmsg() calls on the
> > same socket.
>
> Multiple concurrent sendmsg() should bring a lot of problems on user-
> space side. With current implementation the application has to track the
> value of tskey to check incoming TX timestamp later. But with parallel
> sendmsg() the app cannot be sure which value is assigned to which call
> even in case of proper track value synchronization. That brings us to
> the other solutions if we consider having parallel threads working with
> same socket. Or we can simply pretend that it's impossible and then fix
> error path to decrement tskey value.
> >
> > 2. Allow the user program to specify an explicit ID.  cmsg values are
> > variable length, so for datagram sockets, extending the
> > SO_TIMESTAMPING cmsg with 64 bits of sequence number to be used for
> > the TX timestamp on that particular packet might be a nice solution.
> >
>
> This option can be really useful in case of really parallel work with
> sockets.

I personally like this one the best.  Some care would be needed to
allow programs to detect the new functionality.  Any preferred way to
handle it?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: SOF_TIMESTAMPING_OPT_ID is unreliable when sendmsg fails
  2024-02-08 20:05   ` Andy Lutomirski
@ 2024-02-08 20:40     ` Andy Lutomirski
  2024-02-08 21:51     ` Willem de Bruijn
  1 sibling, 0 replies; 8+ messages in thread
From: Andy Lutomirski @ 2024-02-08 20:40 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: Willem de Bruijn, David S. Miller, Network Development,
	Jakub Kicinski

On Thu, Feb 8, 2024 at 12:05 PM Andy Lutomirski <luto@amacapital.net> wrote:
>
> On Thu, Feb 8, 2024 at 11:55 AM Vadim Fedorenko
> <vadim.fedorenko@linux.dev> wrote:
> >
> > On 08/02/2024 18:02, Andy Lutomirski wrote:
> > > I’ve been using OPT_ID-style timestamping for years, but for some
> > > reason this issue only bit me last week: if sendmsg() fails on a UDP
> > > or ping socket, sk_tskey is poorly.  It may or may not get incremented
> > > by the failed sendmsg().
> > >
> > Well, there are several error paths, for sure. For the sockets you
> > mention the increment of tskey happens at __ip{,6}_append_data. There
> > are 2 different types of failures which can happen after the increment.
> > The first is MTU check fail, another one is memory allocation failures.
> > I believe we can move increment to a later position, after MTU check in
> > both functions to avoid first type of problem.
>
> For reasons that I still haven't deciphered, I'm sporadically getting
> EHOSTUNREACH after the increment.  I can't find anything in the code
> that would cause that, and every time I try to instrument it, it stops
> happening :(  I sendmsg to the same destination several times in rapid
> succession, and at most one of them will get EHOSTUNREACH.

I caught it in strace, finally.  And I also finally grepped the right
part of the kernel tree to (I think) find the offending call chain.

__ip_append_data first increments sk_tskey.  Then it does:

            if (transhdrlen) {
                skb = sock_alloc_send_skb(sk, alloclen,
                        (flags & MSG_DONTWAIT), &err);

(I have no idea why the transhdrlen path is different.)  That does:

static inline struct sk_buff *sock_alloc_send_skb(struct sock *sk,
                          unsigned long size,
                          int noblock, int *errcode)
{
    return sock_alloc_send_pskb(sk, size, 0, noblock, errcode, 0);
}

That does:

struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
                     unsigned long data_len, int noblock,
                     int *errcode, int max_page_order)
{
    struct sk_buff *skb;
    long timeo;
    int err;

    timeo = sock_sndtimeo(sk, noblock);
    for (;;) {
        err = sock_error(sk);

I'm utterly baffled why that check makes any sense whatsoever.  git
blame informs me that it predates 2002.

I'll contemplate a bit more and send a patch.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: SOF_TIMESTAMPING_OPT_ID is unreliable when sendmsg fails
  2024-02-08 20:05   ` Andy Lutomirski
  2024-02-08 20:40     ` Andy Lutomirski
@ 2024-02-08 21:51     ` Willem de Bruijn
  2024-02-08 22:40       ` Andy Lutomirski
  2024-02-09  3:22       ` Vadim Fedorenko
  1 sibling, 2 replies; 8+ messages in thread
From: Willem de Bruijn @ 2024-02-08 21:51 UTC (permalink / raw)
  To: Andy Lutomirski, Vadim Fedorenko
  Cc: Willem de Bruijn, David S. Miller, Network Development,
	Jakub Kicinski

Andy Lutomirski wrote:
> On Thu, Feb 8, 2024 at 11:55 AM Vadim Fedorenko
> <vadim.fedorenko@linux.dev> wrote:
> >
> > On 08/02/2024 18:02, Andy Lutomirski wrote:
> > > I’ve been using OPT_ID-style timestamping for years, but for some
> > > reason this issue only bit me last week: if sendmsg() fails on a UDP
> > > or ping socket, sk_tskey is poorly.  It may or may not get incremented
> > > by the failed sendmsg().

The intent is indeed to only increment on a successful send.

The implementation is complicated a bit by (1) being a socket level
option, thus also supporting SOCK_RAW and (2) MSG_MORE using multiple
send calls to only produce a single datagram and (3) fragmentation
producing multiple skbs for a single datagram.

If only SOCK_DGRAM, conceivably we could move this to udp_send_skb,
after the skb is created and after the usual error exit paths.

An alternative is in error paths to decrement the counter. This is
what we do for MSG_ZEROCOPY references. Unfortunately, with the
lockless UDP path, other threads could come inbetween the inc and dec,
so this is not really workable.

> > Well, there are several error paths, for sure. For the sockets you
> > mention the increment of tskey happens at __ip{,6}_append_data. There
> > are 2 different types of failures which can happen after the increment.
> > The first is MTU check fail, another one is memory allocation failures.
> > I believe we can move increment to a later position, after MTU check in
> > both functions to avoid first type of problem.
> 
> For reasons that I still haven't deciphered, I'm sporadically getting
> EHOSTUNREACH after the increment.  I can't find anything in the code
> that would cause that, and every time I try to instrument it, it stops
> happening :(  I sendmsg to the same destination several times in rapid
> succession, and at most one of them will get EHOSTUNREACH.

UDP might fail on ICMP responses. Try sending to a closed port. A few
send calls will succeed, but eventually the send call will refuse to
send. The cause is in the IP layer.

> >
> > > I can think of at least three ways to improve this:
> > >
> > > 1. Make it so that the sequence number is genuinely only incremented
> > > on success. This may be tedious to implement and may be nearly
> > > impossible if there are multiple concurrent sendmsg() calls on the
> > > same socket.
> >
> > Multiple concurrent sendmsg() should bring a lot of problems on user-
> > space side. With current implementation the application has to track the
> > value of tskey to check incoming TX timestamp later. But with parallel
> > sendmsg() the app cannot be sure which value is assigned to which call
> > even in case of proper track value synchronization. That brings us to
> > the other solutions if we consider having parallel threads working with
> > same socket. Or we can simply pretend that it's impossible and then fix
> > error path to decrement tskey value.
> > >
> > > 2. Allow the user program to specify an explicit ID.  cmsg values are
> > > variable length, so for datagram sockets, extending the
> > > SO_TIMESTAMPING cmsg with 64 bits of sequence number to be used for
> > > the TX timestamp on that particular packet might be a nice solution.
> > >
> >
> > This option can be really useful in case of really parallel work with
> > sockets.
> 
> I personally like this one the best.  Some care would be needed to
> allow programs to detect the new functionality.  Any preferred way to
> handle it?

Regardless of whether we can fix the existing behavior, I also think
this is a worthwhile cmsg. As timestamping is a SOL_SOCKET option, the
cmsg should likely also be that, processed in __sock_cmsg_send.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: SOF_TIMESTAMPING_OPT_ID is unreliable when sendmsg fails
  2024-02-08 21:51     ` Willem de Bruijn
@ 2024-02-08 22:40       ` Andy Lutomirski
  2024-02-09  3:22       ` Vadim Fedorenko
  1 sibling, 0 replies; 8+ messages in thread
From: Andy Lutomirski @ 2024-02-08 22:40 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Vadim Fedorenko, Willem de Bruijn, David S. Miller,
	Network Development, Jakub Kicinski

On Thu, Feb 8, 2024 at 1:51 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Andy Lutomirski wrote:
> > On Thu, Feb 8, 2024 at 11:55 AM Vadim Fedorenko
> > <vadim.fedorenko@linux.dev> wrote:
> > >
> > > On 08/02/2024 18:02, Andy Lutomirski wrote:
> > > > I’ve been using OPT_ID-style timestamping for years, but for some
> > > > reason this issue only bit me last week: if sendmsg() fails on a UDP
> > > > or ping socket, sk_tskey is poorly.  It may or may not get incremented
> > > > by the failed sendmsg().
>
> The intent is indeed to only increment on a successful send.
>
> The implementation is complicated a bit by (1) being a socket level
> option, thus also supporting SOCK_RAW and (2) MSG_MORE using multiple
> send calls to only produce a single datagram and (3) fragmentation
> producing multiple skbs for a single datagram.
>
> If only SOCK_DGRAM, conceivably we could move this to udp_send_skb,
> after the skb is created and after the usual error exit paths.
>
> An alternative is in error paths to decrement the counter. This is
> what we do for MSG_ZEROCOPY references. Unfortunately, with the
> lockless UDP path, other threads could come inbetween the inc and dec,
> so this is not really workable.
>
> > > Well, there are several error paths, for sure. For the sockets you
> > > mention the increment of tskey happens at __ip{,6}_append_data. There
> > > are 2 different types of failures which can happen after the increment.
> > > The first is MTU check fail, another one is memory allocation failures.
> > > I believe we can move increment to a later position, after MTU check in
> > > both functions to avoid first type of problem.
> >
> > For reasons that I still haven't deciphered, I'm sporadically getting
> > EHOSTUNREACH after the increment.  I can't find anything in the code
> > that would cause that, and every time I try to instrument it, it stops
> > happening :(  I sendmsg to the same destination several times in rapid
> > succession, and at most one of them will get EHOSTUNREACH.
>
> UDP might fail on ICMP responses. Try sending to a closed port. A few
> send calls will succeed, but eventually the send call will refuse to
> send. The cause is in the IP layer.
>

I tracked down the code, finally.

But I maintain that this behavior is absurd.  Sure, if I do:

connect(fd, some address);
send(fd, ...);
<-- ICMP error because the port was closed
send(fd, ...) = -ECONNREFUSED;

then this makes a little bit of sense.  But that's about as far as it
makes sense to me.  This variant is a bit different:

connect(fd, some address);
send(fd, ...);
<-- ICMP error because the port was closed
send(fd, ...) = -ECONNREFUSED;
send(fd, ...) = 0;  <-- now it works again by magic!

okay, maybe I can stretch my imagination so this makes sense.  But
then this comes out of left field:

connect(fd, some address);
sendto(fd, ..., willem:1);
<-- ICMP error because the port was closed
sendto(fd, ..., andy:2) = -ECONNREFUSED;

excuse me?  And setting IP_RECVERR broadens the set of errors that
cause this IMO rather silly behavior, presumably motivated by the
login in the ip(7) manpage:

   When the user receives an error from a socket operation, the errors
can be received by calling recvmsg(2) with the MSG_ERRQUEUE flag set.

Isn't that what POLLERR is for?

And somehow the implementation of this logic for send, etc makes it
most of the way through the code before checking sock_error at all.

Anyway, I'll continue contemplating.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: SOF_TIMESTAMPING_OPT_ID is unreliable when sendmsg fails
  2024-02-08 21:51     ` Willem de Bruijn
  2024-02-08 22:40       ` Andy Lutomirski
@ 2024-02-09  3:22       ` Vadim Fedorenko
  2024-02-10 16:37         ` Willem de Bruijn
  1 sibling, 1 reply; 8+ messages in thread
From: Vadim Fedorenko @ 2024-02-09  3:22 UTC (permalink / raw)
  To: Willem de Bruijn, Andy Lutomirski
  Cc: Willem de Bruijn, David S. Miller, Network Development,
	Jakub Kicinski

On 08/02/2024 21:51, Willem de Bruijn wrote:
> Andy Lutomirski wrote:
>> On Thu, Feb 8, 2024 at 11:55 AM Vadim Fedorenko
>> <vadim.fedorenko@linux.dev> wrote:
>>>
>>> On 08/02/2024 18:02, Andy Lutomirski wrote:
>>>> I’ve been using OPT_ID-style timestamping for years, but for some
>>>> reason this issue only bit me last week: if sendmsg() fails on a UDP
>>>> or ping socket, sk_tskey is poorly.  It may or may not get incremented
>>>> by the failed sendmsg().
> 
> The intent is indeed to only increment on a successful send.
> 
> The implementation is complicated a bit by (1) being a socket level
> option, thus also supporting SOCK_RAW and (2) MSG_MORE using multiple
> send calls to only produce a single datagram and (3) fragmentation
> producing multiple skbs for a single datagram.
> 
> If only SOCK_DGRAM, conceivably we could move this to udp_send_skb,
> after the skb is created and after the usual error exit paths.
> 
> An alternative is in error paths to decrement the counter. This is
> what we do for MSG_ZEROCOPY references. Unfortunately, with the
> lockless UDP path, other threads could come inbetween the inc and dec,
> so this is not really workable.

As I've mentioned before, parallelism with OPT_ID is unpredictable by
design, I don't believe we have real apps doing this, so I think it's
better to decrement sk_tskey to have consistent behavior. I can make the
patch to do it.

>>> Well, there are several error paths, for sure. For the sockets you
>>> mention the increment of tskey happens at __ip{,6}_append_data. There
>>> are 2 different types of failures which can happen after the increment.
>>> The first is MTU check fail, another one is memory allocation failures.
>>> I believe we can move increment to a later position, after MTU check in
>>> both functions to avoid first type of problem.
>>
>> For reasons that I still haven't deciphered, I'm sporadically getting
>> EHOSTUNREACH after the increment.  I can't find anything in the code
>> that would cause that, and every time I try to instrument it, it stops
>> happening :(  I sendmsg to the same destination several times in rapid
>> succession, and at most one of them will get EHOSTUNREACH.
> 
> UDP might fail on ICMP responses. Try sending to a closed port. A few
> send calls will succeed, but eventually the send call will refuse to
> send. The cause is in the IP layer.
> 
>>>
>>>> I can think of at least three ways to improve this:
>>>>
>>>> 1. Make it so that the sequence number is genuinely only incremented
>>>> on success. This may be tedious to implement and may be nearly
>>>> impossible if there are multiple concurrent sendmsg() calls on the
>>>> same socket.
>>>
>>> Multiple concurrent sendmsg() should bring a lot of problems on user-
>>> space side. With current implementation the application has to track the
>>> value of tskey to check incoming TX timestamp later. But with parallel
>>> sendmsg() the app cannot be sure which value is assigned to which call
>>> even in case of proper track value synchronization. That brings us to
>>> the other solutions if we consider having parallel threads working with
>>> same socket. Or we can simply pretend that it's impossible and then fix
>>> error path to decrement tskey value.
>>>>
>>>> 2. Allow the user program to specify an explicit ID.  cmsg values are
>>>> variable length, so for datagram sockets, extending the
>>>> SO_TIMESTAMPING cmsg with 64 bits of sequence number to be used for
>>>> the TX timestamp on that particular packet might be a nice solution.
>>>>
>>>
>>> This option can be really useful in case of really parallel work with
>>> sockets.
>>
>> I personally like this one the best.  Some care would be needed to
>> allow programs to detect the new functionality.  Any preferred way to
>> handle it?
> 
> Regardless of whether we can fix the existing behavior, I also think
> this is a worthwhile cmsg. As timestamping is a SOL_SOCKET option, the
> cmsg should likely also be that, processed in __sock_cmsg_send.

Do you think about extending inet_cork and sockcm_cookie to provide
OPT_ID value? If yes, I can give it a try also.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: SOF_TIMESTAMPING_OPT_ID is unreliable when sendmsg fails
  2024-02-09  3:22       ` Vadim Fedorenko
@ 2024-02-10 16:37         ` Willem de Bruijn
  0 siblings, 0 replies; 8+ messages in thread
From: Willem de Bruijn @ 2024-02-10 16:37 UTC (permalink / raw)
  To: Vadim Fedorenko, Willem de Bruijn, Andy Lutomirski
  Cc: Willem de Bruijn, David S. Miller, Network Development,
	Jakub Kicinski

Vadim Fedorenko wrote:
> On 08/02/2024 21:51, Willem de Bruijn wrote:
> > Andy Lutomirski wrote:
> >> On Thu, Feb 8, 2024 at 11:55 AM Vadim Fedorenko
> >> <vadim.fedorenko@linux.dev> wrote:
> >>>
> >>> On 08/02/2024 18:02, Andy Lutomirski wrote:
> >>>> I’ve been using OPT_ID-style timestamping for years, but for some
> >>>> reason this issue only bit me last week: if sendmsg() fails on a UDP
> >>>> or ping socket, sk_tskey is poorly.  It may or may not get incremented
> >>>> by the failed sendmsg().
> > 
> > The intent is indeed to only increment on a successful send.
> > 
> > The implementation is complicated a bit by (1) being a socket level
> > option, thus also supporting SOCK_RAW and (2) MSG_MORE using multiple
> > send calls to only produce a single datagram and (3) fragmentation
> > producing multiple skbs for a single datagram.
> > 
> > If only SOCK_DGRAM, conceivably we could move this to udp_send_skb,
> > after the skb is created and after the usual error exit paths.
> > 
> > An alternative is in error paths to decrement the counter. This is
> > what we do for MSG_ZEROCOPY references. Unfortunately, with the
> > lockless UDP path, other threads could come inbetween the inc and dec,
> > so this is not really workable.
> 
> As I've mentioned before, parallelism with OPT_ID is unpredictable by
> design, I don't believe we have real apps doing this, so I think it's
> better to decrement sk_tskey to have consistent behavior. I can make the
> patch to do it.

Great thanks. Let's see if it can be done without too much churn. This
function is already complex, from combining a lot of optional paths.

> >>> Well, there are several error paths, for sure. For the sockets you
> >>> mention the increment of tskey happens at __ip{,6}_append_data. There
> >>> are 2 different types of failures which can happen after the increment.
> >>> The first is MTU check fail, another one is memory allocation failures.
> >>> I believe we can move increment to a later position, after MTU check in
> >>> both functions to avoid first type of problem.
> >>
> >> For reasons that I still haven't deciphered, I'm sporadically getting
> >> EHOSTUNREACH after the increment.  I can't find anything in the code
> >> that would cause that, and every time I try to instrument it, it stops
> >> happening :(  I sendmsg to the same destination several times in rapid
> >> succession, and at most one of them will get EHOSTUNREACH.
> > 
> > UDP might fail on ICMP responses. Try sending to a closed port. A few
> > send calls will succeed, but eventually the send call will refuse to
> > send. The cause is in the IP layer.
> > 
> >>>
> >>>> I can think of at least three ways to improve this:
> >>>>
> >>>> 1. Make it so that the sequence number is genuinely only incremented
> >>>> on success. This may be tedious to implement and may be nearly
> >>>> impossible if there are multiple concurrent sendmsg() calls on the
> >>>> same socket.
> >>>
> >>> Multiple concurrent sendmsg() should bring a lot of problems on user-
> >>> space side. With current implementation the application has to track the
> >>> value of tskey to check incoming TX timestamp later. But with parallel
> >>> sendmsg() the app cannot be sure which value is assigned to which call
> >>> even in case of proper track value synchronization. That brings us to
> >>> the other solutions if we consider having parallel threads working with
> >>> same socket. Or we can simply pretend that it's impossible and then fix
> >>> error path to decrement tskey value.
> >>>>
> >>>> 2. Allow the user program to specify an explicit ID.  cmsg values are
> >>>> variable length, so for datagram sockets, extending the
> >>>> SO_TIMESTAMPING cmsg with 64 bits of sequence number to be used for
> >>>> the TX timestamp on that particular packet might be a nice solution.
> >>>>
> >>>
> >>> This option can be really useful in case of really parallel work with
> >>> sockets.
> >>
> >> I personally like this one the best.  Some care would be needed to
> >> allow programs to detect the new functionality.  Any preferred way to
> >> handle it?
> > 
> > Regardless of whether we can fix the existing behavior, I also think
> > this is a worthwhile cmsg. As timestamping is a SOL_SOCKET option, the
> > cmsg should likely also be that, processed in __sock_cmsg_send.
> 
> Do you think about extending inet_cork and sockcm_cookie to provide
> OPT_ID value? If yes, I can give it a try also.

If the above fixes Andy's concern, perhaps that is enough.

The only reason to add yet another interface, would be to allow
concurrent sends to use the same socket. Not sure how realistic that
need is.

As for implementation, since SO_TIMESTAMPING is SOL_SOCKET level,
ideally so is the cmsg. But, then we either have to implement for
all existing protocols that do timestamping. Or at least it is
preferable to fail the system call on protocols that do not support
it. Instead of simply returning success and ignoring the sockopt.




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-02-10 16:37 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-08 18:02 SOF_TIMESTAMPING_OPT_ID is unreliable when sendmsg fails Andy Lutomirski
2024-02-08 19:55 ` Vadim Fedorenko
2024-02-08 20:05   ` Andy Lutomirski
2024-02-08 20:40     ` Andy Lutomirski
2024-02-08 21:51     ` Willem de Bruijn
2024-02-08 22:40       ` Andy Lutomirski
2024-02-09  3:22       ` Vadim Fedorenko
2024-02-10 16:37         ` Willem de Bruijn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.