* [PATCH] tcp: SO_TIMESTAMP implementation for TCP
@ 2010-04-30 6:07 Tom Herbert
2010-04-30 6:39 ` David Miller
0 siblings, 1 reply; 9+ messages in thread
From: Tom Herbert @ 2010-04-30 6:07 UTC (permalink / raw)
To: davem, netdev
Implement SO_TIMESTAMP{NS} for TCP. When this socket option is enabled
on a TCP socket, a timestamp for received data can be returned in the
ancillary data of a recvmsg with control message type SCM_TIMESTAMP{NS}.
The timestamp chosen is that of the skb most recently received from
which data was copied. This is useful in debugging and timing
network operations.
Signed-off-by: Tom Herbert <therbert@google.com>
---
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index a778ee0..7dbb662 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -314,6 +314,7 @@ struct tcp_sock {
u32 snd_sml; /* Last byte of the most recently transmitted small packet */
u32 rcv_tstamp; /* timestamp of last received ACK (for keepalives) */
u32 lsndtime; /* timestamp of last sent data packet (for restart window) */
+ ktime_t lrxcopytime; /* timestamp of newest data in copy to user space */
/* Data for direct copy to user */
struct {
@@ -466,6 +467,15 @@ static inline struct tcp_sock *tcp_sk(const struct sock *sk)
return (struct tcp_sock *)sk;
}
+static inline void tcp_update_lrxcopytime(struct sock *sk,
+ const struct sk_buff *skb)
+{
+ struct tcp_sock *tp = tcp_sk(sk);
+
+ if (unlikely(skb->tstamp.tv64 > tp->lrxcopytime.tv64))
+ tp->lrxcopytime = skb->tstamp;
+}
+
struct tcp_timewait_sock {
struct inet_timewait_sock tw_sk;
u32 tw_rcv_nxt;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 8ce2974..c7f107a 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1381,6 +1381,27 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
return copied;
}
+static inline void tcp_sock_recv_timestamp(struct msghdr *msg, struct sock *sk)
+{
+ struct tcp_sock *tp = tcp_sk(sk);
+
+ if (likely(!sock_flag(sk, SOCK_RCVTSTAMP)))
+ return;
+
+ if (msg->msg_controllen >= sizeof(struct timeval) &&
+ tp->lrxcopytime.tv64) {
+ if (sock_flag(sk, SOCK_RCVTSTAMPNS)) {
+ struct timespec ts = ktime_to_timespec(tp->lrxcopytime);
+ put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPNS,
+ sizeof(ts), &ts);
+ } else {
+ struct timeval tv = ktime_to_timeval(tp->lrxcopytime);
+ put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP,
+ sizeof(tv), &tv);
+ }
+ }
+}
+
/*
* This routine copies from a sock struct into the user buffer.
*
@@ -1414,6 +1435,7 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
goto out;
timeo = sock_rcvtimeo(sk, nonblock);
+ tp->lrxcopytime.tv64 = 0;
/* Urgent data needs to be handled specially. */
if (flags & MSG_OOB)
@@ -1691,6 +1713,8 @@ do_prequeue:
break;
}
}
+
+ tcp_update_lrxcopytime(sk, skb);
}
*seq += used;
@@ -1758,6 +1782,8 @@ skip_copy:
* on connected socket. I was just happy when found this 8) --ANK
*/
+ tcp_sock_recv_timestamp(msg, sk);
+
/* Clean up data we have read: This will do ACK frames. */
tcp_cleanup_rbuf(sk, copied);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e82162c..b94ad16 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4397,6 +4397,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
tp->copied_seq += chunk;
eaten = (chunk == skb->len && !th->fin);
tcp_rcv_space_adjust(sk);
+ tcp_update_lrxcopytime(sk, skb);
}
local_bh_disable();
}
@@ -5061,6 +5062,7 @@ static int tcp_copy_to_iovec(struct sock *sk, struct sk_buff *skb, int hlen)
tp->ucopy.len -= chunk;
tp->copied_seq += chunk;
tcp_rcv_space_adjust(sk);
+ tcp_update_lrxcopytime(sk, skb);
}
local_bh_disable();
@@ -5120,6 +5122,7 @@ static int tcp_dma_try_early_copy(struct sock *sk, struct sk_buff *skb,
tp->ucopy.len -= chunk;
tp->copied_seq += chunk;
tcp_rcv_space_adjust(sk);
+ tcp_update_lrxcopytime(sk, skb);
if ((tp->ucopy.len == 0) ||
(tcp_flag_word(tcp_hdr(skb)) & TCP_FLAG_PSH) ||
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] tcp: SO_TIMESTAMP implementation for TCP
2010-04-30 6:07 [PATCH] tcp: SO_TIMESTAMP implementation for TCP Tom Herbert
@ 2010-04-30 6:39 ` David Miller
2010-04-30 7:58 ` Tom Herbert
0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2010-04-30 6:39 UTC (permalink / raw)
To: therbert; +Cc: netdev
From: Tom Herbert <therbert@google.com>
Date: Thu, 29 Apr 2010 23:07:54 -0700 (PDT)
> Implement SO_TIMESTAMP{NS} for TCP. When this socket option is enabled
> on a TCP socket, a timestamp for received data can be returned in the
> ancillary data of a recvmsg with control message type SCM_TIMESTAMP{NS}.
> The timestamp chosen is that of the skb most recently received from
> which data was copied. This is useful in debugging and timing
> network operations.
>
> Signed-off-by: Tom Herbert <therbert@google.com>
That's not what you're implementing here.
You're only updating the socket timestamp if the SKB passed into
the update function has a more recent timestamp.
There is nothing that says the timestamps have to be increasing and
with retransmits and such if it were me I would want to see the real
timestamp even if it was earlier than the most recently reported
timestamp.
I don't know, I really don't like this feature at all. SO_TIMESTAMP
is really meant for datagram oriented sockets, where there is a
clearly defined "packet" whose timestamp you get. A TCP receive can
involve hundreds of tiny packets so the timestamp can lack any
meaning.
All these new checks and branches for a feature of questionable value.
If you can modify you apps to grab this information you can also probe
for the information using external probing tools.
Sorry, I don't think I'll be applying this.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] tcp: SO_TIMESTAMP implementation for TCP
2010-04-30 6:39 ` David Miller
@ 2010-04-30 7:58 ` Tom Herbert
2010-04-30 23:41 ` David Miller
0 siblings, 1 reply; 9+ messages in thread
From: Tom Herbert @ 2010-04-30 7:58 UTC (permalink / raw)
To: David Miller; +Cc: netdev
> That's not what you're implementing here.
>
> You're only updating the socket timestamp if the SKB passed into
> the update function has a more recent timestamp.
>
Yes that is the intent. This provides a measure time from when all
the data in the recvmsg is present on the socket and when the
application actually consumes it. It has been quite useful for
demonstrating to apps writers when their application is being slow to
read the data, as opposed to the stack being slow.
> There is nothing that says the timestamps have to be increasing and
> with retransmits and such if it were me I would want to see the real
> timestamp even if it was earlier than the most recently reported
> timestamp.
>
> I don't know, I really don't like this feature at all. SO_TIMESTAMP
> is really meant for datagram oriented sockets, where there is a
> clearly defined "packet" whose timestamp you get. A TCP receive can
> involve hundreds of tiny packets so the timestamp can lack any
> meaning.
>
And a UDP datagram could be composed of hundreds of IP fragments, so
there's still no clear definition of a "packet"... in fact the choice
of which skb to use as the representative timestamp seems pretty
arbitrary (if the semantics of the timestamp is for when the "datagram
is received", I would think that is when reassembly is complete, i.e.
timestamp of last packet received).
> All these new checks and branches for a feature of questionable value.
> If you can modify you apps to grab this information you can also probe
> for the information using external probing tools.
>
I don't see an nice way to do that, we're profiling a significant
percentage of millions of connections over thousands of paths as part
of standard operations while incurring negligible overhead. The app
can can easily timestamp its operations, but without some mechanism
for getting timestamps out of a TCP connection, the networking portion
of servicing requests is pretty much a black box in that.
> Sorry, I don't think I'll be applying this.
>
Thanks for consideration!
Tom
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] tcp: SO_TIMESTAMP implementation for TCP
2010-04-30 7:58 ` Tom Herbert
@ 2010-04-30 23:41 ` David Miller
2010-05-01 5:07 ` Bill Fink
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: David Miller @ 2010-04-30 23:41 UTC (permalink / raw)
To: therbert; +Cc: netdev
From: Tom Herbert <therbert@google.com>
Date: Fri, 30 Apr 2010 00:58:32 -0700
>> All these new checks and branches for a feature of questionable value.
>
>> If you can modify you apps to grab this information you can also probe
>> for the information using external probing tools.
>>
> I don't see an nice way to do that, we're profiling a significant
> percentage of millions of connections over thousands of paths as part
> of standard operations while incurring negligible overhead. The app
> can can easily timestamp its operations, but without some mechanism
> for getting timestamps out of a TCP connection, the networking portion
> of servicing requests is pretty much a black box in that.
If other people have an opinion about this, now would be the time
to speak up. :-)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] tcp: SO_TIMESTAMP implementation for TCP
2010-04-30 23:41 ` David Miller
@ 2010-05-01 5:07 ` Bill Fink
2010-05-01 5:40 ` Tom Herbert
2010-05-01 5:31 ` Tom Herbert
2010-05-01 12:06 ` Paul LeoNerd Evans
2 siblings, 1 reply; 9+ messages in thread
From: Bill Fink @ 2010-05-01 5:07 UTC (permalink / raw)
To: David Miller; +Cc: therbert, netdev
On Fri, 30 Apr 2010, David Miller wrote:
> From: Tom Herbert <therbert@google.com>
> Date: Fri, 30 Apr 2010 00:58:32 -0700
>
> >> All these new checks and branches for a feature of questionable value.
> >
> >> If you can modify you apps to grab this information you can also probe
> >> for the information using external probing tools.
> >>
> > I don't see an nice way to do that, we're profiling a significant
> > percentage of millions of connections over thousands of paths as part
> > of standard operations while incurring negligible overhead. The app
> > can can easily timestamp its operations, but without some mechanism
> > for getting timestamps out of a TCP connection, the networking portion
> > of servicing requests is pretty much a black box in that.
>
> If other people have an opinion about this, now would be the time
> to speak up. :-)
Not being a kernel hacker, I will naively ask if the kernel tracing
facility could somehow be used to provide the desired info (or could
be modified to provide it).
-Bill
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] tcp: SO_TIMESTAMP implementation for TCP
2010-04-30 23:41 ` David Miller
2010-05-01 5:07 ` Bill Fink
@ 2010-05-01 5:31 ` Tom Herbert
2010-05-01 12:06 ` Paul LeoNerd Evans
2 siblings, 0 replies; 9+ messages in thread
From: Tom Herbert @ 2010-05-01 5:31 UTC (permalink / raw)
To: David Miller; +Cc: netdev
>> I don't see an nice way to do that, we're profiling a significant
>> percentage of millions of connections over thousands of paths as part
>> of standard operations while incurring negligible overhead. The app
>> can can easily timestamp its operations, but without some mechanism
>> for getting timestamps out of a TCP connection, the networking portion
>> of servicing requests is pretty much a black box in that.
>
> If other people have an opinion about this, now would be the time
> to speak up. :-)
>
The use case that motivated this patch is really the same as that of
UDP in that application is receiving messages that it wants to to time
stamp; in the case of TCP the application extracts the frames out of
the stream. The lack of a timestamp to discern when a message was
received over TCP is readily apparent when designing a message based
ULP that can dynamically select which protocol to run over.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] tcp: SO_TIMESTAMP implementation for TCP
2010-05-01 5:07 ` Bill Fink
@ 2010-05-01 5:40 ` Tom Herbert
2010-05-01 6:00 ` Eric Dumazet
0 siblings, 1 reply; 9+ messages in thread
From: Tom Herbert @ 2010-05-01 5:40 UTC (permalink / raw)
To: Bill Fink; +Cc: David Miller, netdev
> Not being a kernel hacker, I will naively ask if the kernel tracing
> facility could somehow be used to provide the desired info (or could
> be modified to provide it).
>
We did consider kernel tracing (more in the context of implementing
RFC 4898). In the case of trying get per packet timestamps,
correlating a ktrace event with an application message is probably too
high to make it practical. If it weren't for the cost of
timestamp'ing every single skb being received, we'd probably have
SO_TIMESTAMP turned on permanently for many connections. For now
we're settling for a percentage of messages for sampling.
Tom
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] tcp: SO_TIMESTAMP implementation for TCP
2010-05-01 5:40 ` Tom Herbert
@ 2010-05-01 6:00 ` Eric Dumazet
0 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2010-05-01 6:00 UTC (permalink / raw)
To: Tom Herbert; +Cc: Bill Fink, David Miller, netdev
Le vendredi 30 avril 2010 à 22:40 -0700, Tom Herbert a écrit :
> > Not being a kernel hacker, I will naively ask if the kernel tracing
> > facility could somehow be used to provide the desired info (or could
> > be modified to provide it).
> >
>
> We did consider kernel tracing (more in the context of implementing
> RFC 4898). In the case of trying get per packet timestamps,
> correlating a ktrace event with an application message is probably too
> high to make it practical. If it weren't for the cost of
> timestamp'ing every single skb being received, we'd probably have
> SO_TIMESTAMP turned on permanently for many connections. For now
> we're settling for a percentage of messages for sampling.
Tom, did you tried to reuse existing skb or sk tstamps ?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] tcp: SO_TIMESTAMP implementation for TCP
2010-04-30 23:41 ` David Miller
2010-05-01 5:07 ` Bill Fink
2010-05-01 5:31 ` Tom Herbert
@ 2010-05-01 12:06 ` Paul LeoNerd Evans
2 siblings, 0 replies; 9+ messages in thread
From: Paul LeoNerd Evans @ 2010-05-01 12:06 UTC (permalink / raw)
To: David Miller, netdev; +Cc: therbert
[-- Attachment #1: Type: text/plain, Size: 1455 bytes --]
On Fri, Apr 30, 2010 at 04:41:15PM -0700, David Miller wrote:
> If other people have an opinion about this, now would be the time
> to speak up. :-)
I have to say I agree with David.
The "receive timestamp" for a TCP recv() call is completely meaningless.
Each byte in the stream arguably could have a set of receive timestamps,
being the timestamp of the underlying IPv4 packet containing a fragment
of a TCP segment that covered that byte. One recv() call could cover
many packets, many recv() calls could be required to consume one packet.
We just don't know from userland.
The point about IPv4 fragments in UDP is a reasonable one; that because
of IPv4 fragmentation there are still potentially multiple timestamps
that could be relevant to a single UDP recv() call. But no two recv()
calls can possibly relate to the same IPv4 fragments, so I feel this is
more defined. Plus, of all the IPv4 fragments that go into a single UDP
packet, one of them is special - the first one, the one containing the
UDP header. We could easily say "the timestamp of a UDP recv() call
shall be the time at which its header was received, even if other
fragments arrived before or after it".
We cannot make any such distinction for some window in a TCP stream. All
TCP segments are indistinct in this manner.
--
Paul "LeoNerd" Evans
leonerd@leonerd.org.uk
ICQ# 4135350 | Registered Linux# 179460
http://www.leonerd.org.uk/
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-05-01 12:06 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-30 6:07 [PATCH] tcp: SO_TIMESTAMP implementation for TCP Tom Herbert
2010-04-30 6:39 ` David Miller
2010-04-30 7:58 ` Tom Herbert
2010-04-30 23:41 ` David Miller
2010-05-01 5:07 ` Bill Fink
2010-05-01 5:40 ` Tom Herbert
2010-05-01 6:00 ` Eric Dumazet
2010-05-01 5:31 ` Tom Herbert
2010-05-01 12:06 ` Paul LeoNerd Evans
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).