* [PATCH net-next-2.6] net: Consistent skb timestamping
@ 2010-05-06 12:01 Eric Dumazet
2010-05-06 15:12 ` Tom Herbert
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2010-05-06 12:01 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Tom Herbert
With RPS inclusion, skb timestamping is not consistent in RX path.
If netif_receive_skb() is used, its deferred after RPS dispatch.
If netif_rx() is used, its done before RPS dispatch.
This can give strange tcpdump timestamps results.
I think timestamping should be done as soon as possible in the receive
path, to get meaningful values (ie timestamps taken at the time packet
was delivered by NIC driver to our stack), even if NAPI already can
defer timestamping a bit (RPS can help to reduce the gap)
Remove timestamping from __netif_receive_skb, and add it to
netif_receive_skb(), before RPS.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
net/core/dev.c | 46 ++++++++++++++++++++++++++--------------------
1 file changed, 26 insertions(+), 20 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 36d53be..3278003 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1454,7 +1454,7 @@ void net_disable_timestamp(void)
}
EXPORT_SYMBOL(net_disable_timestamp);
-static inline void net_timestamp(struct sk_buff *skb)
+static inline void net_timestamp_set(struct sk_buff *skb)
{
if (atomic_read(&netstamp_needed))
__net_timestamp(skb);
@@ -1462,6 +1462,12 @@ static inline void net_timestamp(struct sk_buff *skb)
skb->tstamp.tv64 = 0;
}
+static inline void net_timestamp_check(struct sk_buff *skb)
+{
+ if (!skb->tstamp.tv64 && atomic_read(&netstamp_needed))
+ __net_timestamp(skb);
+}
+
/**
* dev_forward_skb - loopback an skb to another netif
*
@@ -1509,9 +1515,9 @@ static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
#ifdef CONFIG_NET_CLS_ACT
if (!(skb->tstamp.tv64 && (G_TC_FROM(skb->tc_verd) & AT_INGRESS)))
- net_timestamp(skb);
+ net_timestamp_set(skb);
#else
- net_timestamp(skb);
+ net_timestamp_set(skb);
#endif
rcu_read_lock();
@@ -2458,8 +2464,7 @@ int netif_rx(struct sk_buff *skb)
if (netpoll_rx(skb))
return NET_RX_DROP;
- if (!skb->tstamp.tv64)
- net_timestamp(skb);
+ net_timestamp_check(skb);
#ifdef CONFIG_RPS
{
@@ -2780,9 +2785,6 @@ static int __netif_receive_skb(struct sk_buff *skb)
int ret = NET_RX_DROP;
__be16 type;
- if (!skb->tstamp.tv64)
- net_timestamp(skb);
-
if (vlan_tx_tag_present(skb) && vlan_hwaccel_do_receive(skb))
return NET_RX_SUCCESS;
@@ -2899,23 +2901,27 @@ out:
*/
int netif_receive_skb(struct sk_buff *skb)
{
+ net_timestamp_check(skb);
+
#ifdef CONFIG_RPS
- struct rps_dev_flow voidflow, *rflow = &voidflow;
- int cpu, ret;
+ {
+ struct rps_dev_flow voidflow, *rflow = &voidflow;
+ int cpu, ret;
- rcu_read_lock();
+ rcu_read_lock();
- cpu = get_rps_cpu(skb->dev, skb, &rflow);
+ cpu = get_rps_cpu(skb->dev, skb, &rflow);
- if (cpu >= 0) {
- ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
- rcu_read_unlock();
- } else {
- rcu_read_unlock();
- ret = __netif_receive_skb(skb);
- }
+ if (cpu >= 0) {
+ ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
+ rcu_read_unlock();
+ } else {
+ rcu_read_unlock();
+ ret = __netif_receive_skb(skb);
+ }
- return ret;
+ return ret;
+ }
#else
return __netif_receive_skb(skb);
#endif
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net-next-2.6] net: Consistent skb timestamping
2010-05-06 12:01 [PATCH net-next-2.6] net: Consistent skb timestamping Eric Dumazet
@ 2010-05-06 15:12 ` Tom Herbert
2010-05-06 15:37 ` Eric Dumazet
2010-05-16 6:56 ` David Miller
0 siblings, 2 replies; 7+ messages in thread
From: Tom Herbert @ 2010-05-06 15:12 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev
On Thu, May 6, 2010 at 5:01 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> With RPS inclusion, skb timestamping is not consistent in RX path.
>
> If netif_receive_skb() is used, its deferred after RPS dispatch.
>
> If netif_rx() is used, its done before RPS dispatch.
>
> This can give strange tcpdump timestamps results.
>
> I think timestamping should be done as soon as possible in the receive
> path, to get meaningful values (ie timestamps taken at the time packet
> was delivered by NIC driver to our stack), even if NAPI already can
> defer timestamping a bit (RPS can help to reduce the gap)
>
The counter argument to this is that it moves another thing into the
serialized path for networking which slows everyone down. I'm not
concerned about when tcpdump is running since performance will suck
anyway, but what is bad is if any single socket in the system turns on
SO_TIMESTAMP, overhead is incurred on *every* packet. This happens
regardless of whether the application ever actually gets a timestamp,
or even whether timestamps are supported by the protocol (try setting
SO_TIMESTAMP on a TCP socket ;-) ). I'm contemplating changing
SO_TIMESTAMP to not enable global timestamps, but only take the
timestamp for a packet once the socket is identified and the timestamp
flag is set (this is the technique done in FreeBSD and Solaris, so I
believe the external semantics would still be valid).
> Remove timestamping from __netif_receive_skb, and add it to
> netif_receive_skb(), before RPS.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> net/core/dev.c | 46 ++++++++++++++++++++++++++--------------------
> 1 file changed, 26 insertions(+), 20 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 36d53be..3278003 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1454,7 +1454,7 @@ void net_disable_timestamp(void)
> }
> EXPORT_SYMBOL(net_disable_timestamp);
>
> -static inline void net_timestamp(struct sk_buff *skb)
> +static inline void net_timestamp_set(struct sk_buff *skb)
> {
> if (atomic_read(&netstamp_needed))
> __net_timestamp(skb);
> @@ -1462,6 +1462,12 @@ static inline void net_timestamp(struct sk_buff *skb)
> skb->tstamp.tv64 = 0;
> }
>
> +static inline void net_timestamp_check(struct sk_buff *skb)
> +{
> + if (!skb->tstamp.tv64 && atomic_read(&netstamp_needed))
> + __net_timestamp(skb);
> +}
> +
> /**
> * dev_forward_skb - loopback an skb to another netif
> *
> @@ -1509,9 +1515,9 @@ static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
>
> #ifdef CONFIG_NET_CLS_ACT
> if (!(skb->tstamp.tv64 && (G_TC_FROM(skb->tc_verd) & AT_INGRESS)))
> - net_timestamp(skb);
> + net_timestamp_set(skb);
> #else
> - net_timestamp(skb);
> + net_timestamp_set(skb);
> #endif
>
> rcu_read_lock();
> @@ -2458,8 +2464,7 @@ int netif_rx(struct sk_buff *skb)
> if (netpoll_rx(skb))
> return NET_RX_DROP;
>
> - if (!skb->tstamp.tv64)
> - net_timestamp(skb);
> + net_timestamp_check(skb);
>
> #ifdef CONFIG_RPS
> {
> @@ -2780,9 +2785,6 @@ static int __netif_receive_skb(struct sk_buff *skb)
> int ret = NET_RX_DROP;
> __be16 type;
>
> - if (!skb->tstamp.tv64)
> - net_timestamp(skb);
> -
> if (vlan_tx_tag_present(skb) && vlan_hwaccel_do_receive(skb))
> return NET_RX_SUCCESS;
>
> @@ -2899,23 +2901,27 @@ out:
> */
> int netif_receive_skb(struct sk_buff *skb)
> {
> + net_timestamp_check(skb);
> +
> #ifdef CONFIG_RPS
> - struct rps_dev_flow voidflow, *rflow = &voidflow;
> - int cpu, ret;
> + {
> + struct rps_dev_flow voidflow, *rflow = &voidflow;
> + int cpu, ret;
>
> - rcu_read_lock();
> + rcu_read_lock();
>
> - cpu = get_rps_cpu(skb->dev, skb, &rflow);
> + cpu = get_rps_cpu(skb->dev, skb, &rflow);
>
> - if (cpu >= 0) {
> - ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
> - rcu_read_unlock();
> - } else {
> - rcu_read_unlock();
> - ret = __netif_receive_skb(skb);
> - }
> + if (cpu >= 0) {
> + ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
> + rcu_read_unlock();
> + } else {
> + rcu_read_unlock();
> + ret = __netif_receive_skb(skb);
> + }
>
> - return ret;
> + return ret;
> + }
> #else
> return __netif_receive_skb(skb);
> #endif
>
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next-2.6] net: Consistent skb timestamping
2010-05-06 15:12 ` Tom Herbert
@ 2010-05-06 15:37 ` Eric Dumazet
2010-05-06 16:14 ` Eric Dumazet
2010-05-16 6:56 ` David Miller
1 sibling, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2010-05-06 15:37 UTC (permalink / raw)
To: Tom Herbert; +Cc: David Miller, netdev
Le jeudi 06 mai 2010 à 08:12 -0700, Tom Herbert a écrit :
> On Thu, May 6, 2010 at 5:01 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > With RPS inclusion, skb timestamping is not consistent in RX path.
> >
> > If netif_receive_skb() is used, its deferred after RPS dispatch.
> >
> > If netif_rx() is used, its done before RPS dispatch.
> >
> > This can give strange tcpdump timestamps results.
> >
> > I think timestamping should be done as soon as possible in the receive
> > path, to get meaningful values (ie timestamps taken at the time packet
> > was delivered by NIC driver to our stack), even if NAPI already can
> > defer timestamping a bit (RPS can help to reduce the gap)
> >
> The counter argument to this is that it moves another thing into the
> serialized path for networking which slows everyone down. I'm not
> concerned about when tcpdump is running since performance will suck
> anyway, but what is bad is if any single socket in the system turns on
> SO_TIMESTAMP, overhead is incurred on *every* packet. This happens
> regardless of whether the application ever actually gets a timestamp,
> or even whether timestamps are supported by the protocol (try setting
> SO_TIMESTAMP on a TCP socket ;-) ). I'm contemplating changing
> SO_TIMESTAMP to not enable global timestamps, but only take the
> timestamp for a packet once the socket is identified and the timestamp
> flag is set (this is the technique done in FreeBSD and Solaris, so I
> believe the external semantics would still be valid).
I agree with you, thanks for this excellent argument.
Right now, timestamping is not meant for userland pleasure, but for
sniffers and network diagnostics. (I mean with current API, not with a
new one we could add later)
Once we settle a per socket timestamping, not global, we can reconsider
the thing (or not reconsider it, since socket timestamping will be done
after RPS dispatch)
Its true our global variable to enable/disable timestamp sucks, but its
a separate issue ;)
We probably could have a sysctl to let admin chose the moment timestamp
takes place (before or after RPS)
If TSC is available, here is the "perf top" of the cpu handling
1.200.000 packets per second, while timestamping is requested :
You can hardly see something about time services :
--------------------------------------------------------------------------------------------------------------------------
PerfTop: 983 irqs/sec kernel:99.5% [1000Hz cycles], (all, cpu: 10)
--------------------------------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________________ _______
1568.00 14.9% bnx2x_rx_int vmlinux
1133.00 10.7% eth_type_trans vmlinux
798.00 7.6% kmem_cache_alloc_node vmlinux
720.00 6.8% _raw_spin_lock vmlinux
709.00 6.7% __kmalloc_node_track_caller vmlinux
547.00 5.2% __memset vmlinux
540.00 5.1% __slab_alloc vmlinux
453.00 4.3% get_rps_cpu vmlinux
402.00 3.8% _raw_spin_lock_irqsave vmlinux
295.00 2.8% enqueue_to_backlog vmlinux
271.00 2.6% default_send_IPI_mask_sequence_phys vmlinux
259.00 2.5% get_partial_node vmlinux
235.00 2.2% __alloc_skb vmlinux
227.00 2.2% vlan_gro_common vmlinux
206.00 2.0% swiotlb_dma_mapping_error vmlinux
201.00 1.9% skb_put vmlinux
118.00 1.1% getnstimeofday vmlinux
97.00 0.9% csd_lock vmlinux
96.00 0.9% swiotlb_map_page vmlinux
85.00 0.8% read_tsc vmlinux
76.00 0.7% dev_gro_receive vmlinux
75.00 0.7% __napi_complete vmlinux
74.00 0.7% bnx2x_poll vmlinux
73.00 0.7% unmap_single vmlinux
72.00 0.7% netif_receive_skb vmlinux
66.00 0.6% irq_entries_start vmlinux
65.00 0.6% net_rps_action_and_irq_enable vmlinux
62.00 0.6% __phys_addr vmlinux
If HPET or acpi_pm is used, then you can cry :)
(820.000 pps, or 570.000 pps max)
--------------------------------------------------------------------------------------------------------------------------
PerfTop: 1001 irqs/sec kernel:100.0% [1000Hz cycles], (all, cpu: 10)
--------------------------------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________________ _______
6488.00 48.4% read_hpet vmlinux
1214.00 9.1% bnx2x_rx_int vmlinux
820.00 6.1% eth_type_trans vmlinux
679.00 5.1% _raw_spin_lock vmlinux
678.00 5.1% kmem_cache_alloc_node vmlinux
607.00 4.5% __slab_alloc vmlinux
478.00 3.6% __kmalloc_node_track_caller vmlinux
404.00 3.0% __memset vmlinux
246.00 1.8% get_partial_node vmlinux
213.00 1.6% get_rps_cpu vmlinux
195.00 1.5% enqueue_to_backlog vmlinux
171.00 1.3% __alloc_skb vmlinux
163.00 1.2% vlan_gro_common vmlinux
135.00 1.0% swiotlb_dma_mapping_error vmlinux
118.00 0.9% skb_put vmlinux
88.00 0.7% getnstimeofday vmlinux
60.00 0.4% swiotlb_map_page vmlinux
59.00 0.4% dev_gro_receive vmlinux
--------------------------------------------------------------------------------------------------------------------------
PerfTop: 1001 irqs/sec kernel:100.0% [1000Hz cycles], (all, cpu: 10)
--------------------------------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________________ _______
2573.00 68.3% acpi_pm_read vmlinux
237.00 6.3% bnx2x_rx_int vmlinux
153.00 4.1% eth_type_trans vmlinux
101.00 2.7% kmem_cache_alloc_node vmlinux
99.00 2.6% __kmalloc_node_track_caller vmlinux
79.00 2.1% get_rps_cpu vmlinux
75.00 2.0% __memset vmlinux
72.00 1.9% _raw_spin_lock vmlinux
68.00 1.8% __slab_alloc vmlinux
40.00 1.1% enqueue_to_backlog vmlinux
39.00 1.0% __alloc_skb vmlinux
27.00 0.7% get_partial_node vmlinux
23.00 0.6% swiotlb_dma_mapping_error vmlinux
22.00 0.6% vlan_gro_common vmlinux
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next-2.6] net: Consistent skb timestamping
2010-05-06 15:37 ` Eric Dumazet
@ 2010-05-06 16:14 ` Eric Dumazet
2010-05-16 6:57 ` David Miller
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2010-05-06 16:14 UTC (permalink / raw)
To: Tom Herbert; +Cc: David Miller, netdev
Le jeudi 06 mai 2010 à 17:37 +0200, Eric Dumazet a écrit :
> Right now, timestamping is not meant for userland pleasure, but for
> sniffers and network diagnostics. (I mean with current API, not with a
> new one we could add later)
>
> Once we settle a per socket timestamping, not global, we can reconsider
> the thing (or not reconsider it, since socket timestamping will be done
> after RPS dispatch)
>
> Its true our global variable to enable/disable timestamp sucks, but its
> a separate issue ;)
>
> We probably could have a sysctl to let admin chose the moment timestamp
> takes place (before or after RPS)
Here is v2 of patch,
introducing /proc/sys/net/core/netdev_tstamp_prequeue
Thanks
[PATCH v2 net-next-2.6] net: Consistent skb timestamping
With RPS inclusion, skb timestamping is not consistent in RX path.
If netif_receive_skb() is used, its deferred after RPS dispatch.
If netif_rx() is used, its done before RPS dispatch.
This can give strange tcpdump timestamps results.
I think timestamping should be done as soon as possible in the receive
path, to get meaningful values (ie timestamps taken at the time packet
was delivered by NIC driver to our stack), even if NAPI already can
defer timestamping a bit (RPS can help to reduce the gap)
Tom Herbert prefer to sample timestamps after RPS dispatch. In case
sampling is expensive (HPET/acpi_pm on x86), this makes sense.
Let admins switch from one mode to another, using a new
sysctl, /proc/sys/net/core/netdev_tstamp_prequeue
Its default value (1), means timestamps are taken as soon as possible,
before backlog queueing, giving accurate timestamps.
Setting a 0 value permits to sample timestamps when processing backlog,
after RPS dispatch, to lower the load of the pre-RPS cpu.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
Documentation/sysctl/net.txt | 10 ++++++
include/linux/netdevice.h | 1
net/core/dev.c | 50 ++++++++++++++++++++-------------
net/core/sysctl_net_core.c | 7 ++++
4 files changed, 49 insertions(+), 19 deletions(-)
diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index df38ef0..cbd05ff 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -84,6 +84,16 @@ netdev_max_backlog
Maximum number of packets, queued on the INPUT side, when the interface
receives packets faster than kernel can process them.
+netdev_tstamp_prequeue
+----------------------
+
+If set to 0, RX packet timestamps can be sampled after RPS processing, when
+the target CPU processes packets. It might give some delay on timestamps, but
+permit to distribute the load on several cpus.
+
+If set to 1 (default), timestamps are sampled as soon as possible, before
+queueing.
+
optmem_max
----------
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 69022d4..c1b2341 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2100,6 +2100,7 @@ extern const struct net_device_stats *dev_get_stats(struct net_device *dev);
extern void dev_txq_stats_fold(const struct net_device *dev, struct net_device_stats *stats);
extern int netdev_max_backlog;
+extern int netdev_tstamp_prequeue;
extern int weight_p;
extern int netdev_set_master(struct net_device *dev, struct net_device *master);
extern int skb_checksum_help(struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 36d53be..1ca4de8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1454,7 +1454,7 @@ void net_disable_timestamp(void)
}
EXPORT_SYMBOL(net_disable_timestamp);
-static inline void net_timestamp(struct sk_buff *skb)
+static inline void net_timestamp_set(struct sk_buff *skb)
{
if (atomic_read(&netstamp_needed))
__net_timestamp(skb);
@@ -1462,6 +1462,12 @@ static inline void net_timestamp(struct sk_buff *skb)
skb->tstamp.tv64 = 0;
}
+static inline void net_timestamp_check(struct sk_buff *skb)
+{
+ if (!skb->tstamp.tv64 && atomic_read(&netstamp_needed))
+ __net_timestamp(skb);
+}
+
/**
* dev_forward_skb - loopback an skb to another netif
*
@@ -1509,9 +1515,9 @@ static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
#ifdef CONFIG_NET_CLS_ACT
if (!(skb->tstamp.tv64 && (G_TC_FROM(skb->tc_verd) & AT_INGRESS)))
- net_timestamp(skb);
+ net_timestamp_set(skb);
#else
- net_timestamp(skb);
+ net_timestamp_set(skb);
#endif
rcu_read_lock();
@@ -2202,6 +2208,7 @@ EXPORT_SYMBOL(dev_queue_xmit);
=======================================================================*/
int netdev_max_backlog __read_mostly = 1000;
+int netdev_tstamp_prequeue __read_mostly = 1;
int netdev_budget __read_mostly = 300;
int weight_p __read_mostly = 64; /* old backlog weight */
@@ -2458,8 +2465,8 @@ int netif_rx(struct sk_buff *skb)
if (netpoll_rx(skb))
return NET_RX_DROP;
- if (!skb->tstamp.tv64)
- net_timestamp(skb);
+ if (netdev_tstamp_prequeue)
+ net_timestamp_check(skb);
#ifdef CONFIG_RPS
{
@@ -2780,8 +2787,8 @@ static int __netif_receive_skb(struct sk_buff *skb)
int ret = NET_RX_DROP;
__be16 type;
- if (!skb->tstamp.tv64)
- net_timestamp(skb);
+ if (!netdev_tstamp_prequeue)
+ net_timestamp_check(skb);
if (vlan_tx_tag_present(skb) && vlan_hwaccel_do_receive(skb))
return NET_RX_SUCCESS;
@@ -2899,23 +2906,28 @@ out:
*/
int netif_receive_skb(struct sk_buff *skb)
{
+ if (netdev_tstamp_prequeue)
+ net_timestamp_check(skb);
+
#ifdef CONFIG_RPS
- struct rps_dev_flow voidflow, *rflow = &voidflow;
- int cpu, ret;
+ {
+ struct rps_dev_flow voidflow, *rflow = &voidflow;
+ int cpu, ret;
- rcu_read_lock();
+ rcu_read_lock();
+
+ cpu = get_rps_cpu(skb->dev, skb, &rflow);
- cpu = get_rps_cpu(skb->dev, skb, &rflow);
+ if (cpu >= 0) {
+ ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
+ rcu_read_unlock();
+ } else {
+ rcu_read_unlock();
+ ret = __netif_receive_skb(skb);
+ }
- if (cpu >= 0) {
- ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
- rcu_read_unlock();
- } else {
- rcu_read_unlock();
- ret = __netif_receive_skb(skb);
+ return ret;
}
-
- return ret;
#else
return __netif_receive_skb(skb);
#endif
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index dcc7d25..01eee5d 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -122,6 +122,13 @@ static struct ctl_table net_core_table[] = {
.proc_handler = proc_dointvec
},
{
+ .procname = "netdev_tstamp_prequeue",
+ .data = &netdev_tstamp_prequeue,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec
+ },
+ {
.procname = "message_cost",
.data = &net_ratelimit_state.interval,
.maxlen = sizeof(int),
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net-next-2.6] net: Consistent skb timestamping
2010-05-06 15:12 ` Tom Herbert
2010-05-06 15:37 ` Eric Dumazet
@ 2010-05-16 6:56 ` David Miller
2010-05-16 18:30 ` Dimitris Michailidis
1 sibling, 1 reply; 7+ messages in thread
From: David Miller @ 2010-05-16 6:56 UTC (permalink / raw)
To: therbert; +Cc: eric.dumazet, netdev
From: Tom Herbert <therbert@google.com>
Date: Thu, 6 May 2010 08:12:57 -0700
> I'm contemplating changing SO_TIMESTAMP to not enable global
> timestamps, but only take the timestamp for a packet once the socket
> is identified and the timestamp flag is set (this is the technique
> done in FreeBSD and Solaris, so I believe the external semantics
> would still be valid).
This is not tenable.
Users have made it clear in the past that when they ask for a timestamp
they really want the timestamp as close to the device receive handling
path as possible.
Users basically really want timestamps in two places:
1) As near the device RX handling as possible
2) The point at which recvmsg() got the data
The former is obtainable from SO_TIMESTAMP and the latter from
gettimeofday().
So putting it way down to the point where we choose the socket isn't
going to work at all.
FreeBSD and Solaris combined have a tiny sliver of the number of users
we have to cater to, so they can have all kinds of latitude with which
to break things like that. So saying they do something is like saying
"the moon was out tonight", it has no relevance on whether we are able
to do it too :-)
The real fix is to make the devices less stupid and give us timestamps
directly, and thanks to things like PTP support in hardware that's
actually more and more of a reality these days.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next-2.6] net: Consistent skb timestamping
2010-05-06 16:14 ` Eric Dumazet
@ 2010-05-16 6:57 ` David Miller
0 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2010-05-16 6:57 UTC (permalink / raw)
To: eric.dumazet; +Cc: therbert, netdev
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 06 May 2010 18:14:48 +0200
> [PATCH v2 net-next-2.6] net: Consistent skb timestamping
...
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied, thanks Eric.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next-2.6] net: Consistent skb timestamping
2010-05-16 6:56 ` David Miller
@ 2010-05-16 18:30 ` Dimitris Michailidis
0 siblings, 0 replies; 7+ messages in thread
From: Dimitris Michailidis @ 2010-05-16 18:30 UTC (permalink / raw)
To: David Miller; +Cc: therbert, eric.dumazet, netdev
David Miller wrote:
> The real fix is to make the devices less stupid and give us timestamps
> directly, and thanks to things like PTP support in hardware that's
> actually more and more of a reality these days.
For cxgb4 a timestamp is written into Rx descriptors for each received
packet. The value comes from a TSC-like cycle counter. The raw timestamp
is very cheap to get, its value converted to system ktime a bit less so
though not too bad. It would be nicer though if the stack could hint the
driver whether it should do the conversion at all. Maybe export
netstamp_needed and add an inline wrapper to read it?
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-05-16 18:31 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-06 12:01 [PATCH net-next-2.6] net: Consistent skb timestamping Eric Dumazet
2010-05-06 15:12 ` Tom Herbert
2010-05-06 15:37 ` Eric Dumazet
2010-05-06 16:14 ` Eric Dumazet
2010-05-16 6:57 ` David Miller
2010-05-16 6:56 ` David Miller
2010-05-16 18:30 ` Dimitris Michailidis
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).