An inconsistency/bug in ingress netem timestamps

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* An inconsistency/bug in ingress netem timestamps
@ 2009-04-13 19:50 Alex Sidorenko
  2009-04-15 19:50 ` Jarek Poplawski
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Sidorenko @ 2009-04-13 19:50 UTC (permalink / raw)
  To: netdev

Hello,

while experimenting with 'netem' we have found some strange behaviour. It 
seemed that ingress delay as measured by 'ping' command shows up on some 
hosts but not on others.

After some investigation I have found that the problem is that skbuff->tstamp 
field value depends on whether there are any packet sniffers enabled. That 
is:

- if any ptype_all handler is registered, the tstamp field is as expected
- if there are no ptype_all handlers, the tstamp field does not show the delay

I was able to see the problem on RHEL5 (2.6.18 kernel) and Ubuntu/Jaunty 
(2.6.28 kernel).

Duplication
-----------

1. Enable ingress delay, e.g. 100ms

# modprobe ifb
# ip link set dev ifb0 up
# tc qdisc add dev eth0 ingress
# tc filter add dev eth0 parent ffff: \
   protocol ip u32 match u32 0 0 flowid 1:1 action mirred egress \
   redirect dev ifb0
# tc qdisc add dev ifb0 root netem delay 100ms

2. Check that there are no ptype_all handlers registered (stop DHCP, tcpdump, 
vmware etc.)

3. ping any other host on the LAN, e.g.
{asid 14:54:24} ping cats
PING cats (192.168.0.33) 56(84) bytes of data.
64 bytes from cats (192.168.0.33): icmp_seq=1 ttl=64 time=0.258 ms
                                                     ^^^^^^^^^^^^^
Now start tcpdump on any interface (not necessarily eth0)

{asid 15:25:45} ping cats
PING cats (192.168.0.33) 56(84) bytes of data.
64 bytes from cats (192.168.0.33): icmp_seq=1 ttl=64 time=100 ms
                                                     ^^^^^^^^^^^

The ingress packets are really delayed as can be seen from 'ping -U', even 
without tcpdump running:

{asid 15:26:12} ping -U cats
PING cats (192.168.0.33) 56(84) bytes of data.
64 bytes from cats (192.168.0.33): icmp_seq=1 ttl=64 time=100 ms
                                                     ^^^^^^^^^^^

The problem is that modern 'ping' uses SO_TIMESTAMP facility and for some 
reason skb->tstamp is not updated. I was able to verify this with stap script 
(printing skb->tstamp.tv64 in several places).

The strange thing is that as soon as there is any ptype_all handler installed, 
skb->tstamp is updated properly. Unfortunately, my knowledge of TC internals 
is not good enough to find how exactly this happens. There are some comments 
in handle_ing()

	if (*pt_prev) {
		*ret = deliver_skb(skb, *pt_prev, orig_dev);
		*pt_prev = NULL;
	} else {
		/* Huh? Why does turning on AF_PACKET affect this? */
		skb->tc_verd = SET_TC_OK2MUNGE(skb->tc_verd);
	}

but looking at all the places where OK2MUNGE bit is used I don't see how this 
could change the behaviour.

Even though it's a minor issue (after all, the packets are delayed 
and 'ping -U' shows it), it would be nice to have a consistent behaviour 
between cases when there are ptype_all handlers and when there are none.

Regards,
Alex

-- 
------------------------------------------------------------------
Alexandre Sidorenko             email: asid@hp.com
WTEC Linux			Hewlett-Packard (Canada)
------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: An inconsistency/bug in ingress netem timestamps
  2009-04-13 19:50 An inconsistency/bug in ingress netem timestamps Alex Sidorenko
@ 2009-04-15 19:50 ` Jarek Poplawski
  2009-04-15 20:10   ` Alex Sidorenko
  0 siblings, 1 reply; 14+ messages in thread
From: Jarek Poplawski @ 2009-04-15 19:50 UTC (permalink / raw)
  To: Alex Sidorenko; +Cc: netdev

Alex Sidorenko wrote, On 04/13/2009 09:50 PM:

> Hello,
> 
> while experimenting with 'netem' we have found some strange behaviour. It 
> seemed that ingress delay as measured by 'ping' command shows up on some 
> hosts but not on others.
> 
> After some investigation I have found that the problem is that skbuff->tstamp 
> field value depends on whether there are any packet sniffers enabled. That 
> is:
> 
> - if any ptype_all handler is registered, the tstamp field is as expected
> - if there are no ptype_all handlers, the tstamp field does not show the delay
> 
> I was able to see the problem on RHEL5 (2.6.18 kernel) and Ubuntu/Jaunty 
> (2.6.28 kernel).
> 
> Duplication
> -----------
> 
> 1. Enable ingress delay, e.g. 100ms
> 
> # modprobe ifb
> # ip link set dev ifb0 up
> # tc qdisc add dev eth0 ingress
> # tc filter add dev eth0 parent ffff: \
>    protocol ip u32 match u32 0 0 flowid 1:1 action mirred egress \
>    redirect dev ifb0
> # tc qdisc add dev ifb0 root netem delay 100ms
> 
> 2. Check that there are no ptype_all handlers registered (stop DHCP, tcpdump, 
> vmware etc.)
> 
> 3. ping any other host on the LAN, e.g.
> {asid 14:54:24} ping cats
> PING cats (192.168.0.33) 56(84) bytes of data.
> 64 bytes from cats (192.168.0.33): icmp_seq=1 ttl=64 time=0.258 ms
>                                                      ^^^^^^^^^^^^^
> Now start tcpdump on any interface (not necessarily eth0)
> 
> {asid 15:25:45} ping cats
> PING cats (192.168.0.33) 56(84) bytes of data.
> 64 bytes from cats (192.168.0.33): icmp_seq=1 ttl=64 time=100 ms
>                                                      ^^^^^^^^^^^
> 
> The ingress packets are really delayed as can be seen from 'ping -U', even 
> without tcpdump running:
> 
> {asid 15:26:12} ping -U cats
> PING cats (192.168.0.33) 56(84) bytes of data.
> 64 bytes from cats (192.168.0.33): icmp_seq=1 ttl=64 time=100 ms
>                                                      ^^^^^^^^^^^

I agree there is an inconsistency, but it seems 100 ms isn't the
"right" thing to show here. It shows an internal delay added on ifb by
any packet scheduler, so probably not what a user usually expects.

> 
> The problem is that modern 'ping' uses SO_TIMESTAMP facility and for some 
> reason skb->tstamp is not updated. I was able to verify this with stap script 
> (printing skb->tstamp.tv64 in several places).
> 
> The strange thing is that as soon as there is any ptype_all handler installed, 
> skb->tstamp is updated properly. Unfortunately, my knowledge of TC internals 
> is not good enough to find how exactly this happens.

Isn't it when act_mirred calls dev_queue_xmit with dev_queue_xmit_nit?
But, as above mentioned, I doubt it's "updated properly" in this case.

Jarek P.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: An inconsistency/bug in ingress netem timestamps
  2009-04-15 19:50 ` Jarek Poplawski
@ 2009-04-15 20:10   ` Alex Sidorenko
  2009-04-15 20:26     ` Jarek Poplawski
  2009-04-16 10:10     ` David Miller
  0 siblings, 2 replies; 14+ messages in thread
From: Alex Sidorenko @ 2009-04-15 20:10 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev@vger.kernel.org

On April 15, 2009 03:50:22 pm Jarek Poplawski wrote:

> I agree there is an inconsistency, but it seems 100 ms isn't the
> "right" thing to show here. It shows an internal delay added on ifb by
> any packet scheduler, so probably not what a user usually expects.

Hi Jarek,

thank you for your comments. Yes, I understand that it just looked OK in this 
case even though technically the value was not quite correct.

> > The strange thing is that as soon as there is any ptype_all handler
> > installed, skb->tstamp is updated properly. Unfortunately, my knowledge
> > of TC internals is not good enough to find how exactly this happens.
>
> Isn't it when act_mirred calls dev_queue_xmit with dev_queue_xmit_nit?
> But, as above mentioned, I doubt it's "updated properly" in this case.

I can see that dev_queue_xmit_nit calls net_timestamp(skb) unconditionally. I 
agree that to fix this properly we need to update tstamp in another place 
explicitly (in ifb or netem?).

Thanks,
Alex

-- 
------------------------------------------------------------------
Alexandre Sidorenko             email: asid@hp.com
WTEC Linux			Hewlett-Packard (Canada)
------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: An inconsistency/bug in ingress netem timestamps
  2009-04-15 20:10   ` Alex Sidorenko
@ 2009-04-15 20:26     ` Jarek Poplawski
  2009-04-15 20:29       ` Stephen Hemminger
  2009-04-16 10:10     ` David Miller
  1 sibling, 1 reply; 14+ messages in thread
From: Jarek Poplawski @ 2009-04-15 20:26 UTC (permalink / raw)
  To: Alex Sidorenko; +Cc: netdev@vger.kernel.org, Stephen Hemminger

On Wed, Apr 15, 2009 at 04:10:43PM -0400, Alex Sidorenko wrote:
> On April 15, 2009 03:50:22 pm Jarek Poplawski wrote:
> 
> > I agree there is an inconsistency, but it seems 100 ms isn't the
> > "right" thing to show here. It shows an internal delay added on ifb by
> > any packet scheduler, so probably not what a user usually expects.
> 
> Hi Jarek,
> 
> thank you for your comments. Yes, I understand that it just looked OK in this 
> case even though technically the value was not quite correct.
> 
> > > The strange thing is that as soon as there is any ptype_all handler
> > > installed, skb->tstamp is updated properly. Unfortunately, my knowledge
> > > of TC internals is not good enough to find how exactly this happens.
> >
> > Isn't it when act_mirred calls dev_queue_xmit with dev_queue_xmit_nit?
> > But, as above mentioned, I doubt it's "updated properly" in this case.
> 
> I can see that dev_queue_xmit_nit calls net_timestamp(skb) unconditionally. I 
> agree that to fix this properly we need to update tstamp in another place 
> explicitly (in ifb or netem?).

Hmm... I'm not sure how "popular" is netem on ifb, but we could try
Stephen's opinion (Cc-ed).

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: An inconsistency/bug in ingress netem timestamps
  2009-04-15 20:26     ` Jarek Poplawski
@ 2009-04-15 20:29       ` Stephen Hemminger
  2009-04-15 21:00         ` Alex Sidorenko
  0 siblings, 1 reply; 14+ messages in thread
From: Stephen Hemminger @ 2009-04-15 20:29 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Alex Sidorenko, netdev@vger.kernel.org

On Wed, 15 Apr 2009 22:26:20 +0200
Jarek Poplawski <jarkao2@gmail.com> wrote:

> On Wed, Apr 15, 2009 at 04:10:43PM -0400, Alex Sidorenko wrote:
> > On April 15, 2009 03:50:22 pm Jarek Poplawski wrote:
> > 
> > > I agree there is an inconsistency, but it seems 100 ms isn't the
> > > "right" thing to show here. It shows an internal delay added on ifb by
> > > any packet scheduler, so probably not what a user usually expects.
> > 
> > Hi Jarek,
> > 
> > thank you for your comments. Yes, I understand that it just looked OK in this 
> > case even though technically the value was not quite correct.
> > 
> > > > The strange thing is that as soon as there is any ptype_all handler
> > > > installed, skb->tstamp is updated properly. Unfortunately, my knowledge
> > > > of TC internals is not good enough to find how exactly this happens.
> > >
> > > Isn't it when act_mirred calls dev_queue_xmit with dev_queue_xmit_nit?
> > > But, as above mentioned, I doubt it's "updated properly" in this case.
> > 
> > I can see that dev_queue_xmit_nit calls net_timestamp(skb) unconditionally. I 
> > agree that to fix this properly we need to update tstamp in another place 
> > explicitly (in ifb or netem?).
> 
> Hmm... I'm not sure how "popular" is netem on ifb, but we could try
> Stephen's opinion (Cc-ed).

If you are putting on netem on ingress, the timestamps could happen before
or after the added delay. As long as it is consistent, then I have no problem
with the existing behavior; ie. it is not a bug, it just works that way.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: An inconsistency/bug in ingress netem timestamps
  2009-04-15 20:29       ` Stephen Hemminger
@ 2009-04-15 21:00         ` Alex Sidorenko
  2009-04-15 23:41           ` David Miller
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Sidorenko @ 2009-04-15 21:00 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Jarek Poplawski, netdev@vger.kernel.org

On April 15, 2009 04:29:18 pm Stephen Hemminger wrote:
>  As long as it is consistent, then I have no problem
> with the existing behavior; ie. it is not a bug, it just works that way.

Hi Stephen,

the timestamps change depending on whether there are any ptype_all handlers 
registered. Just starting tcpdump changes the behaviour, this probably 
means 'inconsistent' ?

Regards,
Alex

-- 
------------------------------------------------------------------
Alexandre Sidorenko             email: asid@hp.com
WTEC Linux			Hewlett-Packard (Canada)
------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: An inconsistency/bug in ingress netem timestamps
  2009-04-15 21:00         ` Alex Sidorenko
@ 2009-04-15 23:41           ` David Miller
  0 siblings, 0 replies; 14+ messages in thread
From: David Miller @ 2009-04-15 23:41 UTC (permalink / raw)
  To: alexandre.sidorenko; +Cc: shemminger, jarkao2, netdev

From: Alex Sidorenko <alexandre.sidorenko@hp.com>
Date: Wed, 15 Apr 2009 17:00:59 -0400

> the timestamps change depending on whether there are any ptype_all handlers 
> registered. Just starting tcpdump changes the behaviour, this probably 
> means 'inconsistent' ?

It changes whether there is a "timestamp user" and packet sniffers are
currently defined as such as user.

The argument is whether the overhead of making this type of use
a "timestamp user" is warranted or not.

Turning on timestamps is heavily optimized like this because taking
the timestamp on every packet is extremely expensive, especially
on large classes of x86 systems.

Therefore if we make changes here, they have to have a very specific
and limited scope in order to avoid turning this expensive operation
on when it's not really necessary.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: An inconsistency/bug in ingress netem timestamps
  2009-04-15 20:10   ` Alex Sidorenko
  2009-04-15 20:26     ` Jarek Poplawski
@ 2009-04-16 10:10     ` David Miller
  2009-04-16 12:09       ` Alex Sidorenko
  2009-04-16 21:48       ` Jarek Poplawski
  1 sibling, 2 replies; 14+ messages in thread
From: David Miller @ 2009-04-16 10:10 UTC (permalink / raw)
  To: alexandre.sidorenko; +Cc: jarkao2, netdev

From: Alex Sidorenko <alexandre.sidorenko@hp.com>
Date: Wed, 15 Apr 2009 16:10:43 -0400

> On April 15, 2009 03:50:22 pm Jarek Poplawski wrote:
> 
>> Isn't it when act_mirred calls dev_queue_xmit with dev_queue_xmit_nit?
>> But, as above mentioned, I doubt it's "updated properly" in this case.
> 
> I can see that dev_queue_xmit_nit calls net_timestamp(skb) unconditionally. I 
> agree that to fix this properly we need to update tstamp in another place 
> explicitly (in ifb or netem?).

Since IFB completely bypasses netif_rx() and netif_receive_skb() I
think it should unconditionally set skb->tstamp.tv64 to zero and
invoke net_timestamp()

This would match the behavior of loopback and tunnels, and in my
opinion this is reasonable.  There will be virtually no overhead
added unless timestamping is enabled via ping or similar, and in
return we get what I think is correctness :-)

This also means we need to export net_timestamp() to modules.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: An inconsistency/bug in ingress netem timestamps
  2009-04-16 10:10     ` David Miller
@ 2009-04-16 12:09       ` Alex Sidorenko
  2009-04-16 21:48       ` Jarek Poplawski
  1 sibling, 0 replies; 14+ messages in thread
From: Alex Sidorenko @ 2009-04-16 12:09 UTC (permalink / raw)
  To: David Miller; +Cc: jarkao2@gmail.com, netdev@vger.kernel.org

On April 16, 2009 06:10:34 am David Miller wrote:
> Since IFB completely bypasses netif_rx() and netif_receive_skb() I
> think it should unconditionally set skb->tstamp.tv64 to zero and
> invoke net_timestamp()

>From ifb.c (2.6.29):

 109                if (from & AT_EGRESS) {
 110                        dp->st_rx_frm_egr++;
 111                        dev_queue_xmit(skb);
 112                } else if (from & AT_INGRESS) {
 113                        dp->st_rx_frm_ing++;
 114                        skb_pull(skb, skb->dev->hard_header_len);
 115                        netif_rx(skb);
 116                }

Adding skb->tstamp.tv64 = 0 between lines 114 and 115 made 'ping' report the 
delay as expected (tested on 2.6.28)

Alex


-- 
------------------------------------------------------------------
Alexandre Sidorenko             email: asid@hp.com
WTEC Linux			Hewlett-Packard (Canada)
------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: An inconsistency/bug in ingress netem timestamps
  2009-04-16 10:10     ` David Miller
  2009-04-16 12:09       ` Alex Sidorenko
@ 2009-04-16 21:48       ` Jarek Poplawski
  2009-04-17 12:04         ` David Miller
  1 sibling, 1 reply; 14+ messages in thread
From: Jarek Poplawski @ 2009-04-16 21:48 UTC (permalink / raw)
  To: David Miller; +Cc: alexandre.sidorenko, netdev, Stephen Hemminger

David Miller wrote, On 04/16/2009 12:10 PM:

> From: Alex Sidorenko <alexandre.sidorenko@hp.com>
> Date: Wed, 15 Apr 2009 16:10:43 -0400
> 
>> On April 15, 2009 03:50:22 pm Jarek Poplawski wrote:
>>
>>> Isn't it when act_mirred calls dev_queue_xmit with dev_queue_xmit_nit?
>>> But, as above mentioned, I doubt it's "updated properly" in this case.
>> I can see that dev_queue_xmit_nit calls net_timestamp(skb) unconditionally. I 
>> agree that to fix this properly we need to update tstamp in another place 
>> explicitly (in ifb or netem?).
> 
> Since IFB completely bypasses netif_rx() and netif_receive_skb() I
> think it should unconditionally set skb->tstamp.tv64 to zero and
> invoke net_timestamp()

IFB calls netif_rx() and I don't understand why do we need to update
tstamp again except for this netem case.

> This would match the behavior of loopback and tunnels, and in my
> opinion this is reasonable.  There will be virtually no overhead
> added unless timestamping is enabled via ping or similar, and in
> return we get what I think is correctness :-)

I think we need some consistency in counting or not counting packet
scheduling delays into timestamps. Anyway we should avoid unnecessary
updates like now, so I'm proposing something different (for testing).

Jarek P.
---

 net/core/dev.c        |    5 +++++
 net/sched/sch_netem.c |    8 ++++++++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 91d792d..ca740c0 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1336,7 +1336,12 @@ static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct packet_type *ptype;
 
+#ifdef CONFIG_NET_CLS_ACT
+	if (!(skb->tstamp.tv64 && (G_TC_FROM(skb->tc_verd) & AT_INGRESS)))
+		net_timestamp(skb);
+#else
 	net_timestamp(skb);
+#endif
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(ptype, &ptype_all, list) {
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index d876b87..2b88295 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -280,6 +280,14 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 			if (unlikely(!skb))
 				return NULL;
 
+#ifdef CONFIG_NET_CLS_ACT
+			/*
+			 * If it's at ingress let's pretend the delay is
+			 * from the network (tstamp will be updated).
+			 */
+			if (G_TC_FROM(skb->tc_verd) & AT_INGRESS)
+				skb->tstamp.tv64 = 0;
+#endif
 			pr_debug("netem_dequeue: return skb=%p\n", skb);
 			sch->q.qlen--;
 			return skb;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: An inconsistency/bug in ingress netem timestamps
  2009-04-16 21:48       ` Jarek Poplawski
@ 2009-04-17 12:04         ` David Miller
  2009-04-17 16:50           ` Alex Sidorenko
  0 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2009-04-17 12:04 UTC (permalink / raw)
  To: jarkao2; +Cc: alexandre.sidorenko, netdev, shemminger

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Thu, 16 Apr 2009 23:48:46 +0200

> David Miller wrote, On 04/16/2009 12:10 PM:
> 
>> Since IFB completely bypasses netif_rx() and netif_receive_skb() I
>> think it should unconditionally set skb->tstamp.tv64 to zero and
>> invoke net_timestamp()
> 
> IFB calls netif_rx() and I don't understand why do we need to update
> tstamp again except for this netem case.
> 
>> This would match the behavior of loopback and tunnels, and in my
>> opinion this is reasonable.  There will be virtually no overhead
>> added unless timestamping is enabled via ping or similar, and in
>> return we get what I think is correctness :-)
> 
> I think we need some consistency in counting or not counting packet
> scheduling delays into timestamps. Anyway we should avoid unnecessary
> updates like now, so I'm proposing something different (for testing).

Ok, now I understand this situation even more clearly, thanks
Jarek.

I think your patch is the most palatable solution I've seen
so far, but I want to consider it some more.

Meanwhile, Alexandre can you test Jarek's patch for your case?

Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: An inconsistency/bug in ingress netem timestamps
  2009-04-17 12:04         ` David Miller
@ 2009-04-17 16:50           ` Alex Sidorenko
  2009-04-17 20:08             ` [PATCH] " Jarek Poplawski
  2009-04-20  9:15             ` David Miller
  0 siblings, 2 replies; 14+ messages in thread
From: Alex Sidorenko @ 2009-04-17 16:50 UTC (permalink / raw)
  To: David Miller
  Cc: jarkao2@gmail.com, netdev@vger.kernel.org, shemminger@vyatta.com

On April 17, 2009 08:04:22 am David Miller wrote:
> Meanwhile, Alexandre can you test Jarek's patch for your case?

I have applied Jarek's patch to 2.6.29.1 and tested with 'ping'. Everything 
works fine.

Alex

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] Re: An inconsistency/bug in ingress netem timestamps
  2009-04-17 16:50           ` Alex Sidorenko
@ 2009-04-17 20:08             ` Jarek Poplawski
  2009-04-20  9:15             ` David Miller
  1 sibling, 0 replies; 14+ messages in thread
From: Jarek Poplawski @ 2009-04-17 20:08 UTC (permalink / raw)
  To: Alex Sidorenko
  Cc: David Miller, netdev@vger.kernel.org, shemminger@vyatta.com

On Fri, Apr 17, 2009 at 12:50:02PM -0400, Alex Sidorenko wrote:
> On April 17, 2009 08:04:22 am David Miller wrote:
> > Meanwhile, Alexandre can you test Jarek's patch for your case?
> 
> I have applied Jarek's patch to 2.6.29.1 and tested with 'ping'. Everything 
> works fine.
> 
> Alex

Thanks,
Jarek P.
-------------------->
net: sch_netem: Fix an inconsistency in ingress netem timestamps.

Alex Sidorenko reported:

"while experimenting with 'netem' we have found some strange behaviour. It 
seemed that ingress delay as measured by 'ping' command shows up on some 
hosts but not on others.

After some investigation I have found that the problem is that skbuff->tstamp 
field value depends on whether there are any packet sniffers enabled. That 
is:

- if any ptype_all handler is registered, the tstamp field is as expected
- if there are no ptype_all handlers, the tstamp field does not show the delay"

This patch prevents unnecessary update of tstamp in dev_queue_xmit_nit()
on ingress path (with act_mirred) adding a check, so minimal overhead on
the fast path, but only when sniffers etc. are active.

Since netem at ingress seems to logically emulate a network before a host,
tstamp is zeroed to trigger the update and pretend delays are from the
outside.

Reported-by: Alex Sidorenko <alexandre.sidorenko@hp.com>
Tested-by: Alex Sidorenko <alexandre.sidorenko@hp.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---

 net/core/dev.c        |    5 +++++
 net/sched/sch_netem.c |    8 ++++++++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 91d792d..ca740c0 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1336,7 +1336,12 @@ static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct packet_type *ptype;
 
+#ifdef CONFIG_NET_CLS_ACT
+	if (!(skb->tstamp.tv64 && (G_TC_FROM(skb->tc_verd) & AT_INGRESS)))
+		net_timestamp(skb);
+#else
 	net_timestamp(skb);
+#endif
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(ptype, &ptype_all, list) {
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index d876b87..2b88295 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -280,6 +280,14 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
 			if (unlikely(!skb))
 				return NULL;
 
+#ifdef CONFIG_NET_CLS_ACT
+			/*
+			 * If it's at ingress let's pretend the delay is
+			 * from the network (tstamp will be updated).
+			 */
+			if (G_TC_FROM(skb->tc_verd) & AT_INGRESS)
+				skb->tstamp.tv64 = 0;
+#endif
 			pr_debug("netem_dequeue: return skb=%p\n", skb);
 			sch->q.qlen--;
 			return skb;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: An inconsistency/bug in ingress netem timestamps
  2009-04-17 16:50           ` Alex Sidorenko
  2009-04-17 20:08             ` [PATCH] " Jarek Poplawski
@ 2009-04-20  9:15             ` David Miller
  1 sibling, 0 replies; 14+ messages in thread
From: David Miller @ 2009-04-20  9:15 UTC (permalink / raw)
  To: alexandre.sidorenko; +Cc: jarkao2, netdev, shemminger

From: Alex Sidorenko <alexandre.sidorenko@hp.com>
Date: Fri, 17 Apr 2009 12:50:02 -0400

> On April 17, 2009 08:04:22 am David Miller wrote:
>> Meanwhile, Alexandre can you test Jarek's patch for your case?
> 
> I have applied Jarek's patch to 2.6.29.1 and tested with 'ping'. Everything 
> works fine.

Ok I'll add Jarek's patch to the tree, thanks for testing.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-04-20  9:15 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-13 19:50 An inconsistency/bug in ingress netem timestamps Alex Sidorenko
2009-04-15 19:50 ` Jarek Poplawski
2009-04-15 20:10   ` Alex Sidorenko
2009-04-15 20:26     ` Jarek Poplawski
2009-04-15 20:29       ` Stephen Hemminger
2009-04-15 21:00         ` Alex Sidorenko
2009-04-15 23:41           ` David Miller
2009-04-16 10:10     ` David Miller
2009-04-16 12:09       ` Alex Sidorenko
2009-04-16 21:48       ` Jarek Poplawski
2009-04-17 12:04         ` David Miller
2009-04-17 16:50           ` Alex Sidorenko
2009-04-17 20:08             ` [PATCH] " Jarek Poplawski
2009-04-20  9:15             ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).