Re: packet re-ordering on SMP machines.

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: packet re-ordering on SMP machines.
       [not found] <Pine.GSO.4.30.0208251149320.29461-100000@shell.cyberus.ca>
@ 2002-08-25 18:32 ` Ben Greear
  2002-08-26  0:52   ` jamal
  0 siblings, 1 reply; 19+ messages in thread
From: Ben Greear @ 2002-08-25 18:32 UTC (permalink / raw)
  To: jamal; +Cc: netdev

jamal wrote:
> 
> 
> 
> NAPI fixes packet reordering problems.

It does indeed.  I just patched the e1000 with the latest NAPI patch
I could find (from Aug 15 or so), and the re-ordering problems went away.

The amount of packets dropped decreased too, but I still see about 1 out of
1000 packets dropped due to rx-FIFO or rx-dropped.  This is when trying to run
60,000 pps of 1514 byte packets from one port to the other on the same dual-port e1000
NIC (copper).  It will generate up to about 72,000 pps without dropping too many
more...

I will do some more tests on two single-port NICs soon to see if that
performs better.

Also, I see the hard_start_xmit call failing 5876 times out of 2719493
calls (for example).  The code that calls the method looks like this:

                         spin_lock_bh(&odev->xmit_lock);
                         if (!netif_queue_stopped(odev)) {
                                 if (odev->hard_start_xmit(next->skb, odev)) {
                                         if (net_ratelimit()) {
                                                 printk(KERN_INFO "Hard xmit error\n");
                                         }
                                         next->errors++;
                                         next->last_ok = 0;
                                 }
                                 else {
                                         next->last_ok = 1;
                                         next->sofar++;
                                         next->tx_bytes += (next->cur_pkt_size + 4); /* count csum */
                                 }

                                 next->next_tx_ns = getRelativeCurNs() + next->ipg;
                         }
                         else {  /* Re-try it next time */
                                 next->last_ok = 0;
                         }

                         spin_unlock_bh(&odev->xmit_lock);

I have not seen hard_start_xmit fail on other drivers, even when over-driving them
well beyond their capabilities.  Any ideas what causes the hard_start_xmit errors?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: packet re-ordering on SMP machines.
  2002-08-25 18:32 ` packet re-ordering on SMP machines Ben Greear
@ 2002-08-26  0:52   ` jamal
  2002-08-26  4:34     ` Ben Greear
  0 siblings, 1 reply; 19+ messages in thread
From: jamal @ 2002-08-26  0:52 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev

On Sun, 25 Aug 2002, Ben Greear wrote:

> jamal wrote:
> >
> >
> >
> > NAPI fixes packet reordering problems.
>
> It does indeed.  I just patched the e1000 with the latest NAPI patch
> I could find (from Aug 15 or so), and the re-ordering problems went away.
>
> The amount of packets dropped decreased too, but I still see about 1 out of
> 1000 packets dropped due to rx-FIFO or rx-dropped.  This is when trying to run
> 60,000 pps of 1514 byte packets from one port to the other on the same dual-port e1000
> NIC (copper).  It will generate up to about 72,000 pps without dropping too many
> more...
>

That doesnt sound impressive at all. I know it's about .8 of wire rate
but you should be able to exceed that.
Robert was generating in the range of 800Kpps with that NIC if i recall
corectly

> I will do some more tests on two single-port NICs soon to see if that
> performs better.

You should see better numbers.
Also if you have SMP, tie each onto a CPU.
Additionaly get the skb recycler patch from Robert, it should improve
things even more.

>
> Also, I see the hard_start_xmit call failing 5876 times out of 2719493
> calls (for example).  The code that calls the method looks like this:
>

I dont have access to that NIC. But a stoopid question: Have you tried
increasing the transmit queue via ifconfig? 1000 packets is reasonable
for gige.

cheers,
jamal

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: packet re-ordering on SMP machines.
  2002-08-26  0:52   ` jamal
@ 2002-08-26  4:34     ` Ben Greear
  2002-08-26 11:20       ` jamal
  2002-08-26 23:03       ` Xiaoliang (David) Wei
  0 siblings, 2 replies; 19+ messages in thread
From: Ben Greear @ 2002-08-26  4:34 UTC (permalink / raw)
  To: jamal; +Cc: netdev

jamal wrote:

> That doesnt sound impressive at all. I know it's about .8 of wire rate
> but you should be able to exceed that.
> Robert was generating in the range of 800Kpps with that NIC if i recall
> corectly

I had only tested 1514 byte pkts, so I was getting around 880Mbps,
which is pretty good as far as I know.

I see about 255 kpps when sending 64 byte pkts to myself.  Still
dropping about 1 in 4000 packets at this speed.  I think most of Robert's
tests didn't involve actually doing something with the received packet
though, and I am inspecting it for latency, sequence number, etc.

I'm even doing a __get_timeofday() call to calculate the latency...need
to find a faster way to do that...

If I only allocate/scan 1 per 100 packets (ie alloc one packet and send it 100 times),
then I get a more respectable 365kpps.  Robert's patch should definately help!

> Also if you have SMP, tie each onto a CPU.

That's with the irq_afinity thing in proc, right?

> Additionaly get the skb recycler patch from Robert, it should improve
> things even more.

Do you happen to have a URL for this?

Actually, the various network tweaks are relatively hard to find
(at least to find the most up-to-date coppies).  It would be great if
there was a place where they were all concentrated.

> 
> 
>>Also, I see the hard_start_xmit call failing 5876 times out of 2719493
>>calls (for example).  The code that calls the method looks like this:
>>
> 
> 
> I dont have access to that NIC. But a stoopid question: Have you tried
> increasing the transmit queue via ifconfig? 1000 packets is reasonable
> for gige.

I upped it, but it didn't stop the errors.  The NIC is still performing,
so it may not be a real problem...

Thanks for the info,
Ben

-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: packet re-ordering on SMP machines.
  2002-08-26  4:34     ` Ben Greear
@ 2002-08-26 11:20       ` jamal
  2002-08-26 23:03       ` Xiaoliang (David) Wei
  1 sibling, 0 replies; 19+ messages in thread
From: jamal @ 2002-08-26 11:20 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev




On Sun, 25 Aug 2002, Ben Greear wrote:

> jamal wrote:
>
> > That doesnt sound impressive at all. I know it's about .8 of wire rate
> > but you should be able to exceed that.
> > Robert was generating in the range of 800Kpps with that NIC if i recall
> > corectly
>
> I had only tested 1514 byte pkts, so I was getting around 880Mbps,
> which is pretty good as far as I know.

theres no reason you shouldnt be able to do wire rate.

>
> I see about 255 kpps when sending 64 byte pkts to myself.  Still
> dropping about 1 in 4000 packets at this speed.  I think most of Robert's
> tests didn't involve actually doing something with the received packet
> though, and I am inspecting it for latency, sequence number, etc.
>
> I'm even doing a __get_timeofday() call to calculate the latency...need
> to find a faster way to do that...
>

ouch.
for latency or sequencing you dont really need to all packets. Read
academic papers on the subject. You probably need about 5% of the total
packets. Also you dont have to do the checks at runtime, you can do them
once the run is complete (which you should be able to tell since you
control both send and receive).

> If I only allocate/scan 1 per 100 packets (ie alloc one packet and send it 100 times),
> then I get a more respectable 365kpps.  Robert's patch should definately help!
>

Yes, clearly you will benefit.

> > Also if you have SMP, tie each onto a CPU.
>
> That's with the irq_afinity thing in proc, right?

yes.

>
> > Additionaly get the skb recycler patch from Robert, it should improve
> > things even more.
>
> Do you happen to have a URL for this?
>
> Actually, the various network tweaks are relatively hard to find
> (at least to find the most up-to-date coppies).  It would be great if
> there was a place where they were all concentrated.

Roberts site is the main repository; it may have READMEs with URLs
pointing to various locations.
ftp://130.238.98.12/pub/Linux/net-development/
and look at the recycling and NAPI sub-directories.

>
> >
> >
> >>Also, I see the hard_start_xmit call failing 5876 times out of 2719493
> >>calls (for example).  The code that calls the method looks like this:
> >>
> >
> >
> > I dont have access to that NIC. But a stoopid question: Have you tried
> > increasing the transmit queue via ifconfig? 1000 packets is reasonable
> > for gige.
>
> I upped it, but it didn't stop the errors.  The NIC is still performing,
> so it may not be a real problem...
>

I dont have this NIC. When Robert shows up he may be able to explain this.

cheers,
jamal

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: packet re-ordering on SMP machines.
  2002-08-26  4:34     ` Ben Greear
  2002-08-26 11:20       ` jamal
@ 2002-08-26 23:03       ` Xiaoliang (David) Wei
  2002-08-26 23:20         ` Ben Greear
  2002-08-27 10:59         ` jamal
  1 sibling, 2 replies; 19+ messages in thread
From: Xiaoliang (David) Wei @ 2002-08-26 23:03 UTC (permalink / raw)
  To: Ben Greear, jamal, Cheng Jin, Cheng Hu, Steven Low; +Cc: netdev


Hi Ben and Jamal,
       Are you guys sure that getdayoftime per packet is a big overhead on
Gbps connection?
       Do you compare the performance with getdayoftime per packet and
without? I guess RFC 1323 specifies that each packet should have a timestamp
(although not from getdayoftime).
       Also, what's your testbed's configuration, Ben? (I guess if we can
use faster hardware to overcome this effect...)
      Thank you:)

     ps: I am working on some high speed TCP experiment and may want to make
getdayoftime every packet...

-David
Xiaoliang (David) Wei             Graduate Student in CS@Caltech
http://www.cs.caltech.edu/~weixl
====================================================
----- Original Message -----
From: "Ben Greear" <greearb@candelatech.com>
To: "jamal" <hadi@cyberus.ca>
Cc: <netdev@oss.sgi.com>
Sent: Sunday, August 25, 2002 9:34 PM
Subject: Re: packet re-ordering on SMP machines.


>
> jamal wrote:
>
> > That doesnt sound impressive at all. I know it's about .8 of wire rate
> > but you should be able to exceed that.
> > Robert was generating in the range of 800Kpps with that NIC if i recall
> > corectly
>
> I had only tested 1514 byte pkts, so I was getting around 880Mbps,
> which is pretty good as far as I know.
>
> I see about 255 kpps when sending 64 byte pkts to myself.  Still
> dropping about 1 in 4000 packets at this speed.  I think most of Robert's
> tests didn't involve actually doing something with the received packet
> though, and I am inspecting it for latency, sequence number, etc.
>
> I'm even doing a __get_timeofday() call to calculate the latency...need
> to find a faster way to do that...
>
> If I only allocate/scan 1 per 100 packets (ie alloc one packet and send it
100 times),
> then I get a more respectable 365kpps.  Robert's patch should definately
help!
>
> > Also if you have SMP, tie each onto a CPU.
>
> That's with the irq_afinity thing in proc, right?
>
> > Additionaly get the skb recycler patch from Robert, it should improve
> > things even more.
>
> Do you happen to have a URL for this?
>
> Actually, the various network tweaks are relatively hard to find
> (at least to find the most up-to-date coppies).  It would be great if
> there was a place where they were all concentrated.
>
> >
> >
> >>Also, I see the hard_start_xmit call failing 5876 times out of 2719493
> >>calls (for example).  The code that calls the method looks like this:
> >>
> >
> >
> > I dont have access to that NIC. But a stoopid question: Have you tried
> > increasing the transmit queue via ifconfig? 1000 packets is reasonable
> > for gige.
>
> I upped it, but it didn't stop the errors.  The NIC is still performing,
> so it may not be a real problem...
>
> Thanks for the info,
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
> President of Candela Technologies Inc      http://www.candelatech.com
> ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear
>
>
>
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: packet re-ordering on SMP machines.
  2002-08-26 23:03       ` Xiaoliang (David) Wei
@ 2002-08-26 23:20         ` Ben Greear
  2002-08-27 10:59         ` jamal
  1 sibling, 0 replies; 19+ messages in thread
From: Ben Greear @ 2002-08-26 23:20 UTC (permalink / raw)
  To: Xiaoliang (David) Wei; +Cc: jamal, Cheng Jin, Cheng Hu, Steven Low, netdev


Xiaoliang (David) Wei wrote:
> Hi Ben and Jamal,
>        Are you guys sure that getdayoftime per packet is a big overhead on
> Gbps connection?
>        Do you compare the performance with getdayoftime per packet and
> without? I guess RFC 1323 specifies that each packet should have a timestamp
> (although not from getdayoftime).
>        Also, what's your testbed's configuration, Ben? (I guess if we can
> use faster hardware to overcome this effect...)
>       Thank you:)
> 
>      ps: I am working on some high speed TCP experiment and may want to make
> getdayoftime every packet...

Actually, now that I think back, I believe the generic ethernet code timestamps
each skb when it's received anyway....  So, my hit probably comes mostly
from allocating new buffers and potentially the gettimeofday that is done then.

I have not benchmarked the kernel gettimeofday call in any sort of
isolated case.

It does not appear that the CPU is what is limiting my particular test, I think
it's either the NIC or the driver, or more likely, the way I'm driving it...

Ben

> 
> -David
> Xiaoliang (David) Wei             Graduate Student in CS@Caltech
> http://www.cs.caltech.edu/~weixl
> ====================================================
> ----- Original Message -----
> From: "Ben Greear" <greearb@candelatech.com>
> To: "jamal" <hadi@cyberus.ca>
> Cc: <netdev@oss.sgi.com>
> Sent: Sunday, August 25, 2002 9:34 PM
> Subject: Re: packet re-ordering on SMP machines.
> 
> 
> 
>>jamal wrote:
>>
>>
>>>That doesnt sound impressive at all. I know it's about .8 of wire rate
>>>but you should be able to exceed that.
>>>Robert was generating in the range of 800Kpps with that NIC if i recall
>>>corectly
>>
>>I had only tested 1514 byte pkts, so I was getting around 880Mbps,
>>which is pretty good as far as I know.
>>
>>I see about 255 kpps when sending 64 byte pkts to myself.  Still
>>dropping about 1 in 4000 packets at this speed.  I think most of Robert's
>>tests didn't involve actually doing something with the received packet
>>though, and I am inspecting it for latency, sequence number, etc.
>>
>>I'm even doing a __get_timeofday() call to calculate the latency...need
>>to find a faster way to do that...
>>
>>If I only allocate/scan 1 per 100 packets (ie alloc one packet and send it
> 
> 100 times),
> 
>>then I get a more respectable 365kpps.  Robert's patch should definately
> 
> help!
> 
>>>Also if you have SMP, tie each onto a CPU.
>>
>>That's with the irq_afinity thing in proc, right?
>>
>>
>>>Additionaly get the skb recycler patch from Robert, it should improve
>>>things even more.
>>
>>Do you happen to have a URL for this?
>>
>>Actually, the various network tweaks are relatively hard to find
>>(at least to find the most up-to-date coppies).  It would be great if
>>there was a place where they were all concentrated.
>>
>>
>>>
>>>>Also, I see the hard_start_xmit call failing 5876 times out of 2719493
>>>>calls (for example).  The code that calls the method looks like this:
>>>>
>>>
>>>
>>>I dont have access to that NIC. But a stoopid question: Have you tried
>>>increasing the transmit queue via ifconfig? 1000 packets is reasonable
>>>for gige.
>>
>>I upped it, but it didn't stop the errors.  The NIC is still performing,
>>so it may not be a real problem...
>>
>>Thanks for the info,
>>Ben
>>
>>--
>>Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
>>President of Candela Technologies Inc      http://www.candelatech.com
>>ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear
>>
>>
>>
>>
>>
> 
> 


-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: packet re-ordering on SMP machines.
  2002-08-26 23:03       ` Xiaoliang (David) Wei
  2002-08-26 23:20         ` Ben Greear
@ 2002-08-27 10:59         ` jamal
  2002-08-27 11:12           ` Andi Kleen
  1 sibling, 1 reply; 19+ messages in thread
From: jamal @ 2002-08-27 10:59 UTC (permalink / raw)
  To: Xiaoliang (David) Wei; +Cc: Ben Greear, Cheng Jin, Cheng Hu, Steven Low, netdev




On Mon, 26 Aug 2002, Xiaoliang (David) Wei wrote:

> Hi Ben and Jamal,
>        Are you guys sure that getdayoftime per packet is a big overhead on
> Gbps connection?

We may be talking about different things;
I am talking about do_gettimeofday -- which is very expensive.
Anyone who has time could look at improving that. It is run per incoming
packet.

>        Do you compare the performance with getdayoftime per packet and
> without?

I think it would be pretty noticeable if you got rid of the
per-incoming-packet calls to do_gettimeofday

> I guess RFC 1323 specifies that each packet should have a timestamp
> (although not from getdayoftime).

In Linux, this is cleverly based on the system clock (jiffies).

cheers,
jamal

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: packet re-ordering on SMP machines.
  2002-08-27 10:59         ` jamal
@ 2002-08-27 11:12           ` Andi Kleen
  2002-08-27 12:05             ` jamal
  0 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2002-08-27 11:12 UTC (permalink / raw)
  To: jamal
  Cc: Xiaoliang (David) Wei, Ben Greear, Cheng Jin, Cheng Hu,
	Steven Low, netdev


On Tue, Aug 27, 2002 at 06:59:33AM -0400, jamal wrote:
> 
> 
> 
> On Mon, 26 Aug 2002, Xiaoliang (David) Wei wrote:
> 
> > Hi Ben and Jamal,
> >        Are you guys sure that getdayoftime per packet is a big overhead on
> > Gbps connection?
> 
> We may be talking about different things;
> I am talking about do_gettimeofday -- which is very expensive.
> Anyone who has time could look at improving that. It is run per incoming
> packet.

That is because of the lock it takes. Locks are always slow.

Older kernels used gettimeoffset which ran without lock, but that was
changed because in some very obscure cases it could cause non monotonous
timestamps when the user turns on timestamp receiving to user space
(kernel protocols do not care)

Possibilities: 

- Ignore the problem and switch back to gettimeoffset again
- Switch to gettimeoffset but add some correction step for the unlikely
case that someone wants the timestamp from user space
(would be my prefered solution) 
- Implement lockless gettimeofday like x86-64 or sparc
(good one too, but likely slower than last) 


-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: packet re-ordering on SMP machines.
  2002-08-27 11:12           ` Andi Kleen
@ 2002-08-27 12:05             ` jamal
  2002-08-27 12:20               ` Andi Kleen
  2002-08-27 19:43               ` Xiaoliang (David) Wei
  0 siblings, 2 replies; 19+ messages in thread
From: jamal @ 2002-08-27 12:05 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Xiaoliang (David) Wei, Ben Greear, Cheng Jin, Cheng Hu,
	Steven Low, netdev





On Tue, 27 Aug 2002, Andi Kleen wrote:

>
> That is because of the lock it takes. Locks are always slow.

xtime_lock?

>
> Older kernels used gettimeoffset which ran without lock, but that was
> changed because in some very obscure cases it could cause non monotonous
> timestamps when the user turns on timestamp receiving to user space
> (kernel protocols do not care)
>
> Possibilities:
>
> - Ignore the problem and switch back to gettimeoffset again

Is it safe to call gettimeoffset without the lock?

> - Switch to gettimeoffset but add some correction step for the unlikely
> case that someone wants the timestamp from user space
> (would be my prefered solution)
> - Implement lockless gettimeofday like x86-64 or sparc
> (good one too, but likely slower than last)


ia64 seems to also have the lock.

cheers,
jamal

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: packet re-ordering on SMP machines.
  2002-08-27 12:05             ` jamal
@ 2002-08-27 12:20               ` Andi Kleen
  2002-08-27 13:06                 ` kuznet
  2002-08-27 17:22                 ` Cheng Jin
  2002-08-27 19:43               ` Xiaoliang (David) Wei
  1 sibling, 2 replies; 19+ messages in thread
From: Andi Kleen @ 2002-08-27 12:20 UTC (permalink / raw)
  To: jamal
  Cc: Andi Kleen, Xiaoliang (David) Wei, Ben Greear, Cheng Jin,
	Cheng Hu, Steven Low, netdev


On Tue, Aug 27, 2002 at 08:05:04AM -0400, jamal wrote:
> 
> 
> 
> On Tue, 27 Aug 2002, Andi Kleen wrote:
> 
> >
> > That is because of the lock it takes. Locks are always slow.
> 
> xtime_lock?

Yes.

It also has some other overhead.

> 
> >
> > Older kernels used gettimeoffset which ran without lock, but that was
> > changed because in some very obscure cases it could cause non monotonous
> > timestamps when the user turns on timestamp receiving to user space
> > (kernel protocols do not care)
> >
> > Possibilities:
> >
> > - Ignore the problem and switch back to gettimeoffset again
> 
> Is it safe to call gettimeoffset without the lock?


Of course. The only problem is that the clock can be non mononotonous 
sometimes and not be in sync with gettimeofday, but at least the kernel 
users of packet timestamps do not care.
The only problem is the socket option, but it is obscure enough that I 
would not worry too much about it.
> 
> > - Switch to gettimeoffset but add some correction step for the unlikely
> > case that someone wants the timestamp from user space
> > (would be my prefered solution)
> > - Implement lockless gettimeofday like x86-64 or sparc
> > (good one too, but likely slower than last)
> 
> 
> ia64 seems to also have the lock.

Quick fix is to just use gettimeoffset in netif_rx again. Should 
be fine for you.

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: packet re-ordering on SMP machines.
  2002-08-27 12:20               ` Andi Kleen
@ 2002-08-27 13:06                 ` kuznet
  2002-08-27 13:13                   ` Andi Kleen
  2002-08-27 17:22                 ` Cheng Jin
  1 sibling, 1 reply; 19+ messages in thread
From: kuznet @ 2002-08-27 13:06 UTC (permalink / raw)
  To: Andi Kleen; +Cc: netdev

Hello!

> Of course. The only problem is that the clock can be non mononotonous 
> sometimes and not be in sync with gettimeofday, but at least the kernel 
> users of packet timestamps do not care.

What kernel users? Where did you find them? :-)

> The only problem is the socket option, but it is obscure enough that I 
> would not worry too much about it.

I am very sorry, but passing timestamp to user level is the only purpose
of timestamping and it _MUST_ be monotonic and synchronous to time of day,
otherwise it is completely useless.

Shortly, this timestmap must be synchronous to timeofday.

> > > - Implement lockless

You talk about this for ages. :-)

Actually, the problem is solved very easily. Deprecate SIOCGSTAMP,
and either count users of SO_TIMESTAMP and enable timestamping only
when it is required, or, alternatively, to move retrirval timestamp to socket
level.

Alexey

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: packet re-ordering on SMP machines.
  2002-08-27 13:06                 ` kuznet
@ 2002-08-27 13:13                   ` Andi Kleen
  2002-08-27 13:24                     ` kuznet
  2002-09-15  8:42                     ` Harald Welte
  0 siblings, 2 replies; 19+ messages in thread
From: Andi Kleen @ 2002-08-27 13:13 UTC (permalink / raw)
  To: kuznet; +Cc: Andi Kleen, netdev


On Tue, Aug 27, 2002 at 05:06:30PM +0400, A.N.Kuznetsov wrote:
> Hello!
> 
> > Of course. The only problem is that the clock can be non mononotonous 
> > sometimes and not be in sync with gettimeofday, but at least the kernel 
> > users of packet timestamps do not care.
> 
> What kernel users? Where did you find them? :-)

Hmm, I thought TCP used it, but it seems to use jiffies directly.

Ok, no kernel users then. Not sure about sunrpc and out of tree stuff
like SCTP.


> > The only problem is the socket option, but it is obscure enough that I 
> > would not worry too much about it.
> 
> I am very sorry, but passing timestamp to user level is the only purpose
> of timestamping and it _MUST_ be monotonic and synchronous to time of day,
> otherwise it is completely useless.

That make monotonous step doesn't need to be in netif_rx. My old proposal
was to move it to socket layer. Then it would be only done when needed.
Unfortunately it could get somewhat inaccurate when the queueing delay
is too long.

> > > > - Implement lockless
> 
> You talk about this for ages. :-)

It is nearly there for x86-64 ;)  (code is in for vsyscalls, just kernel 
do_gettimeofday doesn't use it yet)

> 
> 
> Actually, the problem is solved very easily. Deprecate SIOCGSTAMP,
> and either count users of SO_TIMESTAMP and enable timestamping only
> when it is required, or, alternatively, to move retrirval timestamp to socket
> level.

Moving it later may make it useless for RTT purposes when the queueing 
delays are too long.

But if no kernel users exist then just making it a global refcnt could work
nicely. Then most people would not eat the overhead when count == 0.


-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: packet re-ordering on SMP machines.
  2002-08-27 13:13                   ` Andi Kleen
@ 2002-08-27 13:24                     ` kuznet
  2002-09-15  8:42                     ` Harald Welte
  1 sibling, 0 replies; 19+ messages in thread
From: kuznet @ 2002-08-27 13:24 UTC (permalink / raw)
  To: Andi Kleen; +Cc: ak, netdev

Hello!

> Moving it later may make it useless for RTT purposes when the queueing 
> delays are too long.

Absolutely wrong. RTT is always calculated end-to-end, otherwise it 
some meaningless quantity, be it sctp, rpc or something.

The only place where precesion of timestamp is more or less interesting
is tcpdump. But not enough to make it not monotonic. :-)

Alexey

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: packet re-ordering on SMP machines.
  2002-08-27 13:13                   ` Andi Kleen
  2002-08-27 13:24                     ` kuznet
@ 2002-09-15  8:42                     ` Harald Welte
  2002-09-15 21:55                       ` Alexey Kuznetsov
  1 sibling, 1 reply; 19+ messages in thread
From: Harald Welte @ 2002-09-15  8:42 UTC (permalink / raw)
  To: Andi Kleen; +Cc: kuznet, netdev

[-- Attachment #1: Type: text/plain, Size: 1448 bytes --]

On Tue, Aug 27, 2002 at 03:13:17PM +0200, Andi Kleen wrote:
> 
> On Tue, Aug 27, 2002 at 05:06:30PM +0400, A.N.Kuznetsov wrote:
> > Hello!
> > 
> > > Of course. The only problem is that the clock can be non mononotonous 
> > > sometimes and not be in sync with gettimeofday, but at least the kernel 
> > > users of packet timestamps do not care.
> > 
> > What kernel users? Where did you find them? :-)
> 
> Hmm, I thought TCP used it, but it seems to use jiffies directly.
> 
> Ok, no kernel users then. Not sure about sunrpc and out of tree stuff
> like SCTP.

The iptables ULOG target passes the skb receive timestamp to userspace, where
it is (depending on local ulogd configuration) written in logging/accounting
databases. (ULOG is in the kernel tree).  

The issue is that ULOG is batching multiple packets (or parts of packets) into
one netlink message sent to userspace.  If userspace would make a timestamp,
it would be very inaccurate.

There is at least one more iptables extension (out of the kernel tree) using
it - but I wouldn't consider this as important.

> -Andi

-- 
Live long and prosper
- Harald Welte / laforge@gnumonks.org               http://www.gnumonks.org/
============================================================================
GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M+ 
V-- PS++ PE-- Y++ PGP++ t+ 5-- !X !R tv-- b+++ !DI !D G+ e* h--- r++ y+(*)

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: packet re-ordering on SMP machines.
  2002-09-15  8:42                     ` Harald Welte
@ 2002-09-15 21:55                       ` Alexey Kuznetsov
  0 siblings, 0 replies; 19+ messages in thread
From: Alexey Kuznetsov @ 2002-09-15 21:55 UTC (permalink / raw)
  To: Harald Welte; +Cc: ak, netdev

Hello!

> The iptables ULOG target passes the skb receive timestamp to userspace,

No differences of packet socket.

> one netlink message sent to userspace.  If userspace would make a timestamp,

Nobody proposed to do this in userspace. This would be even not "inaccuracy",
the userspace time of read() is not correlated to real one at all
f.e. if userspace is going to do some dns, times will differ inpredictably.

Alexey

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: packet re-ordering on SMP machines.
  2002-08-27 12:20               ` Andi Kleen
  2002-08-27 13:06                 ` kuznet
@ 2002-08-27 17:22                 ` Cheng Jin
  2002-08-27 17:33                   ` Andi Kleen
  1 sibling, 1 reply; 19+ messages in thread
From: Cheng Jin @ 2002-08-27 17:22 UTC (permalink / raw)
  To: Andi Kleen
  Cc: jamal, Xiaoliang (David) Wei, Ben Greear, Cheng Hu, Steven Low,
	netdev@oss.sgi.com

Hi, Andi,

> Quick fix is to just use gettimeoffset in netif_rx again. Should
> be fine for you.

There doesn't appear to be a function called gettimeoffset in 2.4.18
anymore.  The closest I found was do_fast_gettimeoffset in
"arch/i386/kernel/time.c"  This appears to be the unlocked version that
you are referring to, except I can't tell why the higher 32 bits (edx) of
the timestamp isn't used.  (maybe the asm code takes care of it, but it seems
that the result is stored in edx so)

What you said about a light-weight gettime function makes sense.  For our
purpose of timing RTTs, any gettime function with a resolution higher than
1 ms will probably be enough.  The time doesn't need to be in exactly in sync
with the one obtained from the locking version of the gettime function.

Thanks,

Cheng

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: packet re-ordering on SMP machines.
  2002-08-27 17:22                 ` Cheng Jin
@ 2002-08-27 17:33                   ` Andi Kleen
  0 siblings, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2002-08-27 17:33 UTC (permalink / raw)
  To: Cheng Jin
  Cc: Andi Kleen, jamal, Xiaoliang (David) Wei, Ben Greear, Cheng Hu,
	Steven Low, netdev@oss.sgi.com


On Tue, Aug 27, 2002 at 10:22:13AM -0700, Cheng Jin wrote:
> Hi, Andi,
> 
> > Quick fix is to just use gettimeoffset in netif_rx again. Should
> > be fine for you.
> 
> There doesn't appear to be a function called gettimeoffset in 2.4.18
> anymore.  The closest I found was do_fast_gettimeoffset in
> "arch/i386/kernel/time.c"  This appears to be the unlocked version that

Yes, I mean do_fast_gettimeoffset.
> you are referring to, except I can't tell why the higher 32 bits (edx) of
> the timestamp isn't used.  (maybe the asm code takes care of it, but it seems
> that the result is stored in edx so)

32bit precision are probably enough for this.

> 
> What you said about a light-weight gettime function makes sense.  For our
> purpose of timing RTTs, any gettime function with a resolution higher than
> 1 ms will probably be enough.  The time doesn't need to be in exactly in sync
> with the one obtained from the locking version of the gettime function.

TSC should be fine then.

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: packet re-ordering on SMP machines.
  2002-08-27 12:05             ` jamal
  2002-08-27 12:20               ` Andi Kleen
@ 2002-08-27 19:43               ` Xiaoliang (David) Wei
  1 sibling, 0 replies; 19+ messages in thread
From: Xiaoliang (David) Wei @ 2002-08-27 19:43 UTC (permalink / raw)
  To: jamal, Andi Kleen; +Cc: Ben Greear, Cheng Jin, Cheng Hu, Steven Low, netdev


> >
> > That is because of the lock it takes. Locks are always slow.
>
> xtime_lock?
I guess so, after looked at do_gettimeofday

>
> > Possibilities:
> >
> > - Ignore the problem and switch back to gettimeoffset again
>
> Is it safe to call gettimeoffset without the lock?
What's the possible danger to ignore the lock? Can I read the xtime
directly?

>
> > - Switch to gettimeoffset but add some correction step for the unlikely
> > case that someone wants the timestamp from user space
> > (would be my prefered solution)
> > - Implement lockless gettimeofday like x86-64 or sparc
> > (good one too, but likely slower than last)
>
>
> ia64 seems to also have the lock.
>
> cheers,
> jamal
>
>
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: packet re-ordering on SMP machines.
@ 2002-08-25 15:56 jamal
  0 siblings, 0 replies; 19+ messages in thread
From: jamal @ 2002-08-25 15:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev

NAPI fixes packet reordering problems.
Could people please post network related questions to netdev please?
I think it even says so in the FAQ Richard Gooch maintains.
Believe it or not, quiet a few people are not subscribed to lk

cheers,
jamal

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2002-09-15 21:55 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <Pine.GSO.4.30.0208251149320.29461-100000@shell.cyberus.ca>
2002-08-25 18:32 ` packet re-ordering on SMP machines Ben Greear
2002-08-26  0:52   ` jamal
2002-08-26  4:34     ` Ben Greear
2002-08-26 11:20       ` jamal
2002-08-26 23:03       ` Xiaoliang (David) Wei
2002-08-26 23:20         ` Ben Greear
2002-08-27 10:59         ` jamal
2002-08-27 11:12           ` Andi Kleen
2002-08-27 12:05             ` jamal
2002-08-27 12:20               ` Andi Kleen
2002-08-27 13:06                 ` kuznet
2002-08-27 13:13                   ` Andi Kleen
2002-08-27 13:24                     ` kuznet
2002-09-15  8:42                     ` Harald Welte
2002-09-15 21:55                       ` Alexey Kuznetsov
2002-08-27 17:22                 ` Cheng Jin
2002-08-27 17:33                   ` Andi Kleen
2002-08-27 19:43               ` Xiaoliang (David) Wei
2002-08-25 15:56 jamal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).