bk16 changes to cbq

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* bk16 changes to cbq
@ 2004-07-03 13:36 jamal
  2004-07-03 17:16 ` David S. Miller
  2004-07-05 20:27 ` Alexey Kuznetsov
  0 siblings, 2 replies; 9+ messages in thread
From: jamal @ 2004-07-03 13:36 UTC (permalink / raw)
  To: Alexey, David S. Miller; +Cc: Stephen Hemminger, netdev

I am noticing this in bk16; it may be one of the changes that came 
from Stephen recently..

--- a/net/sched/sch_cbq.c       2004-02-20 18:37:25 -08:00
+++ b/net/sched/sch_cbq.c       2004-06-18 13:51:18 -07:00
@@ -1054,7 +1054,7 @@

        if (sch->q.qlen) {
                sch->stats.overlimits++;
-               if (q->wd_expires && !netif_queue_stopped(sch->dev)) {
+               if (q->wd_expires) {
                        long delay = PSCHED_US2JIFFIE(q->wd_expires);
                        if (delay <= 0)
                                delay = 1;

What i remember is this (4-5 years back) used to cure a bug - cant
remember the details unfortunately, but Alexey may remember.
I am hoping removal of the above line implies that those conditions dont
exist anymore? 

At the expense of repeating a discussion that may have already happened,
what was the logic for removing this line?

cheers,
jamal

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bk16 changes to cbq
  2004-07-03 13:36 bk16 changes to cbq jamal
@ 2004-07-03 17:16 ` David S. Miller
  2004-07-04  0:03   ` jamal
  2004-07-04  1:21   ` jamal
  2004-07-05 20:27 ` Alexey Kuznetsov
  1 sibling, 2 replies; 9+ messages in thread
From: David S. Miller @ 2004-07-03 17:16 UTC (permalink / raw)
  To: hadi; +Cc: kuznet, shemminger, netdev

On 03 Jul 2004 09:36:50 -0400
jamal <hadi@cyberus.ca> wrote:

> -               if (q->wd_expires && !netif_queue_stopped(sch->dev)) {
> +               if (q->wd_expires) {
>                         long delay = PSCHED_US2JIFFIE(q->wd_expires);
>                         if (delay <= 0)
>                                 delay = 1;
> 
> What i remember is this (4-5 years back) used to cure a bug - cant
> remember the details unfortunately, but Alexey may remember.
> I am hoping removal of the above line implies that those conditions dont
> exist anymore? 

The test is racy with drivers, on the very next line the
driver could take a TX completion interrupt and unplug the
queue invalidating the test entirely.

If the test proves wrong, that's OK because we'll try again
at the top level of packet queue dispatch.

There was a good explaination of Stephen's patch
on netdev when he posted it.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bk16 changes to cbq
  2004-07-03 17:16 ` David S. Miller
@ 2004-07-04  0:03   ` jamal
  2004-07-04  1:21   ` jamal
  1 sibling, 0 replies; 9+ messages in thread
From: jamal @ 2004-07-04  0:03 UTC (permalink / raw)
  To: David S. Miller; +Cc: Alexey, shemminger, netdev

On Sat, 2004-07-03 at 13:16, David S. Miller wrote:
> On 03 Jul 2004 09:36:50 -0400

> The test is racy with drivers, on the very next line the
> driver could take a TX completion interrupt and unplug the
> queue invalidating the test entirely.

I could be wrong:
The only thing the tx complete can do is open up the device for
more packets to tx into the device... i.e !netif_queue_stopped(sch->dev)
is true in that specific case. 
Which would mean theres no race.

> If the test proves wrong, that's OK because we'll try again
> at the top level of packet queue dispatch.
> 
> There was a good explaination of Stephen's patch
> on netdev when he posted it.

Let me dig into the emails and get back.

cheers,
jamal

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bk16 changes to cbq
  2004-07-03 17:16 ` David S. Miller
  2004-07-04  0:03   ` jamal
@ 2004-07-04  1:21   ` jamal
  1 sibling, 0 replies; 9+ messages in thread
From: jamal @ 2004-07-04  1:21 UTC (permalink / raw)
  To: David S. Miller; +Cc: Alexey, shemminger, netdev

Ok, I looked at Stephens email; i found it here:
http://marc.theaimsgroup.com/?l=linux-netdev&m=108751387802077&w=2
Basically similar to what you said.
My suggestion is we back out this change - i think there may be a
problem elsewhere, somehow we end up pointing here. It is also possible
there was a hidden bug, but if this stuff was problematic we would have
had complaints in the last few years related.
The other thing is you can leave it in there and wait for people to
complain.

Lets say there was a race condition:
The worst that could happen in such a case (on TX completion) is that
the cbq watchdog is started (which later reschedules the device) and the
top layer reschedules the device as well(because the device was stopped;
stopped devices get descheduled). In such a case there will be two
netif_schedules() with one being unnecessary. No biggie, just a few
extra lines.

Lets say there was no race condition:
The cbq watchdog gets scheduled only when the netdevice is not stopped.
This is optimization is now gone.

To reiterate what i said earlier.
The only thing the tx complete can do is open up the device for
more packets to tx into the device... i.e !netif_queue_stopped(sch->dev)
is true in that specific case. 

Actually while looking at this i noticed some odd things; it may
be time to remind people of the return code rules on
dev->hard_start_xmit(). 

I will make another posting.

cheers,
jamal

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bk16 changes to cbq
  2004-07-03 13:36 bk16 changes to cbq jamal
  2004-07-03 17:16 ` David S. Miller
@ 2004-07-05 20:27 ` Alexey Kuznetsov
  2004-07-06  1:28   ` jamal
  2004-07-07  5:13   ` David S. Miller
  1 sibling, 2 replies; 9+ messages in thread
From: Alexey Kuznetsov @ 2004-07-05 20:27 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, Stephen Hemminger, netdev

Hello!

> What i remember is this (4-5 years back) used to cure a bug - cant
> remember the details unfortunately, but Alexey may remember.

Actually, I do not. It looks like an optimization: if the device is
throttled we just do not need to add timer, qdisc will be woken up
by unthrottle. I do not see any race condition here, but also I do not
see why this check could be important.

> At the expense of repeating a discussion that may have already happened,

To be honest, I wish to be reminded. If this check cured something,
it would be interesting to know what was this.

Alexey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bk16 changes to cbq
  2004-07-05 20:27 ` Alexey Kuznetsov
@ 2004-07-06  1:28   ` jamal
  2004-07-07  5:13   ` David S. Miller
  1 sibling, 0 replies; 9+ messages in thread
From: jamal @ 2004-07-06  1:28 UTC (permalink / raw)
  To: Alexey Kuznetsov; +Cc: David S. Miller, Stephen Hemminger, netdev

On Mon, 2004-07-05 at 16:27, Alexey Kuznetsov wrote:
> Hello!
> 
> > What i remember is this (4-5 years back) used to cure a bug - cant
> > remember the details unfortunately, but Alexey may remember.
> 
> Actually, I do not. It looks like an optimization: if the device is
> throttled we just do not need to add timer, qdisc will be woken up
> by unthrottle. I do not see any race condition here, but also I do not
> see why this check could be important.

Indeed it looks like an optimization; instead of two events (queue
watchdog and probably EOL )or device wdog) trying to reschedule device,
only one does.
 
> > At the expense of repeating a discussion that may have already happened,
> 
> To be honest, I wish to be reminded. If this check cured something,
> it would be interesting to know what was this.

I searched the archives couldnt find anything relevant. 

cheers,
jamal

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bk16 changes to cbq
  2004-07-05 20:27 ` Alexey Kuznetsov
  2004-07-06  1:28   ` jamal
@ 2004-07-07  5:13   ` David S. Miller
  2004-07-07 15:59     ` Alexey Kuznetsov
  1 sibling, 1 reply; 9+ messages in thread
From: David S. Miller @ 2004-07-07  5:13 UTC (permalink / raw)
  To: Alexey Kuznetsov; +Cc: hadi, shemminger, netdev

On Tue, 6 Jul 2004 00:27:27 +0400
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> wrote:

> > What i remember is this (4-5 years back) used to cure a bug - cant
> > remember the details unfortunately, but Alexey may remember.
> 
> Actually, I do not. It looks like an optimization: if the device is
> throttled we just do not need to add timer, qdisc will be woken up
> by unthrottle. I do not see any race condition here, but also I do not
> see why this check could be important.
> 
> > At the expense of repeating a discussion that may have already happened,
> 
> To be honest, I wish to be reminded. If this check cured something,
> it would be interesting to know what was this.

e1000 would hang the delay scheduler sometimes

Let me look for the exact email from Stephen.  Here it is:

http://mailman.ds9a.nl/pipermail/lartc/2004q2/012736.html

Jamal read this, and that is what prompted him to say that if
the delay scheduler was made classful instead of "pretending"
to be, the bug mentioned in Stephen's email would not occur.

I do not understand things fully.  I think it would help if, for example,
each of the qdisc and classification operations had some comments
describing the exact semantics of each ops->method.   For example,
what does requeue do and what is it indicating with each of the
possible return values.  What is expected of it?  What kind of state
is it allowed to leave the queue in?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bk16 changes to cbq
  2004-07-07  5:13   ` David S. Miller
@ 2004-07-07 15:59     ` Alexey Kuznetsov
  2004-07-07 18:11       ` jamal
  0 siblings, 1 reply; 9+ messages in thread
From: Alexey Kuznetsov @ 2004-07-07 15:59 UTC (permalink / raw)
  To: David S. Miller; +Cc: hadi, shemminger, netdev

Hello!

> e1000 would hang the delay scheduler sometimes
> 
> Let me look for the exact email from Stephen.  Here it is:
> 
> http://mailman.ds9a.nl/pipermail/lartc/2004q2/012736.html

Well, that test was really meaningless. So, Stephen's patch is right.

The problem is that Stephen's argument just does not look right
and forces to suspect some worse bug somewhere. If netif_queue_stopped(),
it will be awaken, so no loss of wakeups can happen. If a device
sets stopped flag and forgets to wake up after that, it is really
badly broken and this should be repaired, it inevitably will result
in stalls.

> Jamal read this, and that is what prompted him to say that if
> the delay scheduler was made classful instead of "pretending"
> to be, the bug mentioned in Stephen's email would not occur.

It is not related issue. sch_delay allocates private fifo qdisc,
which is quite pointless (it is not accessible for tuning anyway)
and complicates things a lot. F.e. if it copied tbf structure,
it would not have to use ->requeue.

And if it is planned to be tunable and replacable with something else,
it should be classful.

> I do not understand things fully.  I think it would help if, for example,
> each of the qdisc and classification operations had some comments
> describing the exact semantics of each ops->method.   For example,
> what does requeue do and what is it indicating with each of the
> possible return values.  What is expected of it?  What kind of state
> is it allowed to leave the queue in?

Ideally, ->requeue() should return queue to the state before ->dequeue(),
undoing all the state. When it fails (which is possible only due to
some fault sort of failed memory allocation. In the case of sch_delay,
which uses requeue of fifo it is absolutely impossible), it drops
skb and returns NET_XMIT_DROP or NET_XMIT_CN.

Stephen's statement:

> Also, if requeuing the packet fails, it is probably because the queue became full
> by a racing enqueue.

is not true. Everything between dequeue() and requeue() happens under
dev->queue_lock, so there is no place for races there.
Simple qdiscs never fail. cbq cannot fail too, the place where it returns
NET_XMIT_CN is rather for safety when some user drops lock somewhere
between dequeue and requeue, which is not allowed actually.

>  So the right thing to do is to go back and try again

In theory he is right. sch_delay just have no choice. If it just returns,
when ->requeue() failed by some unknown reason, sch_delay would stall.

But, again, sch_delay uses fifo, which never fails in any case.
So, it does not matter. :-)

Alexey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bk16 changes to cbq
  2004-07-07 15:59     ` Alexey Kuznetsov
@ 2004-07-07 18:11       ` jamal
  0 siblings, 0 replies; 9+ messages in thread
From: jamal @ 2004-07-07 18:11 UTC (permalink / raw)
  To: Alexey Kuznetsov; +Cc: David S. Miller, shemminger, netdev

On Wed, 2004-07-07 at 11:59, Alexey Kuznetsov wrote:
> Hello!
> 
> > e1000 would hang the delay scheduler sometimes
> > 
> > Let me look for the exact email from Stephen.  Here it is:
> > 
> > http://mailman.ds9a.nl/pipermail/lartc/2004q2/012736.html
> 
> Well, that test was really meaningless. So, Stephen's patch is right.

Not totaly Alexey. Its like you said before, the code there was an
optimization.
Heres a scenario especialy with drivers returning 0 on everything
in their txmit() (discussion going elsewhere with Jeff Garzik):

Driver does netif_queue_stop() because of full ring and returns 0.
qdisc gets called and it decides to throttle. In that case if it starts
watchdog (because it doesnt look at device state). 
That would be two possible future reschedules of device.
One because of qdisc throttle timer and another because of netif_stop.
Its not much code difference or cpu cycles, but still an optimization
for that case.
OTOH, this would not happen if the driver returned a 1 along with skb
(the way e1000 does).

cheers,
jamal

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-07-07 18:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-03 13:36 bk16 changes to cbq jamal
2004-07-03 17:16 ` David S. Miller
2004-07-04  0:03   ` jamal
2004-07-04  1:21   ` jamal
2004-07-05 20:27 ` Alexey Kuznetsov
2004-07-06  1:28   ` jamal
2004-07-07  5:13   ` David S. Miller
2004-07-07 15:59     ` Alexey Kuznetsov
2004-07-07 18:11       ` jamal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).