NAPI eepro100 bug fixed

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* NAPI eepro100 bug fixed
       [not found]   ` <15624.63280.379421.369909@robur.slu.se>
@ 2002-06-18  3:25     ` Zhang Fuxin
  2002-06-18  4:43       ` kuznet
  2002-06-18 17:07       ` Robert Olsson
  0 siblings, 2 replies; 4+ messages in thread
From: Zhang Fuxin @ 2002-06-18  3:25 UTC (permalink / raw)
  To: Robert Olsson; +Cc: Paul, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2303 bytes --]

hi,all
      My first NAPI eepro100 contains a subtle but fatal race,which will 
lead
to lockup(of the whole machine here,but of ether interface for paul). This
version should be ok, Paul, would you like to have a try? I've tested it 
in my
pcs,it seems very stable now. Even 50kpps traffic won't cause any problem
here.
      The bug is explained in the comment,i think NAPI driver writer 
probably
will meet it,so it is listed here.
     /* disable interrupts here is necessary!
         * We need to ensure Rx/RxNobuf ints are disabled if in poll
         * flag is set. If interrupt comes bwteen netif_rx_complete
         * and enable_rx_and_rxnobuf_ints, the following will happen:
         *         netif_rx_complete --> clear RX_SCHED flag
         *           -> ints(e.g. TxDone)
         *                  speedo_interrupt
         *                       if (netif_rx_schedule_prep(dev))
         *                          disable_rx_and_rxnobuf_ints
         *                  return
         *           <-
         *         enable_rx_and_rxnobuf_ints
         *  then we will have Rx&RxNoBuf ints enable while in polling!
         *  it may lead to endless interrupts and effective lockup of
         *  the whole machine.
         */
        spin_lock_irqsave(&sp->lock,flags);
        netif_rx_complete(dev);
        enable_rx_and_rxnobuf_ints(dev);
        spin_unlock_irqrestore(&sp->lock,flags);
 
  Sorry for my delay,it is all the world cup's fault:)

Robert Olsson wrote:

>Paul writes:
> > Man well I tried 2.4.17 kernel
> > eepro100.c driver patched with NAPI and as soon as 
> > I route traffic to it it destroys eth0 and eth1 which are the two
> > interfaces that take the traffic.. they just die.. nothing in logs
> > nothing in dmesg, no errors, just all of a sudden no traffic
> > can go in or out those interfaces... sigh.. 
> > I then took the driver from kernel 2.5.21 and put it in 2.4.17 and compiled
> > after patching with NAPI and had the same problem..
> > 
> > You have any idea what the deal is?
> > 
> > It just dies instantly..
>
> Honestly no, just uploaded the eeproo napi patch from Zhang Fuxin he might 
> have some ideas. 
>
> I'm struggling with napi variant of the D-LINK sundance driver for 4-port 
> board myself.
>
> Cheers.
>
>						--ro
>
>



[-- Attachment #2: eepro100-napi.tar.gz --]
[-- Type: application/gzip, Size: 7900 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NAPI eepro100 bug fixed
  2002-06-18  3:25     ` NAPI eepro100 bug fixed Zhang Fuxin
@ 2002-06-18  4:43       ` kuznet
  2002-06-18 17:07       ` Robert Olsson
  1 sibling, 0 replies; 4+ messages in thread
From: kuznet @ 2002-06-18  4:43 UTC (permalink / raw)
  To: Zhang Fuxin; +Cc: linux-kernel

Hello!

>      /* disable interrupts here is necessary!
>          * We need to ensure Rx/RxNobuf ints are disabled if in poll
>          * flag is set. If interrupt comes bwteen netif_rx_complete
>          * and enable_rx_and_rxnobuf_ints, the following will happen:
>          *         netif_rx_complete --> clear RX_SCHED flag
>          *           -> ints(e.g. TxDone)
>          *                  speedo_interrupt
>          *                       if (netif_rx_schedule_prep(dev))
>          *                          disable_rx_and_rxnobuf_ints
>          *                  return
>          *           <-
>          *         enable_rx_and_rxnobuf_ints
>          *  then we will have Rx&RxNoBuf ints enable while in polling!
>          *  it may lead to endless interrupts and effective lockup of
>          *  the whole machine.
>          */
>         spin_lock_irqsave(&sp->lock,flags);
>         netif_rx_complete(dev);
>         enable_rx_and_rxnobuf_ints(dev);
>         spin_unlock_irqrestore(&sp->lock,flags);

You mixed two different driver models, that's reason of lockup.

You must ACK irq in interrupt handler in some way.
Tulip really does trick with deferring ack to poll routine,
but it pays for this _masking_ irq each interrupt instead, which also
drops irq line. See?

Alexey

^ permalink raw reply	[flat|nested] 4+ messages in thread

* NAPI eepro100 bug fixed
  2002-06-18  3:25     ` NAPI eepro100 bug fixed Zhang Fuxin
  2002-06-18  4:43       ` kuznet
@ 2002-06-18 17:07       ` Robert Olsson
  1 sibling, 0 replies; 4+ messages in thread
From: Robert Olsson @ 2002-06-18 17:07 UTC (permalink / raw)
  To: Zhang Fuxin; +Cc: Robert Olsson, Paul, linux-kernel


Zhang Fuxin writes:

 >       My first NAPI eepro100 contains a subtle but fatal race,which will 
 > lead
 > to lockup(of the whole machine here,but of ether interface for paul). This
 > version should be ok, Paul, would you like to have a try? I've tested it 
 > in my


 > will meet it,so it is listed here.
 >      /* disable interrupts here is necessary!
 >          * We need to ensure Rx/RxNobuf ints are disabled if in poll
 >          * flag is set. If interrupt comes bwteen netif_rx_complete
 >          * and enable_rx_and_rxnobuf_ints, the following will happen:
 >          *         netif_rx_complete --> clear RX_SCHED flag
 >          *           -> ints(e.g. TxDone)
 >          *                  speedo_interrupt
 >          *                       if (netif_rx_schedule_prep(dev))
 >          *                          disable_rx_and_rxnobuf_ints
 >          *                  return
 >          *           <-
 >          *         enable_rx_and_rxnobuf_ints
 >          *  then we will have Rx&RxNoBuf ints enable while in polling!
 >          *  it may lead to endless interrupts and effective lockup of
 >          *  the whole machine.
 >          */
 >         spin_lock_irqsave(&sp->lock,flags);
 >         netif_rx_complete(dev);
 >         enable_rx_and_rxnobuf_ints(dev);
 >         spin_unlock_irqrestore(&sp->lock,flags);

 Thanks!

 Yes as far as I see this correct... and this race and others is mentioned
 in NAPI_HOWTO.txt and yes the spinlock can help for the drivers that uses
 this type interrupt acking. And tulip is a candidate for this as well. Let 
 see if it solves Paul's problem to start with.

 Cheers.
						--ro


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NAPI eepro100 bug fixed
       [not found] <3D0EF872.7020007@ict.ac.cn>
@ 2002-06-18 17:54 ` kuznet
  0 siblings, 0 replies; 4+ messages in thread
From: kuznet @ 2002-06-18 17:54 UTC (permalink / raw)
  To: Zhang Fuxin; +Cc: linux-kernel

Hello!

>     By disabling irq in speedo_poll,we can be sure this won't happen. 
...
>  could you find a flaw?

This nicely will happen when irq arrived on another cpu.

Alexey

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2002-06-18 17:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <15624.57000.928651.530593@robur.slu.se>
     [not found] ` <JBEKKKICLLIJKLIGCCKLCEAJCCAA.xerox@foonet.net>
     [not found]   ` <15624.63280.379421.369909@robur.slu.se>
2002-06-18  3:25     ` NAPI eepro100 bug fixed Zhang Fuxin
2002-06-18  4:43       ` kuznet
2002-06-18 17:07       ` Robert Olsson
     [not found] <3D0EF872.7020007@ict.ac.cn>
2002-06-18 17:54 ` kuznet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox