public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] IBM Lanstreamer bugfixes
@ 2002-01-30 19:27 Kent E Yoder
  2002-01-30 19:48 ` Jeff Garzik
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Kent E Yoder @ 2002-01-30 19:27 UTC (permalink / raw)
  To: Alan Cox; +Cc: Jeff Garzik, linux-kernel

  I think the delays in the driver *were* just working around PCI posting 
effects.  I tested by removing all the delays and instead putting 
something like:
        writew(val, addr);
        (void) read(addr);

instead, to flush the PCI cache.  Things seem to be happy. 

Is this the best way to make sure the PCI cache is flushed for writes that 
need to happen immediately?  I don't see many other drivers doing it...

Kent



> >   BTW, I don't know what PCI posting effects are...
> 
> Ok given
> 
>                  writel(foo, dev->reg);
>                  udelay(5);
>                  writel(bar, dev->reg);
> 
> The pci bridge is at liberty to delay the first write until the second or 
a
> read from that device comes along (and wants to do so to merge bursts). 
It
> tends to bite people
> 
>                  -               When they do a write to clear the IRQ 
status and don't do
>                                  a read so they keep handling lots of 
phantom level triggered
>                                  interrupts.
> 
>                  -               When there is a delay (reset is common) 
that has to be observed
> 
>                  -               At the end of a DMA transfer when people 
unmap stuff early
>                                  and the "stop the DMA" command got 
delayed
> 
> Alan




^ permalink raw reply	[flat|nested] 23+ messages in thread
* Re: [PATCH] IBM Lanstreamer bugfixes
@ 2002-02-01 20:53 Kent E Yoder
  2002-02-01 23:38 ` Alan Cox
  2002-02-01 23:40 ` Alan Cox
  0 siblings, 2 replies; 23+ messages in thread
From: Kent E Yoder @ 2002-02-01 20:53 UTC (permalink / raw)
  To: Alan Cox; +Cc: Jeff Garzik, linux-kernel

> > effects.  I tested by removing all the delays and instead putting 
> > something like:
> >         writew(val, addr);
> >         (void) readw(addr);

  Ok, now I'm curious about something...

  If the readw() above is needed here (it definitely fixes *something*), 
what purpose does the volatile below serve? 

io.h:122:#define writew(b,addr) (*(volatile unsigned short *) 
__io_virt(addr) = (b))X

Is this a sort of "go do this now" command to flush it from the CPU to the 
PCI bus, while the readw() makes sure its flushed out of the PCI cache? 

> > 
> > instead, to flush the PCI cache.  Things seem to be happy. 
> > 
> > Is this the best way to make sure the PCI cache is flushed for writes 
that 
> > need to happen immediately?  I don't see many other drivers doing 
it...

 Another question is, if the PCI bus is caching like this, how does it 
handle adapters which write to one address and read from another for the 
same variable?  I'm guessing it flushes all writes on a read?  This is 
exactly what lanstreamer does, and I'm thinking this may have caused 
problems before.

Thanks,
Kent


^ permalink raw reply	[flat|nested] 23+ messages in thread
* Re: [PATCH] IBM Lanstreamer bugfixes
@ 2002-01-21 16:44 Kent E Yoder
  2002-01-21 17:46 ` Mike Phillips
  0 siblings, 1 reply; 23+ messages in thread
From: Kent E Yoder @ 2002-01-21 16:44 UTC (permalink / raw)
  To: Mike Phillips; +Cc: Alan Cox, Jeff Garzik, linux-kernel, Mike Phillips

Mike,
        Did you tweak the card's PCI config area to fix this problem, or 
elsewhere?

Kent



> Kent,
> 
> We had this on olympic for certain high end IBM boxen. Spent forever
> trying to trap it as I couldn't emulate the behaviour on my test
> boxes. We weren't getting correct values from pci reads/write and we
> were running out of buffers as they weren't getting flushed. The
> machine wouldn't lock but the adapter would stop tx/rx.
> 
> Turned out the pci bridge on the machine itself was causing the
> problems. Tweaking the pci bus fixed the problem. 






^ permalink raw reply	[flat|nested] 23+ messages in thread
* Re: [PATCH] IBM Lanstreamer bugfixes
@ 2002-01-18 23:02 Kent E Yoder
  2002-01-18 23:19 ` Alan Cox
  2002-01-19 17:14 ` Mike Phillips
  0 siblings, 2 replies; 23+ messages in thread
From: Kent E Yoder @ 2002-01-18 23:02 UTC (permalink / raw)
  To: Alan Cox; +Cc: Jeff Garzik, linux-kernel

>
>>   For #6, the udelay(1) had more to do with the following write() than 
>> with spin_lock().  When that delay was not there, the write failed 
>> randomly. The same with the udelay(10) at the end of the interrupt 
>> function...
>
>That smells of covering up a race rather than fixing something. Another
>thing you may be doing perhaps is hiding PCI posting effects ?

  Ok, I thought of one thing that might make things clearer here: when I 
say "the write failed", I mean that we saw the write go out on the PCI bus 
and then the box locked up.  We were looking at it on a PCI bus analyzer. 
That, and it wasn't just this write, or just writes in general, it really 
seemed random.

  BTW, I don't know what PCI posting effects are...

Kent



^ permalink raw reply	[flat|nested] 23+ messages in thread
* Re: [PATCH] IBM Lanstreamer bugfixes
@ 2002-01-18 22:39 Kent E Yoder
  0 siblings, 0 replies; 23+ messages in thread
From: Kent E Yoder @ 2002-01-18 22:39 UTC (permalink / raw)
  To: Alan Cox; +Cc: Jeff Garzik, linux-kernel

> >   actually I should be using spin_lock_irqsave() in open() and close()
> > since the lock is taken inside the interrupt function, no?
> 
> Correct - which might explain some of your other delays curing lockups

  Hmmm..  i wish it did.  Unfortunately the only non irq saved calls to 
spin_lock are in the open and close functions, my lock call in xmit is irq 
saved.  These should be changed obviously, but I haven't seen the box lock 
up during open or close...

Kent


^ permalink raw reply	[flat|nested] 23+ messages in thread
* Re: [PATCH] IBM Lanstreamer bugfixes
@ 2002-01-18 22:32 Kent E Yoder
  2002-01-18 22:47 ` Alan Cox
  0 siblings, 1 reply; 23+ messages in thread
From: Kent E Yoder @ 2002-01-18 22:32 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel

  Sorry, this mail was sent acidentally...  but since its out here...

> 5 & 11) Nope, I had not read Doc/networking/netdevices.txt, so I have a
question.  What does being inside rtnl_lock() imply other than the sleep
> issues?
>
>   The calls to cli() and save_flags() were wrong from the beginning.
They were imported by the last maintainer since this driver is a modified
version  > of the olympic token ring driver.  The current spin_lock() and
spin_unlock() calls protect the srb_queued variable.  If we were to set it
to one and then  > get interrupted before we actually write() to the srb,
the interrupt function will try to service whatever junk happens to be in
the srb.  If going into the
> interrupt function is covered by rtnl_lock(), this would be covered, but
I guess its not (?)....

  actually I should be using spin_lock_irqsave() in open() and close()
since the lock is taken inside the interrupt function, no?

Kent


^ permalink raw reply	[flat|nested] 23+ messages in thread
* Re: [PATCH] IBM Lanstreamer bugfixes
@ 2002-01-18 20:52 Kent E Yoder
  2002-01-18 21:35 ` Alan Cox
  0 siblings, 1 reply; 23+ messages in thread
From: Kent E Yoder @ 2002-01-18 20:52 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, marcelo

2) Fixed.

3) Yeah, this was bad...  Fixed.

4, 6, 7, 9 & 10) These fall into the category of hardware workaround.  The 
lanstreamer hardware is essentially broken (with respect to its hardware 
specification) and these were a few of the things which made it work. 

  For #6, the udelay(1) had more to do with the following write() than 
with spin_lock().  When that delay was not there, the write failed 
randomly. The same with the udelay(10) at the end of the interrupt 
function...

  For #7, 9 & 10, zeroing out interrupts in the interrupt function and in 
the xmit function and the delay have no software function at all, but it 
made the hardware happy.  Without these, the card would lock up the box 
under heavy load.

5 & 11) Nope, I had not read Doc/networking/netdevices.txt, so I have a 
question.  What does being inside rtnl_lock() imply other than the sleep 
issues? 

  The calls to cli() and save_flags() were wrong from the beginning.  They 
were imported by the last maintainer since this driver is a modified 
version of the olympic token ring driver.  The current spin_lock() and 
spin_unlock() calls protect the srb_queued variable.  If we were to set it 
to one and then get interrupted before we actually write() to the srb, the 
interrupt function will try to service whatever junk happens to be in the 
srb.  If going into the interrupt function is covered by rtnl_lock(), this 
would be covered, but I guess its not (?)....

8) Yeah, we didn't see any problems with this in test (probably due to 
lanstreamer's fairly low bandwidth), but you are right.  Fixed.

12) Fixed.

13)  Hmm..  Guess I'll need to learn about that... eventually :^).  Fixed, 
for now...



Kent <yoder1@us.ibm.com>




Kent E Yoder wrote:
> This patch fixes several bugs and works around known hardware problems
> which conspired to lock up the system randomly.  Its somewhat large,
> therefore available at:

http://oss.software.ibm.com/developer/opensource/linux/patches/tr/lanstreamer-0.5.1.patch.gz
> 
> * Interrupt function rearranged
> * PCI Configuration changed
> * Tx descriptors had to be reduced to 1 (!)
> * Send/Receive operations are nearly serialized



Marcelo, please do not apply this patch...



Sorry for the delay, review:

1) (in code, not in your patch) prefer kernel-standard types like u32 or
u16 to __u32 and __u16

2) poor formatting:

> +#if STREAMER_IOCTL
> +        dev->do_ioctl = &streamer_ioctl;
> +#else
>                         dev->do_ioctl = NULL;
> +#endif

3) I don't like this playing around with magic numbers.  pci_set_master
and pci_enable_device and pci_disable_device twiddle PCI_COMMAND bits,
too.  Use constants from pci.h make it clear what bits you are clearing,
and what bits you are setting.

> +       pci_write_config_byte (pdev, PCI_CACHE_LINE_SIZE, cls);
> +       pci_read_config_word (pdev, PCI_COMMAND, &pcr);
> +
> +       /* Turn off Fast B2B enable */
> +       pcr &= 0xfdff;
> +       /* Turn on SERR# enable and others */
> +       pcr |= 0x0157;
> +
> +       pci_write_config_word (pdev, PCI_COMMAND, pcr);
> +       pci_read_config_word (pdev, PCI_COMMAND, &pcr);

4) Your code appears to -always- set cache line size to zero.  Is that a
hardware bug?  Look at acenic.c to see a better example of setting PCI
cache line size.

5) what is the purpose of the spin_lock in streamer_open, if open is
serialized?  it worries me that the previous streamer_open code disabled
interrupts and the new one does not, but replaces with a non-irq-saving
lock that appears superfluous.

6) udelay(1) after brand new spin_lock in streamer_interrupt is
suspicious

7) disabling interrupts by zeroing NIC intr mask, in interrupt handler,
is general not needed.  why was this added?  interrupt handlers are not
re-entered so this is not a worry.

8) the while loop in the interrupt looks like it could go on for quite a
while under heavy load, starving out a lot of other kernel code.  it
needs a work limit at the very least...

9) disabling interrupts at the beginning of each TX is wrong.  you
probably want spin_lock_irqsave at critical parts of the xmit.

10) udelay(100) is likely wrong and a sign of a race (perhaps #9, above,
fixes this)

11) replacing save_flags/cli with normal spin_lock in streamer_close is
suspicious and likely wrong.  See issue #5 about streamer_open
serialization.  Have you read Documentation/networking/netdevices.txt ?

12) formatting of streamer_ioctl is grossly different from the rest of
the code

13) SIOCDEVPRIVATE ioctls are going away in 2.5.  (you can implement
include/linux/ethtool.h SIOCETHTOOL interface for lot of the
functionality, though)

-- 
Jeff Garzik      | Alternate titles for LOTR:
Building 1024    | Fast Times at Uruk-Hai
MandrakeSoft     | The Took, the Elf, His Daughter and Her Lover
                 | Samwise Gamgee: International Hobbit of Mystery




^ permalink raw reply	[flat|nested] 23+ messages in thread
* Re: [PATCH] IBM Lanstreamer bugfixes
@ 2002-01-17 15:29 Kent E Yoder
  0 siblings, 0 replies; 23+ messages in thread
From: Kent E Yoder @ 2002-01-17 15:29 UTC (permalink / raw)
  To: jgarzik; +Cc: linux-kernel, marcelo

Thanks for your points, I will take a look at these when I can...

Kent



Kent E Yoder wrote:
> This patch fixes several bugs and works around known hardware problems
> which conspired to lock up the system randomly.  Its somewhat large,
> therefore available at:

http://oss.software.ibm.com/developer/opensource/linux/patches/tr/lanstreamer-0.5.1.patch.gz
> 
> * Interrupt function rearranged
> * PCI Configuration changed
> * Tx descriptors had to be reduced to 1 (!)
> * Send/Receive operations are nearly serialized



Marcelo, please do not apply this patch...



Sorry for the delay, review:

1) (in code, not in your patch) prefer kernel-standard types like u32 or
u16 to __u32 and __u16

2) poor formatting:

> +#if STREAMER_IOCTL
> +        dev->do_ioctl = &streamer_ioctl;
> +#else
>                         dev->do_ioctl = NULL;
> +#endif

3) I don't like this playing around with magic numbers.  pci_set_master
and pci_enable_device and pci_disable_device twiddle PCI_COMMAND bits,
too.  Use constants from pci.h make it clear what bits you are clearing,
and what bits you are setting.

> +       pci_write_config_byte (pdev, PCI_CACHE_LINE_SIZE, cls);
> +       pci_read_config_word (pdev, PCI_COMMAND, &pcr);
> +
> +       /* Turn off Fast B2B enable */
> +       pcr &= 0xfdff;
> +       /* Turn on SERR# enable and others */
> +       pcr |= 0x0157;
> +
> +       pci_write_config_word (pdev, PCI_COMMAND, pcr);
> +       pci_read_config_word (pdev, PCI_COMMAND, &pcr);

4) Your code appears to -always- set cache line size to zero.  Is that a
hardware bug?  Look at acenic.c to see a better example of setting PCI
cache line size.

5) what is the purpose of the spin_lock in streamer_open, if open is
serialized?  it worries me that the previous streamer_open code disabled
interrupts and the new one does not, but replaces with a non-irq-saving
lock that appears superfluous.

6) udelay(1) after brand new spin_lock in streamer_interrupt is
suspicious

7) disabling interrupts by zeroing NIC intr mask, in interrupt handler,
is general not needed.  why was this added?  interrupt handlers are not
re-entered so this is not a worry.

8) the while loop in the interrupt looks like it could go on for quite a
while under heavy load, starving out a lot of other kernel code.  it
needs a work limit at the very least...

9) disabling interrupts at the beginning of each TX is wrong.  you
probably want spin_lock_irqsave at critical parts of the xmit.

10) udelay(100) is likely wrong and a sign of a race (perhaps #9, above,
fixes this)

11) replacing save_flags/cli with normal spin_lock in streamer_close is
suspicious and likely wrong.  See issue #5 about streamer_open
serialization.  Have you read Documentation/networking/netdevices.txt ?

12) formatting of streamer_ioctl is grossly different from the rest of
the code

13) SIOCDEVPRIVATE ioctls are going away in 2.5.  (you can implement
include/linux/ethtool.h SIOCETHTOOL interface for lot of the
functionality, though)

-- 
Jeff Garzik      | Alternate titles for LOTR:
Building 1024    | Fast Times at Uruk-Hai
MandrakeSoft     | The Took, the Elf, His Daughter and Her Lover
                 | Samwise Gamgee: International Hobbit of Mystery




^ permalink raw reply	[flat|nested] 23+ messages in thread
* [PATCH] IBM Lanstreamer bugfixes
@ 2002-01-16 19:42 Kent E Yoder
  2002-01-17 10:57 ` Jeff Garzik
  0 siblings, 1 reply; 23+ messages in thread
From: Kent E Yoder @ 2002-01-16 19:42 UTC (permalink / raw)
  To: marcelo; +Cc: linux-kernel, jgarzik

This patch fixes several bugs and works around known hardware problems 
which conspired to lock up the system randomly.  Its somewhat large, 
therefore available at:

http://oss.software.ibm.com/developer/opensource/linux/patches/tr/lanstreamer-0.5.1.patch.gz

* Interrupt function rearranged
* PCI Configuration changed
* Tx descriptors had to be reduced to 1 (!)
* Send/Receive operations are nearly serialized

Please apply.

Thanks,
Kent <yoder1@us.ibm.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2002-02-01 23:28 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-01-30 19:27 [PATCH] IBM Lanstreamer bugfixes Kent E Yoder
2002-01-30 19:48 ` Jeff Garzik
2002-01-30 19:58 ` David Weinehall
2002-01-30 20:43   ` Alan Cox
2002-01-30 20:04 ` Mike Phillips
2002-01-30 20:47 ` Alan Cox
  -- strict thread matches above, loose matches on Subject: below --
2002-02-01 20:53 Kent E Yoder
2002-02-01 23:38 ` Alan Cox
2002-02-01 23:40 ` Alan Cox
2002-01-21 16:44 Kent E Yoder
2002-01-21 17:46 ` Mike Phillips
2002-01-18 23:02 Kent E Yoder
2002-01-18 23:19 ` Alan Cox
2002-01-18 23:52   ` Gérard Roudier
2002-01-19 17:14 ` Mike Phillips
2002-01-18 22:39 Kent E Yoder
2002-01-18 22:32 Kent E Yoder
2002-01-18 22:47 ` Alan Cox
2002-01-18 20:52 Kent E Yoder
2002-01-18 21:35 ` Alan Cox
2002-01-17 15:29 Kent E Yoder
2002-01-16 19:42 Kent E Yoder
2002-01-17 10:57 ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox