public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Via-Rhine stalls - transmit errors
@ 2002-04-07  6:43 Ivan G.
  2002-04-10 16:51 ` Urban Widmark
  0 siblings, 1 reply; 15+ messages in thread
From: Ivan G. @ 2002-04-07  6:43 UTC (permalink / raw)
  To: Urban Widmark; +Cc: LKML

Regarding, issue #6 (or whatever one that was)
Ownership bits, tx rings and other fun stuff:
Here's a bunch of logs I generated which clearly show a problem
with perhaps missed interrupts? mishandled ownership bits??
I do not know the cause but here's the evidence.

Info: 
Logs are generated using a modified kernel driver.
Major changes in operation include abort handling from linuxfet driver.
However, you'll notice the problem I'm talking about does not occur
after either Abort or Aborted interrupt. In fact, I think I have previously 
detected the same problem with the original driver.

More Info:
These are sections of a dmesg -c >> log of an scp transfer
between laptop and desktop. The desktop stubbornly refused to stall this time
(but it stalls other times!), however, the laptop stalled every once in a 
while so it generated the timeout messages I was looking for. The transfer 
has to be INITIATED from the laptop - didn't stall otherwise (but I'm not 
sure about any hardware tests - I'd prefer to look at logs)

So, here are the logs with commentary:

At the beginning, a normal? log
//--------------------------------------------------------
Tx descriptor slot 0: tx_status: 00000000, addr: 13350000, next_desc: 1352a110
Tx descriptor slot 1: tx_status: 00000000, addr: 13350600, next_desc: 1352a120
Tx descriptor slot 2: tx_status: 00000000, addr: 13350c00, next_desc: 1352a130
Tx descriptor slot 3: tx_status: 00000000, addr: 13351200, next_desc: 1352a140
Tx descriptor slot 4: tx_status: 00000000, addr: 13351800, next_desc: 1352a150
Tx descriptor slot 5: tx_status: 00000000, addr: 13351e00, next_desc: 1352a160
Tx descriptor slot 6: tx_status: 00000000, addr: 13352400, next_desc: 1352a170
Tx descriptor slot 7: tx_status: 00000000, addr: 13352a00, next_desc: 1352a180
Tx descriptor slot 8: tx_status: 00000000, addr: 13353000, next_desc: 1352a190
Tx descriptor slot 9: tx_status: 00000000, addr: 13353600, next_desc: 1352a1a0
Tx descriptor slot 10: tx_status: 00000000, addr: 00000000, next_desc:
1352a1b0
Tx descriptor slot 11: tx_status: 00000000, addr: 00000000, next_desc: 
1352a1c0
Tx descriptor slot 12: tx_status: 00000000, addr: 00000000, next_desc: 
1352a1d0
Tx descriptor slot 13: tx_status: 00000000, addr: 00000000, next_desc: 
1352a1e0
Tx descriptor slot 14: tx_status: 00000000, addr: 00000000, next_desc: 
1352a1f0
Tx descriptor slot 15: tx_status: 00000000, addr: 00000000, next_desc: 
1352a100
//-----------------------------------------------------//
frame number is evidence that my frame-1 fix is working.
this log seems normal, except
1) are the addresses supposed to be initialized ? rx addresses are ...
2) what exactly do addr and next_desc point to? how can i check those 
addresses.

------------------------------------------------------
Anyway, here's the abnormal piece causing the problems:
Look at txstatus - notice one 0002 interrupt (tx done) removes 2 ownership 
bits, after which another interrupt removes 0, transmit stops soon, and the 
queue keeps going on until timeout. In another log, I recorded many 
exit_status interrupts between the ownership lock
and the NETDEV timeout. After the timeout, addr fields are marked bad.
Here's the log:

Descriptor messages PRECEDE the interrupt message.
(Interrupt has occured but you get the message after the ownership logs)

Notice the cur->tx and dirty->tx reported after timeout.

0002 is the transmit interrupt, 0001 is the receive one

Tx descriptor slot 0: tx_status: 00000000, addr: 13350000, next_desc: 1352a110
Tx descriptor slot 1: tx_status: 00000000, addr: 13350600, next_desc: 1352a120
Tx descriptor slot 2: tx_status: 00000000, addr: 13350c00, next_desc: 1352a130
Tx descriptor slot 3: tx_status: 00000000, addr: 13351200, next_desc: 1352a140
Tx descriptor slot 4: tx_status: 80000000, addr: 13351800, next_desc: 1352a150
Tx descriptor slot 5: tx_status: 80000000, addr: 13351e00, next_desc: 1352a160
Tx descriptor slot 6: tx_status: 80000000, addr: 13352400, next_desc: 1352a170
Tx descriptor slot 7: tx_status: 00000000, addr: 13352a00, next_desc: 1352a180
Tx descriptor slot 8: tx_status: 00000000, addr: 13353000, next_desc: 1352a190
Tx descriptor slot 9: tx_status: 00000000, addr: 13353600, next_desc: 1352a1a0
Tx descriptor slot 10: tx_status: 00000000, addr: 13353c00, next_desc: 
1352a1b0
Tx descriptor slot 11: tx_status: 00000000, addr: 13354200, next_desc: 
1352a1c0
Tx descriptor slot 12: tx_status: 00000000, addr: 13354800, next_desc: 
1352a1d0
Tx descriptor slot 13: tx_status: 00000000, addr: 13354e00, next_desc: 
1352a1e0
Tx descriptor slot 14: tx_status: 00000000, addr: 13355400, next_desc: 
1352a1f0
Tx descriptor slot 15: tx_status: 00000000, addr: 13355a00, next_desc: 
1352a100
eth0: Interrupt, status 0002.
eth0: exiting interrupt, status=0x0000.
Tx descriptor slot 0: tx_status: 00000000, addr: 13350000, next_desc: 1352a110
Tx descriptor slot 1: tx_status: 00000000, addr: 13350600, next_desc: 1352a120
Tx descriptor slot 2: tx_status: 00000000, addr: 13350c00, next_desc: 1352a130
Tx descriptor slot 3: tx_status: 00000000, addr: 13351200, next_desc: 1352a140
Tx descriptor slot 4: tx_status: 00000000, addr: 13351800, next_desc: 1352a150
Tx descriptor slot 5: tx_status: 00000000, addr: 13351e00, next_desc: 1352a160
Tx descriptor slot 6: tx_status: 80000000, addr: 13352400, next_desc: 1352a170
Tx descriptor slot 7: tx_status: 00000000, addr: 13352a00, next_desc: 1352a180
Tx descriptor slot 8: tx_status: 00000000, addr: 13353000, next_desc: 1352a190
Tx descriptor slot 9: tx_status: 00000000, addr: 13353600, next_desc: 1352a1a0
Tx descriptor slot 10: tx_status: 00000000, addr: 13353c00, next_desc: 
1352a1b0
Tx descriptor slot 11: tx_status: 00000000, addr: 13354200, next_desc: 
1352a1c0
Tx descriptor slot 12: tx_status: 00000000, addr: 13354800, next_desc: 
1352a1d0
Tx descriptor slot 13: tx_status: 00000000, addr: 13354e00, next_desc: 
1352a1e0
Tx descriptor slot 14: tx_status: 00000000, addr: 13355400, next_desc: 
1352a1f0
Tx descriptor slot 15: tx_status: 00000000, addr: 13355a00, next_desc: 
1352a100
eth0: Interrupt, status 0002.
eth0: exiting interrupt, status=0x0000.
Tx descriptor slot 0: tx_status: 00000000, addr: 13350000, next_desc: 1352a110
Tx descriptor slot 1: tx_status: 00000000, addr: 13350600, next_desc: 1352a120
Tx descriptor slot 2: tx_status: 00000000, addr: 13350c00, next_desc: 1352a130
Tx descriptor slot 3: tx_status: 00000000, addr: 13351200, next_desc: 1352a140
Tx descriptor slot 4: tx_status: 00000000, addr: 13351800, next_desc: 1352a150
Tx descriptor slot 5: tx_status: 00000000, addr: 13351e00, next_desc: 1352a160
Tx descriptor slot 6: tx_status: 80000000, addr: 13352400, next_desc: 1352a170
Tx descriptor slot 7: tx_status: 00000000, addr: 13352a00, next_desc: 1352a180
Tx descriptor slot 8: tx_status: 00000000, addr: 13353000, next_desc: 1352a190
Tx descriptor slot 9: tx_status: 00000000, addr: 13353600, next_desc: 1352a1a0
Tx descriptor slot 10: tx_status: 00000000, addr: 13353c00, next_desc: 
1352a1b0
Tx descriptor slot 11: tx_status: 00000000, addr: 13354200, next_desc: 
1352a1c0
Tx descriptor slot 12: tx_status: 00000000, addr: 13354800, next_desc: 
1352a1d0
Tx descriptor slot 13: tx_status: 00000000, addr: 13354e00, next_desc: 
1352a1e0
Tx descriptor slot 14: tx_status: 00000000, addr: 13355400, next_desc: 
1352a1f0
Tx descriptor slot 15: tx_status: 00000000, addr: 13355a00, next_desc: 
1352a100
eth0: Interrupt, status 0001.
 In via_rhine_rx(), entry 14 status 00468f00.
  via_rhine_rx() status is 00468f00.
eth0: exiting interrupt, status=0x0000.
eth0: Transmit frame #6807 queued in slot 7.
eth0: Transmit frame #6808 queued in slot 8.
eth0: Transmit frame #6809 queued in slot 9.
eth0: Transmit frame #6810 queued in slot 10.
eth0: Transmit frame #6811 queued in slot 11.
eth0: Transmit frame #6812 queued in slot 12.
eth0: Transmit frame #6813 queued in slot 13.
eth0: Transmit frame #6814 queued in slot 14.
eth0: Transmit frame #6815 queued in slot 15.
NETDEV WATCHDOG: eth0: transmit timed out
Tx descriptor slot 0: tx_status: 00000000, addr: 13350000, next_desc: 1352a110
Tx descriptor slot 1: tx_status: 00000000, addr: 13350600, next_desc: 1352a120
Tx descriptor slot 2: tx_status: 00000000, addr: 13350c00, next_desc: 1352a130
Tx descriptor slot 3: tx_status: 00000000, addr: 13351200, next_desc: 1352a140
Tx descriptor slot 4: tx_status: 00000000, addr: 13351800, next_desc: 1352a150
Tx descriptor slot 5: tx_status: 00000000, addr: 13351e00, next_desc: 1352a160
Tx descriptor slot 6: tx_status: 80000000, addr: 13352400, next_desc: 1352a170
Tx descriptor slot 7: tx_status: 80000000, addr: 13352a00, next_desc: 1352a180
Tx descriptor slot 8: tx_status: 80000000, addr: 13353000, next_desc: 1352a190
Tx descriptor slot 9: tx_status: 80000000, addr: 13353600, next_desc: 1352a1a0
Tx descriptor slot 10: tx_status: 80000000, addr: 13353c00, next_desc: 
1352a1b0
Tx descriptor slot 11: tx_status: 80000000, addr: 13354200, next_desc: 
1352a1c0
Tx descriptor slot 12: tx_status: 80000000, addr: 13354800, next_desc: 
1352a1d0
Tx descriptor slot 13: tx_status: 80000000, addr: 13354e00, next_desc: 
1352a1e0
Tx descriptor slot 14: tx_status: 80000000, addr: 13355400, next_desc: 
1352a1f0
Tx descriptor slot 15: tx_status: 80000000, addr: 13355a00, next_desc: 
1352a100
Cur Tx points to slot:   0
Dirty Tx points to slot: 6
eth0: Transmit timed out, status 0000, PHY status 782d, resetting...
wait for reset, chip_id: 2
eth0: reset finished after 5 microseconds.
eth0: Transmit frame #0 queued in slot 0.
Tx descriptor slot 0: tx_status: 00000000, addr: 13350000, next_desc: 1352a110
Tx descriptor slot 1: tx_status: 00000000, addr: badf00d0, next_desc: 1352a120


...and so on...

code used to generate logs: - see CHANGE tags for additions
this is in interrupt function, and I have more in the timeout function

      /*CHANGE*/
        int i;
        struct netdev_private *np=dev->priv;


        ioaddr = dev->base_addr;

        while ((intr_status = readw(ioaddr + IntrStatus))) {
                /* Acknowledge all of the current interrupt sources ASAP. */
                writew(intr_status & 0xffff, ioaddr + IntrStatus);

        /*CHANGE*/
        for (i = 0; i < TX_RING_SIZE; i++) {
                printk (KERN_INFO "Tx descriptor slot %i: tx_status: %8.8x, 
addr: %8.8x, next_desc: %8.8x\n",i,
np->tx_ring[i].tx_status, le32_to_cpu(np->tx_ring[i].addr), 
le32_to_cpu(np->tx_ring[i].next_desc));



^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Via-Rhine stalls - transmit errors
@ 2002-04-05  5:47 Ivan G.
  0 siblings, 0 replies; 15+ messages in thread
From: Ivan G. @ 2002-04-05  5:47 UTC (permalink / raw)
  To: Urban Widmark; +Cc: LKML

> (A week is probably a personal record ... just hasn't been
> time to think about funny rhine problems)

Well, thanks for even bothering with my problems.
Appreciate it.

Inconsistent hardware tests are driving me insane.
I am beginning to think that I am wasting both my and your
time by doing such testing. My card is stalling again and
I have no idea why. I think I should start concentrating more
on software code and what it does, rather than judging by
the way hardware "seems" to work. For some reason
I never know what to expect when I reboot
that computer and try again.

Because of those problems, I think that I should list all
the things that I have added or consider adding to my
version of the driver and, with your help (if you have time
for this stuff), decide which ones to keep and integrate into
a patch, and which ones to abandon. This way, at least I would
know that my conclusions/confusions are based on the cleanest
code possible at the moment. Also, at least some improvement
will come out of this, while fixing my card transmit is an
unsure thing.

So:
Here we go...
Enumerated list of all issues I'm concerned about...
Please approve or disapprove changes if you have time :)

1) Type of chip

Attempting to fix any device clearly requires knowing what
kind it is. I obtained my ethernet card from a friend and was
unsure about the model - I know that it has a davicom 9101f
chip on it. So far, I thought my card was a VT86C100A, because
that's what /proc/pci says. However, I see now the driver
and via-diag identify the card as VT3043 Rhine. What could
cause the inconsistency and how can I check for sure?


Related code:
In the meantime, trying to check that
I ran into the following issue:
init one: chip_id: 2
wait for reset, chip_id: 0
wait for reset, chip_id: 2
(these are my own debug messages)

Cause of the problem: The first time wait_for_reset is called
(in init_one)  np->chip_id is not initialized.
Effect of the problem: Delay code for 3043
and VT86C100A will also affect the Rhine-II
the first time wait_for_reset is ran since chip_id
is always 0. Fix: Unsure of the best way to fix.

Related code:
I am also concerned about this:
In via_rhine_error there is code
related to link change that includes HasDavicomPhy.
HasDavicomPhy is not included in the chip_info
structure of any of the three chips supported.
According to the other drivers I've seen
it should be. Fix: Should I add that flag
to the 3043 and the VT86C100A? if those
are the right cards.


2) Full duplex.
So far I've been ignoring the fact that my transmit,
whenever it works, however it works, is very very slow.
I use two cards capable of full duplex over a crossover
cable. The driver and diagnostic programs of both
cards report full duplex speed. However, my transfer speed,
even if not stalling does not exceed 1.5mb/s.
I assumed transmit errors were responsible,
however: while I was changing the driver
I ran into a situation where my card was actually
transmitting at the speed it is supposed to - 7Mb/s+
I couldn't reproduce the situation. Do you think
this is related to the transmit errors or it is a duplex
negotiation issue?

3) The missing interrupts

I added those. There's 3 or 4 of them.

4) Queue debug message

   printk(KERN_DEBUG "%s: Transmit frame #%d queued in slot %d.\n",
                           dev->name, np->cur_tx-1, entry);

   Changed np->cur_tx to np->cur_tx-1.
   I thought the frame was one off...

5) The abort handling in linuxfet vs. kernel driver
The underflow handling in linuxfet vs. kernel driver

You say that the descriptor status
is only used to log errors in the kernel.
Is this correct handling? Why does the linuxfet
driver set the ownership bit and send
CmdTxOn, CmdTxDemand in cases of abort and underflow?
How would the hardware react to such handling?
What's a good resource to check on those commands?
The datasheets are not very verbose.

I'll do tests on ownership bit
and descriptor status myself.
I would just like to clear those other
issues since they might interfere...

6) Rx Threshold

...defaults to 0x60 in kernel driver
and 0xE0 in another one I've seen -
either linuxfet or Becker. Which is correct?

7) Those "Wicked" messages

Ok. I understand why you could have a case
to handle any weird interrupt combinations
combining error interrupts and others
(such as Abort/Done).
However, how do you explain Becker's driver...

  if ((intr_status & ~(IntrLinkChange | IntrMIIChange | IntrStatsMax |
                  IntrTxAbort|IntrTxAborted | IntrNormalSummary))

it excludes IntrNormalSummary from those messages.
TxAbort/TxDone would be excluded in this case.
Exactly what kind of error should this message trap?

8) Some other information:
cat /proc/net/dev
on desktop (the via card) shows transmit errors.
cat /proc/net/dev
on laptop (the netgear opp. card) shows
large amounts of packets under FRAME.

-----

Ok, this is all for now.
I apologize for the long message
and any confusion I'm creating.
Just trying to help. I would really like to fix
this driver but I don't seem to be very good at this.
I will do additional testing with ownership bits
and try changing tresholds, etc..


^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Via-Rhine stalls - transmit errors
@ 2002-03-28  8:50 Ivan Gurdiev
  2002-04-04 22:10 ` Urban Widmark
  0 siblings, 1 reply; 15+ messages in thread
From: Ivan Gurdiev @ 2002-03-28  8:50 UTC (permalink / raw)
  To: Urban Widmark; +Cc: LKML

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 4677 bytes --]

> (sysrq to get stacktraces and run through ksymoops,
or kernel debugger or
>  some deadlock detection thing, if no one gives you
more specific
>  directions search for ikd)

I don't have much experience with traces.
I'd rather mess with the mainstream driver
and try to merge code (read on..)

> You need to follow the instructions on those pages,
there are some
> additional header files for backwards compatibility.
I don't know if it
> compiles under 2.4 but I think the idea is that it
should.

I don't need backwards compatibility.
I have kernel 2.4.19-pre3.
And it doesn't compile.

> I believe the "something wicked" is for an
error/uncommon event that isn't
> handled and so the known events are filtered out.

     if (intr_status & (IntrPCIErr | IntrLinkChange |
IntrMIIChange| IntrStatsMax | IntrTxAbort | 
IntrTxUnderrun))
    via_rhine_error(dev, intr_status);

Which ones aren't handled?
The only one I see is PCI Error....

> If you get a IntrTxAbort
> by itself then a message isn't printed, but if you
get a IntrTxAbort and
> IntrTxDone you get some output (000a).

This is actually a great example of the
'redundancies' I was talking about.
You get both Abort and Done...
Abort handler sends command CmdTxDemand.
"Wicked" message handler excludes Abort
but not Done, so it will: (1) print message,
(2) send CmdTxDemand (once again)

>Possibly those flags are never set without also
setting
> another flag, like IntrRxErr,

Why? Each interrupt should have its own bit
in the bitmask.

>the driver doesn't want to do anything
> special on a "IntrRxNoBuf" error anyway. 

IntrRxEarly is not utilised in this driver
IntrRxWakeUp is used to call rx function
IntrRxNoBuf is used to call rx function
IntrTxAborted is used to call tx function
and NOT used to call error function(??)
while it being used inside the error function
for exclusion from "Wicked" messages.

___________
Ok.. on to other major issues:

I tried merging some code from the linuxfet
driver, and I managed to solve the stall problem
and get some interesting speed results.

I know what I did, but I am not certain
exactly how it helps the problems. My lack
of knowledge regarding the operation
of the hardware is beginning to be frustrating.
Perhaps you could help locate the problem...

The summary:
Tx aborted and Tx Underrun
are handled differently in linuxfet.

The kernel driver
simply increases stats for aborts
and ignores underruns in via_rhine_tx,
and later sends CmdDemandTx for aborts,
and increases threshold for Underrun
in via_rhine_error.

The linuxfet driver uses the following
code to handle both aborts and underruns
inside the interrupt handler
(tx sequence...separated as a function
in kernel driver)
/*----------------------------------------------*/
np->tx_ring[entry].tx_status = cpu_to_le32(DescOwn);
                   
writel(virt_to_bus(&np->tx_ring[entry]), ioaddr +
TxRingPtr)
;
/* Turn on Tx On*/
writew(CmdTxOn | np->chip_cmd, dev->base_addr +
ChipCmd);   
/* Stats counted in Tx-done handler, just restart Tx.
*/
writew(CmdTxDemand | np->chip_cmd, dev->base_addr +
ChipCmd)
;
/*----------------------------------------------*/
I am particularly curious about the ownership bits
and the TxRingPtr save... no such thing
in kernel via_rhine_tx...I don't think.

Then the linuxfet driver doesn't do 
anything for abort in the error handler
and increases threshold for underrun.
______________________________________
I moved the code above to the kernel driver
and made it handle aborts and underruns
the same way. I disabled the CmdTxDemand
for aborts in the error handler since it's now
done in the tx function.

This way, I fixed stalls on the Desktop.
However, my laptop was still slow and stalling.
(desktop->laptop transmit, laptop initiates.)

I commented the underrun code in via_rhine_tx
and it fixed all stalls, but speed decreased.

I enabled underrun code in via_rhine_tx
and commented underrun code in via_rhine_error-
same results - decreased speed, no stalls.

I wish I could explain all of this,
but I have little knowledge of hardware operation.
Apparently the handling of aborts is the cause
of stalling. Previously I had logged ownership
bits and stalls always occured when
an ownership bit was set to 0, but transmit 
stopped (result of abort, wrong handling?)
The underrun code has effect on speed
and on transmits initiated from my laptop..
but I am too confused to comment about it.
I am sure this is a horribly ignorant question,
but what exactly is an underrun :)

Thank you for all your help.



__________________________________________________
Do You Yahoo!?
Yahoo! Movies - coverage of the 74th Academy Awards®
http://movies.yahoo.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Via-Rhine stalls - transmit errors
@ 2002-03-26  1:52 Ivan Gurdiev
  2002-03-26 21:19 ` Urban Widmark
  0 siblings, 1 reply; 15+ messages in thread
From: Ivan Gurdiev @ 2002-03-26  1:52 UTC (permalink / raw)
  To: Urban Widmark; +Cc: linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 2655 bytes --]


> That patch was an attempt to fix failed transmission
> and it contains all sorts of junk. It worked for  
> Andy, but not for one of the others with that
problem
> (and that motherboard, a Dragon+ with on-board eth
> chip).
>
> You probably shouldn't use that patch at all ...

Ok, I'll follow your advice and start from scratch...
or the kernel driver that is :) Interestingly,
however,
I was able to somehow get consistent stall-free
transfers for a while using your patch and some other
fixes of my own...but then I changed things and was
unable to reproduce the same situation... This is
happening again with the kernel driver. Sometimes my
card works great, sometimes it works ok, and sometimes
it doesn't. I am losing faith in all hardware...
now I don't trust my Netgear card on the other side
either. The thing is, this Via-rhine card works fine
under Win 98 (I have a dual-boot) or under light load
(ping or ping flood). It only breaks under heavy
transmit.


> VIA has drivers for VT86c100A, the VT6102 and the 
> VT6105, available here:
> http://www.viaarena.com/?PageID=71

tried that one...it seemed to fix transmit when
initiated from my desktop computer, but it freezes
everything when I initiate the transmit in the same
direction from the laptop. 

....just like the time I decided to divide by 0
in the kernel :)

> There is also the Donald Becker driver at 
> http://www.scyld.com/

This one won't compile. Lots of errors.
Entire include files are missing.



> There is an explanation of common "something wicked"
> errors on
> http://www.scyld.com/network/ethercard.html...

> So if you trust the explanations of the errors it 

I do, the erorrs are correct. However for some of
those errors you can't even get the "something wicked"
message the way that the code is written.
Other errors are handled elsewhere. The whole
thing is complicated and may cause redundancies 
and problems. Error handling needs improvement.

> There were ideas on changed meaning
> of an interrupt bit (0x0200) and the "fix" for that 
> is probably causing
>this.

Isn't this your patch?
It adds a mii/underflow inerrupt switch scheme.

By the way,

/* Enable interrupts by setting the interrupt mask. */
 writew(IntrRxDone | IntrRxErr | IntrRxEmpty|
IntrRxOverflow| IntrRxDropp
ed|IntrTxDone | IntrTxAbort | IntrTxUnderrun |
IntrPCIErr | IntrStatsMax | IntrLinkChange |
IntrMIIChange, ioaddr + IntrEnable);


Where's IntrRxEarly? IntrRxNoBuf? IntrRxWakeUp?
IntrTxAborted? .... I added those to my version.


__________________________________________________
Do You Yahoo!?
Yahoo! Movies - coverage of the 74th Academy Awards®
http://movies.yahoo.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Via-Rhine stalls - transmit errors
@ 2002-03-22  2:33 Ivan Gurdiev
  0 siblings, 0 replies; 15+ messages in thread
From: Ivan Gurdiev @ 2002-03-22  2:33 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: Andy Carlson, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 4333 bytes --]

Okay...let's see.
I've been playing around with Urban's patch
and now I'll risk making a fool of myself 
sharing some of my 'ideas' - heh.

ISSUE 1: Those "Something Wicked" messages
Present in original and patch too.

> if ((intr_status & ~( IntrLinkChange | IntrStatsMax
|
>  IntrTxAborted ))) {

Richard,
Isn't this bitwise AND with a complement...
Meaning negation?
The way I understood it:
If this is an interrupt that is NOT IntrLinkChange,
IntrStatsMax or IntrTxAborted, we don't know what's
going on but it can't be good so print error message
and reset the chip.....a block designed to trap 
all other problems at the end of the error function.

I still don't see the point behind this. 
The function via_rhine_error is called only once like
this:
 if (intr_status & (IntrPCIErr | IntrLinkChange |
IntrMIIChange | IntrStatsMax | IntrTxAbort | 
IntrTxUnderrun))
     via_rhine_error(dev, intr_status);

so if none of those interrupts are present,
the error function won't even be called.
So why check for anything else?

Inside error function:
     if (intr_status & (mii | IntrLinkChange)) {
takes care of IntrLinkChange and IntrMIIChange

     if (intr_status & IntrStatsMax) {
takes care of IntrStatsMax
    
     if (intr_status & (underflow | IntrTxAbort)) {
takes care of IntrTxUnderflow and IntrTxAbort

     if (intr_status & IntrTxUnderrun) {
takes care of IntrTxUnderrun

only IntrPCIErr is missing....that could have 
trigged this function call...
so why don't just add:
        if (intr_status & IntrPCIErr) {
		-do error message
		-reset chip
and get rid of the "Wicked" checks...
They prints misleading error messages...
Maybe I'm missing something.

   
ISSUE 2: Repetitive negotiation of full duplex
Created by the Urban patch.

Andy,
Um...
I did some printouts and ended up with duplex being
256....probably because of:  duplex = mii_reg & 0x100;
Also: np->mii_if.full_duplex = duplex; refused
to set full_duplex to 256 (?)

I am not sure but I believe, based on other code,
that duplex is supposed to be 0 or 1.

changed to 
duplex = (mii_reg & 0x100)? 1:0;
and it's working fine now - negotiates only once.
full_duplex actually changes...

ISSUE 3:  Well, my card is still stalling.
But I should probably leave this to somebody
who actually has a clue about those things.
The log looks a lot cleaner now, though:

.....
Mar 21 19:04:10 cobra kernel: eth0: Transmitter
underflow?, status 001a.
Mar 21 19:04:10 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshold setting to 40.
Mar 21 19:04:15 cobra kernel: eth0: Transmitter
underflow?, status 0008.
Mar 21 19:04:15 cobra kernel: eth0: Transmitter
underflow?, status 0008.
Mar 21 19:04:19 cobra kernel: NETDEV WATCHDOG: eth0:
transmit timed out
Mar 21 19:04:19 cobra kernel: eth0: Transmit timed
out, status 0000, PHY status 782d, resetting...
Mar 21 19:04:19 cobra kernel: eth0: reset finished
after 5 microseconds.
Mar 21 19:04:24 cobra kernel: eth0: Transmitter
underflow?, status 001a.
Mar 21 19:04:24 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshold setting to 40.
Mar 21 19:04:24 cobra kernel: eth0: Transmitter
underflow?, status 000a.
Mar 21 19:04:29 cobra kernel: NETDEV WATCHDOG: eth0:
transmit timed out
Mar 21 19:04:29 cobra kernel: eth0: Transmit timed
out, status 0000, PHY status 782d, resetting...
Mar 21 19:04:29 cobra kernel: eth0: reset finished
after 5 microseconds.
Mar 21 19:04:36 cobra kernel: eth0: Transmitter
underflow?, status 0008.
Mar 21 19:04:36 cobra last message repeated 2 times
Mar 21 19:04:39 cobra kernel: NETDEV WATCHDOG: eth0:
transmit timed out
Mar 21 19:04:39 cobra kernel: eth0: Transmit timed
out, status 0000, PHY status 782d, resetting...
Mar 21 19:04:39 cobra kernel: eth0: reset finished
after 5 microseconds.
...........

So?
Is any of the above correct?
Or am I really close to frying my ethernet controller?
:)
Either way, changing stuff in the kernel's been fun.
I'll investigate some more.

Also: an off-topic question...
How do I reply to a particular message..
So that my messages appear in thread format
with more than 1 level...
In-Reply To rather than Maybe In Reply-To
This thread is starting to grow and I'd like to know.


__________________________________________________
Do You Yahoo!?
Yahoo! Movies - coverage of the 74th Academy Awards®
http://movies.yahoo.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Via-Rhine stalls - transmit errors
@ 2002-03-21 20:49 Ivan Gurdiev
  2002-03-21 21:56 ` Richard B. Johnson
  0 siblings, 1 reply; 15+ messages in thread
From: Ivan Gurdiev @ 2002-03-21 20:49 UTC (permalink / raw)
  To: Andy Carlson; +Cc: linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 1022 bytes --]

if ((intr_status & ~( IntrLinkChange | IntrStatsMax |
 IntrTxAborted ))) {
   if (debug > 1)
   	printk(KERN_ERR "%s: Something Wicked happened!
%4.4x.\n",dev->name, intr_status);
 /* Recovery for other fault sources not known. */
  writew(CmdTxDemand | np->chip_cmd, dev->base_addr +
ChipCmd);
        }

What's classified as "Something Wicked" ?

Mar 20 21:52:00 cobra kernel: eth0: Something Wicked 
happened! 0008. 

This is tx abort isn't it?

Mar 20 21:51:59 cobra kernel: eth0: Something Wicked 
happened! 001a. 

...and this should be : tx underrun, tx abort, tx done

are those supposed to be logged as "Wicked"?
Those interrupts are handled earlier aren't they? 
        if (intr_status & (underflow | IntrTxAbort))
	...
	if (intr_status & IntrTxUnderrun) {
	...


I'm quite ignorant of all this, but I'm trying to
learn. I apologize if this is a stupid question.




__________________________________________________
Do You Yahoo!?
Yahoo! Movies - coverage of the 74th Academy Awards®
http://movies.yahoo.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Via-Rhine stalls - transmit errors
@ 2002-03-21  5:20 Ivan Gurdiev
  2002-03-24 12:40 ` Urban Widmark
  0 siblings, 1 reply; 15+ messages in thread
From: Ivan Gurdiev @ 2002-03-21  5:20 UTC (permalink / raw)
  To: Andy Carlson; +Cc: linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 3335 bytes --]

> Here is a patch the Urban Widmark originally came up
> with for 2.4.17, 
> and I retrofitted to 2.4.18.  I do not know if it 
> will patch vs 2.4.19-pre3:

It does. However the problem persists.
I changed the default debug value to 7 again.

Here's more information:
Unlike the old driver this one repeatedly logs:
Mar 20 21:47:50 cobra kernel: eth0: Setting
full-duplex based on MII #1 link partner capability of
3100. 
Mar 20 21:48:30 cobra last message repeated 4 times

...when inactive...

The opposite side repeatedly logs:
eth0: lost link beat
eth0: found link beat
eth0: autonegotiation complete: 100BaseT-FD selected

As for the scp transfer.... same problem except 
now I get "Transmitter underflow?" errors
and status of 0008.

Here's a section of an example log:
--------------------------------------------------
Mar 20 21:51:51 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 20 21:51:51 cobra kernel: eth0: Transmitter
underflow?, status 001a.
Mar 20 21:51:51 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol                      
        d setting to 60.
Mar 20 21:51:51 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 20 21:51:52 cobra kernel: eth0: Transmitter
underflow?, status 0008.
Mar 20 21:51:52 cobra kernel: eth0: Something Wicked
happened! 0008.
Mar 20 21:51:56 cobra kernel: NETDEV WATCHDOG: eth0:
transmit timed out
Mar 20 21:51:56 cobra kernel: eth0: Transmit timed
out, status 0000, PHY status                          
     782d, resetting...
Mar 20 21:51:56 cobra kernel: eth0: reset finished
after 5 microseconds.
Mar 20 21:51:56 cobra kernel: eth0: Setting
full-duplex based on MII #1 link par                  
            tner capability of 3100.
Mar 20 21:51:59 cobra kernel: eth0: Transmitter
underflow?, status 001a.
Mar 20 21:51:59 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol                      
        d setting to 40.
Mar 20 21:51:59 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 20 21:51:59 cobra kernel: eth0: Transmitter
underflow?, status 000a.
Mar 20 21:51:59 cobra kernel: eth0: Something Wicked
happened! 000a.
Mar 20 21:51:59 cobra kernel: eth0: Transmitter
underflow?, status 0008.
Mar 20 21:51:59 cobra kernel: eth0: Something Wicked
happened! 0008.
Mar 20 21:51:59 cobra kernel: eth0: Transmitter
underflow?, status 001a.
Mar 20 21:51:59 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol                      
        d setting to 60.
Mar 20 21:51:59 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 20 21:52:00 cobra kernel: eth0: Transmitter
underflow?, status 0008.
Mar 20 21:52:00 cobra kernel: eth0: Something Wicked
happened! 0008.
Mar 20 21:52:00 cobra kernel: eth0: Transmitter
underflow?, status 0008.
Mar 20 21:52:00 cobra kernel: eth0: Something Wicked
happened! 0008.
Mar 20 21:52:00 cobra kernel: eth0: Transmitter
underflow?, status 0008.
Mar 20 21:52:00 cobra kernel: eth0: Something Wicked
happened! 0008.
Mar 20 21:52:00 cobra kernel: eth0: Setting
full-duplex based on MII #1 link par                  
            tner capability of 3100.

------------------------
Let me know how I can help.
Thanks for your assistance.



__________________________________________________
Do You Yahoo!?
Yahoo! Movies - coverage of the 74th Academy Awards®
http://movies.yahoo.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Via-Rhine stalls - transmit errors
@ 2002-03-20  7:27 Ivan Gurdiev
  2002-03-20 15:34 ` Andy Carlson
  0 siblings, 1 reply; 15+ messages in thread
From: Ivan Gurdiev @ 2002-03-20  7:27 UTC (permalink / raw)
  To: linux-kernel

Hello, 

I was unsure about the maintainer of the via-rhine
driver so I'm mailing the bug report to the kernel
list. Please cc to ivangurdiev@yahoo.com.

Problem:
My ethernet card

/proc/pci:
    Ethernet controller: VIA Technologies, Inc.
VT86C100A [Rhine 10/100] (rev 6).
      IRQ 11.

is stalling during a large scp transfer.
The card freezes for a long time, before continuing
the transfer. I receive transmit timeout messages
and "something wicked happened".
Connection is negotiated to 100BaseTx-FD.


System Information:
AMD Athlon XP 1600+, Matsonic MS8127C+ board,
VIA Apollo Kt133A/VT82C686B, 
Ethernet Controller - listed above,
Kernel 2.4.19-pre3, using the via-rhine driver.


System on the opposite end:
Sony Vaio Laptop - Pentium III Coppermine,
ethernet card: Netgear FA410TX Pcmcia
Kernel: 2.4.19-pre3 using pcnet_cs driver

Appended are sections of the log of an scp transfer
using via-rhine debug lvl 7 - /var/log/messages.

----------------

Mar 19 23:08:53 cobra kernel: eth0: Setting
full-duplex based on MII #1 link partner capability of
41e1.

....


Mar 19 23:12:15 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to 40.
Mar 19 23:12:15 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 19 23:12:15 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to 60.
Mar 19 23:12:15 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 19 23:12:16 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to 80.
Mar 19 23:12:16 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 19 23:12:21 cobra kernel: NETDEV WATCHDOG: eth0:
transmit timed out
Mar 19 23:12:21 cobra kernel: eth0: Transmit timed
out, status 0000, PHY status 
782d, resetting...
Mar 19 23:12:58 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to 40.
Mar 19 23:12:58 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 19 23:13:02 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to 60.
Mar 19 23:13:02 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 19 23:13:04 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to 80.
Mar 19 23:13:04 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 19 23:14:09 cobra kernel: NETDEV WATCHDOG: eth0:
transmit timed out
Mar 19 23:14:09 cobra kernel: eth0: Transmit timed
out, status 0000, PHY status 
782d, resetting...

....

Mar 19 23:15:09 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 19 23:15:09 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to a0.
Mar 19 23:15:09 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 19 23:15:12 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to c0.
Mar 19 23:15:12 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 19 23:15:13 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to e0.
Mar 19 23:15:13 cobra kernel: eth0: Something Wicked
happened! 001a.
Mar 19 23:15:13 cobra kernel: s 0002.
Mar 19 23:15:13 cobra kernel: eth0: Transmitter
underrun, increasing Tx threshol
d setting to e0.

_______________________________________
What could be the problem?
Thank you for your help in advance.







__________________________________________________
Do You Yahoo!?
Yahoo! Sports - live college hoops coverage
http://sports.yahoo.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2002-04-10 22:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-07  6:43 Via-Rhine stalls - transmit errors Ivan G.
2002-04-10 16:51 ` Urban Widmark
2002-04-10 22:46   ` Ivan G.
  -- strict thread matches above, loose matches on Subject: below --
2002-04-05  5:47 Ivan G.
2002-03-28  8:50 Ivan Gurdiev
2002-04-04 22:10 ` Urban Widmark
2002-03-26  1:52 Ivan Gurdiev
2002-03-26 21:19 ` Urban Widmark
2002-03-22  2:33 Ivan Gurdiev
2002-03-21 20:49 Ivan Gurdiev
2002-03-21 21:56 ` Richard B. Johnson
2002-03-21  5:20 Ivan Gurdiev
2002-03-24 12:40 ` Urban Widmark
2002-03-20  7:27 Ivan Gurdiev
2002-03-20 15:34 ` Andy Carlson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox