RE: [PATCH] #2 VIA Rhine stalls: TxAbort handling

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* RE: [PATCH] #2 VIA Rhine stalls: TxAbort handling
@ 2002-05-16 10:03 Shing Chuang
  2002-05-16 18:03 ` 'Roger Luethi'
  0 siblings, 1 reply; 10+ messages in thread
From: Shing Chuang @ 2002-05-16 10:03 UTC (permalink / raw)
  To: 'Roger Luethi', Urban Widmark, Ivan G., Jeff Garzik
  Cc: linux-kernel, AJ Jiang

Hi, 

     As following three error conditions occurred,   the VT6102 & VT86C100A
chip are designed to shutdown TX for driver to examine the error frame.

     1. Tx fifo underrun.                   

     2. Tx Abort (Too many collisions occurred).

     3. TxDescriptors status write back  error. (Only on VT6102 chip)

     All the three conditions caused the TXON bit of CR1 went off. the
driver must wait  a little while until the bit go off, reset the pointer of
current Tx descriptor, and  then turn on TXON bit of CR1 again. Those may be
the reasons why the  via-rhine sometimes hangs, or the watchdog timeout. 
     
     The following are codes of our new driver to handle those errors. 
      (The new driver is under testing now, and will be release very sooner)

     1. For Tx fifo underrun.

          Increase_tx_threshold();

          do {} while (BYTE_REG_BITS_IS_ON(CR0_TXON,&pMacRegs->byCR0));
          writel(cpu_to_le32(pTD->pInfo->curr_desc),
		&pMacRegs->CurrTxDescAddr);
           //Re-send the packet
           pTD->tdesc0.f1Owner=OWNED_BY_NIC;
           BYTE_REG_BITS_ON(CR0_TXON,&pMacRegs->byCR0); 
           BYTE_REG_BITS_ON(CR1_TDMD1,&pMacRegs->byCR1);

     2. For Tx Abort (Too many collisions occurred).

          do {}  while (BYTE_REG_BITS_IS_ON(CR0_TXON,&pMacRegs->byCR0));

           //Drop the frame
           pCurrTD=pCurrTD->next;	
           writel(cpu_to_le32(pCurrTD->pInfo->curr_desc),
		&pMacRegs->CurrTxDescAddr);
			
           BYTE_REG_BITS_ON(CR0_TXON,&pMacRegs->byCR0);           	

           BYTE_REG_BITS_ON(CR1_TDMD1,&pMacRegs->byCR1);

     3. For Tx Descripts status write back  error.        
          do {}
          while (BYTE_REG_BITS_IS_ON(CR0_TXON,&pMacRegs->byCR0));

         // As Tx descriptors status write back error occurred, the status
of transmited Tx descriptor is incorrect. 
         //So, all frame must be droped.
          drop_all_transmited_frame();

          BYTE_REG_BITS_ON(CR0_TXON,&pMacRegs->byCR0);           	

          BYTE_REG_BITS_ON(CR1_TDMD1,&pMacRegs->byCR1);

         

Shing,

> -----Original Message-----
> From: Roger Luethi [mailto:rl@hellgate.ch]
> Sent: Thursday, May 16, 2002 11:14 AM
> To: Urban Widmark; Ivan G.; Jeff Garzik
> Cc: linux-kernel@vger.kernel.org
> Subject: [PATCH] #2 VIA Rhine stalls: TxAbort handling
> 
> 
> This patch is against 2.4.19-pre8.
> 
> Patch description (changes over previous patch marked *):
> * Recover gracefully from TxAbort (the actual fix, new version)
> * Be more quiet about aborts, no need to fill log files or 
> scare people
> - Explicitly pick a backoff algorithm (alternative "fix")
> * Rename backoff bits
> - Remove full_duplex, duplex_lock, and advertising from netdev_private
> - Make use of MII register names somewhat more consistent
> - Update comment regarding config information at 0x78
> - Move comment on *_desc_status where it belongs
> * Remove DescEndPacket, DescIntr; unused and hardly correct
> - More comment details
> * Add TXDESC plus comments for desc_length
> * Reverted comment change "Tune configuration???". I am 
> genuinely puzzled
>   by that line. It sets "store & forward" in a way that 
> according to the
>   documentation is bound to be overriden by the threshold values set
>   elsewhere.
> 
> Note that the abort recovery piece is down to one additional 
> line of code
> compared to vanilla 2.4.19-pre8.
> 
> The summary of what happens: when an abort occurs at frame n, 
> the TxRingPtr
> has already been upped to n+2. Frame n will have a status 
> indicating an
> abort, whereas frame n+1 and following are still owned by the NIC. The
> problem is that the NIC forgets about that. When the driver issues a
> TxDemand after an abort, the NIC won't go back to update the 
> status for
> frame n+1. It will happily continue and send all the remaining frames
> starting with n+2. The driver will receive a bunch of 
> interrupts for sent
> frames, but it will never again scavenge another slot because the chip
> skipped one. Until a time out resets the chip, that is.
> 
> With the new patch, we don't break out to retransmit an aborted frame.
> Instead we scavenge all finished slots after the aborted one 
> (not that I
> think there will be any, but it doesn't hurt to be safe). So 
> once we enter
> the error handler, the aborted frame is reaped, and we _know_ 
> what the next
> frame we need to have sent is -- we just failed to scavenge 
> it because it's
> still owned by the NIC. And that's what we hammer into TxRingPtr. Now
> either the NIC was right (hypothetically speaking, it seems 
> to be wrong
> always), then the writel() is a nop. Or the NIC was confused, 
> then it's
> back on track again.
> 
> While TxAbort is the only error I have encountered 
> frequently, it is still
> tempting to think that the same problem hits us with other 
> errors as well,
> TxUnderrun being the most obvious candidate.
> 
> If this patch brings no improvement for the VT86C100A, you may want to
> watch the state of the rx/tx descriptors and the associated pointers.
> 
> The numbers haven't changed, by the way: I am still seeing 
> about 20% higher
> troughput with what is now called BackModify, despite the aborts it
> produces. Abort handling and resends are cheap compared to 
> the benefits of
> flooding the network with traffic, it seems. YMMV, as always.
> 
> Roger
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] #2 VIA Rhine stalls: TxAbort handling
  2002-05-16 10:03 [PATCH] #2 VIA Rhine stalls: TxAbort handling Shing Chuang
@ 2002-05-16 18:03 ` 'Roger Luethi'
  2002-05-16 18:25   ` Richard B. Johnson
  0 siblings, 1 reply; 10+ messages in thread
From: 'Roger Luethi' @ 2002-05-16 18:03 UTC (permalink / raw)
  To: Shing Chuang; +Cc: Urban Widmark, Ivan G., Jeff Garzik, linux-kernel, AJ Jiang

On Thu, 16 May 2002 18:03:25 +0800, Shing Chuang wrote:
>      As following three error conditions occurred,   the VT6102 & VT86C100A
> chip are designed to shutdown TX for driver to examine the error frame.
> 
>      1. Tx fifo underrun.                   
> 
>      2. Tx Abort (Too many collisions occurred).
> 
>      3. TxDescriptors status write back  error. (Only on VT6102 chip)

Hey, thanks! That's exactly the piece of information I've been looking for.

>      All the three conditions caused the TXON bit of CR1 went off. the
> driver must wait  a little while until the bit go off, reset the pointer of
> [...]
>           do {} while (BYTE_REG_BITS_IS_ON(CR0_TXON,&pMacRegs->byCR0));

The driver "waits a little" in the interrupt handler? How long can that
take, worst case? I don't know of many places where the kernel stops to
wait for an external device to change some value.

I have no numbers on the expected number of iterations, but I'd rather drop
out of the handler after a few failed checks and try again later (or just
reset the chip and log an error, if dropping out is rare enough :-)).

> current Tx descriptor, and  then turn on TXON bit of CR1 again. Those may be

ITYM the TXON bit of CR0. TDMD1 is the one you are setting in CR1. Which
takes me to the next question:

According to my docs, internal registers are like this:

VT86C100A
Byte Bit
0x08 (CR0) 5   TDMD
0x08 (CR0) 6   RDMD
0x09 (CR1) 5   Reserved
0x09 (CR1) 6   Reserved

VT6102
Byte Bit
0x08 (CR0) 5   TDMD
0x08 (CR0) 6   RDMD
0x09 (CR1) 5   TDMD1
0x09 (CR1) 6   RDMD1

The descriptions in both data sheets are somewhat unclear, so maybe you
could enlighten me about why you chose to set TDMD1 instead of TDMD (which
is what the LK driver does)?

Roger

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] #2 VIA Rhine stalls: TxAbort handling
  2002-05-16 18:03 ` 'Roger Luethi'
@ 2002-05-16 18:25   ` Richard B. Johnson
  2002-05-16 20:31     ` 'Roger Luethi'
  0 siblings, 1 reply; 10+ messages in thread
From: Richard B. Johnson @ 2002-05-16 18:25 UTC (permalink / raw)
  To: 'Roger Luethi'
  Cc: Shing Chuang, Urban Widmark, Ivan G., Jeff Garzik, linux-kernel,
	AJ Jiang

On Thu, 16 May 2002, 'Roger Luethi' wrote:

> On Thu, 16 May 2002 18:03:25 +0800, Shing Chuang wrote:
> >      As following three error conditions occurred,   the VT6102 & VT86C100A
> > chip are designed to shutdown TX for driver to examine the error frame.
> > 
> >      1. Tx fifo underrun.                   
> > 
> >      2. Tx Abort (Too many collisions occurred).
> > 
> >      3. TxDescriptors status write back  error. (Only on VT6102 chip)
> 
> Hey, thanks! That's exactly the piece of information I've been looking for.
> 
> >      All the three conditions caused the TXON bit of CR1 went off. the
> > driver must wait  a little while until the bit go off, reset the pointer of
> > [...]
> >           do {} while (BYTE_REG_BITS_IS_ON(CR0_TXON,&pMacRegs->byCR0));
> 
> The driver "waits a little" in the interrupt handler? How long can that
> take, worst case?

Forever..........^;)

> I don't know of many places where the kernel stops to
> wait for an external device to change some value.
> 

Yep... should never wait inside an ISR, to say nothing about
the potential wait-forever.

Even if the chip never breaks, you end up with reports like..
"Strange, I make frisbees when buring CDs while M$ machines do
backups over the network..."

Or, I can't play ".wav" files anymore unless I unplug from the
network...

Stuff has to play together. Sometimes this means you can't get
the maximum-theoretical-possible through-put from your connected
devices.

The worse-case driver is where the programmer decided to turn
interrupts back on in the ISR.... to let higher-priority interrupts
occur...  FYI, there are always higher-priority interrupts that
will take the CPU away... you lose big-time.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

                 Windows-2000/Professional isn't.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] #2 VIA Rhine stalls: TxAbort handling
  2002-05-16 18:25   ` Richard B. Johnson
@ 2002-05-16 20:31     ` 'Roger Luethi'
  2002-05-16 16:39       ` Ivan G.
  2002-05-16 21:05       ` [PATCH] #2 VIA Rhine stalls: TxAbort handling Richard B. Johnson
  0 siblings, 2 replies; 10+ messages in thread
From: 'Roger Luethi' @ 2002-05-16 20:31 UTC (permalink / raw)
  To: Richard B. Johnson
  Cc: Shing Chuang, Urban Widmark, Ivan G., Jeff Garzik, linux-kernel,
	AJ Jiang

> > The driver "waits a little" in the interrupt handler? How long can that
> > take, worst case?
> 
> Forever..........^;)

We should assume that this is indeed the case, but it often helps to know
what the expected values and their distribution are.

It's a weird situation anyway: both the buffer descriptor and the interrupt
status have been updated by the chip to reflect the abort, but by the time
we handle the error it may still be busy coming to a halt.

What tickles my curiosity is that my previous patch didn't fix the stalling
for Ivan G. on his VT86C100A. Maybe the chip just wasn't ready to be
restarted.

> Even if the chip never breaks, you end up with reports like..
> "Strange, I make frisbees when buring CDs while M$ machines do
> backups over the network..."

Not if the chip is guaranteed to have its thing done after one or two
iterations. We make some inb and outb calls in the ISR either way.

That was hypothetically speaking of course, I'm not suggesting we rely on
such a "guarantee".

Roger

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] #2 VIA Rhine stalls: TxAbort handling
  2002-05-16 20:31     ` 'Roger Luethi'
@ 2002-05-16 16:39       ` Ivan G.
  2002-05-17  2:54         ` [PATCH] #3 VIA Rhine stalls: Wait for the chip? 'Roger Luethi'
  2002-05-16 21:05       ` [PATCH] #2 VIA Rhine stalls: TxAbort handling Richard B. Johnson
  1 sibling, 1 reply; 10+ messages in thread
From: Ivan G. @ 2002-05-16 16:39 UTC (permalink / raw)
  To: LKML


> What tickles my curiosity is that my previous patch didn't fix the stalling
> for Ivan G. on his VT86C100A. Maybe the chip just wasn't ready to be
> restarted.
>

With your patch #2, the chip would actually "wait forever"
in some cases...it didn't timeout and recover 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] #3 VIA Rhine stalls: Wait for the chip?
  2002-05-16 16:39       ` Ivan G.
@ 2002-05-17  2:54         ` 'Roger Luethi'
  0 siblings, 0 replies; 10+ messages in thread
From: 'Roger Luethi' @ 2002-05-17  2:54 UTC (permalink / raw)
  To: Ivan G.; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2221 bytes --]

On Thu, 16 May 2002 10:37:52 -0600, Ivan G. wrote:
> > What tickles my curiosity is that my previous patch didn't fix the stalling
> > for Ivan G. on his VT86C100A. Maybe the chip just wasn't ready to be
> > restarted.
> 
> With your patch #2, the chip would actually "wait forever"
> in some cases...it didn't timeout and recover 

What does the log file say? (with debug = 7) The most interesting
information (buffer descriptors and various registers) doesn't get logged,
but the log file may contain a clue that goes beyond "wait forever".

Hanging without timeout seems to indicate that all buffers were
successfully sent (if all frames have been sent, there is no transfer
left to time out). But I'm shooting in the dark here, somebody with
access to such a card needs to look into it.

I have a wonderful unified theory to explain everything but I'm afraid I'm
not looking forward to see it make contact with reality. Anyway, here goes:
On my system, it's almost impossible for the Tx engine to be still on by
the time we enter the error handler. In fact, even checking in
via_rhine_tx() gave me only one instance in hundreds of aborts. If the
VT86C100A (or something else about your setup) is slower, we might turn the
engine back on before the chip is ready. Patch #2 has a faster path to the
point where we set TXON which might explain why things got worse for you.

The problem with this idea is that the VIA driver does the error handling
as soon as it finds an abort, while the LK driver frees the skb first and
returns back to the interrupt handler before it enters the error handler to
finally do something about the error, which should give the chip ample time
to stop the Tx engine meanwhile. I attach a quick hack which will complain
if the driver had to wait for the chip. It will make up to four attempts
before it shrugs and proceeds as it used to. If you have time to give the
patch a spin, I'd be interested if you find any iteration counters in your
log file.

The patch is against the new version Jeff sent out earlier today. Please
note that with the current driver tx underruns are likely to cause a time
out. My patch doesn't even make an attempt at addressing errors other than
aborts.

Roger

[-- Attachment #2: via-rhine.c.3.patch --]
[-- Type: text/plain, Size: 4158 bytes --]

--- via-rhine.c.org	Thu May 16 21:51:46 2002
+++ via-rhine.c	Fri May 17 04:25:05 2002
@@ -234,7 +234,8 @@ II. Board-specific settings
 Boards with this chip are functional only in a bus-master PCI slot.
 
 Many operational settings are loaded from the EEPROM to the Config word at
-offset 0x78.  This driver assumes that they are correct.
+offset 0x78. For most of these settings, this driver assumes that they are
+correct.
 If this driver is compiled to use PCI memory space operations the EEPROM
 must be configured to enable memory ops.
 
@@ -388,7 +389,10 @@ enum register_offsets {
 
 /* Bits in ConfigD (select backoff algorithm (Ethernet capture effect)) */
 enum backoff_bits {
-	BackOpt=0x01, BackAMD=0x02, BackDEC=0x04, BackRandom=0x08
+	BackOptional=0x01,
+	BackModify=0x02,
+	BackCaptureEffect=0x04,
+	BackRandom=0x08
 };
 
 #ifdef USE_MEM
@@ -428,24 +432,27 @@ enum mii_status_bits {
 /* The Rx and Tx buffer descriptors. */
 struct rx_desc {
 	s32 rx_status;
-	u32 desc_length;
+	u32 desc_length; /* Chain flag, Buffer/frame length */
 	u32 addr;
 	u32 next_desc;
 };
 struct tx_desc {
 	s32 tx_status;
-	u32 desc_length;
+	u32 desc_length; /* Chain flag, Tx Config, Frame length */
 	u32 addr;
 	u32 next_desc;
 };
 
+/* Initial value for tx_desc.desc_length, Buffer size goes to bits 0-10 */
+#define TXDESC 0x00e08000
+
 enum rx_status_bits {
 	RxOK=0x8000, RxWholePkt=0x0300, RxErr=0x008F
 };
 
-/* Bits in *_desc.status */
+/* Bits in *_desc.*_status */
 enum desc_status_bits {
-	DescOwn=0x80000000, DescEndPacket=0x4000, DescIntr=0x1000,
+	DescOwn=0x80000000
 };
 
 /* Bits in ChipCmd. */
@@ -703,6 +710,9 @@ static int __devinit via_rhine_init_one 
 		writeb(readb(ioaddr + ConfigA) & 0xFE, ioaddr + ConfigA);
 	}
 
+	/* Select backoff algorithm */
+	writeb(readb(ioaddr + ConfigD) & (0xF0 | BackModify), ioaddr + ConfigD);
+
 	dev->irq = pdev->irq;
 
 	np = dev->priv;
@@ -937,7 +947,7 @@ static void alloc_tbufs(struct net_devic
 	for (i = 0; i < TX_RING_SIZE; i++) {
 		np->tx_skbuff[i] = 0;
 		np->tx_ring[i].tx_status = 0;
-		np->tx_ring[i].desc_length = cpu_to_le32(0x00e08000);
+		np->tx_ring[i].desc_length = cpu_to_le32(TXDESC);
 		next += sizeof(struct tx_desc);
 		np->tx_ring[i].next_desc = cpu_to_le32(next);
 		np->tx_buf[i] = &np->tx_bufs[i * PKT_BUF_SZ];
@@ -953,7 +963,7 @@ static void free_tbufs(struct net_device
 
 	for (i = 0; i < TX_RING_SIZE; i++) {
 		np->tx_ring[i].tx_status = 0;
-		np->tx_ring[i].desc_length = cpu_to_le32(0x00e08000);
+		np->tx_ring[i].desc_length = cpu_to_le32(TXDESC);
 		np->tx_ring[i].addr = cpu_to_le32(0xBADF00D0); /* An invalid address. */
 		if (np->tx_skbuff[i]) {
 			if (np->tx_skbuff_dma[i]) {
@@ -978,7 +988,7 @@ static void init_registers(struct net_de
 		writeb(dev->dev_addr[i], ioaddr + StationAddr + i);
 
 	/* Initialize other registers. */
-	writew(0x0006, ioaddr + PCIBusConfig);	/* Store & forward */
+	writew(0x0006, ioaddr + PCIBusConfig);	/* Tune configuration??? */
 	/* Configure initial FIFO thresholds. */
 	writeb(0x20, ioaddr + TxConfig);
 	np->tx_thresh = 0x20;
@@ -1237,7 +1247,7 @@ static int via_rhine_start_tx(struct sk_
 	}
 
 	np->tx_ring[entry].desc_length = 
-		cpu_to_le32(0x00E08000 | (skb->len >= ETH_ZLEN ? skb->len : ETH_ZLEN));
+		cpu_to_le32(TXDESC | (skb->len >= ETH_ZLEN ? skb->len : ETH_ZLEN));
 
 	/* lock eth irq */
 	spin_lock_irq (&np->lock);
@@ -1502,8 +1512,19 @@ static void via_rhine_error(struct net_d
 		clear_tally_counters(ioaddr);
 	}
 	if (intr_status & IntrTxAbort) {
-		/* Stats counted in Tx-done handler, just restart Tx. */
+		int i=0;
+		while ((i!=4) && (readw(dev->base_addr + ChipCmd) & CmdTxOn)) {
+			i++;
+		};
+		if (i) { printk(KERN_ERR "Abort: %d iterations.\n", i); }
+		/* No skipping frames we have no results for! Bad chip! */
+		writel(virt_to_bus(&np->tx_ring[np->dirty_tx % TX_RING_SIZE]),
+			   ioaddr + TxRingPtr);
+		/* Prevent hanging on a full queue */
 		writew(CmdTxDemand | np->chip_cmd, dev->base_addr + ChipCmd);
+		if (debug > 1)
+			printk(KERN_INFO "%s: Abort %4.4x, frame dropped.\n",
+				   dev->name, intr_status);
 	}
 	if (intr_status & IntrTxUnderrun) {
 		if (np->tx_thresh < 0xE0)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] #2 VIA Rhine stalls: TxAbort handling
  2002-05-16 20:31     ` 'Roger Luethi'
  2002-05-16 16:39       ` Ivan G.
@ 2002-05-16 21:05       ` Richard B. Johnson
  2002-05-17  0:16         ` 'Roger Luethi'
  1 sibling, 1 reply; 10+ messages in thread
From: Richard B. Johnson @ 2002-05-16 21:05 UTC (permalink / raw)
  To: 'Roger Luethi'
  Cc: Shing Chuang, Urban Widmark, Ivan G., Jeff Garzik, linux-kernel,
	AJ Jiang

On Thu, 16 May 2002, 'Roger Luethi' wrote:

> > > The driver "waits a little" in the interrupt handler? How long can that
> > > take, worst case?
> > 
> > Forever..........^;)
> 
> We should assume that this is indeed the case, but it often helps to know
> what the expected values and their distribution are.
> 
> It's a weird situation anyway: both the buffer descriptor and the interrupt
> status have been updated by the chip to reflect the abort, but by the time
> we handle the error it may still be busy coming to a halt.
> 
> What tickles my curiosity is that my previous patch didn't fix the stalling
> for Ivan G. on his VT86C100A. Maybe the chip just wasn't ready to be
> restarted.
> 
> > Even if the chip never breaks, you end up with reports like..
> > "Strange, I make frisbees when buring CDs while M$ machines do
> > backups over the network..."
> 
> Not if the chip is guaranteed to have its thing done after one or two
> iterations. We make some inb and outb calls in the ISR either way.
> 
> That was hypothetically speaking of course, I'm not suggesting we rely on
> such a "guarantee".
> 
> Roger

I think one has to <somehow> find that the chip has halted besides
the current way (noticing that it can't transmit anymore). I don't
know how to do this, of course, but; if you could know that the
chip is hung, the first thing to do is to turn off its interrupt
request(s) (the chip, not the interrupt controller). Some older
(National) devices needed to have the chip then set to loopback
mode because they couldn't be programmed properly if data kept
coming in on the wire. The internal buffer pointers kept changing
in response to incoming data while the chip was being programmed.
By the time you got the chip programmed, it was hung by pointer-wrap.

In the chip-halted work-around that everybody seems to use now,
reprogram it from scratch. The last program operation being to remove
loop-back. I don't even know if this chip can be set to loop-back,
though, so the whole idea may be moot.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

                 Windows-2000/Professional isn't.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] #2 VIA Rhine stalls: TxAbort handling
  2002-05-16 21:05       ` [PATCH] #2 VIA Rhine stalls: TxAbort handling Richard B. Johnson
@ 2002-05-17  0:16         ` 'Roger Luethi'
  2002-05-17 12:51           ` Richard B. Johnson
  0 siblings, 1 reply; 10+ messages in thread
From: 'Roger Luethi' @ 2002-05-17  0:16 UTC (permalink / raw)
  To: Richard B. Johnson
  Cc: Shing Chuang, Urban Widmark, Ivan G., Jeff Garzik, linux-kernel,
	AJ Jiang

> I think one has to <somehow> find that the chip has halted besides
> the current way (noticing that it can't transmit anymore). I don't

There seems to be a misunderstanding. We already get an interrupt and a
status to indicate what kind of problem occured. Thanks to Shing's recent
posting we even have confirmed information about what events stop the Tx
engine. _Plus_ there is a bit flag TXON in a chip status register which
indicates whether the Tx engine is active.

So what's left as a (potential) problem? -- The code snippet that Shing
shared with us suggests that there is potential for a race between the chip
and an ISR which is already scavenging Tx buffers: the chip has updated the
buffer descriptors and set the interrupt status to reflect the error, but
is not yet done halting the Tx engine (if it had only failed to update the
TXON status bit, there would be no special handling required, since we are
writing that bit anyway in a next step, so the issue has to be that the
chip is in a transitional state and restarting the Tx engine at this point
would be premature). Of course this description assumes that the VIA coders
made that particular recent change in their driver for a reason.

> In the chip-halted work-around that everybody seems to use now,
> reprogram it from scratch. The last program operation being to remove
> loop-back. I don't even know if this chip can be set to loop-back,
> though, so the whole idea may be moot.

It can be set to loopback, but I'm not keen on having my chip reprogrammed
by every traffic burst (excessive collisions -> abort). Is that really the
fashion of the year now?

Roger

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] #2 VIA Rhine stalls: TxAbort handling
  2002-05-17  0:16         ` 'Roger Luethi'
@ 2002-05-17 12:51           ` Richard B. Johnson
  2002-05-17 16:25             ` 'Roger Luethi'
  0 siblings, 1 reply; 10+ messages in thread
From: Richard B. Johnson @ 2002-05-17 12:51 UTC (permalink / raw)
  To: 'Roger Luethi'
  Cc: Shing Chuang, Urban Widmark, Ivan G., Jeff Garzik, linux-kernel,
	AJ Jiang

On Fri, 17 May 2002, 'Roger Luethi' wrote:

> > I think one has to <somehow> find that the chip has halted besides
> > the current way (noticing that it can't transmit anymore). I don't
> 
> There seems to be a misunderstanding. We already get an interrupt and a
> status to indicate what kind of problem occured. Thanks to Shing's recent
> posting we even have confirmed information about what events stop the Tx
> engine. _Plus_ there is a bit flag TXON in a chip status register which
> indicates whether the Tx engine is active.
>
[SNIPPED...]
> 
> > In the chip-halted work-around that everybody seems to use now,
> > reprogram it from scratch. The last program operation being to remove
> > loop-back. I don't even know if this chip can be set to loop-back,
> > though, so the whole idea may be moot.
> 
> It can be set to loopback, but I'm not keen on having my chip reprogrammed
> by every traffic burst (excessive collisions -> abort). Is that really the
> fashion of the year now?

Well, maybe the fashion of the day. Do `grep karound *.c` in
../linux/drivers/net and see all the 'workarounds' that exist for
chip problems. Some of the problems are induced by the coding and
most are real hardware problems.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).

                 Windows-2000/Professional isn't.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] #2 VIA Rhine stalls: TxAbort handling
  2002-05-17 12:51           ` Richard B. Johnson
@ 2002-05-17 16:25             ` 'Roger Luethi'
  0 siblings, 0 replies; 10+ messages in thread
From: 'Roger Luethi' @ 2002-05-17 16:25 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: linux-kernel

> Well, maybe the fashion of the day. Do `grep karound *.c` in
> ../linux/drivers/net and see all the 'workarounds' that exist for
> chip problems. Some of the problems are induced by the coding and
> most are real hardware problems.

Nobody's debating the need for workarounds. I just prefer to look for a
more subtle method before taking out the sledge-hammer several times a
second(!) to reprogram the chip from scratch.

Roger

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2002-05-17 16:26 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-16 10:03 [PATCH] #2 VIA Rhine stalls: TxAbort handling Shing Chuang
2002-05-16 18:03 ` 'Roger Luethi'
2002-05-16 18:25   ` Richard B. Johnson
2002-05-16 20:31     ` 'Roger Luethi'
2002-05-16 16:39       ` Ivan G.
2002-05-17  2:54         ` [PATCH] #3 VIA Rhine stalls: Wait for the chip? 'Roger Luethi'
2002-05-16 21:05       ` [PATCH] #2 VIA Rhine stalls: TxAbort handling Richard B. Johnson
2002-05-17  0:16         ` 'Roger Luethi'
2002-05-17 12:51           ` Richard B. Johnson
2002-05-17 16:25             ` 'Roger Luethi'

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox