* RE: Kernel oops while routing
@ 2001-12-05 18:01 Jean-Denis Boyer
2001-12-05 18:21 ` Peter Desnoyers
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Jean-Denis Boyer @ 2001-12-05 18:01 UTC (permalink / raw)
To: 'Ricardo Scop'; +Cc: linuxppc-embedded, Dan Malek, andy_lowe
> - in fcc_enet_start_xmit, after setting up another bd and
> incrementing bdp, the next bd's tx-ready bit is tested in order
> to stop the xmit queue if it is set, ok? But, sometimes, the CPM
> may already have cleared this bit _and_ the corresponding
> interrupt has not been serviced yet (because we're in a
> spin_lock_irq); so, netif_stop_queue is not called in this case,
> nor is tx_full set;
>
> - next, the interrupt is serviced, but then curr_tx equals
> dirty_tx _and_ tx_full is not set, so no sk_buffers are freed!
Yes! I totally agree with you, checking the ready bit in the buffer
descriptor is not guaranteed, even if the interrupts are masked, since the
CPM doesn't suspend its processing.
I have done many tests between two of our custom boards, that use an 8260
and a single FCC. I could effectively see a memory leak.
IMHO, I could suggest an easier patch, that would result in modifying only
one line of code, without changing the 'tx_full' logic. In function
fcc_enet_start_xmit, instead of checking the ready bit (which is bad), we
could only check if cur_tx has reached dirty_tx, and then call
netif_stop_queue. Does it make sense?
BTW, I worked hard last week in debugging the fcc_enet driver. It was not
handling correctly some transmission errors, resulting in the transmitter
completely stopping, without restarting. This is related to an errata
(CPM37) from Motorola about the 8260, concerning the way of restarting the
transmitter. If someone is interested, I can release a patch for that.
--------------------------------------------
Jean-Denis Boyer, B.Eng., Technical Leader
Mediatrix Telecom Inc.
4229 Garlock Street
Sherbrooke (Québec)
J1L 2C8 CANADA
(819)829-8749 x241
--------------------------------------------
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Kernel oops while routing 2001-12-05 18:01 Kernel oops while routing Jean-Denis Boyer @ 2001-12-05 18:21 ` Peter Desnoyers 2001-12-05 18:33 ` Dan Malek 2001-12-05 22:41 ` Re[2]: " Ricardo Scop 2001-12-07 15:56 ` Arto Vuori 2 siblings, 1 reply; 6+ messages in thread From: Peter Desnoyers @ 2001-12-05 18:21 UTC (permalink / raw) To: Jean-Denis Boyer Cc: 'Ricardo Scop', linuxppc-embedded, Dan Malek, andy_lowe Could this be related to a problem we've been seeing on the 860T, resulting in the following messages? NETDEV WATCHDOG: eth0: transmit timed out eth0: transmit timed out. Ring data dump: cur_tx c02f7130, dirty_tx c02f7130 cur_rx: c02f70d8 tx: 16 buffers c02f7100: 1c00 004e 02a69442 c02f7108: 1c00 004e 02a69542 c02f7110: 1c00 004e 02a69642 c02f7118: 1c00 004e 02a69742 c02f7120: 1c00 004e 02a69842 c02f7128: 1c00 004e 02a69942 c02f7130: 1c00 0049 02a2e05e c02f7138: 1c00 0042 02a69c4e c02f7140: 1c04 0042 02aa2b3e c02f7148: 1c00 0036 02c8f6b2 c02f7150: 1c00 0042 02aaaf2e c02f7158: 1c00 0042 02aa213e c02f7160: 1c00 0042 02aa223e c02f7168: 1c00 0042 02aa233e c02f7170: 1c00 0042 02a6914e c02f7178: 3c00 004e 02a69342 rx: 32 buffers c02f7000: 8880 0040 002f6000 [...] This only seems to occur when receiving bulk data on a TCP connection - we never saw it ftp-ing onto a system when we used an NFS filesystem, but now that we're booting a ramdisk, an FTP get of more than 10-20k results in a guaranteed transmitter timeout. -- ..................................................................... Peter Desnoyers (781) 457-1165 pdesnoyers@chinook.com Chinook Communications (617) 661-1979 pjd@fred.cambridge.ma.us 100 Hayden Ave, Lexington MA 02421 ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel oops while routing 2001-12-05 18:21 ` Peter Desnoyers @ 2001-12-05 18:33 ` Dan Malek 0 siblings, 0 replies; 6+ messages in thread From: Dan Malek @ 2001-12-05 18:33 UTC (permalink / raw) To: Jean-Denis Boyer, 'Ricardo Scop', linuxppc-embedded Peter Desnoyers wrote: > Could this be related to a problem we've been seeing on the 860T, > resulting in the following messages? Not likely. In my experience this is the result of a hardware problem, either noise on the lines between the 860 and PHY, or an incorectly configured set of I/O pins. Thanks. -- Dan ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re[2]: Kernel oops while routing 2001-12-05 18:01 Kernel oops while routing Jean-Denis Boyer 2001-12-05 18:21 ` Peter Desnoyers @ 2001-12-05 22:41 ` Ricardo Scop 2001-12-07 15:56 ` Arto Vuori 2 siblings, 0 replies; 6+ messages in thread From: Ricardo Scop @ 2001-12-05 22:41 UTC (permalink / raw) To: Jean-Denis Boyer; +Cc: linuxppc-embedded, Dan Malek, andy_lowe Jean-Denis, Wednesday, December 05, 2001, 3:01:28 PM, you wrote: JDB> IMHO, I could suggest an easier patch, that would result in modifying only JDB> one line of code, without changing the 'tx_full' logic. In function JDB> fcc_enet_start_xmit, instead of checking the ready bit (which is bad), we JDB> could only check if cur_tx has reached dirty_tx, and then call JDB> netif_stop_queue. Does it make sense? Make sense to me. I'll try it out. JDB> BTW, I worked hard last week in debugging the fcc_enet driver. It was not JDB> handling correctly some transmission errors, resulting in the transmitter JDB> completely stopping, without restarting. This is related to an errata JDB> (CPM37) from Motorola about the 8260, concerning the way of restarting the JDB> transmitter. If someone is interested, I can release a patch for that. I'm interested! Ricardo Scop mailto:scop@vanet.com.br R SCOP Consulting ------------------------------------------------------------------ "What's money? A man is a success if he gets up in the morning and goes to bed at night and in between does what he wants to do." ~Bob Dylan ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel oops while routing 2001-12-05 18:01 Kernel oops while routing Jean-Denis Boyer 2001-12-05 18:21 ` Peter Desnoyers 2001-12-05 22:41 ` Re[2]: " Ricardo Scop @ 2001-12-07 15:56 ` Arto Vuori 2 siblings, 0 replies; 6+ messages in thread From: Arto Vuori @ 2001-12-07 15:56 UTC (permalink / raw) To: Jean-Denis Boyer Cc: 'Ricardo Scop', linuxppc-embedded, Dan Malek, andy_lowe Jean-Denis Boyer wrote: > IMHO, I could suggest an easier patch, that would result in modifying only > one line of code, without changing the 'tx_full' logic. In function > fcc_enet_start_xmit, instead of checking the ready bit (which is bad), we > could only check if cur_tx has reached dirty_tx, and then call > netif_stop_queue. Does it make sense? I have fixed our version of the driver with very similar modifications. Since that it has not leaked any memory while it has been under testing for several months. > > BTW, I worked hard last week in debugging the fcc_enet driver. It was not > handling correctly some transmission errors, resulting in the transmitter > completely stopping, without restarting. This is related to an errata > (CPM37) from Motorola about the 8260, concerning the way of restarting the > transmitter. If someone is interested, I can release a patch for that. > I have noticed the same problem on old kernel versions. I think i mentioned about it on the mailing list and i understood that it had been already fixed on later versions, but i didn't check the status of the latest driver version. - Arto -- Arto Vuori email: avuori@ssh.com mobile: +358 40 754 5223 ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel oops while routing
@ 2001-11-29 22:42 Dan Malek
2001-12-05 3:24 ` Re[2]: " Ricardo Scop
0 siblings, 1 reply; 6+ messages in thread
From: Dan Malek @ 2001-11-29 22:42 UTC (permalink / raw)
To: scop; +Cc: linuxppc-embedded, andy_lowe
Ricardo Scop wrote:
> I'm kind of lost with this performance variations. As far as I could
> see, the patch did not insert much processing overhead, so...
Perhaps if someone would post the patch for the rest of us to see we
could be of some assistance.
-- Dan
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 6+ messages in thread* Re[2]: Kernel oops while routing 2001-11-29 22:42 Dan Malek @ 2001-12-05 3:24 ` Ricardo Scop 0 siblings, 0 replies; 6+ messages in thread From: Ricardo Scop @ 2001-12-05 3:24 UTC (permalink / raw) To: Dan Malek; +Cc: linuxppc-embedded, andy_lowe [-- Attachment #1: Type: text/plain, Size: 2338 bytes --] Dan, I apologize for the delay. We were conducting some more tests so as to not make any more false alarms :-) about kernel crashes, memory leaks and/or performance problems in the linuxppc port to our 8255 hardware platform. So, after a _carefull_ test period, these are our findings: 1 - Andy's patch (which is attached) works well and does _not_ append any performance penalties in our tests (we were having PHY negatiation problems there, again :-/ ). 2 - We _did_ have a memory leak which was causing a kernel crash after a while, and it _was_ solved by Andy's patch (thanks, Andy!). I believe it's still on linuxppc_2_4, _2_4_devel and _2_5. It goes like this: - in fcc_enet_start_xmit, after setting up another bd and incrementing bdp, the next bd's tx-ready bit is tested in order to stop the xmit queue if it is set, ok? But, sometimes, the CPM may already have cleared this bit _and_ the corresponding interrupt has not been serviced yet (because we're in a spin_lock_irq); so, netif_stop_queue is not called in this case, nor is tx_full set; - next, the interrupt is serviced, but then curr_tx equals dirty_tx _and_ tx_full is not set, so no sk_buffers are freed! - next time fcc_enet_start_xmit is called, tx_ready bit is still cleared and the next bd is used, but the corresponding sk_buffer wasn't freed, and it's pointer is now lost; - cep->lock can't help with this problem, because the CPM is not bothered by that 8-). AFAIK, Andy's solution is a good one. So, we're offering this patch to the public list (with Andy's blessing :-). I can provide any other details about our tests, if required. Thenks, Ricardo Scop mailto:scop@vanet.com.br R SCOP Consulting ------------------------------------------------------------------ "What's money? A man is a success if he gets up in the morning and goes to bed at night and in between does what he wants to do." ~Bob Dylan Thursday, November 29, 2001, 7:42:24 PM, you wrote: DM> Ricardo Scop wrote: >> I'm kind of lost with this performance variations. As far as I could >> see, the patch did not insert much processing overhead, so... DM> Perhaps if someone would post the patch for the rest of us to see we DM> could be of some assistance. DM> -- Dan [-- Attachment #2: patch-2.41.16-pre1-fcc_enet --] [-- Type: application/octet-stream, Size: 2359 bytes --] Index: arch/ppc/8260_io/fcc_enet.c =================================================================== RCS file: /var/cvs/kernel/arch/ppc/8260_io/fcc_enet.c,v retrieving revision 1.1.1.1.4.2 diff -u -r1.1.1.1.4.2 fcc_enet.c --- arch/ppc/8260_io/fcc_enet.c 4 Sep 2001 16:37:18 -0000 1.1.1.1.4.2 +++ arch/ppc/8260_io/fcc_enet.c 27 Nov 2001 18:42:43 -0000 @@ -300,7 +300,7 @@ volatile fcc_t *fccp; volatile fcc_enet_t *ep; struct net_device_stats stats; - uint tx_full; + uint tx_free; spinlock_t lock; #ifdef CONFIG_USE_MDIO @@ -360,9 +360,9 @@ bdp = cep->cur_tx; #ifndef final_version - if (bdp->cbd_sc & BD_ENET_TX_READY) { + if (!cep->tx_free || (bdp->cbd_sc & BD_ENET_TX_READY)) { /* Ooops. All transmit buffers are full. Bail out. - * This should not happen, since cep->tx_full should be set. + * This should not happen, since the tx queue should be stopped. */ printk("%s: tx queue full!.\n", dev->name); return 1; @@ -407,10 +407,8 @@ else bdp++; - if (bdp->cbd_sc & BD_ENET_TX_READY) { + if (!--cep->tx_free) netif_stop_queue(dev); - cep->tx_full = 1; - } cep->cur_tx = (cbd_t *)bdp; @@ -431,8 +429,8 @@ { int i; cbd_t *bdp; - printk(" Ring data dump: cur_tx %p%s cur_rx %p.\n", - cep->cur_tx, cep->tx_full ? " (full)" : "", + printk(" Ring data dump: cur_tx %p tx_free %d cur_rx %p.\n", + cep->cur_tx, cep->tx_free, cep->cur_rx); bdp = cep->tx_bd_base; printk(" Tx @base %p :\n", bdp); @@ -450,7 +448,7 @@ bdp->cbd_bufaddr); } #endif - if (!cep->tx_full) + if (cep->tx_free) netif_wake_queue(dev); } @@ -492,7 +490,7 @@ spin_lock(&cep->lock); bdp = cep->dirty_tx; while ((bdp->cbd_sc&BD_ENET_TX_READY)==0) { - if ((bdp==cep->cur_tx) && (cep->tx_full == 0)) + if (cep->tx_free == TX_RING_SIZE) break; if (bdp->cbd_sc & BD_ENET_TX_HB) /* No heartbeat */ @@ -546,8 +544,7 @@ /* Since we have freed up a buffer, the ring is no longer * full. */ - if (cep->tx_full) { - cep->tx_full = 0; + if (!cep->tx_free++) { if (netif_queue_stopped(dev)) { netif_wake_queue(dev); } @@ -1529,6 +1526,7 @@ #endif cep->dirty_tx = cep->cur_tx = cep->tx_bd_base; + cep->tx_free = TX_RING_SIZE; cep->cur_rx = cep->rx_bd_base; ep->fen_genfcc.fcc_rstate = (CPMFCR_GBL | CPMFCR_EB) << 24; ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2001-12-07 15:56 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-12-05 18:01 Kernel oops while routing Jean-Denis Boyer 2001-12-05 18:21 ` Peter Desnoyers 2001-12-05 18:33 ` Dan Malek 2001-12-05 22:41 ` Re[2]: " Ricardo Scop 2001-12-07 15:56 ` Arto Vuori -- strict thread matches above, loose matches on Subject: below -- 2001-11-29 22:42 Dan Malek 2001-12-05 3:24 ` Re[2]: " Ricardo Scop
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.