* Re: [Linux-ATM-General] Kernel 2.6.10 and 2.4.29 Oops fore200e (fwd) [not found] <200501181659.j0IGx7km019753@ginger.cmf.nrl.navy.mil> @ 2005-01-21 7:46 ` Lukasz Trabinski 2005-01-24 18:37 ` chas williams - CONTRACTOR 0 siblings, 1 reply; 9+ messages in thread From: Lukasz Trabinski @ 2005-01-21 7:46 UTC (permalink / raw) To: chas williams - CONTRACTOR Cc: linux-atm-general, linux-kernel, Bartlomiej Solarz On Tue, 18 Jan 2005, chas williams - CONTRACTOR wrote: > the system keeps running right? the error is a 'warning' that the > fore200e is driver is sleeping when it should not (probably while holding > interrupts). the schedule() around like 1782 is not a good idea since > the fore200e_send() might not be running in a sleepable context. just > try commenting that line for now. Sorry, but I don;t understand, what line, i am not kernel guru. :/ oceanic:/usr/src/linux-2.4.29$ grep fore200e_send * -r drivers/atm/fore200e.c:fore200e_send(struct atm_vcc *vcc, struct sk_buff *skb) drivers/atm/fore200e.c: send: fore200e_send, Is was happened on 2.4.29, too. It is a interrupt problem? Below Oops from 2.4.29: ksymoops 2.4.11 on i686 2.4.29. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.29/ (default) -m /lib/modules/2.4.29/System.map (specified) kernel BUG at sched.c:564! invalid operand: 0000 CPU: 0 EIP: 0010:[<c0114f57>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010286 eax: 00000018 ebx: f76d2088 ecx: c02b2000 edx: f7651f7c esi: 00000000 edi: 00000000 ebp: c02b3cdc esp: c02b3cac ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, stackpage=c02b3000) Stack: c026b646 376e8c01 f8888470 00000054 c02b2000 f7c95494 c02b2000 00000000 00000054 f76d2088 00000246 f76d3084 f76d00e8 f8843d42 f76d0000 f8888950 00000038 00000001 f67d7c10 00000038 00000000 00000038 00000000 0000001f Call Trace: [<f8843d42>] [<c02599d6>] [<c01fe4a9>] [<c01f36df>] [<c020fa03>] [<c01fda4f>] [<c020f920>] [<c020e3c2>] [<c020f920>] [<c020d060>] [<c01fda4f>] [<c020d010>] [<c020cf4a>] [<c020d010>] [<c020bd09>] [<c01fda4f>] [<c020bb00>] [<c020b920>] [<c020bb00>] [<c01f3cb4>] [<c01f3e0d>] [<c01f3f55>] [<c011d0a6>] [<c0109296>] [<c0105330>] [<c010b938>] [<c0105330>] [<c0105359>] [<c01053f2>] [<c0105000>] Code: 0f 0b 34 02 3e b6 26 c0 e9 17 fb ff ff 0f 0b 2d 02 3e b6 26 >>EIP; c0114f57 <schedule+527/550> <===== >>ebx; f76d2088 <_end+3738b1bc/384fb194> >>ecx; c02b2000 <init_task_union+0/2000> >>edx; f7651f7c <_end+3730b0b0/384fb194> >>ebp; c02b3cdc <init_task_union+1cdc/2000> >>esp; c02b3cac <init_task_union+1cac/2000> Trace; f8843d42 <[fore_200e]fore200e_send+172/6d0> Trace; c02599d6 <clip_start_xmit+186/220> Trace; c01fe4a9 <qdisc_restart+69/190> Trace; c01f36df <dev_queue_xmit+23f/320> Trace; c020fa03 <ip_finish_output2+e3/120> Trace; c01fda4f <nf_hook_slow+11f/230> Trace; c020f920 <ip_finish_output2+0/120> Trace; c020e3c2 <ip_finish_output+42/50> Trace; c020f920 <ip_finish_output2+0/120> Trace; c020d060 <ip_forward_finish+50/60> Trace; c01fda4f <nf_hook_slow+11f/230> Trace; c020d010 <ip_forward_finish+0/60> Trace; c020cf4a <ip_forward+13a/200> Trace; c020d010 <ip_forward_finish+0/60> Trace; c020bd09 <ip_rcv_finish+209/269> Trace; c01fda4f <nf_hook_slow+11f/230> Trace; c020bb00 <ip_rcv_finish+0/269> Trace; c020b920 <ip_rcv+1a0/200> Trace; c020bb00 <ip_rcv_finish+0/269> Trace; c01f3cb4 <netif_receive_skb+e4/1b0> Trace; c01f3e0d <process_backlog+8d/130> Trace; c01f3f55 <net_rx_action+a5/140> Trace; c011d0a6 <do_softirq+d6/e0> Trace; c0109296 <do_IRQ+e6/f0> Trace; c0105330 <default_idle+0/50> Trace; c010b938 <call_do_IRQ+5/d> Trace; c0105330 <default_idle+0/50> Trace; c0105359 <default_idle+29/50> Trace; c01053f2 <cpu_idle+52/70> Trace; c0105000 <_stext+0/0> Code; c0114f57 <schedule+527/550> 00000000 <_EIP>: Code; c0114f57 <schedule+527/550> <===== 0: 0f 0b ud2a <===== Code; c0114f59 <schedule+529/550> 2: 34 02 xor $0x2,%al Code; c0114f5b <schedule+52b/550> 4: 3e ds Code; c0114f5c <schedule+52c/550> 5: b6 26 mov $0x26,%dh Code; c0114f5e <schedule+52e/550> 7: c0 e9 17 shr $0x17,%cl Code; c0114f61 <schedule+531/550> a: fb sti Code; c0114f62 <schedule+532/550> b: ff (bad) Code; c0114f63 <schedule+533/550> c: ff 0f decl (%edi) Code; c0114f65 <schedule+535/550> e: 0b 2d 02 3e b6 26 or 0x26b63e02,%ebp <0>Kernel panic: Aiee, killing interrupt handler! -- *[ Łukasz Trąbiński ]* SysAdmin @wsisiz.edu.pl ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-ATM-General] Kernel 2.6.10 and 2.4.29 Oops fore200e (fwd) 2005-01-21 7:46 ` [Linux-ATM-General] Kernel 2.6.10 and 2.4.29 Oops fore200e (fwd) Lukasz Trabinski @ 2005-01-24 18:37 ` chas williams - CONTRACTOR 2005-01-24 22:27 ` Mike Westall 0 siblings, 1 reply; 9+ messages in thread From: chas williams - CONTRACTOR @ 2005-01-24 18:37 UTC (permalink / raw) To: Lukasz Trabinski; +Cc: linux-atm-general, linux-kernel, Bartlomiej Solarz In message <Pine.LNX.4.61L.0501210835270.6993@lt.wsisiz.edu.pl>,Lukasz Trabinsk i writes: >Sorry, but I don;t understand, what line, i am not kernel guru. :/ look for the following code: /* retry once again? */ if(--retry > 0) { schedule(); goto retry_here; } change schedule() to udelay(50) and see if things are 'better'. >Is was happened on 2.4.29, too. It is a interrupt problem? its calling a routine that might sleep while in the transmit routine. this is not allow. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-ATM-General] Kernel 2.6.10 and 2.4.29 Oops fore200e (fwd) 2005-01-24 18:37 ` chas williams - CONTRACTOR @ 2005-01-24 22:27 ` Mike Westall 2005-01-24 22:38 ` chas williams - CONTRACTOR 0 siblings, 1 reply; 9+ messages in thread From: Mike Westall @ 2005-01-24 22:27 UTC (permalink / raw) To: chas williams - CONTRACTOR Cc: Lukasz Trabinski, linux-atm-general, linux-kernel, Bartlomiej Solarz You could also just revert to kernel 2.4.25 or earlier. Someone who was apparently oblivious to the fact that device driver send routines were "routinely" called in irq context and/or that it was a <very bad thing> to call schedule() under such circumstances slipped that one in sometime between 2.4.25 which is OK and 2.4.28 where it is broken. In 2.4.25 and earlier it was a simple busy wait loop in which "goto retry_here;" immediately followed the "if" statement. This was safe, albeit MP unfriendly because of the spin_lock()/unlock() on each iteration. I'd say just delete the if and drop the damn packet. At any rate someone who has access to the golden code should fix this one way or another ASAP because its definitely seriously broken the way it is now. Mike chas williams - CONTRACTOR wrote: > In message <Pine.LNX.4.61L.0501210835270.6993@lt.wsisiz.edu.pl>,Lukasz Trabinsk > i writes: > >>Sorry, but I don;t understand, what line, i am not kernel guru. :/ > > > look for the following code: > > /* retry once again? */ > if(--retry > 0) { > schedule(); > goto retry_here; > } > > > change schedule() to udelay(50) and see if things are 'better'. > > >>Is was happened on 2.4.29, too. It is a interrupt problem? > > > its calling a routine that might sleep while in the transmit routine. > this is not allow. > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Linux-atm-general mailing list > Linux-atm-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-atm-general > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-ATM-General] Kernel 2.6.10 and 2.4.29 Oops fore200e (fwd) 2005-01-24 22:27 ` Mike Westall @ 2005-01-24 22:38 ` chas williams - CONTRACTOR 2005-01-24 23:18 ` Lukasz Trabinski 0 siblings, 1 reply; 9+ messages in thread From: chas williams - CONTRACTOR @ 2005-01-24 22:38 UTC (permalink / raw) To: Mike Westall Cc: Lukasz Trabinski, linux-atm-general, linux-kernel, Bartlomiej Solarz the author sent me the latest version of the driver and i got it applied. the driver does has some useful changes along with this broken change. i suggest udelay() since it preserves the author's original intent. i intend to submit a patch this week. i probably wont fix the ambassador since i cant test the change. In message <41F5764B.8050308@cs.clemson.edu>,Mike Westall writes: >You could also just revert to kernel 2.4.25 or >earlier. Someone who was apparently oblivious >to the fact that device driver send routines >were "routinely" called in irq context and/or >that it was a <very bad thing> to call schedule() >under such circumstances slipped that one in >sometime between 2.4.25 which is OK and 2.4.28 >where it is broken. > >In 2.4.25 and earlier it was a simple busy wait loop >in which "goto retry_here;" immediately followed >the "if" statement. This was safe, albeit MP unfriendly >because of the spin_lock()/unlock() on each iteration. > >I'd say just delete the if and drop the damn >packet. > >At any rate someone who has access to the golden code >should fix this one way or another ASAP because its >definitely seriously broken the way it is now. > >Mike > > >chas williams - CONTRACTOR wrote: >> In message <Pine.LNX.4.61L.0501210835270.6993@lt.wsisiz.edu.pl>,Lukasz Trabinsk >> i writes: >> >>>Sorry, but I don;t understand, what line, i am not kernel guru. :/ >> >> >> look for the following code: >> >> /* retry once again? */ >> if(--retry > 0) { >> schedule(); >> goto retry_here; >> } >> >> >> change schedule() to udelay(50) and see if things are 'better'. >> >> >>>Is was happened on 2.4.29, too. It is a interrupt problem? >> >> >> its calling a routine that might sleep while in the transmit routine. >> this is not allow. >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting >> Tool for open source databases. Create drag-&-drop reports. Save time >> by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. >> Download a FREE copy at http://www.intelliview.com/go/osdn_nl >> _______________________________________________ >> Linux-atm-general mailing list >> Linux-atm-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/linux-atm-general >> >> > > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-ATM-General] Kernel 2.6.10 and 2.4.29 Oops fore200e (fwd) 2005-01-24 22:38 ` chas williams - CONTRACTOR @ 2005-01-24 23:18 ` Lukasz Trabinski 2005-01-30 19:24 ` Lukasz Trabinski 0 siblings, 1 reply; 9+ messages in thread From: Lukasz Trabinski @ 2005-01-24 23:18 UTC (permalink / raw) To: chas3; +Cc: Mike Westall, linux-atm-general, linux-kernel, Bartlomiej Solarz On Mon, 24 Jan 2005, chas williams - CONTRACTOR wrote: > the author sent me the latest version of the driver and i > got it applied. the driver does has some useful changes > along with this broken change. i suggest udelay() since > it preserves the author's original intent. Ok, i have just put udelay() function to the driver. If router will not crash after 5-6 days, it mean that driver works fine. I will inform about it. Generally problems has stareted (frequently crashes) when we puted to them more atm interfaces/VCs and router started forward more traffic and operated with two additional full bgp table. -- ŁT ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-ATM-General] Kernel 2.6.10 and 2.4.29 Oops fore200e (fwd) 2005-01-24 23:18 ` Lukasz Trabinski @ 2005-01-30 19:24 ` Lukasz Trabinski 2005-01-30 22:55 ` chas williams - CONTRACTOR 0 siblings, 1 reply; 9+ messages in thread From: Lukasz Trabinski @ 2005-01-30 19:24 UTC (permalink / raw) To: chas3; +Cc: Mike Westall, linux-atm-general, linux-kernel, Bartlomiej Solarz On Tue, 25 Jan 2005, Lukasz Trabinski wrote: > Ok, i have just put udelay() function to the driver. If router will not crash > after 5-6 days, it mean that driver works fine. I will inform about > it. Generally problems has stareted (frequently crashes) when we puted to > them more atm interfaces/VCs and router started forward more traffic and > operated with two additional full bgp table. OK, I think that dirver works much better with udelay() function. [root@cosmos root]# uptime 20:20:48 up 6 days, 23:25, 1 user, load average: 0.03, 0.03, 0.00 -- *[ Łukasz Trąbiński ]* SysAdmin @wsisiz.edu.pl ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-ATM-General] Kernel 2.6.10 and 2.4.29 Oops fore200e (fwd) 2005-01-30 19:24 ` Lukasz Trabinski @ 2005-01-30 22:55 ` chas williams - CONTRACTOR 2005-01-31 8:48 ` Lukasz Trabinski 2005-03-05 12:34 ` Lukasz Trabinski 0 siblings, 2 replies; 9+ messages in thread From: chas williams - CONTRACTOR @ 2005-01-30 22:55 UTC (permalink / raw) To: Lukasz Trabinski Cc: Mike Westall, linux-atm-general, linux-kernel, Bartlomiej Solarz In message <Pine.LNX.4.61L.0501302022470.5761@oceanic.wsisiz.edu.pl>,Lukasz Trabinski writes: >OK, I think that dirver works much better with udelay() function. good to hear. what does atmdiag say about that interface? does it have a large percentage of tx drops? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-ATM-General] Kernel 2.6.10 and 2.4.29 Oops fore200e (fwd) 2005-01-30 22:55 ` chas williams - CONTRACTOR @ 2005-01-31 8:48 ` Lukasz Trabinski 2005-03-05 12:34 ` Lukasz Trabinski 1 sibling, 0 replies; 9+ messages in thread From: Lukasz Trabinski @ 2005-01-31 8:48 UTC (permalink / raw) To: chas3; +Cc: Mike Westall, linux-atm-general, linux-kernel, Bartlomiej Solarz On Sun, 30 Jan 2005, chas williams - CONTRACTOR wrote: > In message <Pine.LNX.4.61L.0501302022470.5761@oceanic.wsisiz.edu.pl>,Lukasz Trabinski writes: >> OK, I think that dirver works much better with udelay() function. > > good to hear. what does atmdiag say about that interface? does it have > a large percentage of tx drops? After 12 hours: [root@cosmos root]# atmdiag Itf TX_okay TX_err RX_okay RX_err RX_drop 0 AAL0 0 0 0 0 0 AAL5 31375820 0 31479406 0 0 -- *[ Łukasz Trąbiński ]* SysAdmin @wsisiz.edu.pl ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Linux-ATM-General] Kernel 2.6.10 and 2.4.29 Oops fore200e (fwd) 2005-01-30 22:55 ` chas williams - CONTRACTOR 2005-01-31 8:48 ` Lukasz Trabinski @ 2005-03-05 12:34 ` Lukasz Trabinski 1 sibling, 0 replies; 9+ messages in thread From: Lukasz Trabinski @ 2005-03-05 12:34 UTC (permalink / raw) To: chas3; +Cc: Mike Westall, linux-atm-general, linux-kernel, Bartlomiej Solarz On Sun, 30 Jan 2005, chas williams - CONTRACTOR wrote: Hello again > good to hear. what does atmdiag say about that interface? does it have > a large percentage of tx drops? After one month work without oops, we have experienced oops again. It happen when one or more VC is down (for example on atm switch). We have two atm interfaces (fore_200e,nicstar) on our router: [root@cosmos root]# lspci |grep ATM 01:01.0 ATM network controller: FORE Systems Inc ForeRunner PCA-200EPC ATM 01:05.0 ATM network controller: Integrated Device Tech IDT77211 ATM Adapter (rev 03) I have changed schedule() to udelay(50) in fore_200e and nicstar. I have replaced also atm nicstar card to second one. In log file, we can see many infromation like this one: nicstar0: AAL5 CRC error - PDU size mismatch. ksymoops 2.4.11 on i686 2.4.29. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.29/ (default) -m /lib/modules/2.4.29/System.map (specified) CPU: 0 EIP: 0010:[<c01b68f9>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00000002 eax: c031ea00 ebx: 00000005 ecx: 00000001 edx: 000003fd esi: c031eac0 edi: c0305ee3 ebp: 00000005 esp: c02b3e18 ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, stackpage=c02b3000) Stack: 000f4016 c01bbe61 c031eac0 00000005 00000044 0000000d 00000016 c02a4c60 c0305ede 00011c3e 00011c54 c0118452 c02a4c60 c0305ede 00000016 00011c54 00011c54 00000016 f793d480 c011855f 00011c3e 00011c54 00000004 c029a1bc Call Trace: [<c01bbe61>] [<c0118452>] [<c011855f>] [<c0118893>] [<c01187bf>] [<f8a1f165>] [<f8a1cc15>] [<f8a1f14f>] [<f8a1c96c>] [<f8a1b7ad>] [<c0109029>] [<c0109248>] [<c0105330>] [<c010b938>] [<c0105330>] [<c0105359>] [<c01053f2>] [<c0105000>] Code: 5b 0f b6 c0 c3 89 f6 0f b7 48 74 8b 40 70 d3 e3 0f b6 04 03 >>EIP; c01b68f9 <serial_in+19/30> <===== >>eax; c031ea00 <serial_termios_locked+60/100> >>esi; c031eac0 <async_sercons+0/c0> >>edi; c0305ee3 <log_buf+1c43/8000> >>esp; c02b3e18 <init_task_union+1e18/2000> Trace; c01bbe61 <serial_console_write+81/220> Trace; c0118452 <__call_console_drivers+62/70> Trace; c011855f <call_console_drivers+7f/120> Trace; c0118893 <release_console_sem+53/b0> Trace; c01187bf <printk+14f/180> Trace; f8a1f165 <[nicstar]__module_license+4f/130a> Trace; f8a1cc15 <[nicstar]dequeue_rx+265/1040> Trace; f8a1f14f <[nicstar]__module_license+39/130a> Trace; f8a1c96c <[nicstar]process_rsq+2c/70> Trace; f8a1b7ad <[nicstar]ns_irq_handler+3ad/470> Trace; c0109029 <handle_IRQ_event+79/b0> Trace; c0109248 <do_IRQ+98/f0> Trace; c0105330 <default_idle+0/50> Trace; c010b938 <call_do_IRQ+5/d> Trace; c0105330 <default_idle+0/50> Trace; c0105359 <default_idle+29/50> Trace; c01053f2 <cpu_idle+52/70> Trace; c0105000 <_stext+0/0> Code; c01b68f9 <serial_in+19/30> 00000000 <_EIP>: Code; c01b68f9 <serial_in+19/30> <===== 0: 5b pop %ebx <===== Code; c01b68fa <serial_in+1a/30> 1: 0f b6 c0 movzbl %al,%eax Code; c01b68fd <serial_in+1d/30> 4: c3 ret Code; c01b68fe <serial_in+1e/30> 5: 89 f6 mov %esi,%esi Code; c01b6900 <serial_in+20/30> 7: 0f b7 48 74 movzwl 0x74(%eax),%ecx Code; c01b6904 <serial_in+24/30> b: 8b 40 70 mov 0x70(%eax),%eax Code; c01b6907 <serial_in+27/30> e: d3 e3 shl %cl,%ebx Code; c01b6909 <serial_in+29/30> 10: 0f b6 04 03 movzbl (%ebx,%eax,1),%eax Where is the problem, patchord is bad, or problem exists on atm switch? -- *[ Łukasz Trąbiński ]* SysAdmin @wsisiz.edu.pl ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2005-03-05 12:35 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200501181659.j0IGx7km019753@ginger.cmf.nrl.navy.mil>
2005-01-21 7:46 ` [Linux-ATM-General] Kernel 2.6.10 and 2.4.29 Oops fore200e (fwd) Lukasz Trabinski
2005-01-24 18:37 ` chas williams - CONTRACTOR
2005-01-24 22:27 ` Mike Westall
2005-01-24 22:38 ` chas williams - CONTRACTOR
2005-01-24 23:18 ` Lukasz Trabinski
2005-01-30 19:24 ` Lukasz Trabinski
2005-01-30 22:55 ` chas williams - CONTRACTOR
2005-01-31 8:48 ` Lukasz Trabinski
2005-03-05 12:34 ` Lukasz Trabinski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox