* Starfire (Adaptec) kernel 2.6.13+ panics on AMD64 NFS server @ 2005-09-30 3:36 Hendrik Visage 2005-09-30 4:16 ` Andrew Morton 0 siblings, 1 reply; 10+ messages in thread From: Hendrik Visage @ 2005-09-30 3:36 UTC (permalink / raw) To: linux-net, linux-kernel Hi there, Traced a panicing kernel to what appears the starfire changes for 2.6.13 up to 2.6.14_rc2 During a relative heavy NFS read (client a 32bit 2.6.13.1 P2-350) with rsync (ripped CD archive) I get kernel panics (Aieee interupt handler lost or something... okay also need a way to capture those errors as it's a hard panic and needs a reset button :() I've isolated the problem going from 2.6.12.5/2.6.12-gentoo-r10 (both working) to 2.6.13/2.6.13-gentoo/2.6.14_rc2 while the NFS is served through the Adaptec/starfire, and further more the onboard forceth(nvidia) is serving the data without hassles (at least on 2.6.14_rc2) Using gcc 3.4.4 -- Hendrik Visage ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Starfire (Adaptec) kernel 2.6.13+ panics on AMD64 NFS server 2005-09-30 3:36 Starfire (Adaptec) kernel 2.6.13+ panics on AMD64 NFS server Hendrik Visage @ 2005-09-30 4:16 ` Andrew Morton 2005-09-30 8:14 ` Hendrik Visage 2005-09-30 16:01 ` Hendrik Visage 0 siblings, 2 replies; 10+ messages in thread From: Andrew Morton @ 2005-09-30 4:16 UTC (permalink / raw) To: Hendrik Visage; +Cc: linux-net, linux-kernel, Ion Badulescu Hendrik Visage <hvjunk@gmail.com> wrote: > > Traced a panicing kernel to what appears the starfire changes for > 2.6.13 up to 2.6.14_rc2 > > During a relative heavy NFS read (client a 32bit 2.6.13.1 P2-350) with > rsync (ripped CD archive) I get kernel panics (Aieee interupt handler > lost or something... okay also need > a way to capture those errors as it's a hard panic and needs a reset button :() A serial console is useful. Often people will take a digital photo of the screen, which works OK. But we do need that info somehow, please. > I've isolated the problem going from 2.6.12.5/2.6.12-gentoo-r10 (both > working) to > 2.6.13/2.6.13-gentoo/2.6.14_rc2 while the NFS is served through the > Adaptec/starfire, > and further more the onboard forceth(nvidia) is serving the data > without hassles (at least > on 2.6.14_rc2) The starfire changes in 2.6.12->2.6.13 look fairly innocuous. Need that trace, please. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Starfire (Adaptec) kernel 2.6.13+ panics on AMD64 NFS server 2005-09-30 4:16 ` Andrew Morton @ 2005-09-30 8:14 ` Hendrik Visage 2005-09-30 16:46 ` Ion Badulescu 2005-09-30 16:01 ` Hendrik Visage 1 sibling, 1 reply; 10+ messages in thread From: Hendrik Visage @ 2005-09-30 8:14 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-net, linux-kernel, Ion Badulescu On 9/30/05, Andrew Morton <akpm@osdl.org> wrote: > A serial console is useful. Often people will take a digital photo of the > screen, which works OK. But we do need that info somehow, please. busy getting that (and/or lkcd|kdb) setup.. > The starfire changes in 2.6.12->2.6.13 look fairly innocuous. Need that > trace, please. Will do, but check perhaps for some 64bit uncleanes in the scatter gather stuff that got enabled in 2.6.13 because of the GPL'd Adaptec firmware, as I recalled some skb related stuff. -- Hendrik Visage ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Starfire (Adaptec) kernel 2.6.13+ panics on AMD64 NFS server 2005-09-30 8:14 ` Hendrik Visage @ 2005-09-30 16:46 ` Ion Badulescu 0 siblings, 0 replies; 10+ messages in thread From: Ion Badulescu @ 2005-09-30 16:46 UTC (permalink / raw) To: Hendrik Visage; +Cc: Andrew Morton, linux-net, linux-kernel Hi Henrik, On Fri, 30 Sep 2005, Hendrik Visage wrote: > Will do, but check perhaps for some 64bit uncleanes in the scatter gather stuff > that got enabled in 2.6.13 because of the GPL'd Adaptec firmware, as I > recalled some skb related stuff. There is an easy way to disable the firmware and pretty much all the changes that went into 2.6.13: load the starfire with enable_hw_cksum=0. If you can easily reproduce this problem, try doing the above and see if you can still hit it. Maybe it's a newly introduced problem in the upper layer's SG--your other network driver simply isn't using SG so it's not affected. It's very suspicious that the bug would be in skb_checksum_help(), since the starfire driver doesn't do anything with the skb before handing it over to skb_checksum_help(). It would mean that the upper layer handed an invalid skb to the driver, or that we have some random memory corruption somewhere. Thanks, Ion -- It is better to keep your mouth shut and be thought a fool, than to open it and remove all doubt. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Starfire (Adaptec) kernel 2.6.13+ panics on AMD64 NFS server 2005-09-30 4:16 ` Andrew Morton 2005-09-30 8:14 ` Hendrik Visage @ 2005-09-30 16:01 ` Hendrik Visage 2005-09-30 17:40 ` Andrew Morton 1 sibling, 1 reply; 10+ messages in thread From: Hendrik Visage @ 2005-09-30 16:01 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-net, linux-kernel, Ion Badulescu [-- Attachment #1: Type: text/plain, Size: 281 bytes --] On 9/30/05, Andrew Morton <akpm@osdl.org> wrote: > The starfire changes in 2.6.12->2.6.13 look fairly innocuous. Need that > trace, please. See attached :) Will do a check without PREEMPT as I've noticed that to be the first line of "problem" :( -- Hendrik Visage [-- Attachment #2: crash2.minicom --] [-- Type: application/octet-stream, Size: 4192 bytes --] ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at net/core/dev.c:1099 invalid operand: 0000 [1] PREEMPT CPU 0 Modules linked in: nvidia nfsd exportfs lockd sunrpc rfcomm l2cap hci_usb bluetooth starfire mii snd_ac97_bus soundcore snd_page_alloc forcedeth i2c_nforce2 dm_mirror dm_mod sbp2 ohci1394 ieee1394 ohci_hcd uhci_hcd usb_storage usbhid ehci_hcd usbcore Pid: 11252, comm: nfsd Tainted: P 2.6.14-rc2 #3 RIP: 0010:[<ffffffff802cc7ed>] <ffffffff802cc7ed>{skb_checksum_help+157} RSP: 0000:ffff81003a0bd998 EFLAGS: 00010246 RAX: ffff81003ff01624 RBX: ffff81003ca7f180 RCX: 00000000b7e42194 RDX: 00000000b7e42194 RSI: ffff81003ff01624 RDI: ffff81003b026080 RBP: ffff81003a0bd9b8 R08: 0000000000000000 R09: 0000000000000004 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff81003ca7f180 R15: ffff81003d462218 FS: 00002aaaaade6ae0(0000) GS:ffffffff804fe800(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aaaaaac2000 CR3: 000000003d5a2000 CR4: 00000000000006e0 Process nfsd (pid: 11252, threadinfo ffff81003a0bc000, task ffff81003e0ed0c0) Stack: ffffffff804cd720 ffff81003d462000 ffff81003d4623e0 ffff81003ca7f180 ffff81003a0bda08 ffffffff88104944 ffff81003d462218 000000013a2a8600 ffff81003d462000 ffff81003d462000 Call Trace:<ffffffff88104944>{:starfire:start_tx+164} <ffffffff802db0fc>{qdisc_restart+268} <ffffffff802ccad0>{dev_queue_xmit+288} <ffffffff802d29b0>{neigh_resolve_output+672} <ffffffff802ebb27>{ip_finish_output+455} <ffffffff802ec5ff>{ip_fragment+863} <ffffffff802eb960>{ip_finish_output+0} <ffffffff802eca6c>{ip_output+108} <ffffffff8035a708>{_spin_unlock_bh+24} <ffffffff802ee1e7>{ip_push_pending_frames+919} <ffffffff80307d7e>{udp_push_pending_frames+574} <ffffffff80308658>{udp_sendpage+280} <ffffffff8031001f>{inet_sendpage+111} <ffffffff881411ea>{:sunrpc:svc_sendto+554} <ffffffff8818b8f9>{:nfsd:encode_post_op_attr+553} <ffffffff88141893>{:sunrpc:svc_udp_sendto+35} <ffffffff88142327>{:sunrpc:svc_send+247} <ffffffff88140854>{:sunrpc:svc_process+1108} <ffffffff8817e43e>{:nfsd:nfsd+462} <ffffffff8012e529>{schedule_tail+73} <ffffffff8010f61e>{child_rip+8} <ffffffff8817e270>{:nfsd:nfsd+0} <ffffffff8010f616>{child_rip+0} Code: 0f 0b 68 23 d9 39 80 c2 4b 04 8b 93 8c 00 00 00 8d 42 02 44 RIP <ffffffff802cc7ed>{skb_checksum_help+157} RSP <ffff81003a0bd998> <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 in_atomic():1, irqs_disabled():0 Call Trace:<ffffffff8012db7f>{__might_sleep+191} <ffffffff80133c3c>{profile_task_exit+44} <ffffffff801350f5>{do_exit+37} <ffffffff8035a5c3>{_spin_unlock_irqrestore+19} <ffffffff8035a5cd>{_spin_unlock_irqrestore+29} <ffffffff80110294>{die+84} <ffffffff8035ac1e>{do_trap+334} <ffffffff8011058c>{do_invalid_op+172} <ffffffff802cc7ed>{skb_checksum_help+157} <ffffffff8010f469>{error_exit+0} <ffffffff802cc7ed>{skb_checksum_help+157} <ffffffff802cc7d5>{skb_checksum_help+133} <ffffffff88104944>{:starfire:start_tx+164} <ffffffff802db0fc>{qdisc_restart+268} <ffffffff802ccad0>{dev_queue_xmit+288} <ffffffff802d29b0>{neigh_resolve_output+672} <ffffffff802ebb27>{ip_finish_output+455} <ffffffff802ec5ff>{ip_fragment+863} <ffffffff802eb960>{ip_finish_output+0} <ffffffff802eca6c>{ip_output+108} <ffffffff8035a708>{_spin_unlock_bh+24} <ffffffff802ee1e7>{ip_push_pending_frames+919} <ffffffff80307d7e>{udp_push_pending_frames+574} <ffffffff80308658>{udp_sendpage+280} <ffffffff8031001f>{inet_sendpage+111} <ffffffff881411ea>{:sunrpc:svc_sendto+554} <ffffffff8818b8f9>{:nfsd:encode_post_op_attr+553} <ffffffff88141893>{:sunrpc:svc_udp_sendto+35} <ffffffff88142327>{:sunrpc:svc_send+247} <ffffffff88140854>{:sunrpc:svc_process+1108} <ffffffff8817e43e>{:nfsd:nfsd+462} <ffffffff8012e529>{schedule_tail+73} <ffffffff8010f61e>{child_rip+8} <ffffffff8817e270>{:nfsd:nfsd+0} <ffffffff8010f616>{child_rip+0} Kernel panic - not syncing: Aiee, killing interrupt handler! ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Starfire (Adaptec) kernel 2.6.13+ panics on AMD64 NFS server 2005-09-30 16:01 ` Hendrik Visage @ 2005-09-30 17:40 ` Andrew Morton 2005-09-30 20:10 ` Hendrik Visage 0 siblings, 1 reply; 10+ messages in thread From: Andrew Morton @ 2005-09-30 17:40 UTC (permalink / raw) To: Hendrik Visage; +Cc: linux-net, linux-kernel, ionut, Jeff Garzik Hendrik Visage <hvjunk@gmail.com> wrote: > > On 9/30/05, Andrew Morton <akpm@osdl.org> wrote: > > > The starfire changes in 2.6.12->2.6.13 look fairly innocuous. Need that > > trace, please. > > See attached :) > It helps, thanks. > ----------- [cut here ] --------- [please bite here ] --------- > Kernel BUG at net/core/dev.c:1099 > invalid operand: 0000 [1] PREEMPT > CPU 0 > Modules linked in: nvidia nfsd exportfs lockd sunrpc rfcomm l2cap hci_usb bluetooth starfire mii snd_ac97_bus soundcore snd_page_alloc forcedeth i2c_nforce2 dm_mirror dm_mod sbp2 ohci1394 ieee1394 ohci_hcd uhci_hcd usb_storage usbhid ehci_hcd usbcore > Pid: 11252, comm: nfsd Tainted: P 2.6.14-rc2 #3 > RIP: 0010:[<ffffffff802cc7ed>] <ffffffff802cc7ed>{skb_checksum_help+157} > RSP: 0000:ffff81003a0bd998 EFLAGS: 00010246 > RAX: ffff81003ff01624 RBX: ffff81003ca7f180 RCX: 00000000b7e42194 > RDX: 00000000b7e42194 RSI: ffff81003ff01624 RDI: ffff81003b026080 > RBP: ffff81003a0bd9b8 R08: 0000000000000000 R09: 0000000000000004 > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > R13: 0000000000000000 R14: ffff81003ca7f180 R15: ffff81003d462218 > FS: 00002aaaaade6ae0(0000) GS:ffffffff804fe800(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 00002aaaaaac2000 CR3: 000000003d5a2000 CR4: 00000000000006e0 > Process nfsd (pid: 11252, threadinfo ffff81003a0bc000, task ffff81003e0ed0c0) > Stack: ffffffff804cd720 ffff81003d462000 ffff81003d4623e0 ffff81003ca7f180 > ffff81003a0bda08 ffffffff88104944 ffff81003d462218 000000013a2a8600 > ffff81003d462000 ffff81003d462000 > Call Trace:<ffffffff88104944>{:starfire:start_tx+164} <ffffffff802db0fc>{qdisc_restart+268} > <ffffffff802ccad0>{dev_queue_xmit+288} <ffffffff802d29b0>{neigh_resolve_output+672} > <ffffffff802ebb27>{ip_finish_output+455} <ffffffff802ec5ff>{ip_fragment+863} > <ffffffff802eb960>{ip_finish_output+0} <ffffffff802eca6c>{ip_output+108} yep, there's something wrong with the skb which starfire fed into skb_checksum_help(). offset = skb->tail - skb->h.raw; if (offset <= 0) BUG(); And that's a post-2.6.12 driver change. You can probably work around it by deleting the #define ZEROCOPY line. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Starfire (Adaptec) kernel 2.6.13+ panics on AMD64 NFS server 2005-09-30 17:40 ` Andrew Morton @ 2005-09-30 20:10 ` Hendrik Visage 2005-09-30 20:55 ` Ion Badulescu 2005-09-30 22:39 ` Herbert Xu 0 siblings, 2 replies; 10+ messages in thread From: Hendrik Visage @ 2005-09-30 20:10 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-net, linux-kernel, ionut, Jeff Garzik [-- Attachment #1: Type: text/plain, Size: 727 bytes --] On 9/30/05, Andrew Morton <akpm@osdl.org> wrote: > > ----------- [cut here ] --------- [please bite here ] --------- > > Kernel BUG at net/core/dev.c:1099 > > invalid operand: 0000 [1] PREEMPT > > yep, there's something wrong with the skb which starfire fed into > skb_checksum_help(). > <snip> > > And that's a post-2.6.12 driver change. You can probably work around > it by deleting the #define ZEROCOPY line. :) Anycase, here is a non-PREEMPT traceback. What makes this one interesting, is that in the preempt case, I had to push the NFS output to get the panic, but the non-preempt case attached, sorta just happened, ie. when the clients just checked on the server's status :( -- Hendrik Visage [-- Attachment #2: non-prempt --] [-- Type: application/octet-stream, Size: 4110 bytes --] Kernel BUG at net/core/dev.c:1099 invalid operand: 0000 [1] CPU 0 Modules linked in: nfs nfsd exportfs lockd sunrpc rfcomm l2cap hci_usb bluetooth starfire mii snd_ac97_bus soundcore snd_page_alloc forcedeth i2c_nforce2 dm_mirror dm_mod sbp2 ohci1394 ieee1394 ohci_hcd uhci_hcd usb_storage usbhid ehci_hcd usbcore Pid: 11169, comm: nfsd Not tainted 2.6.14-rc2 #4 RIP: 0010:[<ffffffff802c803d>] <ffffffff802c803d>{skb_checksum_help+157} RSP: 0018:ffff81003d3bda08 EFLAGS: 00010246 RAX: ffff81003ac51c24 RBX: ffff81003ac4cd80 RCX: 000000005459cd0b RDX: 000000005459cd0b RSI: ffff81003ac51c24 RDI: ffff81003d272080 RBP: ffff81003d3bda28 R08: 0000000000000000 R09: 0000000000000006 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: ffff81003ac4cd80 R15: ffff81003a31c218 FS: 00002aaaaade6ae0(0000) GS:ffffffff804f7800(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aaaaabc1190 CR3: 000000003b22b000 CR4: 00000000000006e0 Process nfsd (pid: 11169, threadinfo ffff81003d3bc000, task ffff81003f7c5100) Stack: ffff81003d3bda48 ffff81003a31c000 ffff81003a31c3e0 ffff81003ac4cd80 ffff81003d3bda78 ffffffff88104944 ffff8100b00000d0 0000000100000000 ffff81003a31c000 ffff81003a31c000 Call Trace:<ffffffff88104944>{:starfire:start_tx+164} <ffffffff802d6583>{qdisc_restart+243} <ffffffff802c8325>{dev_queue_xmit+293} <ffffffff802e64c7>{ip_finish_output+455} <ffffffff802e6f9f>{ip_fragment+863} <ffffffff802e6300>{ip_finish_output+0} <ffffffff802e740c>{ip_output+108} <ffffffff8035404e>{_spin_unlock_bh+14} <ffffffff802e8b87>{ip_push_pending_frames+919} <ffffffff803024de>{udp_push_pending_frames+574} <ffffffff80302db8>{udp_sendpage+280} <ffffffff8030a39f>{inet_sendpage+111} <ffffffff881411ca>{:sunrpc:svc_sendto+554} <ffffffff8818b879>{:nfsd:encode_post_op_attr+553} <ffffffff88141873>{:sunrpc:svc_udp_sendto+35} <ffffffff88142307>{:sunrpc:svc_send+247} <ffffffff88140834>{:sunrpc:svc_process+1108} <ffffffff8817e3c0>{:nfsd:nfsd+448} <ffffffff8012dfa9>{schedule_tail+73} <ffffffff8010f50e>{child_rip+8} <ffffffff8817e200>{:nfsd:nfsd+0} <ffffffff8010f506>{child_rip+0} Code: 0f 0b 68 0b 6f 39 80 c2 4b 04 8b 93 8c 00 00 00 8d 42 02 44 RIP <ffffffff802c803d>{skb_checksum_help+157} RSP <ffff81003d3bda08> <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 in_atomic():1, irqs_disabled():0 Call Trace:<ffffffff8012d6af>{__might_sleep+191} <ffffffff801333dc>{profile_task_exit+44} <ffffffff80134895>{do_exit+37} <ffffffff80353ff3>{_spin_unlock_irqrestore+19} <ffffffff80110184>{die+84} <ffffffff8035431e>{do_trap+334} <ffffffff8011047c>{do_invalid_op+172} <ffffffff802c803d>{skb_checksum_help+157} <ffffffff802c27e5>{__alloc_skb+133} <ffffffff802c06dd>{sock_alloc_send_skb+109} <ffffffff802e179d>{__ip_route_output_key+1517} <ffffffff8010f359>{error_exit+0} <ffffffff802c803d>{skb_checksum_help+157} <ffffffff802c8025>{skb_checksum_help+133} <ffffffff88104944>{:starfire:start_tx+164} <ffffffff802d6583>{qdisc_restart+243} <ffffffff802c8325>{dev_queue_xmit+293} <ffffffff802e64c7>{ip_finish_output+455} <ffffffff802e6f9f>{ip_fragment+863} <ffffffff802e6300>{ip_finish_output+0} <ffffffff802e740c>{ip_output+108} <ffffffff8035404e>{_spin_unlock_bh+14} <ffffffff802e8b87>{ip_push_pending_frames+919} <ffffffff803024de>{udp_push_pending_frames+574} <ffffffff80302db8>{udp_sendpage+280} <ffffffff8030a39f>{inet_sendpage+111} <ffffffff881411ca>{:sunrpc:svc_sendto+554} <ffffffff8818b879>{:nfsd:encode_post_op_attr+553} <ffffffff88141873>{:sunrpc:svc_udp_sendto+35} <ffffffff88142307>{:sunrpc:svc_send+247} <ffffffff88140834>{:sunrpc:svc_process+1108} <ffffffff8817e3c0>{:nfsd:nfsd+448} <ffffffff8012dfa9>{schedule_tail+73} <ffffffff8010f50e>{child_rip+8} <ffffffff8817e200>{:nfsd:nfsd+0} <ffffffff8010f506>{child_rip+0} Kernel panic - not syncing: Aiee, killing interrupt handler! ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Starfire (Adaptec) kernel 2.6.13+ panics on AMD64 NFS server 2005-09-30 20:10 ` Hendrik Visage @ 2005-09-30 20:55 ` Ion Badulescu 2005-09-30 22:39 ` Herbert Xu 1 sibling, 0 replies; 10+ messages in thread From: Ion Badulescu @ 2005-09-30 20:55 UTC (permalink / raw) To: Hendrik Visage; +Cc: Andrew Morton, linux-net, linux-kernel, Jeff Garzik On Fri, 30 Sep 2005, Hendrik Visage wrote: > Anycase, here is a non-PREEMPT traceback. Same trace, pretty much like I expected. Still, starfire must be getting a bad skb from the upper layers, because it gets passed __unmodified__ to skb_checksum_help(). Either that, or skb_checksum_help() itself got broken at some point, at least on 64-bit platforms. I'll try to reproduce it over the weekend (assumming I can get an x86_64 box set up, with a starfire inside) and see where the problem is. > What makes this one interesting, is that in the preempt case, I had to > push the NFS output to get the panic, but the non-preempt case attached, > sorta just happened, ie. when the clients just checked on the server's > status :( I'm actually surprised you got your panic from nfsd. skb_checksum_help() is called only when one of the fragments has length == 1, so the easiest way to hit it is to slowly type something into a telnet session. Thanks, Ion -- It is better to keep your mouth shut and be thought a fool, than to open it and remove all doubt. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Starfire (Adaptec) kernel 2.6.13+ panics on AMD64 NFS server 2005-09-30 20:10 ` Hendrik Visage 2005-09-30 20:55 ` Ion Badulescu @ 2005-09-30 22:39 ` Herbert Xu 2005-10-01 19:21 ` Hendrik Visage 1 sibling, 1 reply; 10+ messages in thread From: Herbert Xu @ 2005-09-30 22:39 UTC (permalink / raw) To: Hendrik Visage Cc: Andrew Morton, linux-net, linux-kernel, ionut, Jeff Garzik, netdev [-- Attachment #1: Type: text/plain, Size: 843 bytes --] On Fri, Sep 30, 2005 at 08:10:59PM +0000, Hendrik Visage wrote: > > Anycase, here is a non-PREEMPT traceback. What makes this one > interesting, is that > in the preempt case, I had to push the NFS output to get the panic, but the > non-preempt case attached, sorta just happened, ie. when the clients > just checked on the server's status :( You must never call skb_checksum_help unless the packet is meant to be checksummed by the hardware. So starfire is the guilty party here. This patch makes it do the check and also check for errors from skb_checksum_help. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt [-- Attachment #2: p --] [-- Type: text/plain, Size: 654 bytes --] diff --git a/drivers/net/starfire.c b/drivers/net/starfire.c --- a/drivers/net/starfire.c +++ b/drivers/net/starfire.c @@ -1333,7 +1333,7 @@ static int start_tx(struct sk_buff *skb, } #if defined(ZEROCOPY) && defined(HAS_BROKEN_FIRMWARE) - { + if (skb->ip_summed == CHECKSUM_HW) { int has_bad_length = 0; if (skb_first_frag_len(skb) == 1) @@ -1346,8 +1346,10 @@ static int start_tx(struct sk_buff *skb, } } - if (has_bad_length) - skb_checksum_help(skb, 0); + if (has_bad_length && unlikely(skb_checksum_help(skb, 0))) { + dev_kfree_skb(skb); + return NETDEV_TX_OK; + } } #endif /* ZEROCOPY && HAS_BROKEN_FIRMWARE */ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Starfire (Adaptec) kernel 2.6.13+ panics on AMD64 NFS server 2005-09-30 22:39 ` Herbert Xu @ 2005-10-01 19:21 ` Hendrik Visage 0 siblings, 0 replies; 10+ messages in thread From: Hendrik Visage @ 2005-10-01 19:21 UTC (permalink / raw) To: Herbert Xu Cc: Andrew Morton, linux-net, linux-kernel, ionut, Jeff Garzik, netdev On 10/1/05, Herbert Xu <herbert@gondor.apana.org.au> wrote: > You must never call skb_checksum_help unless the packet is meant to > be checksummed by the hardware. So starfire is the guilty party here. > > This patch makes it do the check and also check for errors from > skb_checksum_help. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Thanx Herbert, at least on 2.6.14_rc2 the patch appears to work for my stress test :) -- Hendrik Visage ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2005-10-01 19:21 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-09-30 3:36 Starfire (Adaptec) kernel 2.6.13+ panics on AMD64 NFS server Hendrik Visage 2005-09-30 4:16 ` Andrew Morton 2005-09-30 8:14 ` Hendrik Visage 2005-09-30 16:46 ` Ion Badulescu 2005-09-30 16:01 ` Hendrik Visage 2005-09-30 17:40 ` Andrew Morton 2005-09-30 20:10 ` Hendrik Visage 2005-09-30 20:55 ` Ion Badulescu 2005-09-30 22:39 ` Herbert Xu 2005-10-01 19:21 ` Hendrik Visage
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox