From mboxrd@z Thu Jan 1 00:00:00 1970 From: Patrick McHardy Subject: Re: sky2 hw csum failure [was Re: sky2 large MTU problems] Date: Thu, 25 May 2006 12:35:45 +0200 Message-ID: <44758881.1020106@trash.net> References: <6278d2220605240228v576dd66atdad4855b308e64bf@mail.gmail.com> <20060524103820.2e213a0b@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Cc: Daniel J Blueman , netdev@vger.kernel.org, Netfilter Developer Return-path: Received: from stinky.trash.net ([213.144.137.162]:51841 "EHLO stinky.trash.net") by vger.kernel.org with ESMTP id S965112AbWEYKfz (ORCPT ); Thu, 25 May 2006 06:35:55 -0400 To: Stephen Hemminger In-Reply-To: <20060524103820.2e213a0b@localhost.localdomain> Sender: netdev-owner@vger.kernel.org List-Id: netdev.vger.kernel.org Stephen Hemminger wrote: > On Wed, 24 May 2006 10:28:52 +0100 > "Daniel J Blueman" wrote: > > >>Having done some more stress testing with sky2 1.4 (in 2.6.17-rc4) and >>the latest patch, I have found problems when streaming lots of data >>out of the sky2 interface (eg via samba serving a large file to GigE >>client). Ultimately, the interface will stop sending. >> >>Before this happens, I see lots of: >> >>kernel: lan0: hw csum failure. >>kernel: [__skb_checksum_complete+86/96] __skb_checksum_complete+0x56/0x60 >>kernel: [tcp_error+300/512] tcp_error+0x12c/0x200 >>kernel: [poison_obj+41/96] poison_obj+0x29/0x60 >>kernel: [tcp_error+0/512] tcp_error+0x0/0x200 >>kernel: [ip_conntrack_in+157/1072] ip_conntrack_in+0x9d/0x430 >>kernel: [kfree_skbmem+8/128] kfree_skbmem+0x8/0x80 >>kernel: [arp_process+102/1408] arp_process+0x66/0x580 >>kernel: [check_poison_obj+36/416] check_poison_obj+0x24/0x1a0 >>kernel: [arp_process+102/1408] arp_process+0x66/0x580 >>kernel: [nf_iterate+99/144] nf_iterate+0x63/0x90 >>kernel: [ip_rcv_finish+0/608] ip_rcv_finish+0x0/0x260 >>kernel: [nf_hook_slow+89/240] nf_hook_slow+0x59/0xf0 >>kernel: [ip_rcv_finish+0/608] ip_rcv_finish+0x0/0x260 >>kernel: [ip_rcv+386/1104] ip_rcv+0x182/0x450 >>kernel: [ip_rcv_finish+0/608] ip_rcv_finish+0x0/0x260 >>kernel: [packet_rcv_spkt+216/320] packet_rcv_spkt+0xd8/0x140 >>kernel: [netif_receive_skb+476/784] netif_receive_skb+0x1dc/0x310 >>kernel: [sky2_poll+879/2096] sky2_poll+0x36f/0x830 >>kernel: [_spin_lock_irqsave+9/16] _spin_lock_irqsave+0x9/0x10 >>kernel: [run_timer_softirq+290/416] run_timer_softirq+0x122/0x1a0 >>kernel: [net_rx_action+108/256] net_rx_action+0x6c/0x100 >>kernel: [__do_softirq+66/160] __do_softirq+0x42/0xa0 >>kernel: [do_softirq+78/96] do_softirq+0x4e/0x60 >>kernel: ======================= >>kernel: [do_IRQ+90/160] do_IRQ+0x5a/0xa0 >>kernel: [remove_vma+69/80] remove_vma+0x45/0x50 >>kernel: [common_interrupt+26/32] common_interrupt+0x1a/0x20 >>kernel: [get_offset_pmtmr+151/3584] get_offset_pmtmr+0x97/0xe00 >>kernel: [do_gettimeofday+26/208] do_gettimeofday+0x1a/0xd0 >>kernel: [sys_gettimeofday+26/144] sys_gettimeofday+0x1a/0x90 >>kernel: [syscall_call+7/11] syscall_call+0x7/0xb > > > > What ever the netfilter chain is, it is trimming or altering the packet > without clearing or altering the hardware checksum. It is not a driver > problem, we saw these in VLAN's and ebtables already. The call chain looks pretty messed up, but the point where an invalid HW checksum is detected is in TCP connection tracking, which is basically the first thing netfilter does, unless you use the raw table. There are no packet modifications done by conntrack, so I doubt that netfilter is the culprit here. Of course we had some big checksumming cleanups, so there is a possibilty of bugs there, but I did test them with sky2 and HW checksumming, so I don't think thats the case. Daniel, is there an easy way to reproduce the checksum failure?