From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262634AbUBYGWS (ORCPT ); Wed, 25 Feb 2004 01:22:18 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262633AbUBYGWS (ORCPT ); Wed, 25 Feb 2004 01:22:18 -0500 Received: from dbl.q-ag.de ([213.172.117.3]:30891 "EHLO dbl.q-ag.de") by vger.kernel.org with ESMTP id S262634AbUBYGWH (ORCPT ); Wed, 25 Feb 2004 01:22:07 -0500 Message-ID: <403C3F04.20601@colorfullife.com> Date: Wed, 25 Feb 2004 07:21:56 +0100 From: Manfred Spraul User-Agent: Mozilla/5.0 (X11; U; Linux i686; fr-FR; rv:1.4.1) Gecko/20031114 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Darren Williams CC: LKML , akpm@osdl.org Subject: Re: [BUG] 2.6.3 Slab corruption: errors are triggered when memory exceeds 2.5GB (correction) References: <403AF155.1080305@colorfullife.com> <20040223225659.4c58c880.akpm@osdl.org> <403B8C78.2020606@colorfullife.com> <20040225005804.GE18070@cse.unsw.EDU.AU> In-Reply-To: <20040225005804.GE18070@cse.unsw.EDU.AU> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Darren Williams wrote: >Hi Manfred > >I have updated to the latest bk and new output can be found at: >http://quasar.cse.unsw.edu.au/~dsw/public-files/lemon-debug/ >kern-log-bk > >Also I am quite confident that it is not a hardware problem. > >I took a look at alloc_skb(..) and there is a reference to >an atomic_t token with this being the most suspect > >150> atomic_set(&(skb_shinfo(skb)->dataref), 1); > > I don't think so: The allocation that generates the error is skb->head: The cache name is "size-2048", thus the allocation is a kmalloc(1000-2000, probably 1536 for one eth frame). The skb itself is allocated from the skbuff_head_cache. I don't see a pattern in the virtual addresses: start=e000000101ee09a0, len=2048 start=e000000101ee09a0, len=2048 start=e000000101ee11b8, len=2048 start=e000000101ee19d0, len=2048 start=e000000101ee3218, len=2048 start=e00000017eed1b90, len=2048 start=e00000017eed23a8, len=2048 start=e00000017eed2bc0, len=2048 start=e00000017eed4308, len=2048 start=e00000017eed5338, len=2048 start=e00000017eed5338, len=2048 start=e00000017eed5b50, len=2048 start=e00000017eed5b50, len=2048 start=e00000017eed6b80, len=2048 start=e00000017eed82c8, len=2048 start=e00000017eedc288, len=2048 start=e00000017eedcaa0, len=2048 start=e00000017eeddad0, len=2048 start=e00000017eede2e8, len=2048 start=e00000017eedeb00, len=2048 start=e00000017ef60a60, len=2048 start=e00000017ef61a90, len=2048 start=e00000017ef622a8, len=2048 start=e00000017ef62ac0, len=2048 start=e00000017ef632d8, len=2048 start=e00000017ef65a50, len=2048 start=e00000017ef65a50, len=2048 start=e00000017ef65a50, len=2048 start=e00000017ef66a80, len=2048 That virtually rules out a bad memory chip. But the corrupted byte is always at offset 0x620 into the allocation: Slab corruption: start=e00000017ef65a50, len=2048 620: 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b -- Slab corruption: start=e000000101ee19d0, len=2048 620: 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b -- Slab corruption: start=e000000101ee3218, len=2048 620: 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b -- Slab corruption: start=e00000017ef66a80, len=2048 620: 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b -- Slab corruption: start=e000000101ee11b8, len=2048 620: 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 0x620 (1568) is behind the end of the actual eth frame. Who could modify that? Darren, which nic do you use? Could you try what happens if you reduce the MTU? -- Manfred