From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S267323AbUHSTta (ORCPT ); Thu, 19 Aug 2004 15:49:30 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S267341AbUHSTsf (ORCPT ); Thu, 19 Aug 2004 15:48:35 -0400 Received: from parcelfarce.linux.theplanet.co.uk ([195.92.249.252]:30930 "EHLO www.linux.org.uk") by vger.kernel.org with ESMTP id S267323AbUHSTr3 (ORCPT ); Thu, 19 Aug 2004 15:47:29 -0400 Date: Thu, 19 Aug 2004 15:36:43 -0300 From: Marcelo Tosatti To: Gene Heskett Cc: linux-kernel@vger.kernel.org, Nick Piggin , viro@parcelfarce.linux.theplanet.co.uk, Linus Torvalds , Andrew Morton Subject: Re: Possible dcache BUG Message-ID: <20040819183643.GA5278@logos.cnet> References: <200408170126.40816.gene.heskett@verizon.net> <4121F2AC.7000907@yahoo.com.au> <200408190541.14131.gene.heskett@verizon.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200408190541.14131.gene.heskett@verizon.net> User-Agent: Mutt/1.5.5.1i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Gene, That is: /* * The buffer's backing address_space's private_lock must be held */ static inline void __remove_assoc_queue(struct buffer_head *bh) { BUG_ON(bh->b_assoc_buffers.next == NULL); <---------- BUG_ON(bh->b_assoc_buffers.prev == NULL); list_del_init(&bh->b_assoc_buffers); } Viro, Linus, Andrew, dont you have any idea what could cause such mapping->b_assoc_mapping corruption? I can't see how that could be caused by flaky hardware. Maybe we should include those BUGs into the official kernel, or -mm's tree? On Thu, Aug 19, 2004 at 05:41:13AM -0400, Gene Heskett wrote: > On Tuesday 17 August 2004 07:57, Nick Piggin wrote: > >Gene Heskett wrote: > >> On Tuesday 17 August 2004 00:58, Nick Piggin wrote: > >>>Gene Heskett wrote: > >>>>Reboot time I guess :((( > >>> > >>>All your low memory has been used by dentry and inode caches. This > >>>isn't very > >>>interesting because this would be no doubt caused by something > >>>oopsing while holding the shrinker semaphore as Andrew pointed > >>> out. > >>> > >>>What is interesting is that first Oops message (I wonder if you > >>>don't have bad hardware though, I don't think anyone else is > >>> seeing it). > >> > >> What 'first Oops message'? One I posted before? > > > >Well, the first Oops that your running kernel raises. Usually you > >don't bother about subsequent oopses and misbehaviour because the > >first one can cause the system to go into a funny state - this is > >a prime example. > > > >> That comment caused me to go back in the log to well above where I > >> had been channel surfing with tvtime, and I did find an Oops: > >> > >> Aug 16 21:15:46 coyote kernel: Unable to handle kernel NULL > >> pointer dereference at virtual address 00000000 Aug 16 21:15:46 > >> coyote kernel: printing eip: > >> Aug 16 21:15:46 coyote kernel: c015c8db > >> Aug 16 21:15:46 coyote kernel: *pde = 00000000 > >> Aug 16 21:15:46 coyote kernel: Oops: 0002 [#1] > >> Aug 16 21:15:46 coyote kernel: Modules linked in: tuner tvaudio > >> bttv video_buf btcx_risc eeprom snd_seq_oss snd_seq _midi_event > >> snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0 > >> snd_ac97_codec snd_pcm snd_timer snd_page_allo c snd_mpu401_uart > >> snd_rawmidi snd_seq_device snd forcedeth sg Aug 16 21:15:46 coyote > >> kernel: CPU: 0 > >> Aug 16 21:15:46 coyote kernel: EIP: 0060:[] Not > >> tainted Aug 16 21:15:46 coyote kernel: EFLAGS: 00210206 > >> (2.6.8-rc4) Aug 16 21:15:46 coyote kernel: EIP is at > >> prune_icache+0x6b/0x1b0 Aug 16 21:15:46 coyote kernel: eax: > >> 00000000 ebx: dffe0fd0 ecx: d3eb8b80 edx: c0341660 Aug 16 > >> 21:15:46 coyote kernel: esi: dffe0fc8 edi: 0000005a ebp: > >> d3eb8b94 esp: d3eb8b74 Aug 16 21:15:46 coyote kernel: ds: 007b > >> es: 007b ss: 0068 Aug 16 21:15:46 coyote kernel: Process yum > >> (pid: 30892, threadinfo=d3eb8000 task=cf6bf7b0) Aug 16 21:15:46 > >> coyote kernel: Stack: dffe0448 00000000 00000059 dffe0450 df58d0d0 > >> 00000080 00000000 d3eb8000 Aug 16 21:15:46 coyote kernel: > >> d3eb8ba0 c015ca5f 00000080 d3eb8bd4 c0135b14 00000080 000000d2 > >> 0108bf00 Aug 16 21:15:46 coyote kernel: 00000000 00021087 > >> 00000080 00000000 f7ffea20 0000000a d3eb8c50 00000000 Aug 16 > >> 21:15:46 coyote kernel: Call Trace: > >> Aug 16 21:15:46 coyote kernel: [] show_stack+0x7f/0xa0 > >> Aug 16 21:15:46 coyote kernel: [] > >> show_registers+0x158/0x1b0 Aug 16 21:15:46 coyote kernel: > >> [] die+0x66/0xd0 Aug 16 21:15:46 coyote kernel: > >> [] do_page_fault+0x28e/0x548 Aug 16 21:15:46 coyote > >> kernel: [] error_code+0x2d/0x38 Aug 16 21:15:46 coyote > >> kernel: [] shrink_icache_memory+0x3f/0x50 Aug 16 > >> 21:15:46 coyote kernel: [] shrink_slab+0x134/0x170 Aug > >> 16 21:15:46 coyote kernel: [] > >> try_to_free_pages+0xa4/0x160 Aug 16 21:15:46 coyote kernel: > >> [] __alloc_pages+0x1b3/0x320 Aug 16 21:15:46 coyote > >> kernel: [] do_anonymous_page+0x5f/0x180 Aug 16 21:15:46 > >> coyote kernel: [] do_no_page+0x61/0x310 Aug 16 21:15:46 > >> coyote kernel: [] handle_mm_fault+0xd7/0x160 Aug 16 > >> 21:15:46 coyote kernel: [] do_page_fault+0x150/0x548 > >> Aug 16 21:15:46 coyote kernel: [] error_code+0x2d/0x38 > >> Aug 16 21:15:46 coyote kernel: [] > >> do_generic_mapping_read+0x129/0x430 Aug 16 21:15:46 coyote kernel: > >> [] __generic_file_aio_read+0x1b6/0x1f0 Aug 16 21:15:46 > >> coyote kernel: [] generic_file_aio_read+0x52/0x70 Aug > >> 16 21:15:46 coyote kernel: [] do_sync_read+0x78/0xa0 > >> Aug 16 21:15:46 coyote kernel: [] vfs_read+0xca/0x140 > >> Aug 16 21:15:46 coyote kernel: [] sys_read+0x4b/0x80 > >> Aug 16 21:15:46 coyote kernel: [] > >> sysenter_past_esp+0x52/0x71 Aug 16 21:15:46 coyote kernel: Code: > >> 89 10 a1 60 16 34 c0 89 58 04 89 03 c7 43 04 60 16 34 c0 89 > >> > >> yum did a segfault about that time. yum is nice code, when > >> it fscking works, which is maybe half the time on 2 different > >> FC2 machines here now. > > > >Although an Oops is always the kernel's (or bad hardware's) fault. > >So in this case you can let yum off the hook :) > > > >> So we're back to the dentry_cache thing... Duh, NO!, this is in > >> prune_icache, not prune_dcache, presumably slightly different. > > > >Yeah, both are going to cause cache shrinking to stop working. > > > >> As far as bad hardware is concerned, warranty time is running out. > >> I need something plausible to take back to tcwo as a good reason > >> for requesting a 'blanket rma' on the whole thing, would they > >> please send me another. > > > >Not too sure really. At this stage keep trying patches that you get > >sent :P > > I just had another but this ones a bit different: > > Aug 19 04:22:11 coyote kernel: ------------[ cut here ]------------ > Aug 19 04:22:11 coyote kernel: kernel BUG at fs/buffer.c:805! > Aug 19 04:22:11 coyote kernel: invalid operand: 0000 [#1] > Aug 19 04:22:11 coyote kernel: Modules linked in: eeprom snd_seq_oss > snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x > snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc > snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg > Aug 19 04:22:11 coyote kernel: CPU: 0 > Aug 19 04:22:11 coyote kernel: EIP: 0060:[] Not > tainted > Aug 19 04:22:11 coyote kernel: EFLAGS: 00010246 (2.6.8-rc4) > Aug 19 04:22:11 coyote kernel: EIP is at > remove_inode_buffers+0x77/0x90 > Aug 19 04:22:11 coyote kernel: eax: 00000000 ebx: d7de519c ecx: > d7deb99c edx: d7deb974 > Aug 19 04:22:11 coyote kernel: esi: d7de50c8 edi: 00000001 ebp: > c198bedc esp: c198becc > Aug 19 04:22:11 coyote kernel: ds: 007b es: 007b ss: 0068 > Aug 19 04:22:11 coyote kernel: Process kswapd0 (pid: 66, > threadinfo=c198b000 task=c1978050) > Aug 19 04:22:11 coyote kernel: Stack: d7de50c8 d7de50d0 d7de50c8 > 00000057 c198bf04 c015c985 d7de50c8 00000000 > Aug 19 04:22:11 coyote kernel: 00000057 d7de5290 e50ac0d0 > 00000080 00000000 c198b000 c198bf10 c015ca5f > Aug 19 04:22:11 coyote kernel: 00000080 c198bf44 c0135b14 > 00000080 000000d0 01779600 00000000 0002d1f3 > Aug 19 04:22:11 coyote kernel: Call Trace: > Aug 19 04:22:11 coyote kernel: [] show_stack+0x7f/0xa0 > Aug 19 04:22:11 coyote kernel: [] > show_registers+0x158/0x1b0 > Aug 19 04:22:11 coyote kernel: [] die+0x66/0xd0 > Aug 19 04:22:12 coyote kernel: [] do_invalid_op+0xb3/0xc0 > Aug 19 04:22:12 coyote kernel: [] error_code+0x2d/0x38 > Aug 19 04:22:12 coyote kernel: [] prune_icache+0x115/0x1b0 > Aug 19 04:22:12 coyote kernel: [] > shrink_icache_memory+0x3f/0x50 > Aug 19 04:22:12 coyote kernel: [] shrink_slab+0x134/0x170 > Aug 19 04:22:12 coyote kernel: [] balance_pgdat+0x1a9/0x1f0 > Aug 19 04:22:12 coyote kernel: [] kswapd+0xbf/0xd0 > Aug 19 04:22:12 coyote kernel: [] > kernel_thread_helper+0x5/0x14 > Aug 19 04:22:12 coyote kernel: Code: 0f 0b 25 03 e5 0b 30 c0 eb c4 31 > ff eb de 0f 0b 36 04 e5 0b > > The system is still up but its 100 megs into swap so I'm going to > reboot without changing anything. Is this one traceable?