From mboxrd@z Thu Jan 1 00:00:00 1970 From: steve Subject: Re: 2.6.12-rc4-mm1 Date: Fri, 13 May 2005 20:07:00 -0500 Message-ID: <42854F34.2070806@friservices.com> References: <20050512033100.017958f6.akpm@osdl.org> <4284BF66.1050704@friservices.com> <1116008021.23972.28.camel@hades.cambridge.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Andrew Morton , linux-fsdevel@vger.kernel.org, dedekind@infradead.org Return-path: Received: from smtp-server.carlislefsp.com ([12.28.84.26]:31953 "EHLO imail.carlislefsp.com") by vger.kernel.org with ESMTP id S262654AbVENBHG (ORCPT ); Fri, 13 May 2005 21:07:06 -0400 To: David Woodhouse In-Reply-To: <1116008021.23972.28.camel@hades.cambridge.redhat.com> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org David Woodhouse wrote: >On Fri, 2005-05-13 at 09:53 -0500, steve would have written, if his mail >client hadn't been broken: > > >>a bug that appeared after running for about 2 hours: >> >>May 13 09:32:34 localhost kernel: BUG: atomic counter underflow at: >>May 13 09:32:34 localhost kernel: [reiserfs_clear_inode+129/176] reiserfs_clear_inode+0x81/0xb0 >>May 13 09:32:34 localhost kernel: [clear_inode+228/304] clear_inode+0xe4/0x130 >>May 13 09:32:34 localhost kernel: [dispose_list+112/304] dispose_list+0x70/0x130 >>May 13 09:32:34 localhost kernel: [prune_icache+191/432] prune_icache+0xbf/0x1b0 >>May 13 09:32:34 localhost kernel: [shrink_icache_memory+20/64] shrink_icache_memory+0x14/0x40 >>May 13 09:32:34 localhost kernel: [shrink_slab+345/416] shrink_slab+0x159/0x1a0 >>May 13 09:32:34 localhost kernel: [balance_pgdat+695/944] balance_pgdat+0x2b7/0x3b0 >>May 13 09:32:34 localhost kernel: [kswapd+210/240] kswapd+0xd2/0xf0 >>May 13 09:32:34 localhost kernel: [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50 >>May 13 09:32:34 localhost kernel: [ret_from_fork+6/20] ret_from_fork+0x6/0x14 >>May 13 09:32:34 localhost kernel: [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50 >>May 13 09:32:34 localhost kernel: [kswapd+0/240] kswapd+0x0/0xf0 >>May 13 09:32:34 localhost kernel: [kernel_thread_helper+5/24] kernel_thread_helper+0x5/0x18 >> >> > >Hmmm. We're hitting that bug when posix_acl_release() decrements the >refcount on one of the inode's ACLs and it goes negative. > >First glance at this had me suspecting that we were somehow calling >clear_inode() twice... but since we clear the pointer to the ACL after >calling posix_acl_release(), that seems unlikely -- unless you managed >to get two CPUs in reiserfs_clear_inode() simultaneously for the same >inode. Is this SMP? Is preempt enabled? > >Can you reproduce it? If so, does it go away if you revert one or both >of these: > >ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc4/2.6.12-rc4-mm1/broken-out/vfs-bugfix-two-read_inode-calles-without.patch >ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc4/2.6.12-rc4-mm1/broken-out/__wait_on_freeing_inode-fix.patch > > > okay, reproduced after a couple hours of running (nothing intensive), then doing a grep blah -r /* here's the new output: May 13 19:54:16 localhost kernel: BUG: atomic counter underflow at: May 13 19:54:16 localhost kernel: [reiserfs_clear_inode+129/176] reiserfs_clear_inode+0x81/0xb0 May 13 19:54:16 localhost kernel: [clear_inode+228/304] clear_inode+0xe4/0x130 May 13 19:54:16 localhost kernel: [dispose_list+112/304] dispose_list+0x70/0x130 May 13 19:54:16 localhost kernel: [prune_icache+191/432] prune_icache+0xbf/0x1b0 May 13 19:54:16 localhost kernel: [shrink_icache_memory+20/64] shrink_icache_memory+0x14/0x40 May 13 19:54:16 localhost kernel: [shrink_slab+345/416] shrink_slab+0x159/0x1a0 May 13 19:54:16 localhost kernel: [try_to_free_pages+226/416] try_to_free_pages+0xe2/0x1a0 May 13 19:54:16 localhost kernel: [__alloc_pages+383/960] __alloc_pages+0x17f/0x3c0 May 13 19:54:16 localhost kernel: [__do_page_cache_readahead+285/352] __do_page_cache_readahead+0x11d/0x160 May 13 19:54:16 localhost kernel: [blockable_page_cache_readahead+81/208] blockable_page_cache_readahead+0x51/0xd0 May 13 19:54:16 localhost kernel: [make_ahead_window+112/176] make_ahead_window+0x70/0xb0 May 13 19:54:16 localhost kernel: [page_cache_readahead+169/384] page_cache_readahead+0xa9/0x180 May 13 19:54:16 localhost kernel: [file_read_actor+198/224] file_read_actor+0xc6/0xe0 May 13 19:54:16 localhost kernel: [do_generic_mapping_read+1446/1472] do_generic_mapping_read+0x5a6/0x5c0 May 13 19:54:16 localhost kernel: [pg0+542324240/1068651520] ieee80211_recv_mgmt+0xed0/0x1d90 [wlan] May 13 19:54:16 localhost kernel: [file_read_actor+0/224] file_read_actor+0x0/0xe0 May 13 19:54:16 localhost kernel: [__generic_file_aio_read+484/544] __generic_file_aio_read+0x1e4/0x220 May 13 19:54:16 localhost kernel: [file_read_actor+0/224] file_read_actor+0x0/0xe0 May 13 19:54:16 localhost kernel: [generic_file_read+149/176] generic_file_read+0x95/0xb0 May 13 19:54:16 localhost kernel: [try_to_wake_up+166/192] try_to_wake_up+0xa6/0xc0 May 13 19:54:16 localhost kernel: [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50 May 13 19:54:16 localhost kernel: [schedule+791/1616] schedule+0x317/0x650 May 13 19:54:16 localhost kernel: [vfs_read+156/336] vfs_read+0x9c/0x150 May 13 19:54:16 localhost kernel: [sys_read+71/128] sys_read+0x47/0x80 May 13 19:54:16 localhost kernel: [syscall_call+7/11] syscall_call+0x7/0xb i'll recompile without those two patches, and try it again. Steve