Re: 2.6.12-rc4-mm1

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: 2.6.12-rc4-mm1
       [not found] ` <4284BF66.1050704@friservices.com>
@ 2005-05-13 18:13   ` David Woodhouse
  2005-05-14  1:07     ` 2.6.12-rc4-mm1 steve
       [not found]     ` <428508BB.8030604@friservices.com>
  0 siblings, 2 replies; 10+ messages in thread
From: David Woodhouse @ 2005-05-13 18:13 UTC (permalink / raw)
  To: steve; +Cc: Andrew Morton, linux-fsdevel, dedekind

On Fri, 2005-05-13 at 09:53 -0500, steve would have written, if his mail
client hadn't been broken:
> a bug that appeared after running for about 2 hours:
> 
> May 13 09:32:34 localhost kernel: BUG: atomic counter underflow at:
> May 13 09:32:34 localhost kernel:  [reiserfs_clear_inode+129/176] reiserfs_clear_inode+0x81/0xb0
> May 13 09:32:34 localhost kernel:  [clear_inode+228/304] clear_inode+0xe4/0x130
> May 13 09:32:34 localhost kernel:  [dispose_list+112/304] dispose_list+0x70/0x130
> May 13 09:32:34 localhost kernel:  [prune_icache+191/432] prune_icache+0xbf/0x1b0
> May 13 09:32:34 localhost kernel:  [shrink_icache_memory+20/64] shrink_icache_memory+0x14/0x40
> May 13 09:32:34 localhost kernel:  [shrink_slab+345/416] shrink_slab+0x159/0x1a0
> May 13 09:32:34 localhost kernel:  [balance_pgdat+695/944] balance_pgdat+0x2b7/0x3b0
> May 13 09:32:34 localhost kernel:  [kswapd+210/240] kswapd+0xd2/0xf0 
> May 13 09:32:34 localhost kernel:  [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
> May 13 09:32:34 localhost kernel:  [ret_from_fork+6/20] ret_from_fork+0x6/0x14
> May 13 09:32:34 localhost kernel:  [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
> May 13 09:32:34 localhost kernel:  [kswapd+0/240] kswapd+0x0/0xf0
> May 13 09:32:34 localhost kernel:  [kernel_thread_helper+5/24] kernel_thread_helper+0x5/0x18

Hmmm. We're hitting that bug when posix_acl_release() decrements the
refcount on one of the inode's ACLs and it goes negative.

First glance at this had me suspecting that we were somehow calling
clear_inode() twice... but since we clear the pointer to the ACL after
calling posix_acl_release(), that seems unlikely -- unless you managed
to get two CPUs in reiserfs_clear_inode() simultaneously for the same
inode. Is this SMP? Is preempt enabled?

Can you reproduce it? If so, does it go away if you revert one or both
of these:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc4/2.6.12-rc4-mm1/broken-out/vfs-bugfix-two-read_inode-calles-without.patch
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc4/2.6.12-rc4-mm1/broken-out/__wait_on_freeing_inode-fix.patch

-- 
dwmw2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.12-rc4-mm1
  2005-05-13 18:13   ` 2.6.12-rc4-mm1 David Woodhouse
@ 2005-05-14  1:07     ` steve
  2005-05-18 10:06       ` 2.6.12-rc4-mm1 Artem B. Bityuckiy
       [not found]     ` <428508BB.8030604@friservices.com>
  1 sibling, 1 reply; 10+ messages in thread
From: steve @ 2005-05-14  1:07 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Andrew Morton, linux-fsdevel, dedekind

David Woodhouse wrote:

>On Fri, 2005-05-13 at 09:53 -0500, steve would have written, if his mail
>client hadn't been broken:
>  
>
>>a bug that appeared after running for about 2 hours:
>>
>>May 13 09:32:34 localhost kernel: BUG: atomic counter underflow at:
>>May 13 09:32:34 localhost kernel:  [reiserfs_clear_inode+129/176] reiserfs_clear_inode+0x81/0xb0
>>May 13 09:32:34 localhost kernel:  [clear_inode+228/304] clear_inode+0xe4/0x130
>>May 13 09:32:34 localhost kernel:  [dispose_list+112/304] dispose_list+0x70/0x130
>>May 13 09:32:34 localhost kernel:  [prune_icache+191/432] prune_icache+0xbf/0x1b0
>>May 13 09:32:34 localhost kernel:  [shrink_icache_memory+20/64] shrink_icache_memory+0x14/0x40
>>May 13 09:32:34 localhost kernel:  [shrink_slab+345/416] shrink_slab+0x159/0x1a0
>>May 13 09:32:34 localhost kernel:  [balance_pgdat+695/944] balance_pgdat+0x2b7/0x3b0
>>May 13 09:32:34 localhost kernel:  [kswapd+210/240] kswapd+0xd2/0xf0 
>>May 13 09:32:34 localhost kernel:  [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
>>May 13 09:32:34 localhost kernel:  [ret_from_fork+6/20] ret_from_fork+0x6/0x14
>>May 13 09:32:34 localhost kernel:  [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
>>May 13 09:32:34 localhost kernel:  [kswapd+0/240] kswapd+0x0/0xf0
>>May 13 09:32:34 localhost kernel:  [kernel_thread_helper+5/24] kernel_thread_helper+0x5/0x18
>>    
>>
>
>Hmmm. We're hitting that bug when posix_acl_release() decrements the
>refcount on one of the inode's ACLs and it goes negative.
>
>First glance at this had me suspecting that we were somehow calling
>clear_inode() twice... but since we clear the pointer to the ACL after
>calling posix_acl_release(), that seems unlikely -- unless you managed
>to get two CPUs in reiserfs_clear_inode() simultaneously for the same
>inode. Is this SMP? Is preempt enabled?
>
>Can you reproduce it? If so, does it go away if you revert one or both
>of these:
>
>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc4/2.6.12-rc4-mm1/broken-out/vfs-bugfix-two-read_inode-calles-without.patch
>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc4/2.6.12-rc4-mm1/broken-out/__wait_on_freeing_inode-fix.patch
>
>  
>
okay, reproduced after a couple hours of running (nothing intensive), 
then doing a grep blah -r /*

here's the new output:
May 13 19:54:16 localhost kernel: BUG: atomic counter underflow at:
May 13 19:54:16 localhost kernel:  [reiserfs_clear_inode+129/176] 
reiserfs_clear_inode+0x81/0xb0
May 13 19:54:16 localhost kernel:  [clear_inode+228/304] 
clear_inode+0xe4/0x130
May 13 19:54:16 localhost kernel:  [dispose_list+112/304] 
dispose_list+0x70/0x130
May 13 19:54:16 localhost kernel:  [prune_icache+191/432] 
prune_icache+0xbf/0x1b0
May 13 19:54:16 localhost kernel:  [shrink_icache_memory+20/64] 
shrink_icache_memory+0x14/0x40
May 13 19:54:16 localhost kernel:  [shrink_slab+345/416] 
shrink_slab+0x159/0x1a0
May 13 19:54:16 localhost kernel:  [try_to_free_pages+226/416] 
try_to_free_pages+0xe2/0x1a0
May 13 19:54:16 localhost kernel:  [__alloc_pages+383/960] 
__alloc_pages+0x17f/0x3c0
May 13 19:54:16 localhost kernel:  [__do_page_cache_readahead+285/352] 
__do_page_cache_readahead+0x11d/0x160
May 13 19:54:16 localhost kernel:  
[blockable_page_cache_readahead+81/208] 
blockable_page_cache_readahead+0x51/0xd0
May 13 19:54:16 localhost kernel:  [make_ahead_window+112/176] 
make_ahead_window+0x70/0xb0
May 13 19:54:16 localhost kernel:  [page_cache_readahead+169/384] 
page_cache_readahead+0xa9/0x180
May 13 19:54:16 localhost kernel:  [file_read_actor+198/224] 
file_read_actor+0xc6/0xe0
May 13 19:54:16 localhost kernel:  [do_generic_mapping_read+1446/1472] 
do_generic_mapping_read+0x5a6/0x5c0
May 13 19:54:16 localhost kernel:  [pg0+542324240/1068651520] 
ieee80211_recv_mgmt+0xed0/0x1d90 [wlan]
May 13 19:54:16 localhost kernel:  [file_read_actor+0/224] 
file_read_actor+0x0/0xe0
May 13 19:54:16 localhost kernel:  [__generic_file_aio_read+484/544] 
__generic_file_aio_read+0x1e4/0x220
May 13 19:54:16 localhost kernel:  [file_read_actor+0/224] 
file_read_actor+0x0/0xe0
May 13 19:54:16 localhost kernel:  [generic_file_read+149/176] 
generic_file_read+0x95/0xb0
May 13 19:54:16 localhost kernel:  [try_to_wake_up+166/192] 
try_to_wake_up+0xa6/0xc0
May 13 19:54:16 localhost kernel:  [autoremove_wake_function+0/80] 
autoremove_wake_function+0x0/0x50
May 13 19:54:16 localhost kernel:  [schedule+791/1616] schedule+0x317/0x650
May 13 19:54:16 localhost kernel:  [vfs_read+156/336] vfs_read+0x9c/0x150
May 13 19:54:16 localhost kernel:  [sys_read+71/128] sys_read+0x47/0x80
May 13 19:54:16 localhost kernel:  [syscall_call+7/11] syscall_call+0x7/0xb


i'll recompile without those two patches, and try it again.

Steve




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.12-rc4-mm1
  2005-05-14  1:07     ` 2.6.12-rc4-mm1 steve
@ 2005-05-18 10:06       ` Artem B. Bityuckiy
  2005-05-18 16:16         ` 2.6.12-rc4-mm1 Steve Roemen
  0 siblings, 1 reply; 10+ messages in thread
From: Artem B. Bityuckiy @ 2005-05-18 10:06 UTC (permalink / raw)
  To: steve; +Cc: David Woodhouse, Andrew Morton, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1040 bytes --]

Steve,

> okay, reproduced after a couple hours of running (nothing intensive), 
> then doing a grep blah -r /*
I can't reproduce your problem using 2.6.12-rc4 + the 2 patches.
I created 50GB Reiserfs partition, copied a lot of data there (built
linux sources in many exemplars) and issued 'grep -r blah *'. I also
tried it with parallel writing to the partition - no help.

I wonder, could you please try 2.6.12-rc4 + the 2 patches and see if the
warning still there ?

The above referred patches are (and they are also attached):
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-
rc4/2.6.12-rc4-mm1/broken-out/vfs-bugfix-two-read_inode-calles-
without.patch 
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc4/2.6.12-rc4-mm1/broken-out/__wait_on_freeing_inode-fix.patch

P.S. I wanted to try 2.6.12-rc4-mm2, but it OOPSes during the kernel
load in my system. Then I switched to 2.6.12-rc4 in order not to fight
with -mm2's problems.

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.

[-- Attachment #2: vfs-bugfix-two-read_inode-calles-without.patch --]
[-- Type: text/x-patch, Size: 1184 bytes --]

diff -puN fs/inode.c~vfs-bugfix-two-read_inode-calles-without fs/inode.c
--- 25/fs/inode.c~vfs-bugfix-two-read_inode-calles-without	Fri May  6 15:12:47 2005
+++ 25-akpm/fs/inode.c	Fri May  6 15:12:47 2005
@@ -282,6 +282,13 @@ static void dispose_list(struct list_hea
 		if (inode->i_data.nrpages)
 			truncate_inode_pages(&inode->i_data, 0);
 		clear_inode(inode);
+
+		spin_lock(&inode_lock);
+		hlist_del_init(&inode->i_hash);
+		list_del_init(&inode->i_sb_list);
+		spin_unlock(&inode_lock);
+
+		wake_up_inode(inode);
 		destroy_inode(inode);
 		nr_disposed++;
 	}
@@ -317,8 +324,6 @@ static int invalidate_list(struct list_h
 		inode = list_entry(tmp, struct inode, i_sb_list);
 		invalidate_inode_buffers(inode);
 		if (!atomic_read(&inode->i_count)) {
-			hlist_del_init(&inode->i_hash);
-			list_del(&inode->i_sb_list);
 			list_move(&inode->i_list, dispose);
 			inode->i_state |= I_FREEING;
 			count++;
@@ -439,8 +444,6 @@ static void prune_icache(int nr_to_scan)
 			if (!can_unuse(inode))
 				continue;
 		}
-		hlist_del_init(&inode->i_hash);
-		list_del_init(&inode->i_sb_list);
 		list_move(&inode->i_list, &freeable);
 		inode->i_state |= I_FREEING;
 		nr_pruned++;


[-- Attachment #3: __wait_on_freeing_inode-fix.patch --]
[-- Type: text/x-patch, Size: 1655 bytes --]

diff -puN fs/inode.c~__wait_on_freeing_inode-fix fs/inode.c
--- 25/fs/inode.c~__wait_on_freeing_inode-fix	2005-05-09 20:09:33.000000000 -0700
+++ 25-akpm/fs/inode.c	2005-05-09 20:09:33.000000000 -0700
@@ -1241,29 +1241,21 @@ int inode_wait(void *word)
 }
 
 /*
- * If we try to find an inode in the inode hash while it is being deleted, we
- * have to wait until the filesystem completes its deletion before reporting
- * that it isn't found.  This is because iget will immediately call
- * ->read_inode, and we want to be sure that evidence of the deletion is found
- * by ->read_inode.
+ * If we try to find an inode in the inode hash while it is being
+ * deleted, we have to wait until the filesystem completes its
+ * deletion before reporting that it isn't found.  This function waits
+ * until the deletion _might_ have completed.  Callers are responsible
+ * to recheck inode state.
+ *
+ * It doesn't matter if I_LOCK is not set initially, a call to
+ * wake_up_inode() after removing from the hash list will DTRT.
+ *
  * This is called with inode_lock held.
  */
 static void __wait_on_freeing_inode(struct inode *inode)
 {
 	wait_queue_head_t *wq;
 	DEFINE_WAIT_BIT(wait, &inode->i_state, __I_LOCK);
-
-	/*
-	 * I_FREEING and I_CLEAR are cleared in process context under
-	 * inode_lock, so we have to give the tasks who would clear them
-	 * a chance to run and acquire inode_lock.
-	 */
-	if (!(inode->i_state & I_LOCK)) {
-		spin_unlock(&inode_lock);
-		yield();
-		spin_lock(&inode_lock);
-		return;
-	}
 	wq = bit_waitqueue(&inode->i_state, __I_LOCK);
 	prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
 	spin_unlock(&inode_lock);


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.12-rc4-mm1
  2005-05-18 10:06       ` 2.6.12-rc4-mm1 Artem B. Bityuckiy
@ 2005-05-18 16:16         ` Steve Roemen
  2005-05-19 16:45           ` 2.6.12-rc4-mm1 David Woodhouse
  0 siblings, 1 reply; 10+ messages in thread
From: Steve Roemen @ 2005-05-18 16:16 UTC (permalink / raw)
  To: dedekind; +Cc: David Woodhouse, Andrew Morton, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 4331 bytes --]

on 05/18/05 05:06 Artem B. Bityuckiy wrote the following:

>Steve,
>
>  
>
>>okay, reproduced after a couple hours of running (nothing intensive), 
>>then doing a grep blah -r /*
>>    
>>
>I can't reproduce your problem using 2.6.12-rc4 + the 2 patches.
>I created 50GB Reiserfs partition, copied a lot of data there (built
>linux sources in many exemplars) and issued 'grep -r blah *'. I also
>tried it with parallel writing to the partition - no help.
>
>I wonder, could you please try 2.6.12-rc4 + the 2 patches and see if the
>warning still there ?
>
>The above referred patches are (and they are also attached):
>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-
>rc4/2.6.12-rc4-mm1/broken-out/vfs-bugfix-two-read_inode-calles-
>without.patch 
>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.12-rc4/2.6.12-rc4-mm1/broken-out/__wait_on_freeing_inode-fix.patch
>
>P.S. I wanted to try 2.6.12-rc4-mm2, but it OOPSes during the kernel
>load in my system. Then I switched to 2.6.12-rc4 in order not to fight
>with -mm2's problems.
>
>  
>
>------------------------------------------------------------------------
>
>diff -puN fs/inode.c~vfs-bugfix-two-read_inode-calles-without fs/inode.c
>--- 25/fs/inode.c~vfs-bugfix-two-read_inode-calles-without	Fri May  6 15:12:47 2005
>+++ 25-akpm/fs/inode.c	Fri May  6 15:12:47 2005
>@@ -282,6 +282,13 @@ static void dispose_list(struct list_hea
> 		if (inode->i_data.nrpages)
> 			truncate_inode_pages(&inode->i_data, 0);
> 		clear_inode(inode);
>+
>+		spin_lock(&inode_lock);
>+		hlist_del_init(&inode->i_hash);
>+		list_del_init(&inode->i_sb_list);
>+		spin_unlock(&inode_lock);
>+
>+		wake_up_inode(inode);
> 		destroy_inode(inode);
> 		nr_disposed++;
> 	}
>@@ -317,8 +324,6 @@ static int invalidate_list(struct list_h
> 		inode = list_entry(tmp, struct inode, i_sb_list);
> 		invalidate_inode_buffers(inode);
> 		if (!atomic_read(&inode->i_count)) {
>-			hlist_del_init(&inode->i_hash);
>-			list_del(&inode->i_sb_list);
> 			list_move(&inode->i_list, dispose);
> 			inode->i_state |= I_FREEING;
> 			count++;
>@@ -439,8 +444,6 @@ static void prune_icache(int nr_to_scan)
> 			if (!can_unuse(inode))
> 				continue;
> 		}
>-		hlist_del_init(&inode->i_hash);
>-		list_del_init(&inode->i_sb_list);
> 		list_move(&inode->i_list, &freeable);
> 		inode->i_state |= I_FREEING;
> 		nr_pruned++;
>
>  
>
>------------------------------------------------------------------------
>
>diff -puN fs/inode.c~__wait_on_freeing_inode-fix fs/inode.c
>--- 25/fs/inode.c~__wait_on_freeing_inode-fix	2005-05-09 20:09:33.000000000 -0700
>+++ 25-akpm/fs/inode.c	2005-05-09 20:09:33.000000000 -0700
>@@ -1241,29 +1241,21 @@ int inode_wait(void *word)
> }
> 
> /*
>- * If we try to find an inode in the inode hash while it is being deleted, we
>- * have to wait until the filesystem completes its deletion before reporting
>- * that it isn't found.  This is because iget will immediately call
>- * ->read_inode, and we want to be sure that evidence of the deletion is found
>- * by ->read_inode.
>+ * If we try to find an inode in the inode hash while it is being
>+ * deleted, we have to wait until the filesystem completes its
>+ * deletion before reporting that it isn't found.  This function waits
>+ * until the deletion _might_ have completed.  Callers are responsible
>+ * to recheck inode state.
>+ *
>+ * It doesn't matter if I_LOCK is not set initially, a call to
>+ * wake_up_inode() after removing from the hash list will DTRT.
>+ *
>  * This is called with inode_lock held.
>  */
> static void __wait_on_freeing_inode(struct inode *inode)
> {
> 	wait_queue_head_t *wq;
> 	DEFINE_WAIT_BIT(wait, &inode->i_state, __I_LOCK);
>-
>-	/*
>-	 * I_FREEING and I_CLEAR are cleared in process context under
>-	 * inode_lock, so we have to give the tasks who would clear them
>-	 * a chance to run and acquire inode_lock.
>-	 */
>-	if (!(inode->i_state & I_LOCK)) {
>-		spin_unlock(&inode_lock);
>-		yield();
>-		spin_lock(&inode_lock);
>-		return;
>-	}
> 	wq = bit_waitqueue(&inode->i_state, __I_LOCK);
> 	prepare_to_wait(wq, &wait.wait, TASK_UNINTERRUPTIBLE);
> 	spin_unlock(&inode_lock);
>
>  
>
okay,  recompiled with those two files back in, and after an hour of 
running, while doing a
tar -xjvf linux-2.6.11.tar.bz2, it kicks out that error (attached).



Steve

[-- Attachment #2: dmesg_output --]
[-- Type: text/plain, Size: 553 bytes --]

BUG: atomic counter underflow at:
 [<c01aadf1>] reiserfs_clear_inode+0x81/0xb0
 [<c0173484>] clear_inode+0xe4/0x130
 [<c0173540>] dispose_list+0x70/0x130
 [<c017385f>] prune_icache+0xbf/0x1b0
 [<c0173964>] shrink_icache_memory+0x14/0x40
 [<c01471b9>] shrink_slab+0x159/0x1a0
 [<c01486e7>] balance_pgdat+0x2b7/0x3b0
 [<c01488b2>] kswapd+0xd2/0xf0
 [<c0130430>] autoremove_wake_function+0x0/0x50
 [<c0102fd2>] ret_from_fork+0x6/0x14
 [<c0130430>] autoremove_wake_function+0x0/0x50
 [<c01487e0>] kswapd+0x0/0xf0
 [<c010136d>] kernel_thread_helper+0x5/0x18

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.12-rc4-mm1
  2005-05-18 16:16         ` 2.6.12-rc4-mm1 Steve Roemen
@ 2005-05-19 16:45           ` David Woodhouse
  2005-05-19 17:55             ` 2.6.12-rc4-mm1 Steve Roemen
  0 siblings, 1 reply; 10+ messages in thread
From: David Woodhouse @ 2005-05-19 16:45 UTC (permalink / raw)
  To: steve; +Cc: dedekind, Andrew Morton, linux-fsdevel, mason

On Wed, 2005-05-18 at 11:16 -0500, Steve Roemen wrote:
> okay,  recompiled with those two files back in, and after an hour of 
> running, while doing a tar -xjvf linux-2.6.11.tar.bz2, it kicks out
> that error (attached).

Thanks. Are you using ACLs? If not, I think there's a more fundamental
problem than a race with clear_inode() -- it's not that we're
decrementing the use count on an ACL twice; it's that you think you have
an ACL when there wasn't one. This could be a symptom of memory
corruption... which has already been reported in reiserfs in 2.6.12-rc4.

Do you have CONFIG_REISERFS_CHECK enabled? Do you have preempt enabled?

Could we trouble you to try again on 2.6.12-rc3 with those two patches,
please?

-- 
dwmw2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.12-rc4-mm1
  2005-05-19 16:45           ` 2.6.12-rc4-mm1 David Woodhouse
@ 2005-05-19 17:55             ` Steve Roemen
  2005-05-19 18:04               ` 2.6.12-rc4-mm1 David Woodhouse
  0 siblings, 1 reply; 10+ messages in thread
From: Steve Roemen @ 2005-05-19 17:55 UTC (permalink / raw)
  To: David Woodhouse; +Cc: dedekind, Andrew Morton, linux-fsdevel, mason

on 05/19/05 11:45 David Woodhouse wrote the following:

>On Wed, 2005-05-18 at 11:16 -0500, Steve Roemen wrote:
>  
>
>>okay,  recompiled with those two files back in, and after an hour of 
>>running, while doing a tar -xjvf linux-2.6.11.tar.bz2, it kicks out
>>that error (attached).
>>    
>>
>
>Thanks. Are you using ACLs? If not, I think there's a more fundamental
>problem than a race with clear_inode() -- it's not that we're
>decrementing the use count on an ACL twice; it's that you think you have
>an ACL when there wasn't one. This could be a symptom of memory
>corruption... which has already been reported in reiserfs in 2.6.12-rc4.
>
>Do you have CONFIG_REISERFS_CHECK enabled? Do you have preempt enabled?
>
>Could we trouble you to try again on 2.6.12-rc3 with those two patches,
>please?
>
>  
>
Compiling 2.6.12-rc3 + the 2 patches right now.  I'll let you know in a 
couple of hours if it still does it. 
Artem forwarded you my .config file. 

Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.12-rc4-mm1
  2005-05-19 17:55             ` 2.6.12-rc4-mm1 Steve Roemen
@ 2005-05-19 18:04               ` David Woodhouse
  2005-05-19 20:12                 ` 2.6.12-rc4-mm1 Steve Roemen
  0 siblings, 1 reply; 10+ messages in thread
From: David Woodhouse @ 2005-05-19 18:04 UTC (permalink / raw)
  To: steve; +Cc: dedekind, Andrew Morton, linux-fsdevel, mason

On Thu, 2005-05-19 at 12:55 -0500, Steve Roemen wrote:
> Compiling 2.6.12-rc3 + the 2 patches right now.  I'll let you know in a 
> couple of hours if it still does it. 
> Artem forwarded you my .config file. 

He did. That confirms you have ACLs enabled -- but are you actually
_using_ them though? If not, the ACL fields whose refcount is causing
this problem should never have been set in the first place.

Artem is putting together a patch which will put a magic value into the
struct posix_acl and hence double-check whether we're really freeing one
of them twice, or whether it's just that we're seeing memory corruption
and what's in REISERFS_I(inode)->i_acl_{access,default} is pure noise.

It might also be useful to attempt to reproduce the problem with slab
debugging turned on, but let's not change that variable just yet.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.12-rc4-mm1
  2005-05-19 18:04               ` 2.6.12-rc4-mm1 David Woodhouse
@ 2005-05-19 20:12                 ` Steve Roemen
  2005-05-19 20:21                   ` 2.6.12-rc4-mm1 David Woodhouse
  0 siblings, 1 reply; 10+ messages in thread
From: Steve Roemen @ 2005-05-19 20:12 UTC (permalink / raw)
  To: David Woodhouse; +Cc: dedekind, Andrew Morton, linux-fsdevel, mason

on 05/19/05 13:04 David Woodhouse wrote the following:

>On Thu, 2005-05-19 at 12:55 -0500, Steve Roemen wrote:
>  
>
>>Compiling 2.6.12-rc3 + the 2 patches right now.  I'll let you know in a 
>>couple of hours if it still does it. 
>>Artem forwarded you my .config file. 
>>    
>>
>
>He did. That confirms you have ACLs enabled -- but are you actually
>_using_ them though? If not, the ACL fields whose refcount is causing
>this problem should never have been set in the first place.
>
>Artem is putting together a patch which will put a magic value into the
>struct posix_acl and hence double-check whether we're really freeing one
>of them twice, or whether it's just that we're seeing memory corruption
>and what's in REISERFS_I(inode)->i_acl_{access,default} is pure noise.
>
>It might also be useful to attempt to reproduce the problem with slab
>debugging turned on, but let's not change that variable just yet.
>
>  
>

No, I am not using ACLs. I am running the 2.6.12-rc3 with those two 
patches, and I can't get it to error out.

Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.12-rc4-mm1
  2005-05-19 20:12                 ` 2.6.12-rc4-mm1 Steve Roemen
@ 2005-05-19 20:21                   ` David Woodhouse
  0 siblings, 0 replies; 10+ messages in thread
From: David Woodhouse @ 2005-05-19 20:21 UTC (permalink / raw)
  To: steve; +Cc: dedekind, Andrew Morton, linux-fsdevel, mason

On Thu, 2005-05-19 at 15:12 -0500, Steve Roemen wrote:
> No, I am not using ACLs. I am running the 2.6.12-rc3 with those two 
> patches, and I can't get it to error out.

OK, then it sounds like what you've seen is a manifestation of the
already-known reiserfs breakage in 2.6.12-rc4. Artem's patch didn't
cause it; it just made it show itself.

Thanks.

-- 
dwmw2



^ permalink raw reply	[flat|nested] 10+ messages in thread

[parent not found: <428508BB.8030604@friservices.com>]

* Re: 2.6.12-rc4-mm1
       [not found]     ` <428508BB.8030604@friservices.com>
@ 2005-05-14 10:46       ` Artem B. Bityuckiy
  0 siblings, 0 replies; 10+ messages in thread
From: Artem B. Bityuckiy @ 2005-05-14 10:46 UTC (permalink / raw)
  To: steve; +Cc: David Woodhouse, Andrew Morton, linux-fsdevel

> i'll try to reproduce it.  this is on an IBM X31 with a pentium M no
> smp.  using reiserfs (not V4)  but did compile it with reiser4

So was the preemption enabled or disabled ?

-- 
Best Regards,
Artem B. Bityuckiy,
St.-Petersburg, Russia.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-05-19 20:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20050512033100.017958f6.akpm@osdl.org>
     [not found] ` <4284BF66.1050704@friservices.com>
2005-05-13 18:13   ` 2.6.12-rc4-mm1 David Woodhouse
2005-05-14  1:07     ` 2.6.12-rc4-mm1 steve
2005-05-18 10:06       ` 2.6.12-rc4-mm1 Artem B. Bityuckiy
2005-05-18 16:16         ` 2.6.12-rc4-mm1 Steve Roemen
2005-05-19 16:45           ` 2.6.12-rc4-mm1 David Woodhouse
2005-05-19 17:55             ` 2.6.12-rc4-mm1 Steve Roemen
2005-05-19 18:04               ` 2.6.12-rc4-mm1 David Woodhouse
2005-05-19 20:12                 ` 2.6.12-rc4-mm1 Steve Roemen
2005-05-19 20:21                   ` 2.6.12-rc4-mm1 David Woodhouse
     [not found]     ` <428508BB.8030604@friservices.com>
2005-05-14 10:46       ` 2.6.12-rc4-mm1 Artem B. Bityuckiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).