Linux NILFS development
 help / color / mirror / Atom feed
* Kernel Bug while deleting files (with rsync)
@ 2008-03-25 10:25 Alexander Schier
       [not found] ` <20080325102553.GA10853-W0ZHf6uU1cg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Schier @ 2008-03-25 10:25 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg

kernel BUG at /home/allo/nilfs-2.0.0/fs/page.c:436!
invalid opcode: 0000 [#1] PREEMPT 
Modules linked in: nilfs2 sha256_generic aes_i586 aes_generic cbc blkcipher nfsd exportfs ipv6 nfs lockd sunrpc dm_crypt dm_mod analog gameport parport_pc parport snd_intel8x0 snd_ac97_codec ehci_hcd ohci_hcd uhci_hcd ac97_bus snd_pcm snd_timer snd snd_page_alloc usbcore

Pid: 2456, comm: rsync Not tainted (2.6.24.2 #2)
EIP: 0060:[<e0e58cff>] EFLAGS: 00010246 CPU: 0
EIP is at nilfs_free_buffer_page+0x29/0x32 [nilfs2]
EAX: 00000000 EBX: c1366360 ECX: d6b161a4 EDX: cfc71e48
ESI: c1366360 EDI: 00000001 EBP: d6b14334 ESP: cfc71e58
 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process rsync (pid: 2456, ti=cfc70000 task=ddcabaa0 task.ti=cfc70000)
Stack: 00000001 e0e5a207 c1366360 00000001 00000001 e0e5aa0a 00000010 c1366360 
       00000004 00000000 00000000 00000005 00000000 00081088 00000000 00081089 
       00000000 0008108a 00000000 0008108b 00000000 00000000 00000000 00081090 
Call Trace:
 [<e0e5a207>] nilfs_btnode_delete_page+0x27/0x2d [nilfs2]
 [<e0e5aa0a>] nilfs_btnode_cache_clear+0x7e/0x91 [nilfs2]
 [<e0e58380>] nilfs_clear_inode+0x87/0x9a [nilfs2]
 [<c0176ca3>] clear_inode+0x6c/0xba
 [<e0e556e5>] nilfs_free_inode+0x17/0x28 [nilfs2]
 [<e0e55959>] nilfs_delete_inode+0xd0/0x10f [nilfs2]
 [<c0175d23>] d_delete+0x5d/0xe3
 [<e0e55889>] nilfs_delete_inode+0x0/0x10f [nilfs2]
 [<c0177122>] generic_delete_inode+0x71/0xee
 [<c0176636>] iput+0x60/0x62
 [<c016e80e>] do_unlinkat+0xbe/0xfe
 [<c01656b1>] vfs_write+0x114/0x124
 [<c0165763>] sys_write+0x41/0x67
 [<c0103e4e>] syscall_call+0x7/0xb
 =======================
Code: 2f df 53 89 c3 8b 00 a8 01 75 04 0f 0b eb fe 83 7b 10 00 74 04 0f 0b eb fe 8b 03 f6 c4 08 74 0f 89 d8 e8 33 9a 32 df 85 c0 75 04 <0f> 0b eb fe 89 d8 5b eb bc 55 57 56 53 83 ec 20 85 c0 c7 44 24 
EIP: [<e0e58cff>] nilfs_free_buffer_page+0x29/0x32 [nilfs2] SS:ESP 0068:cfc71e58
---[ end trace bdf0e0f308ae752d ]---

Yes, its 2.0.0 because of the 2.0.1 incompatiblity. If its in 2.0.1 fixed, disregard this ;).

Alex

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel Bug while deleting files (with rsync)
       [not found] ` <20080325102553.GA10853-W0ZHf6uU1cg@public.gmane.org>
@ 2008-03-25 11:45   ` Ryusuke Konishi
       [not found]     ` <1206445521.3131.60.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Ryusuke Konishi @ 2008-03-25 11:45 UTC (permalink / raw)
  To: NILFS Users mailing list

Hi Alex,

On Tue, 2008-03-25 at 11:25 +0100, Alexander Schier wrote:
> kernel BUG at /home/allo/nilfs-2.0.0/fs/page.c:436!
> invalid opcode: 0000 [#1] PREEMPT 
> Modules linked in: nilfs2 sha256_generic aes_i586 aes_generic cbc blkcipher nfsd exportfs ipv6 nfs lockd sunrpc dm_crypt dm_mod analog gameport parport_pc parport snd_intel8x0 snd_ac97_codec ehci_hcd ohci_hcd uhci_hcd ac97_bus snd_pcm snd_timer snd snd_page_alloc usbcore
> 
> Pid: 2456, comm: rsync Not tainted (2.6.24.2 #2)
> EIP: 0060:[<e0e58cff>] EFLAGS: 00010246 CPU: 0
> EIP is at nilfs_free_buffer_page+0x29/0x32 [nilfs2]
> EAX: 00000000 EBX: c1366360 ECX: d6b161a4 EDX: cfc71e48
> ESI: c1366360 EDI: 00000001 EBP: d6b14334 ESP: cfc71e58
>  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
> Process rsync (pid: 2456, ti=cfc70000 task=ddcabaa0 task.ti=cfc70000)
> Stack: 00000001 e0e5a207 c1366360 00000001 00000001 e0e5aa0a 00000010 c1366360 
>        00000004 00000000 00000000 00000005 00000000 00081088 00000000 00081089 
>        00000000 0008108a 00000000 0008108b 00000000 00000000 00000000 00081090 
> Call Trace:
>  [<e0e5a207>] nilfs_btnode_delete_page+0x27/0x2d [nilfs2]
>  [<e0e5aa0a>] nilfs_btnode_cache_clear+0x7e/0x91 [nilfs2]
>  [<e0e58380>] nilfs_clear_inode+0x87/0x9a [nilfs2]
>  [<c0176ca3>] clear_inode+0x6c/0xba
>  [<e0e556e5>] nilfs_free_inode+0x17/0x28 [nilfs2]
>  [<e0e55959>] nilfs_delete_inode+0xd0/0x10f [nilfs2]
>  [<c0175d23>] d_delete+0x5d/0xe3
>  [<e0e55889>] nilfs_delete_inode+0x0/0x10f [nilfs2]
>  [<c0177122>] generic_delete_inode+0x71/0xee
>  [<c0176636>] iput+0x60/0x62
>  [<c016e80e>] do_unlinkat+0xbe/0xfe
>  [<c01656b1>] vfs_write+0x114/0x124
>  [<c0165763>] sys_write+0x41/0x67
>  [<c0103e4e>] syscall_call+0x7/0xb
>  =======================
> Code: 2f df 53 89 c3 8b 00 a8 01 75 04 0f 0b eb fe 83 7b 10 00 74 04 0f 0b eb fe 8b 03 f6 c4 08 74 0f 89 d8 e8 33 9a 32 df 85 c0 75 04 <0f> 0b eb fe 89 d8 5b eb bc 55 57 56 53 83 ec 20 85 c0 c7 44 24 
> EIP: [<e0e58cff>] nilfs_free_buffer_page+0x29/0x32 [nilfs2] SS:ESP 0068:cfc71e58
> ---[ end trace bdf0e0f308ae752d ]---
> 
> Yes, its 2.0.0 because of the 2.0.1 incompatiblity. If its in 2.0.1 fixed, disregard this ;).
> 
> Alex

Whoa, it seems to be a new bug!
This one seems rather tough. :(

OK, I'll review it before releasing the version 2.0.2.
Anyway, thanks for sending the log.

Regards,
-- 
Ryusuke Konishi
NILFS team NTT
http://www.nilfs.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel Bug while deleting files (with rsync)
       [not found]     ` <1206445521.3131.60.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2008-03-26 13:05       ` Ryusuke Konishi
       [not found]         ` <20080326.220532.62697306.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Ryusuke Konishi @ 2008-03-26 13:05 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg

Hi,

From: Ryusuke Konishi <ryusuke-sG5X7nlA6pw@public.gmane.org>
Subject: Re: [NILFS users] Kernel Bug while deleting files (with rsync)
Date: Tue, 25 Mar 2008 20:45:20 +0900
> On Tue, 2008-03-25 at 11:25 +0100, Alexander Schier wrote:
> > kernel BUG at /home/allo/nilfs-2.0.0/fs/page.c:436!
> <snip>
> > Yes, its 2.0.0 because of the 2.0.1 incompatiblity. If its in 2.0.1 fixed, disregard this ;).
> 
> Whoa, it seems to be a new bug!
> This one seems rather tough. :(
> 
> OK, I'll review it before releasing the version 2.0.2.
> Anyway, thanks for sending the log.

I couldn't find out the cause because the function called
BUG() was a cleanup routine, not that making the failure.

Instead of tweaking around, I added some code to get more
information about the page in inconsistent states without
sacrificing performance.

Please try v2.0.2 released shortly before.
If the bug will be reproduced, we would get some sort of clue.

Thanks,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel Bug while deleting files (with rsync)
       [not found]         ` <20080326.220532.62697306.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2008-03-26 16:47           ` Alexander Schier
       [not found]             ` <20080326164723.GA31518-W0ZHf6uU1cg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Schier @ 2008-03-26 16:47 UTC (permalink / raw)
  To: NILFS Users mailing list

Hi!


segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds
NILFS warning (device dm-3): nilfs_sync_super: barrier-based sync failed. disabling barriers

PAGE_BUG(c1328080): cnt=2 index#=0 flags=0x40000805 mapping=00000000 ino=0
 BH[0] d66a1cd0: cnt=1 block#=0 state=0x20000
------------[ cut here ]------------
kernel BUG at /home/allo/nilfs-2.0.2/fs/page.c:651!
invalid opcode: 0000 [#1] PREEMPT 
Modules linked in: nilfs2 sha256_generic aes_i586 aes_generic cbc blkcipher nfsd exportfs ipv6 nfs lockd sunrpc dm_crypt dm_mod analog uhci_hcd snd_intel8x0 snd_ac97_codec ohci_hcd ehci_hcd gameport parport_pc parport ac97_bus snd_pcm snd_timer usbcore snd snd_page_alloc

Pid: 2903, comm: rsync Not tainted (2.6.24.2 #2)
EIP: 0060:[<e0e7308f>] EFLAGS: 00010246 CPU: 0
EIP is at nilfs_page_bug+0xc5/0xc9 [nilfs2]
EAX: 00000031 EBX: 00000001 ECX: d8fa3e24 EDX: e0e883eb
ESI: d66a1cd0 EDI: 00000000 EBP: 00000001 ESP: d8fa3e20
 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process rsync (pid: 2903, ti=d8fa2000 task=deb72550 task.ti=d8fa2000)
Stack: e0e883eb 00000000 d66a1cd0 00000001 00000000 00000000 00020000 00000000 
       d66a1cd0 c1328080 c1328080 00000001 d670dc60 e0e730d8 00000001 e0e745c6 
       c1328080 00000001 00000001 e0e74dda 00000010 c1328080 00000004 00000000 
Call Trace:
 [<e0e730d8>] nilfs_free_buffer_page+0x33/0x38 [nilfs2]
 [<e0e745c6>] nilfs_btnode_delete_page+0x27/0x2d [nilfs2]
 [<e0e74dda>] nilfs_btnode_cache_clear+0x81/0x94 [nilfs2]
 [<e0e725bb>] nilfs_clear_inode+0x87/0x9a [nilfs2]
 [<c0176ca3>] clear_inode+0x6c/0xba
 [<e0e6f978>] nilfs_free_inode+0x17/0x28 [nilfs2]
 [<e0e6fbcb>] nilfs_delete_inode+0xd0/0x10f [nilfs2]
 [<c0175d23>] d_delete+0x5d/0xe3
 [<e0e6fafb>] nilfs_delete_inode+0x0/0x10f [nilfs2]
 [<c0177122>] generic_delete_inode+0x71/0xee
 [<c0176636>] iput+0x60/0x62
 [<c016e80e>] do_unlinkat+0xbe/0xfe
 [<c01656b1>] vfs_write+0x114/0x124
 [<c0165763>] sys_write+0x41/0x67
 [<c0103e4e>] syscall_call+0x7/0xb
 =======================
Code: 24 18 89 54 24 10 89 4c 24 14 89 5c 24 0c 89 7c 24 04 c7 04 24 eb 83 e8 e0 e8 ef c8 47 df 8b 76 04 3b 74 24 20 74 04 89 ef eb c2 <0f> 0b eb fe 53 89 c3 e8 7c 47 2d df 89 d8 31 d2 5b e9 44 8f 2d 
EIP: [<e0e7308f>] nilfs_page_bug+0xc5/0xc9 [nilfs2] SS:ESP 0068:d8fa3e20
---[ end trace e90c885f690f08c3 ]---

with rsync again.

Alex

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel Bug while deleting files (with rsync)
       [not found]             ` <20080326164723.GA31518-W0ZHf6uU1cg@public.gmane.org>
@ 2008-03-27  3:03               ` Ryusuke Konishi
       [not found]                 ` <1206587021.3100.47.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Ryusuke Konishi @ 2008-03-27  3:03 UTC (permalink / raw)
  To: NILFS Users mailing list

Hi Alex,

On Wed, 2008-03-26 at 17:47 +0100, Alexander Schier wrote:
> Hi!
> 
> segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds
> NILFS warning (device dm-3): nilfs_sync_super: barrier-based sync failed. disabling barriers
> 
> PAGE_BUG(c1328080): cnt=2 index#=0 flags=0x40000805 mapping=00000000 ino=0
>  BH[0] d66a1cd0: cnt=1 block#=0 state=0x20000
> <snip>
> with rsync again.
> 
> Alex

Thanks!
According to your log, the problem is a reference count leak of a
b-tree node buffer.

Could you tell me the concrete way to reproduce the bug with rsync ? 

Regards,
-- 
Ryusuke Konishi
NILFS team NTT
http://www.nilfs.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel Bug while deleting files (with rsync)
       [not found]                 ` <1206587021.3100.47.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
@ 2008-03-27 13:23                   ` Alexander Schier
       [not found]                     ` <20080327132327.GA3562-W0ZHf6uU1cg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Schier @ 2008-03-27 13:23 UTC (permalink / raw)
  To: NILFS Users mailing list

Hi!
On Thu, Mar 27, 2008 at 12:03:41PM +0900, Ryusuke Konishi wrote:
> Thanks!
> According to your log, the problem is a reference count leak of a
> b-tree node buffer.
> 
> Could you tell me the concrete way to reproduce the bug with rsync ? 
I have ~5 GB data at the nilfs2, with one snapshot. It was intended for backup,
with rsyc from another host:
rsync -avPz --delete /home/ hostwithnilfs2:/media/backup/home/

but it tells me ca. 5 deleted files, and then the BUG is triggered.
Then i cannot umount the filesystem, rmmod does not work, and shutdown
hangs, too.

Alex

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel Bug while deleting files (with rsync)
       [not found]                     ` <20080327132327.GA3562-W0ZHf6uU1cg@public.gmane.org>
@ 2008-03-28 18:12                       ` Alexander Schier
       [not found]                         ` <20080328181211.GA19581-W0ZHf6uU1cg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Schier @ 2008-03-28 18:12 UTC (permalink / raw)
  To: NILFS Users mailing list

Hi!
It keeps reproducable, now even just after starting rsync:
> rsync -avPz --delete /home/allo/ /media/backup2/allo/
sending incremental file list

(bug on remotehost)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Resolved? Was: Kernel Bug while deleting files (with rsync)
       [not found]                         ` <20080328181211.GA19581-W0ZHf6uU1cg@public.gmane.org>
@ 2008-03-28 19:40                           ` Alexander Schier
       [not found]                             ` <20080328194039.GA28685-W0ZHf6uU1cg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Schier @ 2008-03-28 19:40 UTC (permalink / raw)
  To: NILFS Users mailing list

OK, it seems to work for me, after ...
-using a debian 2.6.24 kernel instead of a selfcompiled 2.6.22
-maybe the 2.6.22 had a different compiler than the nilfs2 module

at the moment rsync works perfectly.

I'll mail again, if it has the BUG again. Until then, it seems resolved.
;)

Alex

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Resolved? Was: Kernel Bug while deleting files (with rsync)
       [not found]                             ` <20080328194039.GA28685-W0ZHf6uU1cg@public.gmane.org>
@ 2008-03-30  5:04                               ` Ryusuke Konishi
       [not found]                                 ` <20080330.140417.27795318.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Ryusuke Konishi @ 2008-03-30  5:04 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg, alex-W0ZHf6uU1cg

Hi Alex,

> OK, it seems to work for me, after ...
> -using a debian 2.6.24 kernel instead of a selfcompiled 2.6.22
> -maybe the 2.6.22 had a different compiler than the nilfs2 module
> 
> at the moment rsync works perfectly.
> 
> I'll mail again, if it has the BUG again. Until then, it seems resolved.
> ;)

I see. I'll wait for your next report breathlessly. :)

So far, we haven't been able to reproduce the problem by tests
in a similar fashion.

This might be caused by a defect of exclusion control.
If so, it's hard to track it by phenomena.
So, I'll think reviewing the source code of related parts
during making kernel patches of NILFS.

Cheers,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Resolved? Was: Kernel Bug while deleting files (with rsync)
       [not found]                                 ` <20080330.140417.27795318.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2008-03-30  9:29                                   ` Alexander Schier
  0 siblings, 0 replies; 10+ messages in thread
From: Alexander Schier @ 2008-03-30  9:29 UTC (permalink / raw)
  To: NILFS Users mailing list

Hi!
On Sun, Mar 30, 2008 at 02:04:17PM +0900, Ryusuke Konishi wrote:
> Hi Alex,
> 
> > OK, it seems to work for me, after ...
> > -using a debian 2.6.24 kernel instead of a selfcompiled 2.6.22
> > -maybe the 2.6.22 had a different compiler than the nilfs2 module
No BUG since this.
> > at the moment rsync works perfectly.
> > 
> > I'll mail again, if it has the BUG again. Until then, it seems resolved.
> > ;)
> I see. I'll wait for your next report breathlessly. :)
Sorry, seems it was my mistake.

> So far, we haven't been able to reproduce the problem by tests
> in a similar fashion.
You must know, if this can be caused by wrong compiler only, or if the
bug may exist hidden in the Source, and was only triggered by this.
Another option may be, that my self-compiled kernel missed some option,
which nilfs uses, but does not depend on in Makefile?

You're the Coders, i can only guess and be happy it works fine with current
configuration.

> This might be caused by a defect of exclusion control.
> If so, it's hard to track it by phenomena.
> So, I'll think reviewing the source code of related parts
> during making kernel patches of NILFS.
keep improving ;)

Alex

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-03-30  9:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-25 10:25 Kernel Bug while deleting files (with rsync) Alexander Schier
     [not found] ` <20080325102553.GA10853-W0ZHf6uU1cg@public.gmane.org>
2008-03-25 11:45   ` Ryusuke Konishi
     [not found]     ` <1206445521.3131.60.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2008-03-26 13:05       ` Ryusuke Konishi
     [not found]         ` <20080326.220532.62697306.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-03-26 16:47           ` Alexander Schier
     [not found]             ` <20080326164723.GA31518-W0ZHf6uU1cg@public.gmane.org>
2008-03-27  3:03               ` Ryusuke Konishi
     [not found]                 ` <1206587021.3100.47.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2008-03-27 13:23                   ` Alexander Schier
     [not found]                     ` <20080327132327.GA3562-W0ZHf6uU1cg@public.gmane.org>
2008-03-28 18:12                       ` Alexander Schier
     [not found]                         ` <20080328181211.GA19581-W0ZHf6uU1cg@public.gmane.org>
2008-03-28 19:40                           ` Resolved? Was: " Alexander Schier
     [not found]                             ` <20080328194039.GA28685-W0ZHf6uU1cg@public.gmane.org>
2008-03-30  5:04                               ` Ryusuke Konishi
     [not found]                                 ` <20080330.140417.27795318.ryusuke-sG5X7nlA6pw@public.gmane.org>
2008-03-30  9:29                                   ` Alexander Schier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox