All of lore.kernel.org
 help / color / mirror / Atom feed
* error on kernel 2.6.29 while running cleaner on a 1tb volume
@ 2009-03-25  5:22 David Arendt
       [not found] ` <49C9BF81.6090203-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: David Arendt @ 2009-03-25  5:22 UTC (permalink / raw)
  To: NILFS Users mailing list

Hi,

First of all, please don't get me wrong for posting all this bug 
reports. It is not in the sense of complaining me. I am very satisfied 
with nilfs2. As I am a software developer myself, I always like 
receiving bug reports. What I hate most is people complaining in the 
sense nothing is working without any more information.

So here an error on kernel 2.6.29 while running cleaner on a 1 tb 
volume. I am not sure if it is nilfs related, but I post it for your 
information.

BUG: unable to handle kernel paging request at 9c5c67f0
IP: [<c0239049>] radix_tree_delete+0x19/0x220
*pdpt = 0000000030490001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:04:03.0/resource
Modules linked in: nvidia(P) vmnet vmblock vmci vmmon fcpci(P) capi 
capifs kernelcapi nilfs2 scsi_wait_scan

Pid: 333, comm: kswapd0 Tainted: P           (2.6.29server #1) P5QL-E
EIP: 0060:[<c0239049>] EFLAGS: 00010092 CPU: 3
EIP is at radix_tree_delete+0x19/0x220
EAX: 0537456a EBX: 00000000 ECX: f73f10d4 EDX: f701c598
ESI: f73f10d4 EDI: f73f10e4 EBP: f73f10d8 ESP: f76e9d08
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process kswapd0 (pid: 333, ti=f76e8000 task=f76885c0 task.ti=f76e8000)
Stack:
 00000000 f6fc0040 0537456a 00000000 00000000 00000000 f76e9d2c c0169827
 f5f0d3d0 f56e482c 00000080 0000c400 00000000 00000000 0000c5ab 0000c5ab
 00000000 f56e482c c108dee0 f73f10d4 f73f10e4 f76e9ebc c014fc15 0000c5ab
Call Trace:
 [<c0169827>] page_referenced_file+0x77/0x90
 [<c014fc15>] __remove_from_page_cache+0x15/0x90
 [<c0158864>] __remove_mapping+0x84/0xc0
 [<c01590f3>] shrink_page_list+0x393/0x6d0
 [<c0158cf2>] shrink_active_list+0x332/0x3a0
 [<c01580f8>] isolate_pages_global+0x88/0x210
 [<c0156e39>] ____pagevec_lru_add+0x119/0x130
 [<c0159654>] shrink_list+0x224/0x560
 [<c0159c07>] shrink_zone+0x277/0x300
 [<c015a6f8>] kswapd+0x518/0x530
 [<c0158070>] isolate_pages_global+0x0/0x210
 [<c0138940>] autoremove_wake_function+0x0/0x50
 [<c011d33d>] complete+0x3d/0x60
 [<c015a1e0>] kswapd+0x0/0x530
 [<c0138622>] kthread+0x42/0x70
 [<c01385e0>] kthread+0x0/0x70
 [<c010391b>] kernel_thread_helper+0x7/0x1c
Code: 89 f8 5b 5e 5f 5d c3 0f 0b eb fe 0f 0b eb fe 8d 76 00 55 89 c5 57 
56 53 31 db 83 ec 48 89 54 24 08 8b 10 8b 44 24 08 89 5c 24 0c <39> 04 
95 90 51 55 c0 0f 82 1c 01 00 00 8b 4d 08 85 d2 89 4c 24
EIP: [<c0239049>] radix_tree_delete+0x19/0x220 SS:ESP 0068:f76e9d08
---[ end trace 86f39789c1fa8998 ]---
note: kswapd0[333] exited with preempt_count 1
BUG: unable to handle kernel NULL pointer dereference at 00000104
IP: [<c01504f9>] find_get_pages+0x79/0xf0
*pdpt = 0000000030e61001 *pde = 0000000000000000
Oops: 0000 [#2] PREEMPT SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:04:03.0/resource
Modules linked in: nvidia(P) vmnet vmblock vmci vmmon fcpci(P) capi 
capifs kernelcapi nilfs2 scsi_wait_scan
Pid: 8494, comm: nilfs_cleanerd Tainted: P      D    (2.6.29server #1) 
P5QL-E
EIP: 0060:[<c01504f9>] EFLAGS: 00210213 CPU: 2
EIP is at find_get_pages+0x79/0xf0
EAX: 00000100 EBX: 00000104 ECX: 00000100 EDX: db6a1cc4
ESI: 0000000a EDI: db6a1c9c EBP: 0000000a ESP: db6a1c2c
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process nilfs_cleanerd (pid: 8494, ti=db6a0000 task=f75e7800 
task.ti=db6a0000)
Stack:
 0000000e db6a1cc4 00000104 00000002 f70d78a0 0000000e 000f76cf 0000000d
 000f76cf db6a1c94 000f76ce 0000000e c0157502 db6a1c9c 000f76cf c1cf16e0
 f8330336 0000000e db6a1c98 0000000e f70d7aec f70d78ac f70d7ae0 f70d789c
Call Trace:
 [<c0157502>] pagevec_lookup+0x22/0x30
 [<f8330336>] nilfs_copy_back_pages+0x56/0x220 [nilfs2]
 [<f834328b>] nilfs_commit_gcdat_inode+0x8b/0xc0 [nilfs2]
 [<f833b1bd>] nilfs_segctor_complete_write+0x2fd/0x310 [nilfs2]
 [<f833b914>] nilfs_segctor_do_construct+0x424/0x18c0 [nilfs2]
 [<f83319bf>] nilfs_bmap_test_and_clear_dirty+0x2f/0x40 [nilfs2]
 [<f833d009>] nilfs_segctor_construct+0x99/0xb0 [nilfs2]
 [<f833df1f>] nilfs_clean_segments+0xef/0x200 [nilfs2]
 [<f83427e0>] nilfs_ioctl+0x3d0/0x480 [nilfs2]
 [<c030f774>] ehci_work+0x124/0x9a0
 [<c011da6b>] update_curr+0x7b/0xe0
 [<c012eeb7>] lock_timer_base+0x27/0x60
 [<c013f55e>] getnstimeofday+0x4e/0x120
 [<c0140808>] clocksource_get_next+0x38/0x40
 [<f8342410>] nilfs_ioctl+0x0/0x480 [nilfs2]
 [<c0180c6b>] vfs_ioctl+0x2b/0x90
 [<c0180fdb>] do_vfs_ioctl+0x1eb/0x530
 [<c012ebdb>] run_timer_softirq+0x15b/0x190
 [<c012a484>] __do_softirq+0x94/0x160
 [<c018135d>] sys_ioctl+0x3d/0x70
 [<c0103131>] sysenter_do_call+0x12/0x25
 [<c0400000>] pci_bus_size_bridges+0x1f0/0x410
Code: 00 00 8b 44 24 34 8d 04 b0 89 44 24 04 8b 54 24 04 8b 02 8b 00 a8 
01 75 ba 85 c0 89 c1 74 3e 83 f8 ff 74 af 8d 58 04 89 5c 24 08 <8b> 50 
04 85 d2 74 db 8d 7a 01 89 d0 8b 5c 24 08 f0 0f b1 3b 39
EIP: [<c01504f9>] find_get_pages+0x79/0xf0 SS:ESP 0068:db6a1c2c
---[ end trace 86f39789c1fa8999 ]---
note: nilfs_cleanerd[8494] exited with preempt_count 1

Bye,
David Arendt

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: error on kernel 2.6.29 while running cleaner on a 1tb volume
       [not found] ` <49C9BF81.6090203-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
@ 2009-03-25 11:18   ` admin-/LHdS3kC8BfYtjvyW6yDsg
  2009-03-25 17:19   ` Ryusuke Konishi
  1 sibling, 0 replies; 15+ messages in thread
From: admin-/LHdS3kC8BfYtjvyW6yDsg @ 2009-03-25 11:18 UTC (permalink / raw)
  To: NILFS Users mailing list

Hi,

after trying to run the cleaner a second time, I had the following errors:

Mar 25 06:09:50 server nilfs_cleanerd[6772]: start
Mar 25 07:14:24 server nilfs_cpfile_delete_checkpoints: invalid range of
checkpo
int numbers: [4294969344, 32720)
Mar 25 07:14:24 server NILFS: GC failed during preparation: cannot delete
checkp
oints: err=-22
Mar 25 07:14:24 server nilfs_cleanerd[6772]: Invalid argument
Mar 25 07:14:24 server nilfs_cleanerd[6772]: cannot clean segments:
Invalid argu
ment
Mar 25 07:14:24 server nilfs_cleanerd[6772]: shutdown

Bye,
David Arendt

> Hi,
>
> First of all, please don't get me wrong for posting all this bug
> reports. It is not in the sense of complaining me. I am very satisfied
> with nilfs2. As I am a software developer myself, I always like
> receiving bug reports. What I hate most is people complaining in the
> sense nothing is working without any more information.
>
> So here an error on kernel 2.6.29 while running cleaner on a 1 tb
> volume. I am not sure if it is nilfs related, but I post it for your
> information.
>
> BUG: unable to handle kernel paging request at 9c5c67f0
> IP: [<c0239049>] radix_tree_delete+0x19/0x220
> *pdpt = 0000000030490001 *pde = 0000000000000000
> Oops: 0000 [#1] PREEMPT SMP
> last sysfs file:
> /sys/devices/pci0000:00/0000:00:1e.0/0000:04:03.0/resource
> Modules linked in: nvidia(P) vmnet vmblock vmci vmmon fcpci(P) capi
> capifs kernelcapi nilfs2 scsi_wait_scan
>
> Pid: 333, comm: kswapd0 Tainted: P           (2.6.29server #1) P5QL-E
> EIP: 0060:[<c0239049>] EFLAGS: 00010092 CPU: 3
> EIP is at radix_tree_delete+0x19/0x220
> EAX: 0537456a EBX: 00000000 ECX: f73f10d4 EDX: f701c598
> ESI: f73f10d4 EDI: f73f10e4 EBP: f73f10d8 ESP: f76e9d08
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process kswapd0 (pid: 333, ti=f76e8000 task=f76885c0 task.ti=f76e8000)
> Stack:
>  00000000 f6fc0040 0537456a 00000000 00000000 00000000 f76e9d2c c0169827
>  f5f0d3d0 f56e482c 00000080 0000c400 00000000 00000000 0000c5ab 0000c5ab
>  00000000 f56e482c c108dee0 f73f10d4 f73f10e4 f76e9ebc c014fc15 0000c5ab
> Call Trace:
>  [<c0169827>] page_referenced_file+0x77/0x90
>  [<c014fc15>] __remove_from_page_cache+0x15/0x90
>  [<c0158864>] __remove_mapping+0x84/0xc0
>  [<c01590f3>] shrink_page_list+0x393/0x6d0
>  [<c0158cf2>] shrink_active_list+0x332/0x3a0
>  [<c01580f8>] isolate_pages_global+0x88/0x210
>  [<c0156e39>] ____pagevec_lru_add+0x119/0x130
>  [<c0159654>] shrink_list+0x224/0x560
>  [<c0159c07>] shrink_zone+0x277/0x300
>  [<c015a6f8>] kswapd+0x518/0x530
>  [<c0158070>] isolate_pages_global+0x0/0x210
>  [<c0138940>] autoremove_wake_function+0x0/0x50
>  [<c011d33d>] complete+0x3d/0x60
>  [<c015a1e0>] kswapd+0x0/0x530
>  [<c0138622>] kthread+0x42/0x70
>  [<c01385e0>] kthread+0x0/0x70
>  [<c010391b>] kernel_thread_helper+0x7/0x1c
> Code: 89 f8 5b 5e 5f 5d c3 0f 0b eb fe 0f 0b eb fe 8d 76 00 55 89 c5 57
> 56 53 31 db 83 ec 48 89 54 24 08 8b 10 8b 44 24 08 89 5c 24 0c <39> 04
> 95 90 51 55 c0 0f 82 1c 01 00 00 8b 4d 08 85 d2 89 4c 24
> EIP: [<c0239049>] radix_tree_delete+0x19/0x220 SS:ESP 0068:f76e9d08
> ---[ end trace 86f39789c1fa8998 ]---
> note: kswapd0[333] exited with preempt_count 1
> BUG: unable to handle kernel NULL pointer dereference at 00000104
> IP: [<c01504f9>] find_get_pages+0x79/0xf0
> *pdpt = 0000000030e61001 *pde = 0000000000000000
> Oops: 0000 [#2] PREEMPT SMP
> last sysfs file:
> /sys/devices/pci0000:00/0000:00:1e.0/0000:04:03.0/resource
> Modules linked in: nvidia(P) vmnet vmblock vmci vmmon fcpci(P) capi
> capifs kernelcapi nilfs2 scsi_wait_scan
> Pid: 8494, comm: nilfs_cleanerd Tainted: P      D    (2.6.29server #1)
> P5QL-E
> EIP: 0060:[<c01504f9>] EFLAGS: 00210213 CPU: 2
> EIP is at find_get_pages+0x79/0xf0
> EAX: 00000100 EBX: 00000104 ECX: 00000100 EDX: db6a1cc4
> ESI: 0000000a EDI: db6a1c9c EBP: 0000000a ESP: db6a1c2c
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process nilfs_cleanerd (pid: 8494, ti=db6a0000 task=f75e7800
> task.ti=db6a0000)
> Stack:
>  0000000e db6a1cc4 00000104 00000002 f70d78a0 0000000e 000f76cf 0000000d
>  000f76cf db6a1c94 000f76ce 0000000e c0157502 db6a1c9c 000f76cf c1cf16e0
>  f8330336 0000000e db6a1c98 0000000e f70d7aec f70d78ac f70d7ae0 f70d789c
> Call Trace:
>  [<c0157502>] pagevec_lookup+0x22/0x30
>  [<f8330336>] nilfs_copy_back_pages+0x56/0x220 [nilfs2]
>  [<f834328b>] nilfs_commit_gcdat_inode+0x8b/0xc0 [nilfs2]
>  [<f833b1bd>] nilfs_segctor_complete_write+0x2fd/0x310 [nilfs2]
>  [<f833b914>] nilfs_segctor_do_construct+0x424/0x18c0 [nilfs2]
>  [<f83319bf>] nilfs_bmap_test_and_clear_dirty+0x2f/0x40 [nilfs2]
>  [<f833d009>] nilfs_segctor_construct+0x99/0xb0 [nilfs2]
>  [<f833df1f>] nilfs_clean_segments+0xef/0x200 [nilfs2]
>  [<f83427e0>] nilfs_ioctl+0x3d0/0x480 [nilfs2]
>  [<c030f774>] ehci_work+0x124/0x9a0
>  [<c011da6b>] update_curr+0x7b/0xe0
>  [<c012eeb7>] lock_timer_base+0x27/0x60
>  [<c013f55e>] getnstimeofday+0x4e/0x120
>  [<c0140808>] clocksource_get_next+0x38/0x40
>  [<f8342410>] nilfs_ioctl+0x0/0x480 [nilfs2]
>  [<c0180c6b>] vfs_ioctl+0x2b/0x90
>  [<c0180fdb>] do_vfs_ioctl+0x1eb/0x530
>  [<c012ebdb>] run_timer_softirq+0x15b/0x190
>  [<c012a484>] __do_softirq+0x94/0x160
>  [<c018135d>] sys_ioctl+0x3d/0x70
>  [<c0103131>] sysenter_do_call+0x12/0x25
>  [<c0400000>] pci_bus_size_bridges+0x1f0/0x410
> Code: 00 00 8b 44 24 34 8d 04 b0 89 44 24 04 8b 54 24 04 8b 02 8b 00 a8
> 01 75 ba 85 c0 89 c1 74 3e 83 f8 ff 74 af 8d 58 04 89 5c 24 08 <8b> 50
> 04 85 d2 74 db 8d 7a 01 89 d0 8b 5c 24 08 f0 0f b1 3b 39
> EIP: [<c01504f9>] find_get_pages+0x79/0xf0 SS:ESP 0068:db6a1c2c
> ---[ end trace 86f39789c1fa8999 ]---
> note: nilfs_cleanerd[8494] exited with preempt_count 1
>
> Bye,
> David Arendt
> _______________________________________________
> users mailing list
> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> https://www.nilfs.org/mailman/listinfo/users
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: error on kernel 2.6.29 while running cleaner on a 1tb volume
       [not found] ` <49C9BF81.6090203-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
  2009-03-25 11:18   ` admin-/LHdS3kC8BfYtjvyW6yDsg
@ 2009-03-25 17:19   ` Ryusuke Konishi
       [not found]     ` <20090326.021932.61004088.ryusuke-sG5X7nlA6pw@public.gmane.org>
  1 sibling, 1 reply; 15+ messages in thread
From: Ryusuke Konishi @ 2009-03-25 17:19 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg, admin-/LHdS3kC8BfYtjvyW6yDsg

Hi,
On Wed, 25 Mar 2009 06:22:09 +0100, David Arendt <admin-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> wrote:
> Hi,
> 
> First of all, please don't get me wrong for posting all this bug 
> reports. It is not in the sense of complaining me. I am very satisfied 
> with nilfs2. As I am a software developer myself, I always like 
> receiving bug reports. What I hate most is people complaining in the 
> sense nothing is working without any more information.

David, I really appreciated your feedback, so feel free to report
bugs ;)

Though we have not caught up with all your reports, we're keeping them
on record.  I believe we will be able to cut down the problems sooner
or later.

> So here an error on kernel 2.6.29 while running cleaner on a 1 tb 
> volume. I am not sure if it is nilfs related, but I post it for your 
> information.

Thanks for the below information.
I'll try some tests on 2.6.29.
It's more likely to be affected by a page cache change.

Regards,
Ryusuke Konishi

> BUG: unable to handle kernel paging request at 9c5c67f0
> IP: [<c0239049>] radix_tree_delete+0x19/0x220
> *pdpt = 0000000030490001 *pde = 0000000000000000
> Oops: 0000 [#1] PREEMPT SMP
> last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:04:03.0/resource
> Modules linked in: nvidia(P) vmnet vmblock vmci vmmon fcpci(P) capi 
> capifs kernelcapi nilfs2 scsi_wait_scan
> 
> Pid: 333, comm: kswapd0 Tainted: P           (2.6.29server #1) P5QL-E
> EIP: 0060:[<c0239049>] EFLAGS: 00010092 CPU: 3
> EIP is at radix_tree_delete+0x19/0x220
> EAX: 0537456a EBX: 00000000 ECX: f73f10d4 EDX: f701c598
> ESI: f73f10d4 EDI: f73f10e4 EBP: f73f10d8 ESP: f76e9d08
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process kswapd0 (pid: 333, ti=f76e8000 task=f76885c0 task.ti=f76e8000)
> Stack:
>  00000000 f6fc0040 0537456a 00000000 00000000 00000000 f76e9d2c c0169827
>  f5f0d3d0 f56e482c 00000080 0000c400 00000000 00000000 0000c5ab 0000c5ab
>  00000000 f56e482c c108dee0 f73f10d4 f73f10e4 f76e9ebc c014fc15 0000c5ab
> Call Trace:
>  [<c0169827>] page_referenced_file+0x77/0x90
>  [<c014fc15>] __remove_from_page_cache+0x15/0x90
>  [<c0158864>] __remove_mapping+0x84/0xc0
>  [<c01590f3>] shrink_page_list+0x393/0x6d0
>  [<c0158cf2>] shrink_active_list+0x332/0x3a0
>  [<c01580f8>] isolate_pages_global+0x88/0x210
>  [<c0156e39>] ____pagevec_lru_add+0x119/0x130
>  [<c0159654>] shrink_list+0x224/0x560
>  [<c0159c07>] shrink_zone+0x277/0x300
>  [<c015a6f8>] kswapd+0x518/0x530
>  [<c0158070>] isolate_pages_global+0x0/0x210
>  [<c0138940>] autoremove_wake_function+0x0/0x50
>  [<c011d33d>] complete+0x3d/0x60
>  [<c015a1e0>] kswapd+0x0/0x530
>  [<c0138622>] kthread+0x42/0x70
>  [<c01385e0>] kthread+0x0/0x70
>  [<c010391b>] kernel_thread_helper+0x7/0x1c
> Code: 89 f8 5b 5e 5f 5d c3 0f 0b eb fe 0f 0b eb fe 8d 76 00 55 89 c5 57 
> 56 53 31 db 83 ec 48 89 54 24 08 8b 10 8b 44 24 08 89 5c 24 0c <39> 04 
> 95 90 51 55 c0 0f 82 1c 01 00 00 8b 4d 08 85 d2 89 4c 24
> EIP: [<c0239049>] radix_tree_delete+0x19/0x220 SS:ESP 0068:f76e9d08
> ---[ end trace 86f39789c1fa8998 ]---
> note: kswapd0[333] exited with preempt_count 1
> BUG: unable to handle kernel NULL pointer dereference at 00000104
> IP: [<c01504f9>] find_get_pages+0x79/0xf0
> *pdpt = 0000000030e61001 *pde = 0000000000000000
> Oops: 0000 [#2] PREEMPT SMP
> last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:04:03.0/resource
> Modules linked in: nvidia(P) vmnet vmblock vmci vmmon fcpci(P) capi 
> capifs kernelcapi nilfs2 scsi_wait_scan
> Pid: 8494, comm: nilfs_cleanerd Tainted: P      D    (2.6.29server #1) 
> P5QL-E
> EIP: 0060:[<c01504f9>] EFLAGS: 00210213 CPU: 2
> EIP is at find_get_pages+0x79/0xf0
> EAX: 00000100 EBX: 00000104 ECX: 00000100 EDX: db6a1cc4
> ESI: 0000000a EDI: db6a1c9c EBP: 0000000a ESP: db6a1c2c
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Process nilfs_cleanerd (pid: 8494, ti=db6a0000 task=f75e7800 
> task.ti=db6a0000)
> Stack:
>  0000000e db6a1cc4 00000104 00000002 f70d78a0 0000000e 000f76cf 0000000d
>  000f76cf db6a1c94 000f76ce 0000000e c0157502 db6a1c9c 000f76cf c1cf16e0
>  f8330336 0000000e db6a1c98 0000000e f70d7aec f70d78ac f70d7ae0 f70d789c
> Call Trace:
>  [<c0157502>] pagevec_lookup+0x22/0x30
>  [<f8330336>] nilfs_copy_back_pages+0x56/0x220 [nilfs2]
>  [<f834328b>] nilfs_commit_gcdat_inode+0x8b/0xc0 [nilfs2]
>  [<f833b1bd>] nilfs_segctor_complete_write+0x2fd/0x310 [nilfs2]
>  [<f833b914>] nilfs_segctor_do_construct+0x424/0x18c0 [nilfs2]
>  [<f83319bf>] nilfs_bmap_test_and_clear_dirty+0x2f/0x40 [nilfs2]
>  [<f833d009>] nilfs_segctor_construct+0x99/0xb0 [nilfs2]
>  [<f833df1f>] nilfs_clean_segments+0xef/0x200 [nilfs2]
>  [<f83427e0>] nilfs_ioctl+0x3d0/0x480 [nilfs2]
>  [<c030f774>] ehci_work+0x124/0x9a0
>  [<c011da6b>] update_curr+0x7b/0xe0
>  [<c012eeb7>] lock_timer_base+0x27/0x60
>  [<c013f55e>] getnstimeofday+0x4e/0x120
>  [<c0140808>] clocksource_get_next+0x38/0x40
>  [<f8342410>] nilfs_ioctl+0x0/0x480 [nilfs2]
>  [<c0180c6b>] vfs_ioctl+0x2b/0x90
>  [<c0180fdb>] do_vfs_ioctl+0x1eb/0x530
>  [<c012ebdb>] run_timer_softirq+0x15b/0x190
>  [<c012a484>] __do_softirq+0x94/0x160
>  [<c018135d>] sys_ioctl+0x3d/0x70
>  [<c0103131>] sysenter_do_call+0x12/0x25
>  [<c0400000>] pci_bus_size_bridges+0x1f0/0x410
> Code: 00 00 8b 44 24 34 8d 04 b0 89 44 24 04 8b 54 24 04 8b 02 8b 00 a8 
> 01 75 ba 85 c0 89 c1 74 3e 83 f8 ff 74 af 8d 58 04 89 5c 24 08 <8b> 50 
> 04 85 d2 74 db 8d 7a 01 89 d0 8b 5c 24 08 f0 0f b1 3b 39
> EIP: [<c01504f9>] find_get_pages+0x79/0xf0 SS:ESP 0068:db6a1c2c
> ---[ end trace 86f39789c1fa8999 ]---
> note: nilfs_cleanerd[8494] exited with preempt_count 1
> 
> Bye,
> David Arendt
> _______________________________________________
> users mailing list
> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> https://www.nilfs.org/mailman/listinfo/users

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: error on kernel 2.6.29 while running cleaner on a 1tb volume
       [not found]     ` <20090326.021932.61004088.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-03-27  5:18       ` David Arendt
       [not found]         ` <49CC6193.9040900-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: David Arendt @ 2009-03-27  5:18 UTC (permalink / raw)
  To: NILFS Users mailing list

Hi,

There seems to be some bug in the kernel. On another partition 
reformatted on week ago, I had again the following error:

NILFS error (device sda3): nilfs_check_page: bad entry in directory 
#28261: unaligned directory entry - offset=4096, inode=1647255843, 
rec_len=29537, name_len=104
NILFS error (device sda3): nilfs_check_page: bad entry in directory 
#28261: unaligned directory entry - offset=4096, inode=1647255843, 
rec_len=29537, name_len=104
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42880
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42881
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42882
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42883
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42884
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42885
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42886
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42887
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42888
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42889
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42890
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42892
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42893
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42894
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42895
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42896
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42897
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42898
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42899
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42900
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42901
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42902
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42903
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42904
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42905
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42906
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42907
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42908
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42909
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42910
NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
inode: 42911
init_special_inode: bogus i_mode (35070)
init_special_inode: bogus i_mode (30055)
init_special_inode: bogus i_mode (30070)
init_special_inode: bogus i_mode (31070)
init_special_inode: bogus i_mode (31066)
init_special_inode: bogus i_mode (31461)
init_special_inode: bogus i_mode (32146)
init_special_inode: bogus i_mode (32545)
init_special_inode: bogus i_mode (72162)
init_special_inode: bogus i_mode (57556)
init_special_inode: bogus i_mode (72542)
init_special_inode: bogus i_mode (5042)
init_special_inode: bogus i_mode (36504)
NILFS error (device sda3): nilfs_check_page: bad entry in directory 
#469193: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, 
name_len=0
NILFS error (device sda3): nilfs_readdir: bad page in #469193
NILFS error (device sda3): nilfs_check_page: bad entry in directory 
#469195: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, 
name_len=0
NILFS error (device sda3): nilfs_readdir: bad page in #469195
NILFS error (device sda3): nilfs_check_page: bad entry in directory 
#468107: directory entry across blocks - offset=0, inode=1095777639, 
rec_len=26480, name_len=61
NILFS error (device sda3): nilfs_readdir: bad page in #468107
NILFS error (device sda3): nilfs_readdir: bad page in #28261
------------[ cut here ]------------
WARNING: at /home/admin/x/nilfs-2.0.11/fs/dat.c:182 
nilfs_dat_prepare_end+0xb0/0xc0 [nilfs2]()
Hardware name: P5QL-E
Modules linked in: nvidia(P) vmnet vmblock vmci vmmon fcpci(P) capi 
capifs kernelcapi nilfs2 scsi_wait_scan
Pid: 333, comm: kswapd0 Tainted: P           2.6.29server #1
Call Trace:
 [<c0125b99>] warn_slowpath+0x99/0xc0
 [<c014e390>] find_get_page+0x30/0xc0
 [<f83410b0>] nilfs_palloc_bitmap_blkoff+0x40/0x60 [nilfs2]
 [<f834118b>] nilfs_palloc_get_entry_block+0x5b/0x70 [nilfs2]
 [<c01506aa>] find_or_create_page+0x2a/0xa0
 [<f83366d8>] nilfs_dat_prepare_entry+0x18/0x20 [nilfs2]
 [<f8336be0>] nilfs_dat_prepare_end+0xb0/0xc0 [nilfs2]
 [<f83364c2>] nilfs_direct_delete+0x62/0xa0 [nilfs2]
 [<f8331e46>] nilfs_bmap_do_delete+0xb6/0xc0 [nilfs2]
 [<c0157502>] pagevec_lookup+0x22/0x30
 [<c0157c29>] truncate_inode_pages_range+0x179/0x310
 [<f8331ecb>] nilfs_bmap_truncate+0x7b/0xa0 [nilfs2]
 [<f832b53a>] nilfs_truncate_bmap+0x6a/0x100 [nilfs2]
 [<f832c028>] nilfs_delete_inode+0x38/0xc0 [nilfs2]
 [<c019c4b7>] inotify_inode_is_dead+0x17/0x80
 [<f832bff0>] nilfs_delete_inode+0x0/0xc0 [nilfs2]
 [<c018626e>] generic_delete_inode+0x6e/0x100
 [<c02353cb>] _atomic_dec_and_lock+0x3b/0x70
 [<c0185bf4>] iput+0x44/0x50
 [<c0183695>] d_kill+0x35/0x60
 [<c0183860>] __shrink_dcache_sb+0x1a0/0x280
 [<c0183add>] shrink_dcache_memory+0x18d/0x1b0
 [<c0159dbb>] shrink_slab+0x12b/0x190
 [<c015a53c>] kswapd+0x35c/0x530
 [<c0158070>] isolate_pages_global+0x0/0x210
 [<c0138940>] autoremove_wake_function+0x0/0x50
 [<c011d33d>] complete+0x3d/0x60
 [<c015a1e0>] kswapd+0x0/0x530
 [<c0138622>] kthread+0x42/0x70
 [<c01385e0>] kthread+0x0/0x70
 [<c010391b>] kernel_thread_helper+0x7/0x1c
---[ end trace fcc3f79f56f6e698 ]---
NILFS warning (device sda3): nilfs_truncate_bmap: failed to truncate 
bmap (ino=468107, err=-2)

nilfs seems to run absolutely stable as long as the cleaner is running, 
but the cleaner seems to cause corruption. This time, I paid attention 
to run the cleaner always when there was more than 5 gigabytes (20%) of 
freee space on the volume.

Bye,
David Arendt

Ryusuke Konishi wrote:
> Hi,
> On Wed, 25 Mar 2009 06:22:09 +0100, David Arendt <admin-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> wrote:
>   
>> Hi,
>>
>> First of all, please don't get me wrong for posting all this bug 
>> reports. It is not in the sense of complaining me. I am very satisfied 
>> with nilfs2. As I am a software developer myself, I always like 
>> receiving bug reports. What I hate most is people complaining in the 
>> sense nothing is working without any more information.
>>     
>
> David, I really appreciated your feedback, so feel free to report
> bugs ;)
>
> Though we have not caught up with all your reports, we're keeping them
> on record.  I believe we will be able to cut down the problems sooner
> or later.
>
>   
>> So here an error on kernel 2.6.29 while running cleaner on a 1 tb 
>> volume. I am not sure if it is nilfs related, but I post it for your 
>> information.
>>     
>
> Thanks for the below information.
> I'll try some tests on 2.6.29.
> It's more likely to be affected by a page cache change.
>
> Regards,
> Ryusuke Konishi
>
>   
>> BUG: unable to handle kernel paging request at 9c5c67f0
>> IP: [<c0239049>] radix_tree_delete+0x19/0x220
>> *pdpt = 0000000030490001 *pde = 0000000000000000
>> Oops: 0000 [#1] PREEMPT SMP
>> last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:04:03.0/resource
>> Modules linked in: nvidia(P) vmnet vmblock vmci vmmon fcpci(P) capi 
>> capifs kernelcapi nilfs2 scsi_wait_scan
>>
>> Pid: 333, comm: kswapd0 Tainted: P           (2.6.29server #1) P5QL-E
>> EIP: 0060:[<c0239049>] EFLAGS: 00010092 CPU: 3
>> EIP is at radix_tree_delete+0x19/0x220
>> EAX: 0537456a EBX: 00000000 ECX: f73f10d4 EDX: f701c598
>> ESI: f73f10d4 EDI: f73f10e4 EBP: f73f10d8 ESP: f76e9d08
>>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>> Process kswapd0 (pid: 333, ti=f76e8000 task=f76885c0 task.ti=f76e8000)
>> Stack:
>>  00000000 f6fc0040 0537456a 00000000 00000000 00000000 f76e9d2c c0169827
>>  f5f0d3d0 f56e482c 00000080 0000c400 00000000 00000000 0000c5ab 0000c5ab
>>  00000000 f56e482c c108dee0 f73f10d4 f73f10e4 f76e9ebc c014fc15 0000c5ab
>> Call Trace:
>>  [<c0169827>] page_referenced_file+0x77/0x90
>>  [<c014fc15>] __remove_from_page_cache+0x15/0x90
>>  [<c0158864>] __remove_mapping+0x84/0xc0
>>  [<c01590f3>] shrink_page_list+0x393/0x6d0
>>  [<c0158cf2>] shrink_active_list+0x332/0x3a0
>>  [<c01580f8>] isolate_pages_global+0x88/0x210
>>  [<c0156e39>] ____pagevec_lru_add+0x119/0x130
>>  [<c0159654>] shrink_list+0x224/0x560
>>  [<c0159c07>] shrink_zone+0x277/0x300
>>  [<c015a6f8>] kswapd+0x518/0x530
>>  [<c0158070>] isolate_pages_global+0x0/0x210
>>  [<c0138940>] autoremove_wake_function+0x0/0x50
>>  [<c011d33d>] complete+0x3d/0x60
>>  [<c015a1e0>] kswapd+0x0/0x530
>>  [<c0138622>] kthread+0x42/0x70
>>  [<c01385e0>] kthread+0x0/0x70
>>  [<c010391b>] kernel_thread_helper+0x7/0x1c
>> Code: 89 f8 5b 5e 5f 5d c3 0f 0b eb fe 0f 0b eb fe 8d 76 00 55 89 c5 57 
>> 56 53 31 db 83 ec 48 89 54 24 08 8b 10 8b 44 24 08 89 5c 24 0c <39> 04 
>> 95 90 51 55 c0 0f 82 1c 01 00 00 8b 4d 08 85 d2 89 4c 24
>> EIP: [<c0239049>] radix_tree_delete+0x19/0x220 SS:ESP 0068:f76e9d08
>> ---[ end trace 86f39789c1fa8998 ]---
>> note: kswapd0[333] exited with preempt_count 1
>> BUG: unable to handle kernel NULL pointer dereference at 00000104
>> IP: [<c01504f9>] find_get_pages+0x79/0xf0
>> *pdpt = 0000000030e61001 *pde = 0000000000000000
>> Oops: 0000 [#2] PREEMPT SMP
>> last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:04:03.0/resource
>> Modules linked in: nvidia(P) vmnet vmblock vmci vmmon fcpci(P) capi 
>> capifs kernelcapi nilfs2 scsi_wait_scan
>> Pid: 8494, comm: nilfs_cleanerd Tainted: P      D    (2.6.29server #1) 
>> P5QL-E
>> EIP: 0060:[<c01504f9>] EFLAGS: 00210213 CPU: 2
>> EIP is at find_get_pages+0x79/0xf0
>> EAX: 00000100 EBX: 00000104 ECX: 00000100 EDX: db6a1cc4
>> ESI: 0000000a EDI: db6a1c9c EBP: 0000000a ESP: db6a1c2c
>>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
>> Process nilfs_cleanerd (pid: 8494, ti=db6a0000 task=f75e7800 
>> task.ti=db6a0000)
>> Stack:
>>  0000000e db6a1cc4 00000104 00000002 f70d78a0 0000000e 000f76cf 0000000d
>>  000f76cf db6a1c94 000f76ce 0000000e c0157502 db6a1c9c 000f76cf c1cf16e0
>>  f8330336 0000000e db6a1c98 0000000e f70d7aec f70d78ac f70d7ae0 f70d789c
>> Call Trace:
>>  [<c0157502>] pagevec_lookup+0x22/0x30
>>  [<f8330336>] nilfs_copy_back_pages+0x56/0x220 [nilfs2]
>>  [<f834328b>] nilfs_commit_gcdat_inode+0x8b/0xc0 [nilfs2]
>>  [<f833b1bd>] nilfs_segctor_complete_write+0x2fd/0x310 [nilfs2]
>>  [<f833b914>] nilfs_segctor_do_construct+0x424/0x18c0 [nilfs2]
>>  [<f83319bf>] nilfs_bmap_test_and_clear_dirty+0x2f/0x40 [nilfs2]
>>  [<f833d009>] nilfs_segctor_construct+0x99/0xb0 [nilfs2]
>>  [<f833df1f>] nilfs_clean_segments+0xef/0x200 [nilfs2]
>>  [<f83427e0>] nilfs_ioctl+0x3d0/0x480 [nilfs2]
>>  [<c030f774>] ehci_work+0x124/0x9a0
>>  [<c011da6b>] update_curr+0x7b/0xe0
>>  [<c012eeb7>] lock_timer_base+0x27/0x60
>>  [<c013f55e>] getnstimeofday+0x4e/0x120
>>  [<c0140808>] clocksource_get_next+0x38/0x40
>>  [<f8342410>] nilfs_ioctl+0x0/0x480 [nilfs2]
>>  [<c0180c6b>] vfs_ioctl+0x2b/0x90
>>  [<c0180fdb>] do_vfs_ioctl+0x1eb/0x530
>>  [<c012ebdb>] run_timer_softirq+0x15b/0x190
>>  [<c012a484>] __do_softirq+0x94/0x160
>>  [<c018135d>] sys_ioctl+0x3d/0x70
>>  [<c0103131>] sysenter_do_call+0x12/0x25
>>  [<c0400000>] pci_bus_size_bridges+0x1f0/0x410
>> Code: 00 00 8b 44 24 34 8d 04 b0 89 44 24 04 8b 54 24 04 8b 02 8b 00 a8 
>> 01 75 ba 85 c0 89 c1 74 3e 83 f8 ff 74 af 8d 58 04 89 5c 24 08 <8b> 50 
>> 04 85 d2 74 db 8d 7a 01 89 d0 8b 5c 24 08 f0 0f b1 3b 39
>> EIP: [<c01504f9>] find_get_pages+0x79/0xf0 SS:ESP 0068:db6a1c2c
>> ---[ end trace 86f39789c1fa8999 ]---
>> note: nilfs_cleanerd[8494] exited with preempt_count 1
>>
>> Bye,
>> David Arendt
>> _______________________________________________
>> users mailing list
>> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
>> https://www.nilfs.org/mailman/listinfo/users
>>     

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: error on kernel 2.6.29 while running cleaner on a 1tb volume
       [not found]         ` <49CC6193.9040900-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
@ 2009-03-27  5:55           ` David Arendt
       [not found]             ` <49CC6A6C.9060006-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
  2009-03-27  5:58           ` Ryusuke Konishi
  1 sibling, 1 reply; 15+ messages in thread
From: David Arendt @ 2009-03-27  5:55 UTC (permalink / raw)
  To: NILFS Users mailing list

Hi,

one thing I forgot to mention, in /etc/nilfs_cleanerd.conf I changed 
n_segments_per clean to 20 in order to clean faster when running the 
cleaner manually. Could this have any influence ?

Bye,
David Arendt

David Arendt wrote:
> Hi,
>
> There seems to be some bug in the kernel. On another partition 
> reformatted on week ago, I had again the following error:
>
> NILFS error (device sda3): nilfs_check_page: bad entry in directory 
> #28261: unaligned directory entry - offset=4096, inode=1647255843, 
> rec_len=29537, name_len=104
> NILFS error (device sda3): nilfs_check_page: bad entry in directory 
> #28261: unaligned directory entry - offset=4096, inode=1647255843, 
> rec_len=29537, name_len=104
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42880
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42881
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42882
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42883
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42884
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42885
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42886
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42887
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42888
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42889
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42890
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42892
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42893
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42894
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42895
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42896
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42897
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42898
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42899
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42900
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42901
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42902
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42903
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42904
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42905
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42906
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42907
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42908
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42909
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42910
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42911
> init_special_inode: bogus i_mode (35070)
> init_special_inode: bogus i_mode (30055)
> init_special_inode: bogus i_mode (30070)
> init_special_inode: bogus i_mode (31070)
> init_special_inode: bogus i_mode (31066)
> init_special_inode: bogus i_mode (31461)
> init_special_inode: bogus i_mode (32146)
> init_special_inode: bogus i_mode (32545)
> init_special_inode: bogus i_mode (72162)
> init_special_inode: bogus i_mode (57556)
> init_special_inode: bogus i_mode (72542)
> init_special_inode: bogus i_mode (5042)
> init_special_inode: bogus i_mode (36504)
> NILFS error (device sda3): nilfs_check_page: bad entry in directory 
> #469193: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, 
> name_len=0
> NILFS error (device sda3): nilfs_readdir: bad page in #469193
> NILFS error (device sda3): nilfs_check_page: bad entry in directory 
> #469195: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, 
> name_len=0
> NILFS error (device sda3): nilfs_readdir: bad page in #469195
> NILFS error (device sda3): nilfs_check_page: bad entry in directory 
> #468107: directory entry across blocks - offset=0, inode=1095777639, 
> rec_len=26480, name_len=61
> NILFS error (device sda3): nilfs_readdir: bad page in #468107
> NILFS error (device sda3): nilfs_readdir: bad page in #28261
> ------------[ cut here ]------------
> WARNING: at /home/admin/x/nilfs-2.0.11/fs/dat.c:182 
> nilfs_dat_prepare_end+0xb0/0xc0 [nilfs2]()
> Hardware name: P5QL-E
> Modules linked in: nvidia(P) vmnet vmblock vmci vmmon fcpci(P) capi 
> capifs kernelcapi nilfs2 scsi_wait_scan
> Pid: 333, comm: kswapd0 Tainted: P           2.6.29server #1
> Call Trace:
>  [<c0125b99>] warn_slowpath+0x99/0xc0
>  [<c014e390>] find_get_page+0x30/0xc0
>  [<f83410b0>] nilfs_palloc_bitmap_blkoff+0x40/0x60 [nilfs2]
>  [<f834118b>] nilfs_palloc_get_entry_block+0x5b/0x70 [nilfs2]
>  [<c01506aa>] find_or_create_page+0x2a/0xa0
>  [<f83366d8>] nilfs_dat_prepare_entry+0x18/0x20 [nilfs2]
>  [<f8336be0>] nilfs_dat_prepare_end+0xb0/0xc0 [nilfs2]
>  [<f83364c2>] nilfs_direct_delete+0x62/0xa0 [nilfs2]
>  [<f8331e46>] nilfs_bmap_do_delete+0xb6/0xc0 [nilfs2]
>  [<c0157502>] pagevec_lookup+0x22/0x30
>  [<c0157c29>] truncate_inode_pages_range+0x179/0x310
>  [<f8331ecb>] nilfs_bmap_truncate+0x7b/0xa0 [nilfs2]
>  [<f832b53a>] nilfs_truncate_bmap+0x6a/0x100 [nilfs2]
>  [<f832c028>] nilfs_delete_inode+0x38/0xc0 [nilfs2]
>  [<c019c4b7>] inotify_inode_is_dead+0x17/0x80
>  [<f832bff0>] nilfs_delete_inode+0x0/0xc0 [nilfs2]
>  [<c018626e>] generic_delete_inode+0x6e/0x100
>  [<c02353cb>] _atomic_dec_and_lock+0x3b/0x70
>  [<c0185bf4>] iput+0x44/0x50
>  [<c0183695>] d_kill+0x35/0x60
>  [<c0183860>] __shrink_dcache_sb+0x1a0/0x280
>  [<c0183add>] shrink_dcache_memory+0x18d/0x1b0
>  [<c0159dbb>] shrink_slab+0x12b/0x190
>  [<c015a53c>] kswapd+0x35c/0x530
>  [<c0158070>] isolate_pages_global+0x0/0x210
>  [<c0138940>] autoremove_wake_function+0x0/0x50
>  [<c011d33d>] complete+0x3d/0x60
>  [<c015a1e0>] kswapd+0x0/0x530
>  [<c0138622>] kthread+0x42/0x70
>  [<c01385e0>] kthread+0x0/0x70
>  [<c010391b>] kernel_thread_helper+0x7/0x1c
> ---[ end trace fcc3f79f56f6e698 ]---
> NILFS warning (device sda3): nilfs_truncate_bmap: failed to truncate 
> bmap (ino=468107, err=-2)
>
> nilfs seems to run absolutely stable as long as the cleaner is running, 
> but the cleaner seems to cause corruption. This time, I paid attention 
> to run the cleaner always when there was more than 5 gigabytes (20%) of 
> freee space on the volume.
>
> Bye,
> David Arendt
>
> Ryusuke Konishi wrote:
>   
>> Hi,
>> On Wed, 25 Mar 2009 06:22:09 +0100, David Arendt <admin-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> wrote:
>>   
>>     
>>> Hi,
>>>
>>> First of all, please don't get me wrong for posting all this bug 
>>> reports. It is not in the sense of complaining me. I am very satisfied 
>>> with nilfs2. As I am a software developer myself, I always like 
>>> receiving bug reports. What I hate most is people complaining in the 
>>> sense nothing is working without any more information.
>>>     
>>>       
>> David, I really appreciated your feedback, so feel free to report
>> bugs ;)
>>
>> Though we have not caught up with all your reports, we're keeping them
>> on record.  I believe we will be able to cut down the problems sooner
>> or later.
>>
>>   
>>     
>>> So here an error on kernel 2.6.29 while running cleaner on a 1 tb 
>>> volume. I am not sure if it is nilfs related, but I post it for your 
>>> information.
>>>     
>>>       
>> Thanks for the below information.
>> I'll try some tests on 2.6.29.
>> It's more likely to be affected by a page cache change.
>>
>> Regards,
>> Ryusuke Konishi
>>
>>   
>>     
>>> BUG: unable to handle kernel paging request at 9c5c67f0
>>> IP: [<c0239049>] radix_tree_delete+0x19/0x220
>>> *pdpt = 0000000030490001 *pde = 0000000000000000
>>> Oops: 0000 [#1] PREEMPT SMP
>>> last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:04:03.0/resource
>>> Modules linked in: nvidia(P) vmnet vmblock vmci vmmon fcpci(P) capi 
>>> capifs kernelcapi nilfs2 scsi_wait_scan
>>>
>>> Pid: 333, comm: kswapd0 Tainted: P           (2.6.29server #1) P5QL-E
>>> EIP: 0060:[<c0239049>] EFLAGS: 00010092 CPU: 3
>>> EIP is at radix_tree_delete+0x19/0x220
>>> EAX: 0537456a EBX: 00000000 ECX: f73f10d4 EDX: f701c598
>>> ESI: f73f10d4 EDI: f73f10e4 EBP: f73f10d8 ESP: f76e9d08
>>>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>>> Process kswapd0 (pid: 333, ti=f76e8000 task=f76885c0 task.ti=f76e8000)
>>> Stack:
>>>  00000000 f6fc0040 0537456a 00000000 00000000 00000000 f76e9d2c c0169827
>>>  f5f0d3d0 f56e482c 00000080 0000c400 00000000 00000000 0000c5ab 0000c5ab
>>>  00000000 f56e482c c108dee0 f73f10d4 f73f10e4 f76e9ebc c014fc15 0000c5ab
>>> Call Trace:
>>>  [<c0169827>] page_referenced_file+0x77/0x90
>>>  [<c014fc15>] __remove_from_page_cache+0x15/0x90
>>>  [<c0158864>] __remove_mapping+0x84/0xc0
>>>  [<c01590f3>] shrink_page_list+0x393/0x6d0
>>>  [<c0158cf2>] shrink_active_list+0x332/0x3a0
>>>  [<c01580f8>] isolate_pages_global+0x88/0x210
>>>  [<c0156e39>] ____pagevec_lru_add+0x119/0x130
>>>  [<c0159654>] shrink_list+0x224/0x560
>>>  [<c0159c07>] shrink_zone+0x277/0x300
>>>  [<c015a6f8>] kswapd+0x518/0x530
>>>  [<c0158070>] isolate_pages_global+0x0/0x210
>>>  [<c0138940>] autoremove_wake_function+0x0/0x50
>>>  [<c011d33d>] complete+0x3d/0x60
>>>  [<c015a1e0>] kswapd+0x0/0x530
>>>  [<c0138622>] kthread+0x42/0x70
>>>  [<c01385e0>] kthread+0x0/0x70
>>>  [<c010391b>] kernel_thread_helper+0x7/0x1c
>>> Code: 89 f8 5b 5e 5f 5d c3 0f 0b eb fe 0f 0b eb fe 8d 76 00 55 89 c5 57 
>>> 56 53 31 db 83 ec 48 89 54 24 08 8b 10 8b 44 24 08 89 5c 24 0c <39> 04 
>>> 95 90 51 55 c0 0f 82 1c 01 00 00 8b 4d 08 85 d2 89 4c 24
>>> EIP: [<c0239049>] radix_tree_delete+0x19/0x220 SS:ESP 0068:f76e9d08
>>> ---[ end trace 86f39789c1fa8998 ]---
>>> note: kswapd0[333] exited with preempt_count 1
>>> BUG: unable to handle kernel NULL pointer dereference at 00000104
>>> IP: [<c01504f9>] find_get_pages+0x79/0xf0
>>> *pdpt = 0000000030e61001 *pde = 0000000000000000
>>> Oops: 0000 [#2] PREEMPT SMP
>>> last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:04:03.0/resource
>>> Modules linked in: nvidia(P) vmnet vmblock vmci vmmon fcpci(P) capi 
>>> capifs kernelcapi nilfs2 scsi_wait_scan
>>> Pid: 8494, comm: nilfs_cleanerd Tainted: P      D    (2.6.29server #1) 
>>> P5QL-E
>>> EIP: 0060:[<c01504f9>] EFLAGS: 00210213 CPU: 2
>>> EIP is at find_get_pages+0x79/0xf0
>>> EAX: 00000100 EBX: 00000104 ECX: 00000100 EDX: db6a1cc4
>>> ESI: 0000000a EDI: db6a1c9c EBP: 0000000a ESP: db6a1c2c
>>>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
>>> Process nilfs_cleanerd (pid: 8494, ti=db6a0000 task=f75e7800 
>>> task.ti=db6a0000)
>>> Stack:
>>>  0000000e db6a1cc4 00000104 00000002 f70d78a0 0000000e 000f76cf 0000000d
>>>  000f76cf db6a1c94 000f76ce 0000000e c0157502 db6a1c9c 000f76cf c1cf16e0
>>>  f8330336 0000000e db6a1c98 0000000e f70d7aec f70d78ac f70d7ae0 f70d789c
>>> Call Trace:
>>>  [<c0157502>] pagevec_lookup+0x22/0x30
>>>  [<f8330336>] nilfs_copy_back_pages+0x56/0x220 [nilfs2]
>>>  [<f834328b>] nilfs_commit_gcdat_inode+0x8b/0xc0 [nilfs2]
>>>  [<f833b1bd>] nilfs_segctor_complete_write+0x2fd/0x310 [nilfs2]
>>>  [<f833b914>] nilfs_segctor_do_construct+0x424/0x18c0 [nilfs2]
>>>  [<f83319bf>] nilfs_bmap_test_and_clear_dirty+0x2f/0x40 [nilfs2]
>>>  [<f833d009>] nilfs_segctor_construct+0x99/0xb0 [nilfs2]
>>>  [<f833df1f>] nilfs_clean_segments+0xef/0x200 [nilfs2]
>>>  [<f83427e0>] nilfs_ioctl+0x3d0/0x480 [nilfs2]
>>>  [<c030f774>] ehci_work+0x124/0x9a0
>>>  [<c011da6b>] update_curr+0x7b/0xe0
>>>  [<c012eeb7>] lock_timer_base+0x27/0x60
>>>  [<c013f55e>] getnstimeofday+0x4e/0x120
>>>  [<c0140808>] clocksource_get_next+0x38/0x40
>>>  [<f8342410>] nilfs_ioctl+0x0/0x480 [nilfs2]
>>>  [<c0180c6b>] vfs_ioctl+0x2b/0x90
>>>  [<c0180fdb>] do_vfs_ioctl+0x1eb/0x530
>>>  [<c012ebdb>] run_timer_softirq+0x15b/0x190
>>>  [<c012a484>] __do_softirq+0x94/0x160
>>>  [<c018135d>] sys_ioctl+0x3d/0x70
>>>  [<c0103131>] sysenter_do_call+0x12/0x25
>>>  [<c0400000>] pci_bus_size_bridges+0x1f0/0x410
>>> Code: 00 00 8b 44 24 34 8d 04 b0 89 44 24 04 8b 54 24 04 8b 02 8b 00 a8 
>>> 01 75 ba 85 c0 89 c1 74 3e 83 f8 ff 74 af 8d 58 04 89 5c 24 08 <8b> 50 
>>> 04 85 d2 74 db 8d 7a 01 89 d0 8b 5c 24 08 f0 0f b1 3b 39
>>> EIP: [<c01504f9>] find_get_pages+0x79/0xf0 SS:ESP 0068:db6a1c2c
>>> ---[ end trace 86f39789c1fa8999 ]---
>>> note: nilfs_cleanerd[8494] exited with preempt_count 1
>>>
>>> Bye,
>>> David Arendt
>>> _______________________________________________
>>> users mailing list
>>> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
>>> https://www.nilfs.org/mailman/listinfo/users
>>>     
>>>       
>
> _______________________________________________
> users mailing list
> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> https://www.nilfs.org/mailman/listinfo/users
>   

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: error on kernel 2.6.29 while running cleaner on a 1tb volume
       [not found]         ` <49CC6193.9040900-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
  2009-03-27  5:55           ` David Arendt
@ 2009-03-27  5:58           ` Ryusuke Konishi
       [not found]             ` <20090327.145831.16149916.ryusuke-sG5X7nlA6pw@public.gmane.org>
  1 sibling, 1 reply; 15+ messages in thread
From: Ryusuke Konishi @ 2009-03-27  5:58 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg, admin-/LHdS3kC8BfYtjvyW6yDsg

Hi David,

On Fri, 27 Mar 2009 05:18:11 +0000, David Arendt <admin-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> wrote:
> Hi,
> 
> There seems to be some bug in the kernel. On another partition 
> reformatted on week ago, I had again the following error:
> 
> NILFS error (device sda3): nilfs_check_page: bad entry in directory 
> #28261: unaligned directory entry - offset=4096, inode=1647255843, 
> rec_len=29537, name_len=104
> NILFS error (device sda3): nilfs_check_page: bad entry in directory 
> #28261: unaligned directory entry - offset=4096, inode=1647255843, 
> rec_len=29537, name_len=104
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42880
<snip>
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42910
> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read 
> inode: 42911
> init_special_inode: bogus i_mode (35070)
> init_special_inode: bogus i_mode (30055)
<snip>
> init_special_inode: bogus i_mode (36504)

Uum, this time, ifile (i.e. inode index file) seems to be broken.

Do you think probability of the fault depends on the kernel version?

And, is it reproducible after umount(or reboot) and mount -i (= mount
without GC) ?

We partially succeeded to reproduce corrpution under a near disk full
condition, and are trying to narrow down the occurrence condition.  I
now suspect cache coherence violation between GC cache and regular
page caches, but it's uncorroborated so far.

With regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: error on kernel 2.6.29 while running cleaner on a 1tb volume
       [not found]             ` <49CC6A6C.9060006-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
@ 2009-03-27  6:20               ` Ryusuke Konishi
       [not found]                 ` <20090327.152005.04656990.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Ryusuke Konishi @ 2009-03-27  6:20 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg, admin-/LHdS3kC8BfYtjvyW6yDsg

Hi,
On Fri, 27 Mar 2009 06:55:56 +0100, David Arendt <admin-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> wrote:
> Hi,
> 
> one thing I forgot to mention, in /etc/nilfs_cleanerd.conf I changed 
> n_segments_per clean to 20 in order to clean faster when running the 
> cleaner manually. Could this have any influence ?

Yes, maybe.  It raises memory pressure then may induce unusual path of
execution like cache invalidation.  It may even increase the chance of
revealing underlying problems in relocation of on-disk blocks.

Decreasing cleaning_interval is safer in general.  We'll try the
condition.

Regards,
Ryusuke

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: error on kernel 2.6.29 while running cleaner on a 1tb volume
       [not found]                 ` <20090327.152005.04656990.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-03-27 10:47                   ` Ryusuke Konishi
       [not found]                     ` <20090327.194735.32664212.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Ryusuke Konishi @ 2009-03-27 10:47 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg, admin-/LHdS3kC8BfYtjvyW6yDsg

Hi David,
On Fri, 27 Mar 2009 15:20:05 +0900 (JST), Ryusuke Konishi wrote:
> Hi,
> On Fri, 27 Mar 2009 06:55:56 +0100, David Arendt <admin-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> wrote:
> > Hi,
> > 
> > one thing I forgot to mention, in /etc/nilfs_cleanerd.conf I changed 
> > n_segments_per clean to 20 in order to clean faster when running the 
> > cleaner manually. Could this have any influence ?
> 
> Yes, maybe.  It raises memory pressure then may induce unusual path of
> execution like cache invalidation.  It may even increase the chance of
> revealing underlying problems in relocation of on-disk blocks.
> 
> Decreasing cleaning_interval is safer in general.  We'll try the
> condition.
> 
> Regards,
> Ryusuke

I examined the case of nsegments_per_clean = 20 and met an
inconsistent state as follows:

 # lssu -a
             SEGNUM        DATE     TIME STAT     NBLOCKS
               ...
               7418  2009-03-27 18:41:33  -d-        2048
               7419  2009-03-27 18:41:48  -d-        2048
               7420  2009-03-27 18:42:08  -d-        2048
               7421  2009-03-27 18:42:28  -d-        2048
               7422  2009-03-27 18:42:48  ---        2048
               7423  2009-03-27 18:43:03  ---        2048
               7424  2009-03-27 18:43:23  -d-        2048
               7425  2009-03-27 18:43:33  ad-        1166
               7426  ---------- --:--:--  ad-           0
               7427  ---------- --:--:--  ---           0
               ...

Here, the segment 7422 and 7423 are in-use but not dirty.

This is crucial because these segments will be reallocated and
overridden later.  I suspect there is a bug of error handling
somewhere, and it evaporates the dirty flag and causes the crash.

If you have a (not broken) nilfs partition made under heavy stress,
could you try ``lssu -a'' likewise ?

I'll dig into this from now.

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: error on kernel 2.6.29 while running cleaner on a 1tb volume
       [not found]                     ` <20090327.194735.32664212.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-03-27 11:13                       ` admin-/LHdS3kC8BfYtjvyW6yDsg
  2009-03-28  8:09                       ` David Arendt
  1 sibling, 0 replies; 15+ messages in thread
From: admin-/LHdS3kC8BfYtjvyW6yDsg @ 2009-03-27 11:13 UTC (permalink / raw)
  To: Ryusuke Konishi
  Cc: admin-/LHdS3kC8BfYtjvyW6yDsg, users-JrjvKiOkagjYtjvyW6yDsg

Hi,

I tried an lssu -a /dev/... | grep -e "2009-" | grep -e "---" without
receiving a result, so I suppose on my actual nilfs2 filesystems there are
no in use but not dirty segments.

Bye,
David Arendt

> Hi David,
> On Fri, 27 Mar 2009 15:20:05 +0900 (JST), Ryusuke Konishi wrote:
>> Hi,
>> On Fri, 27 Mar 2009 06:55:56 +0100, David Arendt <admin-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
>> wrote:
>> > Hi,
>> >
>> > one thing I forgot to mention, in /etc/nilfs_cleanerd.conf I changed
>> > n_segments_per clean to 20 in order to clean faster when running the
>> > cleaner manually. Could this have any influence ?
>>
>> Yes, maybe.  It raises memory pressure then may induce unusual path of
>> execution like cache invalidation.  It may even increase the chance of
>> revealing underlying problems in relocation of on-disk blocks.
>>
>> Decreasing cleaning_interval is safer in general.  We'll try the
>> condition.
>>
>> Regards,
>> Ryusuke
>
> I examined the case of nsegments_per_clean = 20 and met an
> inconsistent state as follows:
>
>  # lssu -a
>              SEGNUM        DATE     TIME STAT     NBLOCKS
>                ...
>                7418  2009-03-27 18:41:33  -d-        2048
>                7419  2009-03-27 18:41:48  -d-        2048
>                7420  2009-03-27 18:42:08  -d-        2048
>                7421  2009-03-27 18:42:28  -d-        2048
>                7422  2009-03-27 18:42:48  ---        2048
>                7423  2009-03-27 18:43:03  ---        2048
>                7424  2009-03-27 18:43:23  -d-        2048
>                7425  2009-03-27 18:43:33  ad-        1166
>                7426  ---------- --:--:--  ad-           0
>                7427  ---------- --:--:--  ---           0
>                ...
>
> Here, the segment 7422 and 7423 are in-use but not dirty.
>
> This is crucial because these segments will be reallocated and
> overridden later.  I suspect there is a bug of error handling
> somewhere, and it evaporates the dirty flag and causes the crash.
>
> If you have a (not broken) nilfs partition made under heavy stress,
> could you try ``lssu -a'' likewise ?
>
> I'll dig into this from now.
>
> Regards,
> Ryusuke Konishi
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: error on kernel 2.6.29 while running cleaner on a 1tb volume
       [not found]             ` <20090327.145831.16149916.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-03-27 11:20               ` admin-/LHdS3kC8BfYtjvyW6yDsg
       [not found]                 ` <44728.212.24.212.169.1238152837.squirrel-YfwCgBv0H3oBXFe83j6qeQ@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: admin-/LHdS3kC8BfYtjvyW6yDsg @ 2009-03-27 11:20 UTC (permalink / raw)
  To: Ryusuke Konishi
  Cc: admin-/LHdS3kC8BfYtjvyW6yDsg, users-JrjvKiOkagjYtjvyW6yDsg

Hi,

Maybe I'm wrong, but I think the probability of the fault is not kernel
dependent as there have been similar problems on 2.6.28.8.

The error is reproducible after a reboot and mount -i.

Bye,
David Arendt

> Hi David,
>
> On Fri, 27 Mar 2009 05:18:11 +0000, David Arendt <admin-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> wrote:
>> Hi,
>>
>> There seems to be some bug in the kernel. On another partition
>> reformatted on week ago, I had again the following error:
>>
>> NILFS error (device sda3): nilfs_check_page: bad entry in directory
>> #28261: unaligned directory entry - offset=4096, inode=1647255843,
>> rec_len=29537, name_len=104
>> NILFS error (device sda3): nilfs_check_page: bad entry in directory
>> #28261: unaligned directory entry - offset=4096, inode=1647255843,
>> rec_len=29537, name_len=104
>> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read
>> inode: 42880
> <snip>
>> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read
>> inode: 42910
>> NILFS warning (device sda3): nilfs_ifile_get_inode_block: unable to read
>> inode: 42911
>> init_special_inode: bogus i_mode (35070)
>> init_special_inode: bogus i_mode (30055)
> <snip>
>> init_special_inode: bogus i_mode (36504)
>
> Uum, this time, ifile (i.e. inode index file) seems to be broken.
>
> Do you think probability of the fault depends on the kernel version?
>
> And, is it reproducible after umount(or reboot) and mount -i (= mount
> without GC) ?
>
> We partially succeeded to reproduce corrpution under a near disk full
> condition, and are trying to narrow down the occurrence condition.  I
> now suspect cache coherence violation between GC cache and regular
> page caches, but it's uncorroborated so far.
>
> With regards,
> Ryusuke Konishi
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: error on kernel 2.6.29 while running cleaner on a 1tb volume
       [not found]                 ` <44728.212.24.212.169.1238152837.squirrel-YfwCgBv0H3oBXFe83j6qeQ@public.gmane.org>
@ 2009-03-27 11:36                   ` Ryusuke Konishi
  0 siblings, 0 replies; 15+ messages in thread
From: Ryusuke Konishi @ 2009-03-27 11:36 UTC (permalink / raw)
  To: admin-/LHdS3kC8BfYtjvyW6yDsg; +Cc: users-JrjvKiOkagjYtjvyW6yDsg

On Fri, 27 Mar 2009 12:20:37 +0100 (CET), admin-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org wrote:
> Hi,
> 
> Maybe I'm wrong, but I think the probability of the fault is not kernel
> dependent as there have been similar problems on 2.6.28.8.
> 
> The error is reproducible after a reboot and mount -i.
> 
> Bye,
> David Arendt

Yeah, this problem seems independent of kernel version.
Thanks for the responses, they're really helpful for narrowing down
the problem.

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: error on kernel 2.6.29 while running cleaner on a 1tb volume
       [not found]                     ` <20090327.194735.32664212.ryusuke-sG5X7nlA6pw@public.gmane.org>
  2009-03-27 11:13                       ` admin-/LHdS3kC8BfYtjvyW6yDsg
@ 2009-03-28  8:09                       ` David Arendt
       [not found]                         ` <49CDDB37.9030603-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
  1 sibling, 1 reply; 15+ messages in thread
From: David Arendt @ 2009-03-28  8:09 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: users-JrjvKiOkagjYtjvyW6yDsg

Hi,

today I have tried the lssu on a dedicated server running nilfs and here 
I had the following result:

fr ~ # lssu -a /dev/sda2 | grep -e "2009-" | grep -v -e "-d-"
                2558  2009-03-23 16:59:05  ---        2048
                4967  2009-03-28 09:07:10  ad-        1928

so I suppose corruption will soon occur here.

Is there something I can do to manually mark it as dirty or should I go 
the backup/restore route ?

Thanks in advance

Bye,
David Arendt


Ryusuke Konishi wrote:
> Hi David,
> On Fri, 27 Mar 2009 15:20:05 +0900 (JST), Ryusuke Konishi wrote:
>   
>> Hi,
>> On Fri, 27 Mar 2009 06:55:56 +0100, David Arendt <admin-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> wrote:
>>     
>>> Hi,
>>>
>>> one thing I forgot to mention, in /etc/nilfs_cleanerd.conf I changed 
>>> n_segments_per clean to 20 in order to clean faster when running the 
>>> cleaner manually. Could this have any influence ?
>>>       
>> Yes, maybe.  It raises memory pressure then may induce unusual path of
>> execution like cache invalidation.  It may even increase the chance of
>> revealing underlying problems in relocation of on-disk blocks.
>>
>> Decreasing cleaning_interval is safer in general.  We'll try the
>> condition.
>>
>> Regards,
>> Ryusuke
>>     
>
> I examined the case of nsegments_per_clean = 20 and met an
> inconsistent state as follows:
>
>  # lssu -a
>              SEGNUM        DATE     TIME STAT     NBLOCKS
>                ...
>                7418  2009-03-27 18:41:33  -d-        2048
>                7419  2009-03-27 18:41:48  -d-        2048
>                7420  2009-03-27 18:42:08  -d-        2048
>                7421  2009-03-27 18:42:28  -d-        2048
>                7422  2009-03-27 18:42:48  ---        2048
>                7423  2009-03-27 18:43:03  ---        2048
>                7424  2009-03-27 18:43:23  -d-        2048
>                7425  2009-03-27 18:43:33  ad-        1166
>                7426  ---------- --:--:--  ad-           0
>                7427  ---------- --:--:--  ---           0
>                ...
>
> Here, the segment 7422 and 7423 are in-use but not dirty.
>
> This is crucial because these segments will be reallocated and
> overridden later.  I suspect there is a bug of error handling
> somewhere, and it evaporates the dirty flag and causes the crash.
>
> If you have a (not broken) nilfs partition made under heavy stress,
> could you try ``lssu -a'' likewise ?
>
> I'll dig into this from now.
>
> Regards,
> Ryusuke Konishi
>   

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: error on kernel 2.6.29 while running cleaner on a 1tb volume
       [not found]                         ` <49CDDB37.9030603-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
@ 2009-03-28 12:52                           ` Ryusuke Konishi
       [not found]                             ` <20090328.215257.15833655.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Ryusuke Konishi @ 2009-03-28 12:52 UTC (permalink / raw)
  To: admin-/LHdS3kC8BfYtjvyW6yDsg; +Cc: users-JrjvKiOkagjYtjvyW6yDsg

Hi,
On Sat, 28 Mar 2009 09:09:27 +0100, David Arendt <admin-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> wrote:
> Hi,
> 
> today I have tried the lssu on a dedicated server running nilfs and here 
> I had the following result:
> 
> fr ~ # lssu -a /dev/sda2 | grep -e "2009-" | grep -v -e "-d-"
>                 2558  2009-03-23 16:59:05  ---        2048
>                 4967  2009-03-28 09:07:10  ad-        1928
> 
> so I suppose corruption will soon occur here.

Oh, it would come.

> Is there something I can do to manually mark it as dirty or should I go 
> the backup/restore route ?

No, sorry.  You may as well go the backup/restore route.

BTW, I found a bug in sufile that may relate to this problem.  The
following patch fixes the bug. (I'm now testing this)

If I can confirm that the patch has effect on the dirty flag
evaporation, I will release an update ASAP.

Othewise, I'll continue debugging.
Please try the patch in the meantime.

Regards,
Ryusuke Konishi

diff --git a/fs/sufile.c b/fs/sufile.c
index e64a5de..0ea8558 100644
--- a/fs/sufile.c
+++ b/fs/sufile.c
@@ -553,7 +553,6 @@ int nilfs_sufile_set_error(struct inode *sufile, __u64 segnum)
 
 	nilfs_segment_usage_set_error(su);
 	kunmap_atomic(kaddr, KM_USER0);
-	brelse(su_bh);
 
 	kaddr = kmap_atomic(header_bh->b_page, KM_USER0);
 	header = nilfs_sufile_block_get_header(sufile, header_bh, kaddr);
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: error on kernel 2.6.29 while running cleaner on a 1tb volume
       [not found]                             ` <20090328.215257.15833655.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-03-29 15:25                               ` David Arendt
       [not found]                                 ` <49CF92EC.2020803-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: David Arendt @ 2009-03-29 15:25 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: users-JrjvKiOkagjYtjvyW6yDsg

Hi,

Many thanks for the patch.
I have seen that you already have included the latest patch in git so I 
used the git version. I have done a backup/restore on my nilfs2 
partitions in order to be sure to start with a clean state. So far no 
corruption did occur and and all used segments have been marked dirty. 
As generally the corruption only occurred after several times of 
cleaning, I can only say in a few days, if the patch really solved the 
problem.

I have however had the following result on a fresh restored 1tb 
partition where the cleaner has not been  run yet:

server ~ # lssu -a /dev/sda10 | grep -e "2009-" | grep -v -e "-d-"
               14335  2009-03-29 01:44:28  ad-        2048
               14589  2009-03-29 01:46:23  ad-         941

For all other partitions I have only one segment marked as active. Can 
it be a normal case for nilfs2 that 2 segments are marked as active or 
is there something weird going on here ? dmesg returns nothing special 
about this volume. There has also been no system crash so this volume 
should have been mounted/unmounted correctly.

Bye,
David Arendt

Ryusuke Konishi wrote:
> Hi,
> On Sat, 28 Mar 2009 09:09:27 +0100, David Arendt <admin-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> wrote:
>   
>> Hi,
>>
>> today I have tried the lssu on a dedicated server running nilfs and here 
>> I had the following result:
>>
>> fr ~ # lssu -a /dev/sda2 | grep -e "2009-" | grep -v -e "-d-"
>>                 2558  2009-03-23 16:59:05  ---        2048
>>                 4967  2009-03-28 09:07:10  ad-        1928
>>
>> so I suppose corruption will soon occur here.
>>     
>
> Oh, it would come.
>
>   
>> Is there something I can do to manually mark it as dirty or should I go 
>> the backup/restore route ?
>>     
>
> No, sorry.  You may as well go the backup/restore route.
>
> BTW, I found a bug in sufile that may relate to this problem.  The
> following patch fixes the bug. (I'm now testing this)
>
> If I can confirm that the patch has effect on the dirty flag
> evaporation, I will release an update ASAP.
>
> Othewise, I'll continue debugging.
> Please try the patch in the meantime.
>
> Regards,
> Ryusuke Konishi
>
> diff --git a/fs/sufile.c b/fs/sufile.c
> index e64a5de..0ea8558 100644
> --- a/fs/sufile.c
> +++ b/fs/sufile.c
> @@ -553,7 +553,6 @@ int nilfs_sufile_set_error(struct inode *sufile, __u64 segnum)
>  
>  	nilfs_segment_usage_set_error(su);
>  	kunmap_atomic(kaddr, KM_USER0);
> -	brelse(su_bh);
>  
>  	kaddr = kmap_atomic(header_bh->b_page, KM_USER0);
>  	header = nilfs_sufile_block_get_header(sufile, header_bh, kaddr);
>   

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: error on kernel 2.6.29 while running cleaner on a 1tb volume
       [not found]                                 ` <49CF92EC.2020803-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
@ 2009-03-29 16:38                                   ` Ryusuke Konishi
  0 siblings, 0 replies; 15+ messages in thread
From: Ryusuke Konishi @ 2009-03-29 16:38 UTC (permalink / raw)
  To: admin-/LHdS3kC8BfYtjvyW6yDsg; +Cc: users-JrjvKiOkagjYtjvyW6yDsg

Hi David,
On Sun, 29 Mar 2009 17:25:32 +0200, David Arendt <admin-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> wrote:
> Hi,
> 
> Many thanks for the patch.
> I have seen that you already have included the latest patch in git so I 
> used the git version. I have done a backup/restore on my nilfs2 
> partitions in order to be sure to start with a clean state. So far no 
> corruption did occur and and all used segments have been marked dirty. 
> As generally the corruption only occurred after several times of 
> cleaning, I can only say in a few days, if the patch really solved the 
> problem.

I found another bug which seems the true cause of this problem.
I've just pushed the bugfix to the git repo, so please apply it, too.

After it's verified, I'd like to release the next version.

> I have however had the following result on a fresh restored 1tb 
> partition where the cleaner has not been  run yet:
> 
> server ~ # lssu -a /dev/sda10 | grep -e "2009-" | grep -v -e "-d-"
>                14335  2009-03-29 01:44:28  ad-        2048
>                14589  2009-03-29 01:46:23  ad-         941
> 
> For all other partitions I have only one segment marked as active. Can 
> it be a normal case for nilfs2 that 2 segments are marked as active or 
> is there something weird going on here ? dmesg returns nothing special 
> about this volume. There has also been no system crash so this volume 
> should have been mounted/unmounted correctly.

Nilfs keeps the current segment and next segment as active, so usually
it has two active segments.  But we may see the above case if the
current segment is fully empty.

Othewise, the above bugfix may relate to this; the bugfix corrects the
phenomenon that the active flag appears on wrong segments.

Anyway, it's early to make a toast ;)
I hope the latest bugfix will settle the mess.

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-03-29 16:38 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-25  5:22 error on kernel 2.6.29 while running cleaner on a 1tb volume David Arendt
     [not found] ` <49C9BF81.6090203-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
2009-03-25 11:18   ` admin-/LHdS3kC8BfYtjvyW6yDsg
2009-03-25 17:19   ` Ryusuke Konishi
     [not found]     ` <20090326.021932.61004088.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-03-27  5:18       ` David Arendt
     [not found]         ` <49CC6193.9040900-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
2009-03-27  5:55           ` David Arendt
     [not found]             ` <49CC6A6C.9060006-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
2009-03-27  6:20               ` Ryusuke Konishi
     [not found]                 ` <20090327.152005.04656990.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-03-27 10:47                   ` Ryusuke Konishi
     [not found]                     ` <20090327.194735.32664212.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-03-27 11:13                       ` admin-/LHdS3kC8BfYtjvyW6yDsg
2009-03-28  8:09                       ` David Arendt
     [not found]                         ` <49CDDB37.9030603-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
2009-03-28 12:52                           ` Ryusuke Konishi
     [not found]                             ` <20090328.215257.15833655.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-03-29 15:25                               ` David Arendt
     [not found]                                 ` <49CF92EC.2020803-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
2009-03-29 16:38                                   ` Ryusuke Konishi
2009-03-27  5:58           ` Ryusuke Konishi
     [not found]             ` <20090327.145831.16149916.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-03-27 11:20               ` admin-/LHdS3kC8BfYtjvyW6yDsg
     [not found]                 ` <44728.212.24.212.169.1238152837.squirrel-YfwCgBv0H3oBXFe83j6qeQ@public.gmane.org>
2009-03-27 11:36                   ` Ryusuke Konishi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.