Linux NILFS development
 help / color / mirror / Atom feed
* NilFS cleanerd bugreport
@ 2009-01-28 20:52 Reinoud Zandijk
       [not found] ` <20090128205223.GA416-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Reinoud Zandijk @ 2009-01-28 20:52 UTC (permalink / raw)
  To: nilfs.org

Dear folks, dear Ryusuke,

I've found a bug in the cleanerd/nilfs interaction that might give rise to the
various problems we've seen recently with the cleanerd. It comes down to the
wrong counting of the number of dirty segments and the wrong counting of the
number of checkpoints.

I created this disc using the NiLFS version 2.05 with 2.06 userland (AFAIK)
with mkfs.nilfs and created a sparse file on it with my sparse file generator
I created for UDF testing. It dismounted fine giving a nilfs_dump
`vnd0e-dump-3'. When i remounted it again, the cleanerd started after a while
and after unmounting it gives `vnd03-dump-3-cleanerd'. A diff shows:

(superblock)
--- /root/luiaard/root/vnd0e-dump-3	2009-01-25 17:10:22.000000000 +0100
+++ /root/luiaard/root/vnd0e-dump-3-cleanerd	2009-01-28 17:24:07.000000000 +0100
@@ -7,7 +7,7 @@
 
 	Flags                       0x0000
 	CRC seed                    0xd4dd3d5a
-	Checksum (CRC)              0x05ec6c58 (OK)
+	Checksum (CRC)              0xddd0a2f7 (OK)
 
 	Blocksize                   4096
 	Number of segments          499
@@ -17,15 +17,15 @@
 	Blocks per segment          2048
 	Reserved segments percent   5
 
-	Last checkpoint number      8
-	Last pseg blocknr writen    12288
+	Last checkpoint number      11
+	Last pseg blocknr writen    13726
 	Seq. number last segment    6
-	Free blocks count           1005568
+	Free blocks count           1015808
 	FS Creation time            Sun Jan 25 17:05:10 2009
-	FS last mount time          Sun Jan 25 17:05:14 2009
-	FS last write time          Sun Jan 25 17:06:02 2009
+	FS last mount time          Wed Jan 28 17:21:25 2009
+	FS last write time          Wed Jan 28 17:21:44 2009
 
-	Mount count                 1
+	Mount count                 2
 	Max mount count             50
 	FS state                    0x1<VALID_FS>
 	Error behaviour flags       0x0001


And the su and cp files give:

@@ -30743,34 +31480,34 @@
 Reading file `SU.out` for 1 blocks (4 Kb)
 
 	SU file dump
-		nclean       491
-		ndirty       8
+		nclean       496
+		ndirty       21474836483
 		last alloced 7
 
 		Segment 0
-		Last modified Sun Jan 25 17:05:28 2009
-		Containing nblks 2047
-		Flags            0x2<DIRTY>
+		Last modified Thu Jan  1 01:00:00 1970
+		Containing nblks 0
+		Flags            0x0

......

@@ -30789,136 +31526,72 @@
 
 Reading file `CP.out` for 1 blocks (4 Kb)
 	CP file dump
-		Number of checkpoints 8
+		Number of checkpoints 8589934596
 		Number of snapshots   0
 
 		Checkpoint number    1
-		Flags                0x0
+		Flags                0x2<INVALID>
 		Checkpoints in block 0
 		Created at Sun Jan 25 17:05:10 2009
 		Blocks incremented   11
 		Inodes count         3
 		Blocks count (red.)  9

---------------------

ny idea as to if and why this can happen? Has it been fixed in the meantime?
or could this be a clue as to the wierd behaviour seen by others including the
corruption?

With regards,
Reinoud

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: NilFS cleanerd bugreport
       [not found] ` <20090128205223.GA416-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
@ 2009-01-30 14:18   ` Ryusuke Konishi
       [not found]     ` <20090130.231853.99024523.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Ryusuke Konishi @ 2009-01-30 14:18 UTC (permalink / raw)
  To: reinoud-S783fYmB3Ccdnm+yROfE0A; +Cc: users-JrjvKiOkagjYtjvyW6yDsg

Hi Reinoud,
On Wed, 28 Jan 2009 21:52:23 +0100, Reinoud Zandijk wrote:
> Dear folks, dear Ryusuke,
> 
> I've found a bug in the cleanerd/nilfs interaction that might give rise to the
> various problems we've seen recently with the cleanerd. It comes down to the
> wrong counting of the number of dirty segments and the wrong counting of the
> number of checkpoints.
> 
> I created this disc using the NiLFS version 2.05 with 2.06 userland (AFAIK)
> with mkfs.nilfs and created a sparse file on it with my sparse file generator
> I created for UDF testing. It dismounted fine giving a nilfs_dump
> `vnd0e-dump-3'. When i remounted it again, the cleanerd started after a while
> and after unmounting it gives `vnd03-dump-3-cleanerd'. A diff shows:
<snip>
> And the su and cp files give:
> 
> @@ -30743,34 +31480,34 @@
>  Reading file `SU.out` for 1 blocks (4 Kb)
>  
>  	SU file dump
> -		nclean       491
> -		ndirty       8
> +		nclean       496
> +		ndirty       21474836483
>  		last alloced 7
>  
>  		Segment 0
> -		Last modified Sun Jan 25 17:05:28 2009
> -		Containing nblks 2047
> -		Flags            0x2<DIRTY>
> +		Last modified Thu Jan  1 01:00:00 1970
> +		Containing nblks 0
> +		Flags            0x0
> 
> ......
> 
> @@ -30789,136 +31526,72 @@
>  
>  Reading file `CP.out` for 1 blocks (4 Kb)
>  	CP file dump
> -		Number of checkpoints 8
> +		Number of checkpoints 8589934596
>  		Number of snapshots   0
>  
>  		Checkpoint number    1
> -		Flags                0x0
> +		Flags                0x2<INVALID>
>  		Checkpoints in block 0
>  		Created at Sun Jan 25 17:05:10 2009
>  		Blocks incremented   11
>  		Inodes count         3
>  		Blocks count (red.)  9
> 
> ny idea as to if and why this can happen?

looks underflow or collision of updates.

> Has it been fixed in the meantime?

Not yet, I think.

> or could this be a clue as to the wierd behaviour seen by others including the
> corruption?

I don't know.  As I remember, the cleanerd does not depend on these
values, but it may be indirectly-induced.

Anyway, thanks for reporting this issue.
I'll review the cpfile and sufile again.

Regards,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: NilFS cleanerd bugreport
       [not found]     ` <20090130.231853.99024523.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-02-02  7:24       ` Ryusuke Konishi
  0 siblings, 0 replies; 3+ messages in thread
From: Ryusuke Konishi @ 2009-02-02  7:24 UTC (permalink / raw)
  To: reinoud-S783fYmB3Ccdnm+yROfE0A; +Cc: users-JrjvKiOkagjYtjvyW6yDsg

Hi,
On Fri, 30 Jan 2009 23:18:53 +0900 (JST), Ryusuke Konishi wrote:
> Hi Reinoud,
> On Wed, 28 Jan 2009 21:52:23 +0100, Reinoud Zandijk wrote:
> > Dear folks, dear Ryusuke,
> > 
> > I've found a bug in the cleanerd/nilfs interaction that might give rise to the
> > various problems we've seen recently with the cleanerd. It comes down to the
> > wrong counting of the number of dirty segments and the wrong counting of the
> > number of checkpoints.
> > 
> > I created this disc using the NiLFS version 2.05 with 2.06 userland (AFAIK)
> > with mkfs.nilfs and created a sparse file on it with my sparse file generator
> > I created for UDF testing. It dismounted fine giving a nilfs_dump
> > `vnd0e-dump-3'. When i remounted it again, the cleanerd started after a while
> > and after unmounting it gives `vnd03-dump-3-cleanerd'. A diff shows:
> <snip>
> > And the su and cp files give:
> > 
> > @@ -30743,34 +31480,34 @@
> >  Reading file `SU.out` for 1 blocks (4 Kb)
> >  
> >  	SU file dump
> > -		nclean       491
> > -		ndirty       8
> > +		nclean       496
> > +		ndirty       21474836483
> >  		last alloced 7
> >  
> >  		Segment 0
> > -		Last modified Sun Jan 25 17:05:28 2009
> > -		Containing nblks 2047
> > -		Flags            0x2<DIRTY>
> > +		Last modified Thu Jan  1 01:00:00 1970
> > +		Containing nblks 0
> > +		Flags            0x0
> > 
> > ......
> > 
> > @@ -30789,136 +31526,72 @@
> >  
> >  Reading file `CP.out` for 1 blocks (4 Kb)
> >  	CP file dump
> > -		Number of checkpoints 8
> > +		Number of checkpoints 8589934596
> >  		Number of snapshots   0
> >  
> >  		Checkpoint number    1
> > -		Flags                0x0
> > +		Flags                0x2<INVALID>
> >  		Checkpoints in block 0
> >  		Created at Sun Jan 25 17:05:10 2009
> >  		Blocks incremented   11
> >  		Inodes count         3
> >  		Blocks count (red.)  9
> > 
> > ny idea as to if and why this can happen?
> 
> looks underflow or collision of updates.

This turn out to be the bug of counter operations on the cpfile and
sufile.

Here, I attach a test patch to fix the problem.
After some tests and submission to -mm tree, I'll push it to the git
repo.

Reinoud, thank you for finding this problem.

Regards,
Ryusuke Konishi

---
 fs/cpfile.c |    2 +-
 fs/sufile.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/cpfile.c b/fs/cpfile.c
index 1e9ce4c..45bfe82 100644
--- a/fs/cpfile.c
+++ b/fs/cpfile.c
@@ -357,7 +357,7 @@ int nilfs_cpfile_delete_checkpoints(struct inode *cpfile,
 		kaddr = kmap_atomic(header_bh->b_page, KM_USER0);
 		header = nilfs_cpfile_block_get_header(cpfile, header_bh,
 						       kaddr);
-		le64_add_cpu(&header->ch_ncheckpoints, -tnicps);
+		le64_add_cpu(&header->ch_ncheckpoints, -(u64)tnicps);
 		nilfs_mdt_mark_buffer_dirty(header_bh);
 		nilfs_mdt_mark_dirty(cpfile);
 		kunmap_atomic(kaddr, KM_USER0);
diff --git a/fs/sufile.c b/fs/sufile.c
index 7b73a5f..9f0a988 100644
--- a/fs/sufile.c
+++ b/fs/sufile.c
@@ -331,7 +331,7 @@ int nilfs_sufile_freev(struct inode *sufile, __u64 *segnum, size_t nsegs)
 	kaddr = kmap_atomic(header_bh->b_page, KM_USER0);
 	header = nilfs_sufile_block_get_header(sufile, header_bh, kaddr);
 	le64_add_cpu(&header->sh_ncleansegs, nsegs);
-	le64_add_cpu(&header->sh_ndirtysegs, -nsegs);
+	le64_add_cpu(&header->sh_ndirtysegs, -(u64)nsegs);
 	kunmap_atomic(kaddr, KM_USER0);
 	nilfs_mdt_mark_buffer_dirty(header_bh);
 	nilfs_mdt_mark_dirty(sufile);
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-02-02  7:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-28 20:52 NilFS cleanerd bugreport Reinoud Zandijk
     [not found] ` <20090128205223.GA416-5cYspOl2ggRz6xQTk39kMVfVdRo2wo/d@public.gmane.org>
2009-01-30 14:18   ` Ryusuke Konishi
     [not found]     ` <20090130.231853.99024523.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-02-02  7:24       ` Ryusuke Konishi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox