All of lore.kernel.org
 help / color / mirror / Atom feed
* mount & fsck of nilfs partition fail.
@ 2011-06-13  7:13 Zahid Chowdhury
       [not found] ` <053D39D3D76C474EB2D2A284AA6BA3181B05E4E02D-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Zahid Chowdhury @ 2011-06-13  7:13 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hello,
  We had another crash with nilfs2 under load and then on reboot mount failed as usual on
    "Checksum error in segment payload"
Leter when we tried an "fsck0.nilfs2 -v -f" on the partition we got:
.
.
.
Unclean FS.
The latest log is lost. Trying rollback recovery..
.
Searching the latest checkpoint.
fsck0.nilfs2: cannot read block (blocknr = 2696911)

and fsck0.nilfs2 gave up. Is there any other way of recovering this partition? If so, how?

Is this problem common? Is there any way to stop this scenario from occurring? Thanks.

Zahid
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mount & fsck of nilfs partition fail.
       [not found] ` <053D39D3D76C474EB2D2A284AA6BA3181B05E4E02D-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
@ 2011-06-13 12:33   ` Ryusuke Konishi
       [not found]     ` <20110613.213316.221578492.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Ryusuke Konishi @ 2011-06-13 12:33 UTC (permalink / raw)
  To: zahid.chowdhury-VJizFkI/10gAspv4Qr0y0gC/G2K4zDHf
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Mon, 13 Jun 2011 00:13:17 -0700, Zahid Chowdhury wrote:
> Hello,
>   We had another crash with nilfs2 under load and then on reboot mount failed as usual on
>     "Checksum error in segment payload"
> Leter when we tried an "fsck0.nilfs2 -v -f" on the partition we got:
> .
> .
> .
> Unclean FS.
> The latest log is lost. Trying rollback recovery..
> .
> Searching the latest checkpoint.
> fsck0.nilfs2: cannot read block (blocknr = 2696911)
> 
> and fsck0.nilfs2 gave up. Is there any other way of recovering this partition? If so, how?

Seems an I/O error or some sort of critical error happened.

I pushed a commit to append error reason in the above message.
Could you try it ?  (it's available on the fsck0 branch of the devel tree).

> Is this problem common? Is there any way to stop this scenario from occurring? Thanks.
> 
> Zahid

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: mount & fsck of nilfs partition fail.
       [not found]     ` <20110613.213316.221578492.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2011-06-13 21:12       ` Zahid Chowdhury
       [not found]         ` <053D39D3D76C474EB2D2A284AA6BA3181B05E4E167-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Zahid Chowdhury @ 2011-06-13 21:12 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hello Ryusuke (or anybody else),
  I am no git expert, just looked at the cheatsheet, but I was unable to find your commit which add's a reason to the message below. I tried:
	git pull http://git.nilfs.org/nilfs2-utils-devel.git fsck0:fsck0
	Already up-to-date.
I also tried a re-fetch from scratch. All with no changes showing up in
fsck0.nilfs2.c. If anybody else knows a way for me to pickup Ryusuke's commit, please let me know. Thanks.

Zahid

-----Original Message-----
From: linux-nilfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-nilfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Ryusuke Konishi
Sent: Monday, June 13, 2011 5:33 AM
To: Zahid Chowdhury
Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: mount & fsck of nilfs partition fail.

On Mon, 13 Jun 2011 00:13:17 -0700, Zahid Chowdhury wrote:
> Hello,
>   We had another crash with nilfs2 under load and then on reboot mount failed as usual on
>     "Checksum error in segment payload"
> Leter when we tried an "fsck0.nilfs2 -v -f" on the partition we got:
> .
> .
> .
> Unclean FS.
> The latest log is lost. Trying rollback recovery..
> .
> Searching the latest checkpoint.
> fsck0.nilfs2: cannot read block (blocknr = 2696911)
> 
> and fsck0.nilfs2 gave up. Is there any other way of recovering this partition? If so, how?

Seems an I/O error or some sort of critical error happened.

I pushed a commit to append error reason in the above message.
Could you try it ?  (it's available on the fsck0 branch of the devel tree).

> Is this problem common? Is there any way to stop this scenario from occurring? Thanks.
> 
> Zahid

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mount & fsck of nilfs partition fail.
       [not found]         ` <053D39D3D76C474EB2D2A284AA6BA3181B05E4E167-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
@ 2011-06-13 22:21           ` dexen deVries
       [not found]             ` <201106140021.52229.dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: dexen deVries @ 2011-06-13 22:21 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Zahid,


On Monday 13 June 2011 23:12:00 you wrote:
>   I am no git expert, just looked at the cheatsheet, but I was unable to
> find your commit which add's a reason to the message below. I tried: git
> pull http://git.nilfs.org/nilfs2-utils-devel.git fsck0:fsck0
> 	Already up-to-date.
> I also tried a re-fetch from scratch. All with no changes showing up in
> fsck0.nilfs2.c. If anybody else knows a way for me to pickup Ryusuke's
> commit, please let me know. Thanks.
> 


you need two operations, actually, go get what you want:

1) `git fetch ...' -- to download the changes from server to the auxilliary 
`.git' directory; you did that already
2) `git checkout ...' -- to check out the actual files.


you did the `git fetch ...' part already, but for now the changes are still 
held only in the auxilliary `.git' directory. To get the actual files visible 
in the repository, do:
$ git checkout fsck0

and that should be it.


In case something goes awry (say, you have a modified file or something), it'll 
be easiest to just create a new repository clone from scratch, like:

$ cd ../SOME_OTHER_DIRECTORY/
$ git clone http://git.nilfs.org/nilfs2-utils-devel.git
$ cd nilfs2-utils-devel
$ git co -b fsck0 remotes/origin/fsck0

The last line creates an `fsck0' branch in your repository clone and makes it 
follow/track the `fsck0' branch from the `origin' server -- the server you 
cloned the repo from. Here, the official NILFS2 server :-)

Regards,
-- 
dexen deVries

``One can't proceed from the informal to the formal by formal means.''
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mount & fsck of nilfs partition fail. [correction]
       [not found]             ` <201106140021.52229.dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2011-06-13 22:28               ` dexen deVries
  2011-06-13 23:28               ` mount & fsck of nilfs partition fail Zahid Chowdhury
  1 sibling, 0 replies; 26+ messages in thread
From: dexen deVries @ 2011-06-13 22:28 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi again Zahid,


On Tuesday 14 June 2011 00:21:51 I wrote:
> $ git co -b fsck0 remotes/origin/fsck0


important correction: make that last line read:
$ git checkout -b fsck0 remotes/origin/fsck0

(`co' is not a git command, `checkout' is)


Sorry for the screw-up,
-- 
dexen deVries

``One can't proceed from the informal to the formal by formal means.''
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: mount & fsck of nilfs partition fail.
       [not found]             ` <201106140021.52229.dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2011-06-13 22:28               ` mount & fsck of nilfs partition fail. [correction] dexen deVries
@ 2011-06-13 23:28               ` Zahid Chowdhury
       [not found]                 ` <053D39D3D76C474EB2D2A284AA6BA3181B05E4E1CE-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
  1 sibling, 1 reply; 26+ messages in thread
From: Zahid Chowdhury @ 2011-06-13 23:28 UTC (permalink / raw)
  To: dexen deVries,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hello Dexen/Ryusuke,
  Thanks both for your help. I tried this:
   [quadcore:~/nilfs/nilfs-utils.git/nilfs2-utils] git fetch
           http://git.nilfs.org/nilfs2-utils-devel.git fsck0:fsck0
   fatal: Refusing to fetch into current branch
   [quadcore:~/nilfs/nilfs-utils.git/nilfs2-utils] git checkout fsck0
   Already on "fsck0"
I also tried from scratch with your commands below (not co, but checkout) -
there were no changes to the file with diff:
  nilfs2-utils-devel/sbin/fsck/fsck0.nilfs2.c
Regards.

Zahid

-----Original Message-----
From: linux-nilfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-nilfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of dexen deVries
Sent: Monday, June 13, 2011 3:22 PM
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: mount & fsck of nilfs partition fail.

Hi Zahid,


On Monday 13 June 2011 23:12:00 you wrote:
>   I am no git expert, just looked at the cheatsheet, but I was unable to
> find your commit which add's a reason to the message below. I tried: git
> pull http://git.nilfs.org/nilfs2-utils-devel.git fsck0:fsck0
> 	Already up-to-date.
> I also tried a re-fetch from scratch. All with no changes showing up in
> fsck0.nilfs2.c. If anybody else knows a way for me to pickup Ryusuke's
> commit, please let me know. Thanks.
> 


you need two operations, actually, go get what you want:

1) `git fetch ...' -- to download the changes from server to the auxilliary 
`.git' directory; you did that already
2) `git checkout ...' -- to check out the actual files.


you did the `git fetch ...' part already, but for now the changes are still 
held only in the auxilliary `.git' directory. To get the actual files visible 
in the repository, do:
$ git checkout fsck0

and that should be it.


In case something goes awry (say, you have a modified file or something), it'll 
be easiest to just create a new repository clone from scratch, like:

$ cd ../SOME_OTHER_DIRECTORY/
$ git clone http://git.nilfs.org/nilfs2-utils-devel.git
$ cd nilfs2-utils-devel
$ git co -b fsck0 remotes/origin/fsck0

The last line creates an `fsck0' branch in your repository clone and makes it 
follow/track the `fsck0' branch from the `origin' server -- the server you 
cloned the repo from. Here, the official NILFS2 server :-)

Regards,
-- 
dexen deVries

``One can't proceed from the informal to the formal by formal means.''
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mount & fsck of nilfs partition fail.
       [not found]                 ` <053D39D3D76C474EB2D2A284AA6BA3181B05E4E1CE-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
@ 2011-06-13 23:51                   ` Ryusuke Konishi
       [not found]                     ` <20110614.085157.212693296.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Ryusuke Konishi @ 2011-06-13 23:51 UTC (permalink / raw)
  To: zahid.chowdhury-VJizFkI/10gAspv4Qr0y0gC/G2K4zDHf
  Cc: dexen.devries-Re5JQEeQqe8AvxtiuMwx3w,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Mon, 13 Jun 2011 16:28:19 -0700, Zahid Chowdhury wrote:
> Hello Dexen/Ryusuke,
>   Thanks both for your help. I tried this:
>    [quadcore:~/nilfs/nilfs-utils.git/nilfs2-utils] git fetch
>            http://git.nilfs.org/nilfs2-utils-devel.git fsck0:fsck0
>    fatal: Refusing to fetch into current branch
>    [quadcore:~/nilfs/nilfs-utils.git/nilfs2-utils] git checkout fsck0
>    Already on "fsck0"
> I also tried from scratch with your commands below (not co, but checkout) -
> there were no changes to the file with diff:
>   nilfs2-utils-devel/sbin/fsck/fsck0.nilfs2.c

Sorry, the fsck0 branch was not propery updated.
I just fixed the problem in the git repo.

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: mount & fsck of nilfs partition fail.
       [not found]                     ` <20110614.085157.212693296.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2011-06-14 18:04                       ` Zahid Chowdhury
       [not found]                         ` <053D39D3D76C474EB2D2A284AA6BA3181B05E4E394-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Zahid Chowdhury @ 2011-06-14 18:04 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hello Ryusuke,
  I changed the code some to:
diff -u --ignore-all-space fsck0.nilfs2.c ~/nilfs/nilfs-utils.git/nilfs2-utils/sbin/fsck
--- fsck0.nilfs2.c      2011-06-14 11:03:49.000000000 -0700
+++ /root/nilfs/nilfs-utils.git/nilfs2-utils/sbin/fsck/fsck0.nilfs2.c   2011-06-14 11:01:34.000000000 -0700
@@ -172,10 +172,14 @@
 static void read_block(int fd, __u64 blocknr, void *buf,
                       unsigned long size)
 {
+        int num_read;
        if (lseek64(fd, blocknr * blocksize, SEEK_SET) < 0 ||
-           read(fd, buf, size) < size)
-               die("cannot read block (blocknr = %llu): %s",
-                   (unsigned long long)blocknr, strerror(errno));
+            (num_read = read(fd, buf, size) < size)) {
+                fprintf(stderr, "Read size was: %d\tNum read: %d\tStrerror: %s\n",
+                    size, num_read, strerror(errno));
+                die("cannot read block (blocknr = %llu)",
+                    (unsigned long long)blocknr);
+        }
 }

 static inline __u64 segment_start_blocknr(unsigned long segnum)

and I got this as output:

./fsck0.nilfs2 -f -v /dev/sda2
Super-block:
    revision = 2.0
    blocksize = 4096
    write time = 2011-06-11 23:22:03
    indicated log: blocknr = 1648528
        segnum = 804, seq = 401758, cno=3250953

Unclean FS.
The latest log is lost. Trying rollback recovery..
......
Searching the latest checkpoint.
Read size was: 4096     Num read: 1     Strerror: Success
fsck0.nilfs2: cannot read block (blocknr = 2696911)

The mount error is:
$ mount -t nilfs2 /dev/sda2 /writable
Jun 14 10:52:13 _Lab kernel: NILFS warning: mounting unchecked fs
Jun 14 10:52:13 _Lab kernel: NILFS warning: Checksum error in segment payload
Jun 14 10:52:13 _Lab kernel: NILFS: error searching super root.
mount.nilfs2: Error while mounting /dev/sda2 on /writable: Invalid argument


Will it still be possible to recover the partition or is this error fatal? Thanks all.

Zahid


-----Original Message-----
From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org] 
Sent: Monday, June 13, 2011 4:52 PM
To: Zahid Chowdhury
Cc: dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: mount & fsck of nilfs partition fail.

On Mon, 13 Jun 2011 16:28:19 -0700, Zahid Chowdhury wrote:
> Hello Dexen/Ryusuke,
>   Thanks both for your help. I tried this:
>    [quadcore:~/nilfs/nilfs-utils.git/nilfs2-utils] git fetch
>            http://git.nilfs.org/nilfs2-utils-devel.git fsck0:fsck0
>    fatal: Refusing to fetch into current branch
>    [quadcore:~/nilfs/nilfs-utils.git/nilfs2-utils] git checkout fsck0
>    Already on "fsck0"
> I also tried from scratch with your commands below (not co, but checkout) -
> there were no changes to the file with diff:
>   nilfs2-utils-devel/sbin/fsck/fsck0.nilfs2.c

Sorry, the fsck0 branch was not propery updated.
I just fixed the problem in the git repo.

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mount & fsck of nilfs partition fail.
       [not found]                         ` <053D39D3D76C474EB2D2A284AA6BA3181B05E4E394-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
@ 2011-06-15  1:42                           ` Ryusuke Konishi
       [not found]                             ` <20110615.104251.29260790.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Ryusuke Konishi @ 2011-06-15  1:42 UTC (permalink / raw)
  To: zahid.chowdhury-VJizFkI/10gAspv4Qr0y0gC/G2K4zDHf
  Cc: konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg,
	linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Tue, 14 Jun 2011 11:04:26 -0700, Zahid Chowdhury wrote:
> Hello Ryusuke,
>   I changed the code some to:
> diff -u --ignore-all-space fsck0.nilfs2.c ~/nilfs/nilfs-utils.git/nilfs2-utils/sbin/fsck
> --- fsck0.nilfs2.c      2011-06-14 11:03:49.000000000 -0700
> +++ /root/nilfs/nilfs-utils.git/nilfs2-utils/sbin/fsck/fsck0.nilfs2.c   2011-06-14 11:01:34.000000000 -0700
> @@ -172,10 +172,14 @@
>  static void read_block(int fd, __u64 blocknr, void *buf,
>                        unsigned long size)
>  {
> +        int num_read;
>         if (lseek64(fd, blocknr * blocksize, SEEK_SET) < 0 ||
> -           read(fd, buf, size) < size)
> -               die("cannot read block (blocknr = %llu): %s",
> -                   (unsigned long long)blocknr, strerror(errno));
> +            (num_read = read(fd, buf, size) < size)) {
> +                fprintf(stderr, "Read size was: %d\tNum read: %d\tStrerror: %s\n",
> +                    size, num_read, strerror(errno));
> +                die("cannot read block (blocknr = %llu)",
> +                    (unsigned long long)blocknr);
> +        }
>  }
> 
>  static inline __u64 segment_start_blocknr(unsigned long segnum)
> 
> and I got this as output:
> 
> ./fsck0.nilfs2 -f -v /dev/sda2
> Super-block:
>     revision = 2.0
>     blocksize = 4096
>     write time = 2011-06-11 23:22:03
>     indicated log: blocknr = 1648528
>         segnum = 804, seq = 401758, cno=3250953
> 
> Unclean FS.
> The latest log is lost. Trying rollback recovery..
> ......
> Searching the latest checkpoint.
> Read size was: 4096     Num read: 1     Strerror: Success
> fsck0.nilfs2: cannot read block (blocknr = 2696911)

The return value looks weird.
Is your block device readable ?

 # dd if=/dev/sda2 of=/sda2-image-file

If you can copy the block device into an image file.  You may be able
to recover it through a loop device.

 # losetup /dev/loop0 /sda2-image-file
 # ./fsck0.nilfs2 -f -v /dev/loop0

Otherwise, you may need a low level recovery for the device.

Regards,
Ryusuke Konishi

> The mount error is:
> $ mount -t nilfs2 /dev/sda2 /writable
> Jun 14 10:52:13 _Lab kernel: NILFS warning: mounting unchecked fs
> Jun 14 10:52:13 _Lab kernel: NILFS warning: Checksum error in segment payload
> Jun 14 10:52:13 _Lab kernel: NILFS: error searching super root.
> mount.nilfs2: Error while mounting /dev/sda2 on /writable: Invalid argument
> 
> 
> Will it still be possible to recover the partition or is this error fatal? Thanks all.
> 
> Zahid
> 
> 
> -----Original Message-----
> From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org] 
> Sent: Monday, June 13, 2011 4:52 PM
> To: Zahid Chowdhury
> Cc: dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: mount & fsck of nilfs partition fail.
> 
> On Mon, 13 Jun 2011 16:28:19 -0700, Zahid Chowdhury wrote:
> > Hello Dexen/Ryusuke,
> >   Thanks both for your help. I tried this:
> >    [quadcore:~/nilfs/nilfs-utils.git/nilfs2-utils] git fetch
> >            http://git.nilfs.org/nilfs2-utils-devel.git fsck0:fsck0
> >    fatal: Refusing to fetch into current branch
> >    [quadcore:~/nilfs/nilfs-utils.git/nilfs2-utils] git checkout fsck0
> >    Already on "fsck0"
> > I also tried from scratch with your commands below (not co, but checkout) -
> > there were no changes to the file with diff:
> >   nilfs2-utils-devel/sbin/fsck/fsck0.nilfs2.c
> 
> Sorry, the fsck0 branch was not propery updated.
> I just fixed the problem in the git repo.
> 
> Regards,
> Ryusuke Konishi
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mount & fsck of nilfs partition fail.
       [not found]                             ` <20110615.104251.29260790.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2011-06-15 10:58                               ` Ryusuke Konishi
       [not found]                                 ` <20110615.195858.252298449.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Ryusuke Konishi @ 2011-06-15 10:58 UTC (permalink / raw)
  To: zahid.chowdhury-VJizFkI/10gAspv4Qr0y0gC/G2K4zDHf
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Wed, 15 Jun 2011 10:42:51 +0900 (JST), Ryusuke Konishi wrote:
> On Tue, 14 Jun 2011 11:04:26 -0700, Zahid Chowdhury wrote:
> > Hello Ryusuke,
> >   I changed the code some to:
> > diff -u --ignore-all-space fsck0.nilfs2.c ~/nilfs/nilfs-utils.git/nilfs2-utils/sbin/fsck
> > --- fsck0.nilfs2.c      2011-06-14 11:03:49.000000000 -0700
> > +++ /root/nilfs/nilfs-utils.git/nilfs2-utils/sbin/fsck/fsck0.nilfs2.c   2011-06-14 11:01:34.000000000 -0700
> > @@ -172,10 +172,14 @@
> >  static void read_block(int fd, __u64 blocknr, void *buf,
> >                        unsigned long size)
> >  {
> > +        int num_read;
> >         if (lseek64(fd, blocknr * blocksize, SEEK_SET) < 0 ||
> > -           read(fd, buf, size) < size)
> > -               die("cannot read block (blocknr = %llu): %s",
> > -                   (unsigned long long)blocknr, strerror(errno));
> > +            (num_read = read(fd, buf, size) < size)) {
> > +                fprintf(stderr, "Read size was: %d\tNum read: %d\tStrerror: %s\n",
> > +                    size, num_read, strerror(errno));
> > +                die("cannot read block (blocknr = %llu)",
> > +                    (unsigned long long)blocknr);
> > +        }
> >  }
> > 
> >  static inline __u64 segment_start_blocknr(unsigned long segnum)
> > 
> > and I got this as output:
> > 
> > ./fsck0.nilfs2 -f -v /dev/sda2
> > Super-block:
> >     revision = 2.0
> >     blocksize = 4096
> >     write time = 2011-06-11 23:22:03
> >     indicated log: blocknr = 1648528
> >         segnum = 804, seq = 401758, cno=3250953
> > 
> > Unclean FS.
> > The latest log is lost. Trying rollback recovery..
> > ......
> > Searching the latest checkpoint.
> > Read size was: 4096     Num read: 1     Strerror: Success
> > fsck0.nilfs2: cannot read block (blocknr = 2696911)
> 
> The return value looks weird.
> Is your block device readable ?
> 
>  # dd if=/dev/sda2 of=/sda2-image-file
> 
> If you can copy the block device into an image file.  You may be able
> to recover it through a loop device.
> 
>  # losetup /dev/loop0 /sda2-image-file
>  # ./fsck0.nilfs2 -f -v /dev/loop0
> 
> Otherwise, you may need a low level recovery for the device.
> 
> Regards,
> Ryusuke Konishi

Ah, sorry.  I noticed that the block number (= 2696911) is beyond the
size of your block device.  It is the cause of this error.

I'll look into the rollback loop code of fsck0.nilfs2 to find out the
root cause of this out-of-range access.


Ryusuke Konishi


> > The mount error is:
> > $ mount -t nilfs2 /dev/sda2 /writable
> > Jun 14 10:52:13 _Lab kernel: NILFS warning: mounting unchecked fs
> > Jun 14 10:52:13 _Lab kernel: NILFS warning: Checksum error in segment payload
> > Jun 14 10:52:13 _Lab kernel: NILFS: error searching super root.
> > mount.nilfs2: Error while mounting /dev/sda2 on /writable: Invalid argument
> > 
> > 
> > Will it still be possible to recover the partition or is this error fatal? Thanks all.
> > 
> > Zahid
> > 
> > 
> > -----Original Message-----
> > From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org] 
> > Sent: Monday, June 13, 2011 4:52 PM
> > To: Zahid Chowdhury
> > Cc: dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Subject: Re: mount & fsck of nilfs partition fail.
> > 
> > On Mon, 13 Jun 2011 16:28:19 -0700, Zahid Chowdhury wrote:
> > > Hello Dexen/Ryusuke,
> > >   Thanks both for your help. I tried this:
> > >    [quadcore:~/nilfs/nilfs-utils.git/nilfs2-utils] git fetch
> > >            http://git.nilfs.org/nilfs2-utils-devel.git fsck0:fsck0
> > >    fatal: Refusing to fetch into current branch
> > >    [quadcore:~/nilfs/nilfs-utils.git/nilfs2-utils] git checkout fsck0
> > >    Already on "fsck0"
> > > I also tried from scratch with your commands below (not co, but checkout) -
> > > there were no changes to the file with diff:
> > >   nilfs2-utils-devel/sbin/fsck/fsck0.nilfs2.c
> > 
> > Sorry, the fsck0 branch was not propery updated.
> > I just fixed the problem in the git repo.
> > 
> > Regards,
> > Ryusuke Konishi
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mount & fsck of nilfs partition fail.
       [not found]                                 ` <20110615.195858.252298449.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2011-06-15 18:32                                   ` Ryusuke Konishi
       [not found]                                     ` <20110616.033201.162617955.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Ryusuke Konishi @ 2011-06-15 18:32 UTC (permalink / raw)
  To: zahid.chowdhury-VJizFkI/10gAspv4Qr0y0gC/G2K4zDHf
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Wed, 15 Jun 2011 19:58:58 +0900 (JST), Ryusuke Konishi wrote:
> On Wed, 15 Jun 2011 10:42:51 +0900 (JST), Ryusuke Konishi wrote:
> > On Tue, 14 Jun 2011 11:04:26 -0700, Zahid Chowdhury wrote:
> > > Hello Ryusuke,
> > >   I changed the code some to:
> > > diff -u --ignore-all-space fsck0.nilfs2.c ~/nilfs/nilfs-utils.git/nilfs2-utils/sbin/fsck
> > > --- fsck0.nilfs2.c      2011-06-14 11:03:49.000000000 -0700
> > > +++ /root/nilfs/nilfs-utils.git/nilfs2-utils/sbin/fsck/fsck0.nilfs2.c   2011-06-14 11:01:34.000000000 -0700
> > > @@ -172,10 +172,14 @@
> > >  static void read_block(int fd, __u64 blocknr, void *buf,
> > >                        unsigned long size)
> > >  {
> > > +        int num_read;
> > >         if (lseek64(fd, blocknr * blocksize, SEEK_SET) < 0 ||
> > > -           read(fd, buf, size) < size)
> > > -               die("cannot read block (blocknr = %llu): %s",
> > > -                   (unsigned long long)blocknr, strerror(errno));
> > > +            (num_read = read(fd, buf, size) < size)) {
> > > +                fprintf(stderr, "Read size was: %d\tNum read: %d\tStrerror: %s\n",
> > > +                    size, num_read, strerror(errno));
> > > +                die("cannot read block (blocknr = %llu)",
> > > +                    (unsigned long long)blocknr);
> > > +        }
> > >  }
> > > 
> > >  static inline __u64 segment_start_blocknr(unsigned long segnum)
> > > 
> > > and I got this as output:
> > > 
> > > ./fsck0.nilfs2 -f -v /dev/sda2
> > > Super-block:
> > >     revision = 2.0
> > >     blocksize = 4096
> > >     write time = 2011-06-11 23:22:03
> > >     indicated log: blocknr = 1648528
> > >         segnum = 804, seq = 401758, cno=3250953
> > > 
> > > Unclean FS.
> > > The latest log is lost. Trying rollback recovery..
> > > ......
> > > Searching the latest checkpoint.
> > > Read size was: 4096     Num read: 1     Strerror: Success
> > > fsck0.nilfs2: cannot read block (blocknr = 2696911)
> 
> Ah, sorry.  I noticed that the block number (= 2696911) is beyond the
> size of your block device.  It is the cause of this error.
> 
> I'll look into the rollback loop code of fsck0.nilfs2 to find out the
> root cause of this out-of-range access.

Uum, this bug is not trivial.

Clearly this happened in the context of
find_latest_cno_in_logical_segment() function, but I couldn't find any
suspicious callsites so far.

If you hurry, please go ahead.

Otherwise (if the data on the partition is important), I need your
help to narrow down this problem.  If we can get a backtrace of the
error, things would become clear.

Anyway, I would like to release an updated nilfs2 kmod in a week or so
for centos users to minimize this sort of thing.

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: mount & fsck of nilfs partition fail.
       [not found]                                     ` <20110616.033201.162617955.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2011-06-15 18:38                                       ` Zahid Chowdhury
       [not found]                                         ` <053D39D3D76C474EB2D2A284AA6BA3181B05E99563-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Zahid Chowdhury @ 2011-06-15 18:38 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hello Ryusuke,
  Yes, "the data on the partition is important". Please let me know how to
"get a backtrace of the error" and I will send it to you. Thanks a lot.

Zahid

-----Original Message-----
From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org] 
Sent: Wednesday, June 15, 2011 11:32 AM
To: Zahid Chowdhury
Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: mount & fsck of nilfs partition fail.

On Wed, 15 Jun 2011 19:58:58 +0900 (JST), Ryusuke Konishi wrote:
> On Wed, 15 Jun 2011 10:42:51 +0900 (JST), Ryusuke Konishi wrote:
> > On Tue, 14 Jun 2011 11:04:26 -0700, Zahid Chowdhury wrote:
> > > Hello Ryusuke,
> > >   I changed the code some to:
> > > diff -u --ignore-all-space fsck0.nilfs2.c ~/nilfs/nilfs-utils.git/nilfs2-utils/sbin/fsck
> > > --- fsck0.nilfs2.c      2011-06-14 11:03:49.000000000 -0700
> > > +++ /root/nilfs/nilfs-utils.git/nilfs2-utils/sbin/fsck/fsck0.nilfs2.c   2011-06-14 11:01:34.000000000 -0700
> > > @@ -172,10 +172,14 @@
> > >  static void read_block(int fd, __u64 blocknr, void *buf,
> > >                        unsigned long size)
> > >  {
> > > +        int num_read;
> > >         if (lseek64(fd, blocknr * blocksize, SEEK_SET) < 0 ||
> > > -           read(fd, buf, size) < size)
> > > -               die("cannot read block (blocknr = %llu): %s",
> > > -                   (unsigned long long)blocknr, strerror(errno));
> > > +            (num_read = read(fd, buf, size) < size)) {
> > > +                fprintf(stderr, "Read size was: %d\tNum read: %d\tStrerror: %s\n",
> > > +                    size, num_read, strerror(errno));
> > > +                die("cannot read block (blocknr = %llu)",
> > > +                    (unsigned long long)blocknr);
> > > +        }
> > >  }
> > > 
> > >  static inline __u64 segment_start_blocknr(unsigned long segnum)
> > > 
> > > and I got this as output:
> > > 
> > > ./fsck0.nilfs2 -f -v /dev/sda2
> > > Super-block:
> > >     revision = 2.0
> > >     blocksize = 4096
> > >     write time = 2011-06-11 23:22:03
> > >     indicated log: blocknr = 1648528
> > >         segnum = 804, seq = 401758, cno=3250953
> > > 
> > > Unclean FS.
> > > The latest log is lost. Trying rollback recovery..
> > > ......
> > > Searching the latest checkpoint.
> > > Read size was: 4096     Num read: 1     Strerror: Success
> > > fsck0.nilfs2: cannot read block (blocknr = 2696911)
> 
> Ah, sorry.  I noticed that the block number (= 2696911) is beyond the
> size of your block device.  It is the cause of this error.
> 
> I'll look into the rollback loop code of fsck0.nilfs2 to find out the
> root cause of this out-of-range access.

Uum, this bug is not trivial.

Clearly this happened in the context of
find_latest_cno_in_logical_segment() function, but I couldn't find any
suspicious callsites so far.

If you hurry, please go ahead.

Otherwise (if the data on the partition is important), I need your
help to narrow down this problem.  If we can get a backtrace of the
error, things would become clear.

Anyway, I would like to release an updated nilfs2 kmod in a week or so
for centos users to minimize this sort of thing.

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mount & fsck of nilfs partition fail.
       [not found]                                         ` <053D39D3D76C474EB2D2A284AA6BA3181B05E99563-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
@ 2011-06-17 18:29                                           ` Ryusuke Konishi
       [not found]                                             ` <20110618.032928.182500686.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Ryusuke Konishi @ 2011-06-17 18:29 UTC (permalink / raw)
  To: zahid.chowdhury-VJizFkI/10gAspv4Qr0y0gC/G2K4zDHf
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Wed, 15 Jun 2011 11:38:16 -0700, Zahid Chowdhury wrote:
> Hello Ryusuke,
>   Yes, "the data on the partition is important". Please let me know how to
> "get a backtrace of the error" and I will send it to you. Thanks a lot.
> 
> Zahid

Try the following patch.

You will need to install gdb and backtrace script available at:

  http://samba.org/ftp/unpacked/junkcode/segv_handler/backtrace

The modified fsck0.nilfs2 will write a backtrace into
"/var/log/bt_fsck0.nilfs2.<pid>.out".


Regards,
Ryusuke Konishi
---
From: Ryusuke Konishi <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>

fsck0.nilfs2: add backtrace routine

Signed-off-by: Ryusuke Konishi <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
---
 sbin/fsck/Makefile.am    |    2 +-
 sbin/fsck/fsck0.nilfs2.c |   30 +++++++++++++++++++++++++++++-
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/sbin/fsck/Makefile.am b/sbin/fsck/Makefile.am
index 789ae1b..5357967 100644
--- a/sbin/fsck/Makefile.am
+++ b/sbin/fsck/Makefile.am
@@ -1,6 +1,6 @@
 ## Makefile.am
 
-AM_CFLAGS = -Wall
+AM_CFLAGS = -Wall -g
 AM_CPPFLAGS = -I$(top_srcdir)/include
 LDADD = -luuid $(top_builddir)/lib/libnilfsfeature.la \
 	$(top_builddir)/lib/libmountchk.la \
diff --git a/sbin/fsck/fsck0.nilfs2.c b/sbin/fsck/fsck0.nilfs2.c
index 35a010c..6a41766 100644
--- a/sbin/fsck/fsck0.nilfs2.c
+++ b/sbin/fsck/fsck0.nilfs2.c
@@ -151,6 +151,32 @@ static inline void *nilfs_zalloc(size_t size)
 }
 
 /*
+ * The following part is based on segv_handler by Andrew Tridgell
+ * found at http://samba.org/ftp/unpacked/junkcode/segv_handler/
+ *
+ * To enable this feature, install gdb and 'backtrace' script available
+ * on the above site.
+ */
+static void nilfs_backtrace(void)
+{
+	char cmd[100];
+	char progname[100];
+	char *p;
+	int n;
+
+	n = readlink("/proc/self/exe", progname, sizeof(progname));
+	progname[n] = 0;
+
+	p = strrchr(progname, '/');
+	*p = 0;
+
+	snprintf(cmd, sizeof(cmd),
+		 "backtrace %d > /var/log/bt_%s.%d.out 2>&1",
+		 (int)getpid(), p+1, (int)getpid());
+	system(cmd);
+}
+
+/*
  * Block buffer
  */
 static void *block_buffer = NULL;
@@ -173,9 +199,11 @@ static void read_block(int fd, __u64 blocknr, void *buf,
 		       unsigned long size)
 {
 	if (lseek64(fd, blocknr * blocksize, SEEK_SET) < 0 ||
-	    read(fd, buf, size) < size)
+	    read(fd, buf, size) < size) {
+		nilfs_backtrace();
 		die("cannot read block (blocknr = %llu): %s",
 		    (unsigned long long)blocknr, strerror(errno));
+	}
 }
 
 static inline __u64 segment_start_blocknr(unsigned long segnum)
-- 
1.7.3.5

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* RE: mount & fsck of nilfs partition fail.
       [not found]                                             ` <20110618.032928.182500686.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2011-06-17 21:55                                               ` Zahid Chowdhury
       [not found]                                                 ` <053D39D3D76C474EB2D2A284AA6BA3181B05E99A12-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Zahid Chowdhury @ 2011-06-17 21:55 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hello Ryusuke,
 I have attached the output below (let me know if you need anything further - thanks for your help):
[Thread debugging using libthread_db enabled]
0x0090a402 in __kernel_vsyscall ()
#0  0x0090a402 in __kernel_vsyscall ()
No symbol table info available.
#1  0x00267713 in __waitpid_nocancel () from /lib/libc.so.6
No symbol table info available.
#2  0x0020c07b in do_system () from /lib/libc.so.6
No symbol table info available.
#3  0x08049154 in nilfs_backtrace () at fsck0.nilfs2.c:176
        cmd = "backtrace 10688 > /var/log/bt_fsck0.nilfs2.10688.out 2>&1\000\000\000\243\213)\000\032\200*\000\024\215#\000\002", '\000' <repeats 12 times>"\360, l\222\002\000\000\000\364\257\062\000\000\000\000"
        progname = "/sbin\000fsck0.nilfs2", '\000' <repeats 18 times>, "`\000\000\000p\301\062\000\004\000\000\000\346\356#\000\000\000\000\000\230\301\062\000p\000\000\000\377\017\000\000@\000\000\000\300\277\337\bp\301\062\0
00\000 \004\000\032\200*\000\000\020\002\000\001\000\000\000\000\000\000"
        n = <value optimized out>
#4  0x08049251 in read_block (fd=3, blocknr=2696911, buf=0x8ddb8b0, size=4096)
    at fsck0.nilfs2.c:204
No locals.
#5  0x080492d9 in next_ss_entry (fd=3, blocknrp=0xbfc458b8,
    offsetp=0xbfc458c4, entry_size=8) at fsck0.nilfs2.c:570
        p = <value optimized out>
#6  0x0804994c in get_latest_cno (fd=3, seginfo=0x8dfd8f8, start=0x8dfe410)
    at fsck0.nilfs2.c:636
No locals.
#7  find_latest_cno_in_logical_segment (fd=3, seginfo=0x8dfd8f8,
    start=0x8dfe410) at fsck0.nilfs2.c:660
        loginfo = 0x8dfe410
        cno = 3249616
        latest_cno = 0
        seq = <value optimized out>
        i = 0
#8  0x0804aafc in nilfs_fsck (argc=2146089, argv=0xbfc4579c)
    at fsck0.nilfs2.c:1058
No locals.
#9  main (argc=2146089, argv=0xbfc4579c) at fsck0.nilfs2.c:1183
No locals.

Thread 1 (Thread 0xb7fdfa80 (LWP 10688)):
#0  0x0090a402 in __kernel_vsyscall ()
No symbol table info available.
#1  0x00267713 in __waitpid_nocancel () from /lib/libc.so.6
No symbol table info available.
#2  0x0020c07b in do_system () from /lib/libc.so.6
No symbol table info available.
#3  0x08049154 in nilfs_backtrace () at fsck0.nilfs2.c:176
        cmd = "backtrace 10688 > /var/log/bt_fsck0.nilfs2.10688.out 2>&1\000\000\000\243\213)\000\032\200*\000\024\215#\000\002", '\000' <repeats 12 times>"\360, l\222\002\000\000\000\364\257\062\000\000\000\000"
        progname = "/sbin\000fsck0.nilfs2", '\000' <repeats 18 times>, "`\000\000\000p\301\062\000\004\000\000\000\346\356#\000\000\000\000\000\230\301\062\000p\000\000\000\377\017\000\000@\000\000\000\300\277\337\bp\301\062\0
00\000 \004\000\032\200*\000\000\020\002\000\001\000\000\000\000\000\000"
        n = <value optimized out>
#4  0x08049251 in read_block (fd=3, blocknr=2696911, buf=0x8ddb8b0, size=4096)
    at fsck0.nilfs2.c:204
No locals.
#5  0x080492d9 in next_ss_entry (fd=3, blocknrp=0xbfc458b8,
    offsetp=0xbfc458c4, entry_size=8) at fsck0.nilfs2.c:570
        p = <value optimized out>
#6  0x0804994c in get_latest_cno (fd=3, seginfo=0x8dfd8f8, start=0x8dfe410)
    at fsck0.nilfs2.c:636
No locals.
#7  find_latest_cno_in_logical_segment (fd=3, seginfo=0x8dfd8f8,
    start=0x8dfe410) at fsck0.nilfs2.c:660
        loginfo = 0x8dfe410
        cno = 3249616
        latest_cno = 0
        seq = <value optimized out>
        i = 0
#8  0x0804aafc in nilfs_fsck (argc=2146089, argv=0xbfc4579c)
    at fsck0.nilfs2.c:1058
No locals.
#9  main (argc=2146089, argv=0xbfc4579c) at fsck0.nilfs2.c:1183
No locals.

Thread 1 (Thread 0xb7fdfa80 (LWP 10688)):
#0  0x0090a402 in __kernel_vsyscall ()
No symbol table info available.
#1  0x00267713 in __waitpid_nocancel () from /lib/libc.so.6
No symbol table info available.
#2  0x0020c07b in do_system () from /lib/libc.so.6
No symbol table info available.
#3  0x08049154 in nilfs_backtrace () at fsck0.nilfs2.c:176
        cmd = "backtrace 10688 > /var/log/bt_fsck0.nilfs2.10688.out 2>&1\000\000\000\243\213)\000\032\200*\000\024\215#\000\002", '\000' <repeats 12 times>"\360, l\222\002\000\000\000\364\257\062\000\000\000\000"
        progname = "/sbin\000fsck0.nilfs2", '\000' <repeats 18 times>, "`\000\000\000p\301\062\000\004\000\000\000\346\356#\000\000\000\000\000\230\301\062\000p\000\000\000\377\017\000\000@\000\000\000\300\277\337\bp\301\062\0
00\000 \004\000\032\200*\000\000\020\002\000\001\000\000\000\000\000\000"
        n = <value optimized out>
#4  0x08049251 in read_block (fd=3, blocknr=2696911, buf=0x8ddb8b0, size=4096)
    at fsck0.nilfs2.c:204
No locals.
#5  0x080492d9 in next_ss_entry (fd=3, blocknrp=0xbfc458b8,
    offsetp=0xbfc458c4, entry_size=8) at fsck0.nilfs2.c:570
        p = <value optimized out>
#6  0x0804994c in get_latest_cno (fd=3, seginfo=0x8dfd8f8, start=0x8dfe410)
    at fsck0.nilfs2.c:636
No locals.
#7  find_latest_cno_in_logical_segment (fd=3, seginfo=0x8dfd8f8,
    start=0x8dfe410) at fsck0.nilfs2.c:660
        loginfo = 0x8dfe410
        cno = 3249616
        latest_cno = 0
        seq = <value optimized out>
        i = 0
#8  0x0804aafc in nilfs_fsck (argc=2146089, argv=0xbfc4579c)
    at fsck0.nilfs2.c:1058
No locals.
#9  main (argc=2146089, argv=0xbfc4579c) at fsck0.nilfs2.c:1183
No locals.
A debugging session is active.

        Inferior 1 [process 10688] will be detached.

Quit anyway? (y or n) [answered Y; input not from terminal]

Zahid

-----Original Message-----
From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org] 
Sent: Friday, June 17, 2011 11:29 AM
To: Zahid Chowdhury
Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: mount & fsck of nilfs partition fail.

On Wed, 15 Jun 2011 11:38:16 -0700, Zahid Chowdhury wrote:
> Hello Ryusuke,
>   Yes, "the data on the partition is important". Please let me know how to
> "get a backtrace of the error" and I will send it to you. Thanks a lot.
> 
> Zahid

Try the following patch.

You will need to install gdb and backtrace script available at:

  http://samba.org/ftp/unpacked/junkcode/segv_handler/backtrace

The modified fsck0.nilfs2 will write a backtrace into
"/var/log/bt_fsck0.nilfs2.<pid>.out".


Regards,
Ryusuke Konishi
---
From: Ryusuke Konishi <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>

fsck0.nilfs2: add backtrace routine

Signed-off-by: Ryusuke Konishi <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
---
 sbin/fsck/Makefile.am    |    2 +-
 sbin/fsck/fsck0.nilfs2.c |   30 +++++++++++++++++++++++++++++-
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/sbin/fsck/Makefile.am b/sbin/fsck/Makefile.am
index 789ae1b..5357967 100644
--- a/sbin/fsck/Makefile.am
+++ b/sbin/fsck/Makefile.am
@@ -1,6 +1,6 @@
 ## Makefile.am
 
-AM_CFLAGS = -Wall
+AM_CFLAGS = -Wall -g
 AM_CPPFLAGS = -I$(top_srcdir)/include
 LDADD = -luuid $(top_builddir)/lib/libnilfsfeature.la \
 	$(top_builddir)/lib/libmountchk.la \
diff --git a/sbin/fsck/fsck0.nilfs2.c b/sbin/fsck/fsck0.nilfs2.c
index 35a010c..6a41766 100644
--- a/sbin/fsck/fsck0.nilfs2.c
+++ b/sbin/fsck/fsck0.nilfs2.c
@@ -151,6 +151,32 @@ static inline void *nilfs_zalloc(size_t size)
 }
 
 /*
+ * The following part is based on segv_handler by Andrew Tridgell
+ * found at http://samba.org/ftp/unpacked/junkcode/segv_handler/
+ *
+ * To enable this feature, install gdb and 'backtrace' script available
+ * on the above site.
+ */
+static void nilfs_backtrace(void)
+{
+	char cmd[100];
+	char progname[100];
+	char *p;
+	int n;
+
+	n = readlink("/proc/self/exe", progname, sizeof(progname));
+	progname[n] = 0;
+
+	p = strrchr(progname, '/');
+	*p = 0;
+
+	snprintf(cmd, sizeof(cmd),
+		 "backtrace %d > /var/log/bt_%s.%d.out 2>&1",
+		 (int)getpid(), p+1, (int)getpid());
+	system(cmd);
+}
+
+/*
  * Block buffer
  */
 static void *block_buffer = NULL;
@@ -173,9 +199,11 @@ static void read_block(int fd, __u64 blocknr, void *buf,
 		       unsigned long size)
 {
 	if (lseek64(fd, blocknr * blocksize, SEEK_SET) < 0 ||
-	    read(fd, buf, size) < size)
+	    read(fd, buf, size) < size) {
+		nilfs_backtrace();
 		die("cannot read block (blocknr = %llu): %s",
 		    (unsigned long long)blocknr, strerror(errno));
+	}
 }
 
 static inline __u64 segment_start_blocknr(unsigned long segnum)
-- 
1.7.3.5

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: mount & fsck of nilfs partition fail.
       [not found]                                                 ` <053D39D3D76C474EB2D2A284AA6BA3181B05E99A12-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
@ 2011-06-18  4:53                                                   ` Ryusuke Konishi
       [not found]                                                     ` <20110618.135312.64853996.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Ryusuke Konishi @ 2011-06-18  4:53 UTC (permalink / raw)
  To: zahid.chowdhury-VJizFkI/10gAspv4Qr0y0gC/G2K4zDHf
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Fri, 17 Jun 2011 14:55:04 -0700, Zahid Chowdhury wrote:
> Hello Ryusuke,
>  I have attached the output below (let me know if you need anything further - thanks for your help):
> [Thread debugging using libthread_db enabled]
> 0x0090a402 in __kernel_vsyscall ()
> #0  0x0090a402 in __kernel_vsyscall ()
> No symbol table info available.
> #1  0x00267713 in __waitpid_nocancel () from /lib/libc.so.6
> No symbol table info available.
> #2  0x0020c07b in do_system () from /lib/libc.so.6
> No symbol table info available.
> #3  0x08049154 in nilfs_backtrace () at fsck0.nilfs2.c:176
>         cmd = "backtrace 10688 > /var/log/bt_fsck0.nilfs2.10688.out 2>&1\000\000\000\243\213)\000\032\200*\000\024\215#\000\002", '\000' <repeats 12 times>"\360, l\222\002\000\000\000\364\257\062\000\000\000\000"
>         progname = "/sbin\000fsck0.nilfs2", '\000' <repeats 18 times>, "`\000\000\000p\301\062\000\004\000\000\000\346\356#\000\000\000\000\000\230\301\062\000p\000\000\000\377\017\000\000@\000\000\000\300\277\337\bp\301\062\0
> 00\000 \004\000\032\200*\000\000\020\002\000\001\000\000\000\000\000\000"
>         n = <value optimized out>
> #4  0x08049251 in read_block (fd=3, blocknr=2696911, buf=0x8ddb8b0, size=4096)
>     at fsck0.nilfs2.c:204
> No locals.
> #5  0x080492d9 in next_ss_entry (fd=3, blocknrp=0xbfc458b8,
>     offsetp=0xbfc458c4, entry_size=8) at fsck0.nilfs2.c:570
>         p = <value optimized out>
> #6  0x0804994c in get_latest_cno (fd=3, seginfo=0x8dfd8f8, start=0x8dfe410)
>     at fsck0.nilfs2.c:636
> No locals.
> #7  find_latest_cno_in_logical_segment (fd=3, seginfo=0x8dfd8f8,
>     start=0x8dfe410) at fsck0.nilfs2.c:660
>         loginfo = 0x8dfe410
>         cno = 3249616
>         latest_cno = 0
>         seq = <value optimized out>
>         i = 0
> #8  0x0804aafc in nilfs_fsck (argc=2146089, argv=0xbfc4579c)
>     at fsck0.nilfs2.c:1058
> No locals.
> #9  main (argc=2146089, argv=0xbfc4579c) at fsck0.nilfs2.c:1183
> No locals.

Ok, get_latest_cno function seems to have something wrong.

Could you please get some debug information with the following patch ?


Ryusuke Konishi
---
From: Ryusuke Konishi <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>

fsck0.nilfs2: insert debug messages in get_latest_cno function

---
 sbin/fsck/fsck0.nilfs2.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/sbin/fsck/fsck0.nilfs2.c b/sbin/fsck/fsck0.nilfs2.c
index 6a41766..7b70911 100644
--- a/sbin/fsck/fsck0.nilfs2.c
+++ b/sbin/fsck/fsck0.nilfs2.c
@@ -592,6 +592,11 @@ static __u64 get_latest_cno(int fd, __u64 log_start)
 	offset = le16_to_cpu(ss->ss_bytes);
 	fblocknr = blocknr + DIV_ROUND_UP(le32_to_cpu(ss->ss_sumbytes),
 					  blocksize);
+	fprintf(stderr, "%s: log_start=%llu (segnum=%lu): nfinfo=%lu, "
+		"fblocknr=%llu\n", __func__,
+		(unsigned long long)log_start,
+		(unsigned long)log_start / blocks_per_segment,
+		(unsigned long)nfinfo, (unsigned long long)fblocknr);
 
 	for (i = 0; i < nfinfo; i++) {
 		finfo = next_ss_entry(fd, &blocknr, &offset, sizeof(*finfo));
@@ -601,6 +606,14 @@ static __u64 get_latest_cno(int fd, __u64 log_start)
 		nnodeblk = nblocks - ndatablk;
 		ino = le64_to_cpu(finfo->fi_ino);
 
+		fprintf(stderr, "%s: finfo: ino=%llu, sum-blocknr=%llu, "
+			"offset=%u, nblocks=%lu, ndatablk=%lu, "
+			"fblocknr=%llu\n", __func__,
+			(unsigned long long)ino,
+			(unsigned long long)blocknr, offset,
+			(unsigned long)nblocks, (unsigned long)ndatablk,
+			(unsigned long long)fblocknr);
+
 		if (ino == NILFS_DAT_INO) {
 			__le64 *blkoff;
 			struct nilfs_binfo_dat *binfo_dat;
-- 
1.7.3.5

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* RE: mount & fsck of nilfs partition fail.
       [not found]                                                     ` <20110618.135312.64853996.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2011-06-20 18:27                                                       ` Zahid Chowdhury
       [not found]                                                         ` <053D39D3D76C474EB2D2A284AA6BA3181B05E99CC6-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Zahid Chowdhury @ 2011-06-20 18:27 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hello Ryusuke,
  Sorry, I was away on the w/e. I've attached the console trace and the out file again for posterity. I will be upgrading to the recently released 2.0.22 version, and will try to mount the corrupted filesystem with it - unlikely, it will work, though it should help on future filesystems based on nilfs2? Thanks for the fsck help and the new release for older kernels. Please let me know if you need anything further, such that I can recover the corrupted filesystem.

Zahid

The console trace:
/sbin/fsck0.nilfs2 -f -v /dev/sda2
Super-block:
    revision = 2.0
    blocksize = 4096
    write time = 2011-06-11 23:22:03
    indicated log: blocknr = 1648528
        segnum = 804, seq = 401758, cno=3250953

Unclean FS.
The latest log is lost. Trying rollback recovery..
......
Searching the latest checkpoint.
get_latest_cno: log_start=1556429 (segnum=759): nfinfo=6, fblocknr=1556430
get_latest_cno: finfo: ino=17874, sum-blocknr=1556429, offset=80, nblocks=2, ndatablk=1, fblocknr=1556430
get_latest_cno: finfo: ino=17875, sum-blocknr=1556429, offset=128, nblocks=1, ndatablk=1, fblocknr=1556432
get_latest_cno: finfo: ino=6, sum-blocknr=1556429, offset=168, nblocks=2, ndatablk=1, fblocknr=1556433
get_latest_cno: finfo: ino=4, sum-blocknr=1556429, offset=216, nblocks=3, ndatablk=2, fblocknr=1556435
get_latest_cno: finfo: ino=4499, sum-blocknr=1556429, offset=280, nblocks=1306282328, ndatablk=0, fblocknr=1556438
fsck0.nilfs2: cannot read block (blocknr = 2696911): Success

The out file contents:
[Thread debugging using libthread_db enabled]
0x008c1402 in __kernel_vsyscall ()
#0  0x008c1402 in __kernel_vsyscall ()
No symbol table info available.
#1  0x00267713 in __waitpid_nocancel () from /lib/libc.so.6
No symbol table info available.
#2  0x0020c07b in do_system () from /lib/libc.so.6
No symbol table info available.
#3  0x08049154 in nilfs_backtrace () at fsck0.nilfs2.c:176
        cmd = "backtrace 17363 > /var/log/bt_fsck0.nilfs2.17363.out 2>&1\000)\000\000\000\000\000\032\200*\000\003\000\000\000\002", '\000' <repeats 12 time
s>"\360, l\222\002\000\000\000\364\257\062\000\000\000\000"
        progname = "/sbin\000fsck0.nilfs2\000\277p\305\062\000\001\000\000\000p\301\062\000@\301\062\000@\301\062\000X\000\000\000@\301\062\000\001", '\000'
 <repeats 19 times>, "`\000\000\000p\301\062\000\004\000\000\000\346\356#\000\000\000\000\000\230\301\062\000p\000\000\000\377\017\000"
        n = <value optimized out>
#4  0x08049251 in read_block (fd=3, blocknr=2696911, buf=0x950b8b0, size=4096)
    at fsck0.nilfs2.c:204
No locals.
#5  0x080492d9 in next_ss_entry (fd=3, blocknrp=0xbf881d18,
    offsetp=0xbf881d24, entry_size=8) at fsck0.nilfs2.c:570
        p = <value optimized out>
#6  0x080499fc in get_latest_cno (fd=3, seginfo=0x952d8f8, start=0x952e410)
    at fsck0.nilfs2.c:650
        __func__ = "get_latest_cno"
#7  find_latest_cno_in_logical_segment (fd=3, seginfo=0x952d8f8,
    start=0x952e410) at fsck0.nilfs2.c:674
        loginfo = 0x952e410
        cno = 3249616
        latest_cno = 0
        seq = <value optimized out>
        i = 0
#8  0x0804abac in nilfs_fsck (argc=2146089, argv=0xbf881bdc)
    at fsck0.nilfs2.c:1072
No locals.
#9  main (argc=2146089, argv=0xbf881bdc) at fsck0.nilfs2.c:1197
No locals.

Thread 1 (Thread 0xb7f88a80 (LWP 17363)):
#0  0x008c1402 in __kernel_vsyscall ()
No symbol table info available.
#1  0x00267713 in __waitpid_nocancel () from /lib/libc.so.6
No symbol table info available.
#2  0x0020c07b in do_system () from /lib/libc.so.6
No symbol table info available.
#3  0x08049154 in nilfs_backtrace () at fsck0.nilfs2.c:176
        cmd = "backtrace 17363 > /var/log/bt_fsck0.nilfs2.17363.out 2>&1\000)\000\000\000\000\000\032\200*\000\003\000\000\000\002", '\000' <repeats 12 time
s>"\360, l\222\002\000\000\000\364\257\062\000\000\000\000"
        progname = "/sbin\000fsck0.nilfs2\000\277p\305\062\000\001\000\000\000p\301\062\000@\301\062\000@\301\062\000X\000\000\000@\301\062\000\001", '\000'
 <repeats 19 times>, "`\000\000\000p\301\062\000\004\000\000\000\346\356#\000\000\000\000\000\230\301\062\000p\000\000\000\377\017\000"
        n = <value optimized out>
#4  0x08049251 in read_block (fd=3, blocknr=2696911, buf=0x950b8b0, size=4096)
    at fsck0.nilfs2.c:204
No locals.
#5  0x080492d9 in next_ss_entry (fd=3, blocknrp=0xbf881d18,
    offsetp=0xbf881d24, entry_size=8) at fsck0.nilfs2.c:570
        p = <value optimized out>
#6  0x080499fc in get_latest_cno (fd=3, seginfo=0x952d8f8, start=0x952e410)
    at fsck0.nilfs2.c:650
        __func__ = "get_latest_cno"
#7  find_latest_cno_in_logical_segment (fd=3, seginfo=0x952d8f8,
    start=0x952e410) at fsck0.nilfs2.c:674
        loginfo = 0x952e410
        cno = 3249616
        latest_cno = 0
        seq = <value optimized out>
        i = 0
#8  0x0804abac in nilfs_fsck (argc=2146089, argv=0xbf881bdc)
    at fsck0.nilfs2.c:1072
No locals.
#9  main (argc=2146089, argv=0xbf881bdc) at fsck0.nilfs2.c:1197

Zahid

-----Original Message-----
From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org] 
Sent: Friday, June 17, 2011 9:53 PM
To: Zahid Chowdhury
Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: mount & fsck of nilfs partition fail.

On Fri, 17 Jun 2011 14:55:04 -0700, Zahid Chowdhury wrote:
> Hello Ryusuke,
>  I have attached the output below (let me know if you need anything further - thanks for your help):
> [Thread debugging using libthread_db enabled]
> 0x0090a402 in __kernel_vsyscall ()
> #0  0x0090a402 in __kernel_vsyscall ()
> No symbol table info available.
> #1  0x00267713 in __waitpid_nocancel () from /lib/libc.so.6
> No symbol table info available.
> #2  0x0020c07b in do_system () from /lib/libc.so.6
> No symbol table info available.
> #3  0x08049154 in nilfs_backtrace () at fsck0.nilfs2.c:176
>         cmd = "backtrace 10688 > /var/log/bt_fsck0.nilfs2.10688.out 2>&1\000\000\000\243\213)\000\032\200*\000\024\215#\000\002", '\000' <repeats 12 times>"\360, l\222\002\000\000\000\364\257\062\000\000\000\000"
>         progname = "/sbin\000fsck0.nilfs2", '\000' <repeats 18 times>, "`\000\000\000p\301\062\000\004\000\000\000\346\356#\000\000\000\000\000\230\301\062\000p\000\000\000\377\017\000\000@\000\000\000\300\277\337\bp\301\062\0
> 00\000 \004\000\032\200*\000\000\020\002\000\001\000\000\000\000\000\000"
>         n = <value optimized out>
> #4  0x08049251 in read_block (fd=3, blocknr=2696911, buf=0x8ddb8b0, size=4096)
>     at fsck0.nilfs2.c:204
> No locals.
> #5  0x080492d9 in next_ss_entry (fd=3, blocknrp=0xbfc458b8,
>     offsetp=0xbfc458c4, entry_size=8) at fsck0.nilfs2.c:570
>         p = <value optimized out>
> #6  0x0804994c in get_latest_cno (fd=3, seginfo=0x8dfd8f8, start=0x8dfe410)
>     at fsck0.nilfs2.c:636
> No locals.
> #7  find_latest_cno_in_logical_segment (fd=3, seginfo=0x8dfd8f8,
>     start=0x8dfe410) at fsck0.nilfs2.c:660
>         loginfo = 0x8dfe410
>         cno = 3249616
>         latest_cno = 0
>         seq = <value optimized out>
>         i = 0
> #8  0x0804aafc in nilfs_fsck (argc=2146089, argv=0xbfc4579c)
>     at fsck0.nilfs2.c:1058
> No locals.
> #9  main (argc=2146089, argv=0xbfc4579c) at fsck0.nilfs2.c:1183
> No locals.

Ok, get_latest_cno function seems to have something wrong.

Could you please get some debug information with the following patch ?


Ryusuke Konishi
---
From: Ryusuke Konishi <konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>

fsck0.nilfs2: insert debug messages in get_latest_cno function

---
 sbin/fsck/fsck0.nilfs2.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/sbin/fsck/fsck0.nilfs2.c b/sbin/fsck/fsck0.nilfs2.c
index 6a41766..7b70911 100644
--- a/sbin/fsck/fsck0.nilfs2.c
+++ b/sbin/fsck/fsck0.nilfs2.c
@@ -592,6 +592,11 @@ static __u64 get_latest_cno(int fd, __u64 log_start)
 	offset = le16_to_cpu(ss->ss_bytes);
 	fblocknr = blocknr + DIV_ROUND_UP(le32_to_cpu(ss->ss_sumbytes),
 					  blocksize);
+	fprintf(stderr, "%s: log_start=%llu (segnum=%lu): nfinfo=%lu, "
+		"fblocknr=%llu\n", __func__,
+		(unsigned long long)log_start,
+		(unsigned long)log_start / blocks_per_segment,
+		(unsigned long)nfinfo, (unsigned long long)fblocknr);
 
 	for (i = 0; i < nfinfo; i++) {
 		finfo = next_ss_entry(fd, &blocknr, &offset, sizeof(*finfo));
@@ -601,6 +606,14 @@ static __u64 get_latest_cno(int fd, __u64 log_start)
 		nnodeblk = nblocks - ndatablk;
 		ino = le64_to_cpu(finfo->fi_ino);
 
+		fprintf(stderr, "%s: finfo: ino=%llu, sum-blocknr=%llu, "
+			"offset=%u, nblocks=%lu, ndatablk=%lu, "
+			"fblocknr=%llu\n", __func__,
+			(unsigned long long)ino,
+			(unsigned long long)blocknr, offset,
+			(unsigned long)nblocks, (unsigned long)ndatablk,
+			(unsigned long long)fblocknr);
+
 		if (ino == NILFS_DAT_INO) {
 			__le64 *blkoff;
 			struct nilfs_binfo_dat *binfo_dat;
-- 
1.7.3.5

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: mount & fsck of nilfs partition fail.
       [not found]                                                         ` <053D39D3D76C474EB2D2A284AA6BA3181B05E99CC6-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
@ 2011-06-23 11:25                                                           ` Ryusuke Konishi
       [not found]                                                             ` <20110623.202505.27804490.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Ryusuke Konishi @ 2011-06-23 11:25 UTC (permalink / raw)
  To: zahid.chowdhury-VJizFkI/10gAspv4Qr0y0gC/G2K4zDHf
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Mon, 20 Jun 2011 11:27:49 -0700, Zahid Chowdhury wrote:
> Hello Ryusuke,
>
>   Sorry, I was away on the w/e. I've attached the console trace and
>   the out file again for posterity. I will be upgrading to the
>   recently released 2.0.22 version, and will try to mount the
>   corrupted filesystem with it - unlikely, it will work, though it
>   should help on future filesystems based on nilfs2? Thanks for the
>   fsck help and the new release for older kernels. Please let me
>   know if you need anything further, such that I can recover the
>   corrupted filesystem.
>
> Zahid
> 
> The console trace:
> /sbin/fsck0.nilfs2 -f -v /dev/sda2
> Super-block:
>     revision = 2.0
>     blocksize = 4096
>     write time = 2011-06-11 23:22:03
>     indicated log: blocknr = 1648528
>         segnum = 804, seq = 401758, cno=3250953
> 
> Unclean FS.
> The latest log is lost. Trying rollback recovery..
> ......
> Searching the latest checkpoint.
> get_latest_cno: log_start=1556429 (segnum=759): nfinfo=6, fblocknr=1556430
> get_latest_cno: finfo: ino=17874, sum-blocknr=1556429, offset=80, nblocks=2, ndatablk=1, fblocknr=1556430
> get_latest_cno: finfo: ino=17875, sum-blocknr=1556429, offset=128, nblocks=1, ndatablk=1, fblocknr=1556432
> get_latest_cno: finfo: ino=6, sum-blocknr=1556429, offset=168, nblocks=2, ndatablk=1, fblocknr=1556433
> get_latest_cno: finfo: ino=4, sum-blocknr=1556429, offset=216, nblocks=3, ndatablk=2, fblocknr=1556435
> get_latest_cno: finfo: ino=4499, sum-blocknr=1556429, offset=280, nblocks=1306282328, ndatablk=0, fblocknr=1556438

According to this log, the summary information of segment #759 looks
broken.  This may cause future GC failure or filesystem corruption.

Could you confirm whether the segment summary is actually broken or
not ?  This can be done with dumpseg tool:

 # dumpseg /dev/sda2 759

If it looks actually broken, I recommend you to back up all data as
soon as possible.

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: mount & fsck of nilfs partition fail.
       [not found]                                                             ` <20110623.202505.27804490.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2011-06-23 18:21                                                               ` Zahid Chowdhury
       [not found]                                                                 ` <053D39D3D76C474EB2D2A284AA6BA3181B05E9A356-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Zahid Chowdhury @ 2011-06-23 18:21 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hello Ryusuke,
  After the new kernel module (2.0.22) the nilfs partition mounted with no problems. I have encountered no problems since then. Doing a lssu(1) does not show segment 759 to be on the list of used segments any further:

lssu -a /dev/sda2
              SEGNUM        DATE     TIME STAT     NBLOCKS
                   0  2011-06-23 10:56:42  -d-        2047
                   1  2011-06-23 10:56:42  -d-        2048
                   2  2011-06-23 10:56:42  -d-        2048
                   3  2011-06-23 10:56:42  -d-        2048
                   4  2011-06-23 10:56:44  -d-        2048
                   5  2011-06-23 10:56:46  -d-        2048
                   7  2011-06-23 10:56:46  -d-        2048
                   8  2011-06-23 10:56:47  -d-        2048
                   9  2011-06-23 10:56:47  -d-        2048
                  10  2011-06-23 10:56:47  -d-        2048
                  11  2011-06-23 10:56:47  -d-        2048
                  12  2011-06-23 10:56:47  -d-        2048
                  13  2011-06-23 10:56:52  -d-        2048
                  14  2011-06-23 10:56:52  -d-        2048
                  16  2011-06-23 10:56:52  -d-        2048
                  17  2011-06-23 10:56:52  -d-        2048
                  18  2011-06-23 10:56:52  -d-        2048
                  19  2011-06-23 10:56:53  -d-        2048
                  20  2011-06-23 10:56:54  ad-        1273
                  21  ---------- --:--:--  ad-           0
                 946  2011-06-23 10:52:27  -d-        2048
                 947  2011-06-23 10:52:28  -d-        2048
                 948  2011-06-23 10:52:28  -d-        2048
                 949  2011-06-23 10:52:28  -d-        2048
			.
			.
			.
Though dumpseg 759 does not show anything untoward (I don't think its used any further, correct?):

dumpseg /dev/sda2 759
segment: segnum = 759
  sequence number = 608068, next segnum = 760
  partial segment: blocknr = 1554432, nblocks = 2048
    creation time = 2011-06-23 10:48:02
    nfinfo = 652
    finfo
      ino = 7984, cno = 13, nblocks = 756, ndatblk = 756
        vblocknr = 146359, blkoff = 30686, blocknr = 1554444
        vblocknr = 146360, blkoff = 30687, blocknr = 1554445
		.
		.
		.
    finfo
      ino = 16619, cno = 3763620, nblocks = 2, ndatblk = 2
        vblocknr = 224656, blkoff = 304, blocknr = 1555200
        vblocknr = 224635, blkoff = 305, blocknr = 1555201
    finfo
      ino = 16619, cno = 3763616, nblocks = 1, ndatblk = 1
        vblocknr = 224551, blkoff = 303, blocknr = 1555202
		.
		.
		.

One other question I have for anybody on the list or Ryusuke, on a corruption of nilfs on older kernels (pre 2.6.30) should I leave fsck0.nilfs2 to run on the initscripts besides the new 2.0.22 kernel module or is this really redundant? Thanks for any help/comments. All, as far as I can see, this is a pretty cool filesystem.

Zahid 

-----Original Message-----
From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org] 
Sent: Thursday, June 23, 2011 4:25 AM
To: Zahid Chowdhury
Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: mount & fsck of nilfs partition fail.

On Mon, 20 Jun 2011 11:27:49 -0700, Zahid Chowdhury wrote:
> Hello Ryusuke,
>
>   Sorry, I was away on the w/e. I've attached the console trace and
>   the out file again for posterity. I will be upgrading to the
>   recently released 2.0.22 version, and will try to mount the
>   corrupted filesystem with it - unlikely, it will work, though it
>   should help on future filesystems based on nilfs2? Thanks for the
>   fsck help and the new release for older kernels. Please let me
>   know if you need anything further, such that I can recover the
>   corrupted filesystem.
>
> Zahid
> 
> The console trace:
> /sbin/fsck0.nilfs2 -f -v /dev/sda2
> Super-block:
>     revision = 2.0
>     blocksize = 4096
>     write time = 2011-06-11 23:22:03
>     indicated log: blocknr = 1648528
>         segnum = 804, seq = 401758, cno=3250953
> 
> Unclean FS.
> The latest log is lost. Trying rollback recovery..
> ......
> Searching the latest checkpoint.
> get_latest_cno: log_start=1556429 (segnum=759): nfinfo=6, fblocknr=1556430
> get_latest_cno: finfo: ino=17874, sum-blocknr=1556429, offset=80, nblocks=2, ndatablk=1, fblocknr=1556430
> get_latest_cno: finfo: ino=17875, sum-blocknr=1556429, offset=128, nblocks=1, ndatablk=1, fblocknr=1556432
> get_latest_cno: finfo: ino=6, sum-blocknr=1556429, offset=168, nblocks=2, ndatablk=1, fblocknr=1556433
> get_latest_cno: finfo: ino=4, sum-blocknr=1556429, offset=216, nblocks=3, ndatablk=2, fblocknr=1556435
> get_latest_cno: finfo: ino=4499, sum-blocknr=1556429, offset=280, nblocks=1306282328, ndatablk=0, fblocknr=1556438

According to this log, the summary information of segment #759 looks
broken.  This may cause future GC failure or filesystem corruption.

Could you confirm whether the segment summary is actually broken or
not ?  This can be done with dumpseg tool:

 # dumpseg /dev/sda2 759

If it looks actually broken, I recommend you to back up all data as
soon as possible.

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mount & fsck of nilfs partition fail.
       [not found]                                                                 ` <053D39D3D76C474EB2D2A284AA6BA3181B05E9A356-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
@ 2011-06-24 16:26                                                                   ` Ryusuke Konishi
       [not found]                                                                     ` <20110625.012634.121140098.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Ryusuke Konishi @ 2011-06-24 16:26 UTC (permalink / raw)
  To: zahid.chowdhury-VJizFkI/10gAspv4Qr0y0gC/G2K4zDHf
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, 23 Jun 2011 11:21:03 -0700, Zahid Chowdhury wrote:
> Hello Ryusuke,
>   After the new kernel module (2.0.22) the nilfs partition mounted
>   with no problems. I have encountered no problems since then. Doing
>   a lssu(1) does not show segment 759 to be on the list of used
>   segments any further:
> 
> lssu -a /dev/sda2
>               SEGNUM        DATE     TIME STAT     NBLOCKS
>                    0  2011-06-23 10:56:42  -d-        2047
>                    1  2011-06-23 10:56:42  -d-        2048
>                    2  2011-06-23 10:56:42  -d-        2048
>                    3  2011-06-23 10:56:42  -d-        2048
>                    4  2011-06-23 10:56:44  -d-        2048
>                    5  2011-06-23 10:56:46  -d-        2048
>                    7  2011-06-23 10:56:46  -d-        2048
>                    8  2011-06-23 10:56:47  -d-        2048
>                    9  2011-06-23 10:56:47  -d-        2048
>                   10  2011-06-23 10:56:47  -d-        2048
>                   11  2011-06-23 10:56:47  -d-        2048
>                   12  2011-06-23 10:56:47  -d-        2048
>                   13  2011-06-23 10:56:52  -d-        2048
>                   14  2011-06-23 10:56:52  -d-        2048
>                   16  2011-06-23 10:56:52  -d-        2048
>                   17  2011-06-23 10:56:52  -d-        2048
>                   18  2011-06-23 10:56:52  -d-        2048
>                   19  2011-06-23 10:56:53  -d-        2048
p>                   20  2011-06-23 10:56:54  ad-        1273
>                   21  ---------- --:--:--  ad-           0
>                  946  2011-06-23 10:52:27  -d-        2048
>                  947  2011-06-23 10:52:28  -d-        2048
>                  948  2011-06-23 10:52:28  -d-        2048
>                  949  2011-06-23 10:52:28  -d-        2048
> 			.
> 			.
> 			.
> Though dumpseg 759 does not show anything untoward (I don't think its used any further, correct?):
> 
> dumpseg /dev/sda2 759
> segment: segnum = 759
>   sequence number = 608068, next segnum = 760
>   partial segment: blocknr = 1554432, nblocks = 2048
>     creation time = 2011-06-23 10:48:02
>     nfinfo = 652
>     finfo
>       ino = 7984, cno = 13, nblocks = 756, ndatblk = 756
>         vblocknr = 146359, blkoff = 30686, blocknr = 1554444
>         vblocknr = 146360, blkoff = 30687, blocknr = 1554445
> 		.
> 		.
> 		.
>     finfo
>       ino = 16619, cno = 3763620, nblocks = 2, ndatblk = 2
>         vblocknr = 224656, blkoff = 304, blocknr = 1555200
>         vblocknr = 224635, blkoff = 305, blocknr = 1555201
>     finfo
>       ino = 16619, cno = 3763616, nblocks = 1, ndatblk = 1
>         vblocknr = 224551, blkoff = 303, blocknr = 1555202
> 		.
> 		.
> 		.

Hmm, the segment looks to be overwritten with new data after the
partition was successfully mounted.  I don't know if it's certainly
safe now, but It might be needless fear.

> One other question I have for anybody on the list or Ryusuke, on a
> corruption of nilfs on older kernels (pre 2.6.30) should I leave
> fsck0.nilfs2 to run on the initscripts besides the new 2.0.22 kernel
> module or is this really redundant? Thanks for any
> help/comments. All, as far as I can see, this is a pretty cool
> filesystem.

For now, fsck0.nilfs2 is just a manual rollback tool.  There is no
merit to run it from initscripts since it doesn't verify filesystem
consistency.  ( Clearly, making a true fsck is one of TODO items. )

As for fsck0.nilfs2, you only have to use it when you couldn't mount
the partition.  I hope this never happens for the 2.0.22 module.

Thanks for your interest and help.

Regards,
Ryusuke Konishi


> Zahid 
> 
> -----Original Message-----
> From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org] 
> Sent: Thursday, June 23, 2011 4:25 AM
> To: Zahid Chowdhury
> Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: mount & fsck of nilfs partition fail.
> 
> On Mon, 20 Jun 2011 11:27:49 -0700, Zahid Chowdhury wrote:
> > Hello Ryusuke,
> >
> >   Sorry, I was away on the w/e. I've attached the console trace and
> >   the out file again for posterity. I will be upgrading to the
> >   recently released 2.0.22 version, and will try to mount the
> >   corrupted filesystem with it - unlikely, it will work, though it
> >   should help on future filesystems based on nilfs2? Thanks for the
> >   fsck help and the new release for older kernels. Please let me
> >   know if you need anything further, such that I can recover the
> >   corrupted filesystem.
> >
> > Zahid
> > 
> > The console trace:
> > /sbin/fsck0.nilfs2 -f -v /dev/sda2
> > Super-block:
> >     revision = 2.0
> >     blocksize = 4096
> >     write time = 2011-06-11 23:22:03
> >     indicated log: blocknr = 1648528
> >         segnum = 804, seq = 401758, cno=3250953
> > 
> > Unclean FS.
> > The latest log is lost. Trying rollback recovery..
> > ......
> > Searching the latest checkpoint.
> > get_latest_cno: log_start=1556429 (segnum=759): nfinfo=6, fblocknr=1556430
> > get_latest_cno: finfo: ino=17874, sum-blocknr=1556429, offset=80, nblocks=2, ndatablk=1, fblocknr=1556430
> > get_latest_cno: finfo: ino=17875, sum-blocknr=1556429, offset=128, nblocks=1, ndatablk=1, fblocknr=1556432
> > get_latest_cno: finfo: ino=6, sum-blocknr=1556429, offset=168, nblocks=2, ndatablk=1, fblocknr=1556433
> > get_latest_cno: finfo: ino=4, sum-blocknr=1556429, offset=216, nblocks=3, ndatablk=2, fblocknr=1556435
> > get_latest_cno: finfo: ino=4499, sum-blocknr=1556429, offset=280, nblocks=1306282328, ndatablk=0, fblocknr=1556438
> 
> According to this log, the summary information of segment #759 looks
> broken.  This may cause future GC failure or filesystem corruption.
> 
> Could you confirm whether the segment summary is actually broken or
> not ?  This can be done with dumpseg tool:
> 
>  # dumpseg /dev/sda2 759
> 
> If it looks actually broken, I recommend you to back up all data as
> soon as possible.
> 
> Regards,
> Ryusuke Konishi
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: mount & fsck of nilfs partition fail.
       [not found]                                                                     ` <20110625.012634.121140098.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2011-07-05  0:29                                                                       ` Zahid Chowdhury
       [not found]                                                                         ` <053D39D3D76C474EB2D2A284AA6BA3181B05EE2ED0-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Zahid Chowdhury @ 2011-07-05  0:29 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hello Ryusuke,
  On a relatively quiescent system I still encountered a mount failure on a power cycle. The messages in /var/log/messages were:

  kernel: NILFS warning: mounting unchecked fs
  kernel: NILFS: recovery complete.
  kernel: segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds
  kernel:  [<c04c2fbc>] nilfs_btree_do_lookup+0xa9/0x234
  kernel:  [<c04c2fdf>] nilfs_btree_do_lookup+0xcc/0x234
  kernel:  [<c04c438c>] nilfs_btree_lookup+0x42/0x7f
  kernel:  [<c04c2aa2>] nilfs_bmap_lookup_at_level+0x2b/0x81
  kernel:  [<c04c2b11>] nilfs_bmap_lookup+0x19/0x2d
  kernel:  [<c04c156a>] nilfs_mdt_submit_block+0x9a/0x131
  kernel:  [<c04c163d>] nilfs_mdt_read_block+0x3c/0x1b1
  kernel:  [<c04c193a>] nilfs_mdt_get_block+0x2c/0x277
  kernel:  [<c0477b75>] alloc_page_buffers+0x74/0xba
  kernel:  [<c04d1316>] nilfs_palloc_get_entry_block+0x45/0x4c
  kernel:  [<c04c7125>] nilfs_dat_translate+0x3c/0x137
  kernel:  [<c04c2032>] nilfs_btnode_submit_block+0x1a3/0x29e
  kernel:  [<c04c2144>] nilfs_btnode_get+0x17/0x5f
  kernel:  [<c04c2f0f>] nilfs_btree_get_block+0x12/0x16
  kernel:  [<c04c2fbc>] nilfs_btree_do_lookup+0xa9/0x234
  kernel:  [<c04c438c>] nilfs_btree_lookup+0x42/0x7f
  kernel:  [<c04c2aa2>] nilfs_bmap_lookup_at_level+0x2b/0x81
  kernel:  [<c04c2b11>] nilfs_bmap_lookup+0x19/0x2d
  kernel:  [<c04c156a>] nilfs_mdt_submit_block+0x9a/0x131
  kernel:  [<c04c163d>] nilfs_mdt_read_block+0x3c/0x1b1
  kernel:  [<c04c193a>] nilfs_mdt_get_block+0x2c/0x277
  kernel:  [<c04d1316>] nilfs_palloc_get_entry_block+0x45/0x4c
  kernel:  [<c04d0ff3>] nilfs_ifile_get_inode_block+0x57/0x94
  kernel:  [<c04bcdee>] nilfs_read_inode+0x6a/0x1a6
  kernel:  [<c04bf7a0>] nilfs_get_sb+0x40f/0x65e
  kernel:  [<c045d2c9>] __alloc_pages+0x69/0x2cf
  kernel:  [<c047c152>] vfs_kern_mount+0x7d/0xf2
  kernel:  [<c047c1f9>] do_kern_mount+0x25/0x36
  kernel:  [<c048fbee>] do_mount+0x5fb/0x66b
  kernel:  [<c04589df>] find_get_page+0x18/0x3f
  kernel:  [<c045b50a>] filemap_nopage+0x19f/0x349
  kernel:  [<c0464e3f>] __handle_mm_fault+0x690/0xaac
  kernel:  [<c0484323>] __link_path_walk+0xd29/0xd4b
  kernel:  [<c045b50a>] filemap_nopage+0x19f/0x349
  kernel:  [<c06376de>] do_page_fault+0x23a/0x52d
  kernel:  [<c0637748>] do_page_fault+0x2a4/0x52d
  kernel:  [<c06374a4>] do_page_fault+0x0/0x52d
  kernel:  [<c048eb45>] copy_mount_options+0x90/0x109
  kernel:  [<c048fccb>] sys_mount+0x6d/0xa5
  kernel:  [<c0404f17>] syscall_call+0x7/0xb
  kernel:  =======================
  kernel: NILFS: btree level mismatch: 114 != 1
  kernel: NILFS error (device sda2): nilfs_ifile_get_inode_block: ifile is broken
  kernel: Remounting filesystem read-only
  kernel: NILFS: get root inode failed


I ran fsck0.nilfs2:
  /sbin/fsck0.nilfs2 -v -f /dev/sda2
  Super-block:
      revision = 2.0
      blocksize = 4096
      write time = 2011-07-02 06:09:20
      indicated log: blocknr = 2097786
          segnum = 1024, seq = 2055540, cno=1775795

  Clean FS.
  The latest log is lost. Trying rollback recovery..
  .......
  Selected log: blocknr = 2097655
      segnum = 1024, seq = 2055540, cno=1775793
      creation time = 2011-07-02 06:08:13
  Do you wish to overwrite super block (y/N)? y
  Recovery will complete on mount.

From then on the mount has worked always, but I get the following error in /var/log/messages always on the mount:

  kernel: segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds
  kernel: NILFS warning: mounting fs with errors

Also:
  dmesg | grep -i nilfs
      NILFS nilfs_fill_super: start(silent=0)
      NILFS(recovery) nilfs_search_super_root: found super root: segnum=251, seq=2062534, pseg_start=514624, pseg_offset=621
      NILFS warning: mounting fs with errors
      NILFS nilfs_fill_super: mounted filesystem

  nilfs-tune -l /dev/sda2

     Filesystem state:         invalid or mounted,error

All of the daemons on our system run with no problems with the existing nilfs partition, but the warnings make us wonder. Can we continue using this nilfs2 partition or might we have issues in the future. Thanks for any help.

Zahid

-----Original Message-----
From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org] 
Sent: Friday, June 24, 2011 9:27 AM
To: Zahid Chowdhury
Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: mount & fsck of nilfs partition fail.

On Thu, 23 Jun 2011 11:21:03 -0700, Zahid Chowdhury wrote:
> Hello Ryusuke,
>   After the new kernel module (2.0.22) the nilfs partition mounted
>   with no problems. I have encountered no problems since then. Doing
>   a lssu(1) does not show segment 759 to be on the list of used
>   segments any further:
> 
> lssu -a /dev/sda2
>               SEGNUM        DATE     TIME STAT     NBLOCKS
>                    0  2011-06-23 10:56:42  -d-        2047
>                    1  2011-06-23 10:56:42  -d-        2048
>                    2  2011-06-23 10:56:42  -d-        2048
>                    3  2011-06-23 10:56:42  -d-        2048
>                    4  2011-06-23 10:56:44  -d-        2048
>                    5  2011-06-23 10:56:46  -d-        2048
>                    7  2011-06-23 10:56:46  -d-        2048
>                    8  2011-06-23 10:56:47  -d-        2048
>                    9  2011-06-23 10:56:47  -d-        2048
>                   10  2011-06-23 10:56:47  -d-        2048
>                   11  2011-06-23 10:56:47  -d-        2048
>                   12  2011-06-23 10:56:47  -d-        2048
>                   13  2011-06-23 10:56:52  -d-        2048
>                   14  2011-06-23 10:56:52  -d-        2048
>                   16  2011-06-23 10:56:52  -d-        2048
>                   17  2011-06-23 10:56:52  -d-        2048
>                   18  2011-06-23 10:56:52  -d-        2048
>                   19  2011-06-23 10:56:53  -d-        2048
p>                   20  2011-06-23 10:56:54  ad-        1273
>                   21  ---------- --:--:--  ad-           0
>                  946  2011-06-23 10:52:27  -d-        2048
>                  947  2011-06-23 10:52:28  -d-        2048
>                  948  2011-06-23 10:52:28  -d-        2048
>                  949  2011-06-23 10:52:28  -d-        2048
> 			.
> 			.
> 			.
> Though dumpseg 759 does not show anything untoward (I don't think its used any further, correct?):
> 
> dumpseg /dev/sda2 759
> segment: segnum = 759
>   sequence number = 608068, next segnum = 760
>   partial segment: blocknr = 1554432, nblocks = 2048
>     creation time = 2011-06-23 10:48:02
>     nfinfo = 652
>     finfo
>       ino = 7984, cno = 13, nblocks = 756, ndatblk = 756
>         vblocknr = 146359, blkoff = 30686, blocknr = 1554444
>         vblocknr = 146360, blkoff = 30687, blocknr = 1554445
> 		.
> 		.
> 		.
>     finfo
>       ino = 16619, cno = 3763620, nblocks = 2, ndatblk = 2
>         vblocknr = 224656, blkoff = 304, blocknr = 1555200
>         vblocknr = 224635, blkoff = 305, blocknr = 1555201
>     finfo
>       ino = 16619, cno = 3763616, nblocks = 1, ndatblk = 1
>         vblocknr = 224551, blkoff = 303, blocknr = 1555202
> 		.
> 		.
> 		.

Hmm, the segment looks to be overwritten with new data after the
partition was successfully mounted.  I don't know if it's certainly
safe now, but It might be needless fear.

> One other question I have for anybody on the list or Ryusuke, on a
> corruption of nilfs on older kernels (pre 2.6.30) should I leave
> fsck0.nilfs2 to run on the initscripts besides the new 2.0.22 kernel
> module or is this really redundant? Thanks for any
> help/comments. All, as far as I can see, this is a pretty cool
> filesystem.

For now, fsck0.nilfs2 is just a manual rollback tool.  There is no
merit to run it from initscripts since it doesn't verify filesystem
consistency.  ( Clearly, making a true fsck is one of TODO items. )

As for fsck0.nilfs2, you only have to use it when you couldn't mount
the partition.  I hope this never happens for the 2.0.22 module.

Thanks for your interest and help.

Regards,
Ryusuke Konishi


> Zahid 
> 
> -----Original Message-----
> From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org] 
> Sent: Thursday, June 23, 2011 4:25 AM
> To: Zahid Chowdhury
> Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: mount & fsck of nilfs partition fail.
> 
> On Mon, 20 Jun 2011 11:27:49 -0700, Zahid Chowdhury wrote:
> > Hello Ryusuke,
> >
> >   Sorry, I was away on the w/e. I've attached the console trace and
> >   the out file again for posterity. I will be upgrading to the
> >   recently released 2.0.22 version, and will try to mount the
> >   corrupted filesystem with it - unlikely, it will work, though it
> >   should help on future filesystems based on nilfs2? Thanks for the
> >   fsck help and the new release for older kernels. Please let me
> >   know if you need anything further, such that I can recover the
> >   corrupted filesystem.
> >
> > Zahid
> > 
> > The console trace:
> > /sbin/fsck0.nilfs2 -f -v /dev/sda2
> > Super-block:
> >     revision = 2.0
> >     blocksize = 4096
> >     write time = 2011-06-11 23:22:03
> >     indicated log: blocknr = 1648528
> >         segnum = 804, seq = 401758, cno=3250953
> > 
> > Unclean FS.
> > The latest log is lost. Trying rollback recovery..
> > ......
> > Searching the latest checkpoint.
> > get_latest_cno: log_start=1556429 (segnum=759): nfinfo=6, fblocknr=1556430
> > get_latest_cno: finfo: ino=17874, sum-blocknr=1556429, offset=80, nblocks=2, ndatablk=1, fblocknr=1556430
> > get_latest_cno: finfo: ino=17875, sum-blocknr=1556429, offset=128, nblocks=1, ndatablk=1, fblocknr=1556432
> > get_latest_cno: finfo: ino=6, sum-blocknr=1556429, offset=168, nblocks=2, ndatablk=1, fblocknr=1556433
> > get_latest_cno: finfo: ino=4, sum-blocknr=1556429, offset=216, nblocks=3, ndatablk=2, fblocknr=1556435
> > get_latest_cno: finfo: ino=4499, sum-blocknr=1556429, offset=280, nblocks=1306282328, ndatablk=0, fblocknr=1556438
> 
> According to this log, the summary information of segment #759 looks
> broken.  This may cause future GC failure or filesystem corruption.
> 
> Could you confirm whether the segment summary is actually broken or
> not ?  This can be done with dumpseg tool:
> 
>  # dumpseg /dev/sda2 759
> 
> If it looks actually broken, I recommend you to back up all data as
> soon as possible.
> 
> Regards,
> Ryusuke Konishi
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mount & fsck of nilfs partition fail.
       [not found]                                                                         ` <053D39D3D76C474EB2D2A284AA6BA3181B05EE2ED0-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
@ 2011-07-06  2:16                                                                           ` Ryusuke Konishi
       [not found]                                                                             ` <20110706.111615.163244275.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Ryusuke Konishi @ 2011-07-06  2:16 UTC (permalink / raw)
  To: zahid.chowdhury-VJizFkI/10gAspv4Qr0y0gC/G2K4zDHf
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Zahid,
On Mon, 4 Jul 2011 17:29:01 -0700, Zahid Chowdhury wrote:
> Hello Ryusuke,
>   On a relatively quiescent system I still encountered a mount failure on a power cycle. The messages in /var/log/messages were:
> 
>   kernel: NILFS warning: mounting unchecked fs
>   kernel: NILFS: recovery complete.
>   kernel: segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds
>   kernel:  [<c04c2fbc>] nilfs_btree_do_lookup+0xa9/0x234
>   kernel:  [<c04c2fdf>] nilfs_btree_do_lookup+0xcc/0x234
>   kernel:  [<c04c438c>] nilfs_btree_lookup+0x42/0x7f
>   kernel:  [<c04c2aa2>] nilfs_bmap_lookup_at_level+0x2b/0x81
>   kernel:  [<c04c2b11>] nilfs_bmap_lookup+0x19/0x2d
>   kernel:  [<c04c156a>] nilfs_mdt_submit_block+0x9a/0x131
>   kernel:  [<c04c163d>] nilfs_mdt_read_block+0x3c/0x1b1
>   kernel:  [<c04c193a>] nilfs_mdt_get_block+0x2c/0x277
>   kernel:  [<c0477b75>] alloc_page_buffers+0x74/0xba
>   kernel:  [<c04d1316>] nilfs_palloc_get_entry_block+0x45/0x4c
>   kernel:  [<c04c7125>] nilfs_dat_translate+0x3c/0x137
>   kernel:  [<c04c2032>] nilfs_btnode_submit_block+0x1a3/0x29e
>   kernel:  [<c04c2144>] nilfs_btnode_get+0x17/0x5f
>   kernel:  [<c04c2f0f>] nilfs_btree_get_block+0x12/0x16
>   kernel:  [<c04c2fbc>] nilfs_btree_do_lookup+0xa9/0x234
>   kernel:  [<c04c438c>] nilfs_btree_lookup+0x42/0x7f
>   kernel:  [<c04c2aa2>] nilfs_bmap_lookup_at_level+0x2b/0x81
>   kernel:  [<c04c2b11>] nilfs_bmap_lookup+0x19/0x2d
>   kernel:  [<c04c156a>] nilfs_mdt_submit_block+0x9a/0x131
>   kernel:  [<c04c163d>] nilfs_mdt_read_block+0x3c/0x1b1
>   kernel:  [<c04c193a>] nilfs_mdt_get_block+0x2c/0x277
>   kernel:  [<c04d1316>] nilfs_palloc_get_entry_block+0x45/0x4c
>   kernel:  [<c04d0ff3>] nilfs_ifile_get_inode_block+0x57/0x94
>   kernel:  [<c04bcdee>] nilfs_read_inode+0x6a/0x1a6
>   kernel:  [<c04bf7a0>] nilfs_get_sb+0x40f/0x65e
>   kernel:  [<c045d2c9>] __alloc_pages+0x69/0x2cf
>   kernel:  [<c047c152>] vfs_kern_mount+0x7d/0xf2
>   kernel:  [<c047c1f9>] do_kern_mount+0x25/0x36
>   kernel:  [<c048fbee>] do_mount+0x5fb/0x66b
>   kernel:  [<c04589df>] find_get_page+0x18/0x3f
>   kernel:  [<c045b50a>] filemap_nopage+0x19f/0x349
>   kernel:  [<c0464e3f>] __handle_mm_fault+0x690/0xaac
>   kernel:  [<c0484323>] __link_path_walk+0xd29/0xd4b
>   kernel:  [<c045b50a>] filemap_nopage+0x19f/0x349
>   kernel:  [<c06376de>] do_page_fault+0x23a/0x52d
>   kernel:  [<c0637748>] do_page_fault+0x2a4/0x52d
>   kernel:  [<c06374a4>] do_page_fault+0x0/0x52d
>   kernel:  [<c048eb45>] copy_mount_options+0x90/0x109
>   kernel:  [<c048fccb>] sys_mount+0x6d/0xa5
>   kernel:  [<c0404f17>] syscall_call+0x7/0xb
>   kernel:  =======================
>   kernel: NILFS: btree level mismatch: 114 != 1
>   kernel: NILFS error (device sda2): nilfs_ifile_get_inode_block: ifile is broken
>   kernel: Remounting filesystem read-only
>   kernel: NILFS: get root inode failed
> 
> 
> I ran fsck0.nilfs2:
>   /sbin/fsck0.nilfs2 -v -f /dev/sda2
>   Super-block:
>       revision = 2.0
>       blocksize = 4096
>       write time = 2011-07-02 06:09:20
>       indicated log: blocknr = 2097786
>           segnum = 1024, seq = 2055540, cno=1775795
> 
>   Clean FS.
>   The latest log is lost. Trying rollback recovery..
>   .......
>   Selected log: blocknr = 2097655
>       segnum = 1024, seq = 2055540, cno=1775793
>       creation time = 2011-07-02 06:08:13
>   Do you wish to overwrite super block (y/N)? y
>   Recovery will complete on mount.
> 
> From then on the mount has worked always, but I get the following error in /var/log/messages always on the mount:
> 
>   kernel: segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds
>   kernel: NILFS warning: mounting fs with errors
> 
> Also:
>   dmesg | grep -i nilfs
>       NILFS nilfs_fill_super: start(silent=0)
>       NILFS(recovery) nilfs_search_super_root: found super root: segnum=251, seq=2062534, pseg_start=514624, pseg_offset=621
>       NILFS warning: mounting fs with errors
>       NILFS nilfs_fill_super: mounted filesystem
> 
>   nilfs-tune -l /dev/sda2
> 
>      Filesystem state:         invalid or mounted,error
> 
> All of the daemons on our system run with no problems with the existing nilfs partition, but the warnings make us wonder. Can we continue using this nilfs2 partition or might we have issues in the future. Thanks for any help.
> 
> Zahid

The current nilfs sets an error flag on super blocks once it detected
inconsistency in the filesystem.

The error flag will not be cleared even after fsck0.nilfs2 or
mount-time rollback succeeded.  This is a limitation of the
fsck0.nilfs2 program, so the warning remains irrelevantly with whether
the filesystem has an actual defect. (sorry)

If you can back up the filesystem and restore it for a new nilfs
partition, I would like to ask you to do so.

This is because there was a crucial btree bug in nilfs modules older
than version 2.0.22.  It can be the cause of the above error (even if
you are now using 2.0.22).

To narrow down whether the error came from older nilfs modules or the
2.0.22 module still has a crucial bug, we need longrun use test with a
nilfs partition which has been never mounted by older modules.

Regards,
Ryusuke Konishi

> 
> -----Original Message-----
> From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org] 
> Sent: Friday, June 24, 2011 9:27 AM
> To: Zahid Chowdhury
> Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: mount & fsck of nilfs partition fail.
> 
> On Thu, 23 Jun 2011 11:21:03 -0700, Zahid Chowdhury wrote:
> > Hello Ryusuke,
> >   After the new kernel module (2.0.22) the nilfs partition mounted
> >   with no problems. I have encountered no problems since then. Doing
> >   a lssu(1) does not show segment 759 to be on the list of used
> >   segments any further:
> > 
> > lssu -a /dev/sda2
> >               SEGNUM        DATE     TIME STAT     NBLOCKS
> >                    0  2011-06-23 10:56:42  -d-        2047
> >                    1  2011-06-23 10:56:42  -d-        2048
> >                    2  2011-06-23 10:56:42  -d-        2048
> >                    3  2011-06-23 10:56:42  -d-        2048
> >                    4  2011-06-23 10:56:44  -d-        2048
> >                    5  2011-06-23 10:56:46  -d-        2048
> >                    7  2011-06-23 10:56:46  -d-        2048
> >                    8  2011-06-23 10:56:47  -d-        2048
> >                    9  2011-06-23 10:56:47  -d-        2048
> >                   10  2011-06-23 10:56:47  -d-        2048
> >                   11  2011-06-23 10:56:47  -d-        2048
> >                   12  2011-06-23 10:56:47  -d-        2048
> >                   13  2011-06-23 10:56:52  -d-        2048
> >                   14  2011-06-23 10:56:52  -d-        2048
> >                   16  2011-06-23 10:56:52  -d-        2048
> >                   17  2011-06-23 10:56:52  -d-        2048
> >                   18  2011-06-23 10:56:52  -d-        2048
> >                   19  2011-06-23 10:56:53  -d-        2048
> p>                   20  2011-06-23 10:56:54  ad-        1273
> >                   21  ---------- --:--:--  ad-           0
> >                  946  2011-06-23 10:52:27  -d-        2048
> >                  947  2011-06-23 10:52:28  -d-        2048
> >                  948  2011-06-23 10:52:28  -d-        2048
> >                  949  2011-06-23 10:52:28  -d-        2048
> > 			.
> > 			.
> > 			.
> > Though dumpseg 759 does not show anything untoward (I don't think its used any further, correct?):
> > 
> > dumpseg /dev/sda2 759
> > segment: segnum = 759
> >   sequence number = 608068, next segnum = 760
> >   partial segment: blocknr = 1554432, nblocks = 2048
> >     creation time = 2011-06-23 10:48:02
> >     nfinfo = 652
> >     finfo
> >       ino = 7984, cno = 13, nblocks = 756, ndatblk = 756
> >         vblocknr = 146359, blkoff = 30686, blocknr = 1554444
> >         vblocknr = 146360, blkoff = 30687, blocknr = 1554445
> > 		.
> > 		.
> > 		.
> >     finfo
> >       ino = 16619, cno = 3763620, nblocks = 2, ndatblk = 2
> >         vblocknr = 224656, blkoff = 304, blocknr = 1555200
> >         vblocknr = 224635, blkoff = 305, blocknr = 1555201
> >     finfo
> >       ino = 16619, cno = 3763616, nblocks = 1, ndatblk = 1
> >         vblocknr = 224551, blkoff = 303, blocknr = 1555202
> > 		.
> > 		.
> > 		.
> 
> Hmm, the segment looks to be overwritten with new data after the
> partition was successfully mounted.  I don't know if it's certainly
> safe now, but It might be needless fear.
> 
> > One other question I have for anybody on the list or Ryusuke, on a
> > corruption of nilfs on older kernels (pre 2.6.30) should I leave
> > fsck0.nilfs2 to run on the initscripts besides the new 2.0.22 kernel
> > module or is this really redundant? Thanks for any
> > help/comments. All, as far as I can see, this is a pretty cool
> > filesystem.
> 
> For now, fsck0.nilfs2 is just a manual rollback tool.  There is no
> merit to run it from initscripts since it doesn't verify filesystem
> consistency.  ( Clearly, making a true fsck is one of TODO items. )
> 
> As for fsck0.nilfs2, you only have to use it when you couldn't mount
> the partition.  I hope this never happens for the 2.0.22 module.
> 
> Thanks for your interest and help.
> 
> Regards,
> Ryusuke Konishi
> 
> 
> > Zahid 
> > 
> > -----Original Message-----
> > From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org] 
> > Sent: Thursday, June 23, 2011 4:25 AM
> > To: Zahid Chowdhury
> > Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Subject: Re: mount & fsck of nilfs partition fail.
> > 
> > On Mon, 20 Jun 2011 11:27:49 -0700, Zahid Chowdhury wrote:
> > > Hello Ryusuke,
> > >
> > >   Sorry, I was away on the w/e. I've attached the console trace and
> > >   the out file again for posterity. I will be upgrading to the
> > >   recently released 2.0.22 version, and will try to mount the
> > >   corrupted filesystem with it - unlikely, it will work, though it
> > >   should help on future filesystems based on nilfs2? Thanks for the
> > >   fsck help and the new release for older kernels. Please let me
> > >   know if you need anything further, such that I can recover the
> > >   corrupted filesystem.
> > >
> > > Zahid
> > > 
> > > The console trace:
> > > /sbin/fsck0.nilfs2 -f -v /dev/sda2
> > > Super-block:
> > >     revision = 2.0
> > >     blocksize = 4096
> > >     write time = 2011-06-11 23:22:03
> > >     indicated log: blocknr = 1648528
> > >         segnum = 804, seq = 401758, cno=3250953
> > > 
> > > Unclean FS.
> > > The latest log is lost. Trying rollback recovery..
> > > ......
> > > Searching the latest checkpoint.
> > > get_latest_cno: log_start=1556429 (segnum=759): nfinfo=6, fblocknr=1556430
> > > get_latest_cno: finfo: ino=17874, sum-blocknr=1556429, offset=80, nblocks=2, ndatablk=1, fblocknr=1556430
> > > get_latest_cno: finfo: ino=17875, sum-blocknr=1556429, offset=128, nblocks=1, ndatablk=1, fblocknr=1556432
> > > get_latest_cno: finfo: ino=6, sum-blocknr=1556429, offset=168, nblocks=2, ndatablk=1, fblocknr=1556433
> > > get_latest_cno: finfo: ino=4, sum-blocknr=1556429, offset=216, nblocks=3, ndatablk=2, fblocknr=1556435
> > > get_latest_cno: finfo: ino=4499, sum-blocknr=1556429, offset=280, nblocks=1306282328, ndatablk=0, fblocknr=1556438
> > 
> > According to this log, the summary information of segment #759 looks
> > broken.  This may cause future GC failure or filesystem corruption.
> > 
> > Could you confirm whether the segment summary is actually broken or
> > not ?  This can be done with dumpseg tool:
> > 
> >  # dumpseg /dev/sda2 759
> > 
> > If it looks actually broken, I recommend you to back up all data as
> > soon as possible.
> > 
> > Regards,
> > Ryusuke Konishi
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: mount & fsck of nilfs partition fail.
       [not found]                                                                             ` <20110706.111615.163244275.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2011-07-08 23:52                                                                               ` Zahid Chowdhury
       [not found]                                                                                 ` <053D39D3D76C474EB2D2A284AA6BA3181B05F38A67-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Zahid Chowdhury @ 2011-07-08 23:52 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hello Ryusuke,
  I have done a backup/restore of the whole nilfs partition. I will let the mailing-list know if we have any more problems on a longrun use test with the new kernel module. Thanks.

Zahid

-----Original Message-----
From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org] 
Sent: Tuesday, July 05, 2011 7:16 PM
To: Zahid Chowdhury
Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: mount & fsck of nilfs partition fail.

Hi Zahid,
On Mon, 4 Jul 2011 17:29:01 -0700, Zahid Chowdhury wrote:
> Hello Ryusuke,
>   On a relatively quiescent system I still encountered a mount failure on a power cycle. The messages in /var/log/messages were:
> 
>   kernel: NILFS warning: mounting unchecked fs
>   kernel: NILFS: recovery complete.
>   kernel: segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds
>   kernel:  [<c04c2fbc>] nilfs_btree_do_lookup+0xa9/0x234
>   kernel:  [<c04c2fdf>] nilfs_btree_do_lookup+0xcc/0x234
>   kernel:  [<c04c438c>] nilfs_btree_lookup+0x42/0x7f
>   kernel:  [<c04c2aa2>] nilfs_bmap_lookup_at_level+0x2b/0x81
>   kernel:  [<c04c2b11>] nilfs_bmap_lookup+0x19/0x2d
>   kernel:  [<c04c156a>] nilfs_mdt_submit_block+0x9a/0x131
>   kernel:  [<c04c163d>] nilfs_mdt_read_block+0x3c/0x1b1
>   kernel:  [<c04c193a>] nilfs_mdt_get_block+0x2c/0x277
>   kernel:  [<c0477b75>] alloc_page_buffers+0x74/0xba
>   kernel:  [<c04d1316>] nilfs_palloc_get_entry_block+0x45/0x4c
>   kernel:  [<c04c7125>] nilfs_dat_translate+0x3c/0x137
>   kernel:  [<c04c2032>] nilfs_btnode_submit_block+0x1a3/0x29e
>   kernel:  [<c04c2144>] nilfs_btnode_get+0x17/0x5f
>   kernel:  [<c04c2f0f>] nilfs_btree_get_block+0x12/0x16
>   kernel:  [<c04c2fbc>] nilfs_btree_do_lookup+0xa9/0x234
>   kernel:  [<c04c438c>] nilfs_btree_lookup+0x42/0x7f
>   kernel:  [<c04c2aa2>] nilfs_bmap_lookup_at_level+0x2b/0x81
>   kernel:  [<c04c2b11>] nilfs_bmap_lookup+0x19/0x2d
>   kernel:  [<c04c156a>] nilfs_mdt_submit_block+0x9a/0x131
>   kernel:  [<c04c163d>] nilfs_mdt_read_block+0x3c/0x1b1
>   kernel:  [<c04c193a>] nilfs_mdt_get_block+0x2c/0x277
>   kernel:  [<c04d1316>] nilfs_palloc_get_entry_block+0x45/0x4c
>   kernel:  [<c04d0ff3>] nilfs_ifile_get_inode_block+0x57/0x94
>   kernel:  [<c04bcdee>] nilfs_read_inode+0x6a/0x1a6
>   kernel:  [<c04bf7a0>] nilfs_get_sb+0x40f/0x65e
>   kernel:  [<c045d2c9>] __alloc_pages+0x69/0x2cf
>   kernel:  [<c047c152>] vfs_kern_mount+0x7d/0xf2
>   kernel:  [<c047c1f9>] do_kern_mount+0x25/0x36
>   kernel:  [<c048fbee>] do_mount+0x5fb/0x66b
>   kernel:  [<c04589df>] find_get_page+0x18/0x3f
>   kernel:  [<c045b50a>] filemap_nopage+0x19f/0x349
>   kernel:  [<c0464e3f>] __handle_mm_fault+0x690/0xaac
>   kernel:  [<c0484323>] __link_path_walk+0xd29/0xd4b
>   kernel:  [<c045b50a>] filemap_nopage+0x19f/0x349
>   kernel:  [<c06376de>] do_page_fault+0x23a/0x52d
>   kernel:  [<c0637748>] do_page_fault+0x2a4/0x52d
>   kernel:  [<c06374a4>] do_page_fault+0x0/0x52d
>   kernel:  [<c048eb45>] copy_mount_options+0x90/0x109
>   kernel:  [<c048fccb>] sys_mount+0x6d/0xa5
>   kernel:  [<c0404f17>] syscall_call+0x7/0xb
>   kernel:  =======================
>   kernel: NILFS: btree level mismatch: 114 != 1
>   kernel: NILFS error (device sda2): nilfs_ifile_get_inode_block: ifile is broken
>   kernel: Remounting filesystem read-only
>   kernel: NILFS: get root inode failed
> 
> 
> I ran fsck0.nilfs2:
>   /sbin/fsck0.nilfs2 -v -f /dev/sda2
>   Super-block:
>       revision = 2.0
>       blocksize = 4096
>       write time = 2011-07-02 06:09:20
>       indicated log: blocknr = 2097786
>           segnum = 1024, seq = 2055540, cno=1775795
> 
>   Clean FS.
>   The latest log is lost. Trying rollback recovery..
>   .......
>   Selected log: blocknr = 2097655
>       segnum = 1024, seq = 2055540, cno=1775793
>       creation time = 2011-07-02 06:08:13
>   Do you wish to overwrite super block (y/N)? y
>   Recovery will complete on mount.
> 
> From then on the mount has worked always, but I get the following error in /var/log/messages always on the mount:
> 
>   kernel: segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds
>   kernel: NILFS warning: mounting fs with errors
> 
> Also:
>   dmesg | grep -i nilfs
>       NILFS nilfs_fill_super: start(silent=0)
>       NILFS(recovery) nilfs_search_super_root: found super root: segnum=251, seq=2062534, pseg_start=514624, pseg_offset=621
>       NILFS warning: mounting fs with errors
>       NILFS nilfs_fill_super: mounted filesystem
> 
>   nilfs-tune -l /dev/sda2
> 
>      Filesystem state:         invalid or mounted,error
> 
> All of the daemons on our system run with no problems with the existing nilfs partition, but the warnings make us wonder. Can we continue using this nilfs2 partition or might we have issues in the future. Thanks for any help.
> 
> Zahid

The current nilfs sets an error flag on super blocks once it detected
inconsistency in the filesystem.

The error flag will not be cleared even after fsck0.nilfs2 or
mount-time rollback succeeded.  This is a limitation of the
fsck0.nilfs2 program, so the warning remains irrelevantly with whether
the filesystem has an actual defect. (sorry)

If you can back up the filesystem and restore it for a new nilfs
partition, I would like to ask you to do so.

This is because there was a crucial btree bug in nilfs modules older
than version 2.0.22.  It can be the cause of the above error (even if
you are now using 2.0.22).

To narrow down whether the error came from older nilfs modules or the
2.0.22 module still has a crucial bug, we need longrun use test with a
nilfs partition which has been never mounted by older modules.

Regards,
Ryusuke Konishi

> 
> -----Original Message-----
> From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org] 
> Sent: Friday, June 24, 2011 9:27 AM
> To: Zahid Chowdhury
> Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: mount & fsck of nilfs partition fail.
> 
> On Thu, 23 Jun 2011 11:21:03 -0700, Zahid Chowdhury wrote:
> > Hello Ryusuke,
> >   After the new kernel module (2.0.22) the nilfs partition mounted
> >   with no problems. I have encountered no problems since then. Doing
> >   a lssu(1) does not show segment 759 to be on the list of used
> >   segments any further:
> > 
> > lssu -a /dev/sda2
> >               SEGNUM        DATE     TIME STAT     NBLOCKS
> >                    0  2011-06-23 10:56:42  -d-        2047
> >                    1  2011-06-23 10:56:42  -d-        2048
> >                    2  2011-06-23 10:56:42  -d-        2048
> >                    3  2011-06-23 10:56:42  -d-        2048
> >                    4  2011-06-23 10:56:44  -d-        2048
> >                    5  2011-06-23 10:56:46  -d-        2048
> >                    7  2011-06-23 10:56:46  -d-        2048
> >                    8  2011-06-23 10:56:47  -d-        2048
> >                    9  2011-06-23 10:56:47  -d-        2048
> >                   10  2011-06-23 10:56:47  -d-        2048
> >                   11  2011-06-23 10:56:47  -d-        2048
> >                   12  2011-06-23 10:56:47  -d-        2048
> >                   13  2011-06-23 10:56:52  -d-        2048
> >                   14  2011-06-23 10:56:52  -d-        2048
> >                   16  2011-06-23 10:56:52  -d-        2048
> >                   17  2011-06-23 10:56:52  -d-        2048
> >                   18  2011-06-23 10:56:52  -d-        2048
> >                   19  2011-06-23 10:56:53  -d-        2048
> p>                   20  2011-06-23 10:56:54  ad-        1273
> >                   21  ---------- --:--:--  ad-           0
> >                  946  2011-06-23 10:52:27  -d-        2048
> >                  947  2011-06-23 10:52:28  -d-        2048
> >                  948  2011-06-23 10:52:28  -d-        2048
> >                  949  2011-06-23 10:52:28  -d-        2048
> > 			.
> > 			.
> > 			.
> > Though dumpseg 759 does not show anything untoward (I don't think its used any further, correct?):
> > 
> > dumpseg /dev/sda2 759
> > segment: segnum = 759
> >   sequence number = 608068, next segnum = 760
> >   partial segment: blocknr = 1554432, nblocks = 2048
> >     creation time = 2011-06-23 10:48:02
> >     nfinfo = 652
> >     finfo
> >       ino = 7984, cno = 13, nblocks = 756, ndatblk = 756
> >         vblocknr = 146359, blkoff = 30686, blocknr = 1554444
> >         vblocknr = 146360, blkoff = 30687, blocknr = 1554445
> > 		.
> > 		.
> > 		.
> >     finfo
> >       ino = 16619, cno = 3763620, nblocks = 2, ndatblk = 2
> >         vblocknr = 224656, blkoff = 304, blocknr = 1555200
> >         vblocknr = 224635, blkoff = 305, blocknr = 1555201
> >     finfo
> >       ino = 16619, cno = 3763616, nblocks = 1, ndatblk = 1
> >         vblocknr = 224551, blkoff = 303, blocknr = 1555202
> > 		.
> > 		.
> > 		.
> 
> Hmm, the segment looks to be overwritten with new data after the
> partition was successfully mounted.  I don't know if it's certainly
> safe now, but It might be needless fear.
> 
> > One other question I have for anybody on the list or Ryusuke, on a
> > corruption of nilfs on older kernels (pre 2.6.30) should I leave
> > fsck0.nilfs2 to run on the initscripts besides the new 2.0.22 kernel
> > module or is this really redundant? Thanks for any
> > help/comments. All, as far as I can see, this is a pretty cool
> > filesystem.
> 
> For now, fsck0.nilfs2 is just a manual rollback tool.  There is no
> merit to run it from initscripts since it doesn't verify filesystem
> consistency.  ( Clearly, making a true fsck is one of TODO items. )
> 
> As for fsck0.nilfs2, you only have to use it when you couldn't mount
> the partition.  I hope this never happens for the 2.0.22 module.
> 
> Thanks for your interest and help.
> 
> Regards,
> Ryusuke Konishi
> 
> 
> > Zahid 
> > 
> > -----Original Message-----
> > From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org] 
> > Sent: Thursday, June 23, 2011 4:25 AM
> > To: Zahid Chowdhury
> > Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Subject: Re: mount & fsck of nilfs partition fail.
> > 
> > On Mon, 20 Jun 2011 11:27:49 -0700, Zahid Chowdhury wrote:
> > > Hello Ryusuke,
> > >
> > >   Sorry, I was away on the w/e. I've attached the console trace and
> > >   the out file again for posterity. I will be upgrading to the
> > >   recently released 2.0.22 version, and will try to mount the
> > >   corrupted filesystem with it - unlikely, it will work, though it
> > >   should help on future filesystems based on nilfs2? Thanks for the
> > >   fsck help and the new release for older kernels. Please let me
> > >   know if you need anything further, such that I can recover the
> > >   corrupted filesystem.
> > >
> > > Zahid
> > > 
> > > The console trace:
> > > /sbin/fsck0.nilfs2 -f -v /dev/sda2
> > > Super-block:
> > >     revision = 2.0
> > >     blocksize = 4096
> > >     write time = 2011-06-11 23:22:03
> > >     indicated log: blocknr = 1648528
> > >         segnum = 804, seq = 401758, cno=3250953
> > > 
> > > Unclean FS.
> > > The latest log is lost. Trying rollback recovery..
> > > ......
> > > Searching the latest checkpoint.
> > > get_latest_cno: log_start=1556429 (segnum=759): nfinfo=6, fblocknr=1556430
> > > get_latest_cno: finfo: ino=17874, sum-blocknr=1556429, offset=80, nblocks=2, ndatablk=1, fblocknr=1556430
> > > get_latest_cno: finfo: ino=17875, sum-blocknr=1556429, offset=128, nblocks=1, ndatablk=1, fblocknr=1556432
> > > get_latest_cno: finfo: ino=6, sum-blocknr=1556429, offset=168, nblocks=2, ndatablk=1, fblocknr=1556433
> > > get_latest_cno: finfo: ino=4, sum-blocknr=1556429, offset=216, nblocks=3, ndatablk=2, fblocknr=1556435
> > > get_latest_cno: finfo: ino=4499, sum-blocknr=1556429, offset=280, nblocks=1306282328, ndatablk=0, fblocknr=1556438
> > 
> > According to this log, the summary information of segment #759 looks
> > broken.  This may cause future GC failure or filesystem corruption.
> > 
> > Could you confirm whether the segment summary is actually broken or
> > not ?  This can be done with dumpseg tool:
> > 
> >  # dumpseg /dev/sda2 759
> > 
> > If it looks actually broken, I recommend you to back up all data as
> > soon as possible.
> > 
> > Regards,
> > Ryusuke Konishi
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: mount & fsck of nilfs partition fail.
       [not found]                                                                                 ` <053D39D3D76C474EB2D2A284AA6BA3181B05F38A67-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
@ 2011-07-14  0:54                                                                                   ` Zahid Chowdhury
       [not found]                                                                                     ` <053D39D3D76C474EB2D2A284AA6BA3181B05F390D2-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Zahid Chowdhury @ 2011-07-14  0:54 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hello Ryusuke,
  Under a 4 day test with power cycles in there we encountered a similar problem after a complete backup & restore of the filesystem (used gnu tar). The error codes are similar - I tried adding some of the debug options under /proc, no relevant extra information exists. I also tried upgrading to a newer nilfs-utils (2.0.23 vs. the 2.0.22 that I ran with up till now) - no change in the error messages. The fsck0.nilfs2 does not fix this error. though it states things are okay:

fsck0.nilfs2 /dev/sda2
Super-block:
    revision = 2.0
    blocksize = 4096
    write time = 2011-07-13 17:36:01
    indicated log: blocknr = 2165165
        segnum = 1057, seq = 299785, cno=2206916

Clean FS.
A valid log is pointed to by superblock (No change needed): blocknr = 2165165
    segnum = 1057, seq = 299785, cno=2206916
    creation time = 2011-07-13 17:24:31

the mount gives the following error under 2.0.23:

kernel: segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds
kernel: NILFS warning: mounting fs with errors
nilfs_cleanerd[2970]: start
kernel:  [<c04c2fbc>] nilfs_btree_do_lookup+0xa9/0x234
kernel:  [<c04c2fdf>] nilfs_btree_do_lookup+0xcc/0x234
kernel:  [<c04c438c>] nilfs_btree_lookup+0x42/0x7f
kernel:  [<c04c2aa2>] nilfs_bmap_lookup_at_level+0x2b/0x81
kernel:  [<c04c2b11>] nilfs_bmap_lookup+0x19/0x2d
kernel:  [<c04c156a>] nilfs_mdt_submit_block+0x9a/0x131
kernel:  [<c04c163d>] nilfs_mdt_read_block+0x3c/0x1b1
kernel:  [<c04c193a>] nilfs_mdt_get_block+0x2c/0x277
kernel:  [<c04059d7>] apic_timer_interrupt+0x1f/0x24
kernel:  [<c04d1316>] nilfs_palloc_get_entry_block+0x45/0x4c
kernel:  [<c04d13aa>] nilfs_palloc_block_get_entry+0x12/0x41
kernel:  [<c04c69db>] nilfs_dat_get_vinfo+0x46/0x1bf
kernel:  [<c04d2da1>] nilfs_ioctl_do_get_vinfo+0x51/0x60
kernel:  [<c04d20e9>] nilfs_ioctl_wrap_copy+0xdd/0x16b
kernel:  [<c04d21c7>] nilfs_ioctl_get_info+0x50/0x7a
kernel:  [<c04d2d50>] nilfs_ioctl_do_get_vinfo+0x0/0x60
kernel:  [<c04d24d6>] nilfs_ioctl+0x238/0x57d
kernel:  [<c04d2d50>] nilfs_ioctl_do_get_vinfo+0x0/0x60
kernel:  [<c045194a>] delayacct_end+0x58/0x7a
kernel:  [<c045cf86>] get_page_from_freelist+0x96/0x370
kernel:  [<c045d1af>] get_page_from_freelist+0x2bf/0x370
kernel:  [<c045d2c9>] __alloc_pages+0x69/0x2cf
kernel:  [<c0464e3f>] __handle_mm_fault+0x690/0xaac
kernel:  [<c04d229e>] nilfs_ioctl+0x0/0x57d
kernel:  [<c048620d>] do_ioctl+0x1c/0x5d
kernel:  [<c04867a1>] vfs_ioctl+0x47b/0x4d3
kernel:  [<c0637748>] do_page_fault+0x2a4/0x52d
kernel:  [<c0486841>] sys_ioctl+0x48/0x5f
kernel:  [<c0404f17>] syscall_call+0x7/0xb
kernel:  =======================
kernel: NILFS: btree level mismatch: 84 != 1
nilfs_cleanerd[2970]: shutdown

I can mount the filesystem without the cleanerd (-n) as writable, but we do need the gc daemon to clear space on a writable filesystem. Please let me know if there is an easy fix to this problem, otherwise, I guess a backup/restore would let me continue, correct? Also, let me know if you need any more information. Thanks.

Zahid

-----Original Message-----
From: linux-nilfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-nilfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Zahid Chowdhury
Sent: Friday, July 08, 2011 4:53 PM
To: Ryusuke Konishi
Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: RE: mount & fsck of nilfs partition fail.

Hello Ryusuke,
  I have done a backup/restore of the whole nilfs partition. I will let the mailing-list know if we have any more problems on a longrun use test with the new kernel module. Thanks.

Zahid

-----Original Message-----
From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org]
Sent: Tuesday, July 05, 2011 7:16 PM
To: Zahid Chowdhury
Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: mount & fsck of nilfs partition fail.

Hi Zahid,
On Mon, 4 Jul 2011 17:29:01 -0700, Zahid Chowdhury wrote:
> Hello Ryusuke,
>   On a relatively quiescent system I still encountered a mount failure on a power cycle. The messages in /var/log/messages were:
>
>   kernel: NILFS warning: mounting unchecked fs
>   kernel: NILFS: recovery complete.
>   kernel: segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds
>   kernel:  [<c04c2fbc>] nilfs_btree_do_lookup+0xa9/0x234
>   kernel:  [<c04c2fdf>] nilfs_btree_do_lookup+0xcc/0x234
>   kernel:  [<c04c438c>] nilfs_btree_lookup+0x42/0x7f
>   kernel:  [<c04c2aa2>] nilfs_bmap_lookup_at_level+0x2b/0x81
>   kernel:  [<c04c2b11>] nilfs_bmap_lookup+0x19/0x2d
>   kernel:  [<c04c156a>] nilfs_mdt_submit_block+0x9a/0x131
>   kernel:  [<c04c163d>] nilfs_mdt_read_block+0x3c/0x1b1
>   kernel:  [<c04c193a>] nilfs_mdt_get_block+0x2c/0x277
>   kernel:  [<c0477b75>] alloc_page_buffers+0x74/0xba
>   kernel:  [<c04d1316>] nilfs_palloc_get_entry_block+0x45/0x4c
>   kernel:  [<c04c7125>] nilfs_dat_translate+0x3c/0x137
>   kernel:  [<c04c2032>] nilfs_btnode_submit_block+0x1a3/0x29e
>   kernel:  [<c04c2144>] nilfs_btnode_get+0x17/0x5f
>   kernel:  [<c04c2f0f>] nilfs_btree_get_block+0x12/0x16
>   kernel:  [<c04c2fbc>] nilfs_btree_do_lookup+0xa9/0x234
>   kernel:  [<c04c438c>] nilfs_btree_lookup+0x42/0x7f
>   kernel:  [<c04c2aa2>] nilfs_bmap_lookup_at_level+0x2b/0x81
>   kernel:  [<c04c2b11>] nilfs_bmap_lookup+0x19/0x2d
>   kernel:  [<c04c156a>] nilfs_mdt_submit_block+0x9a/0x131
>   kernel:  [<c04c163d>] nilfs_mdt_read_block+0x3c/0x1b1
>   kernel:  [<c04c193a>] nilfs_mdt_get_block+0x2c/0x277
>   kernel:  [<c04d1316>] nilfs_palloc_get_entry_block+0x45/0x4c
>   kernel:  [<c04d0ff3>] nilfs_ifile_get_inode_block+0x57/0x94
>   kernel:  [<c04bcdee>] nilfs_read_inode+0x6a/0x1a6
>   kernel:  [<c04bf7a0>] nilfs_get_sb+0x40f/0x65e
>   kernel:  [<c045d2c9>] __alloc_pages+0x69/0x2cf
>   kernel:  [<c047c152>] vfs_kern_mount+0x7d/0xf2
>   kernel:  [<c047c1f9>] do_kern_mount+0x25/0x36
>   kernel:  [<c048fbee>] do_mount+0x5fb/0x66b
>   kernel:  [<c04589df>] find_get_page+0x18/0x3f
>   kernel:  [<c045b50a>] filemap_nopage+0x19f/0x349
>   kernel:  [<c0464e3f>] __handle_mm_fault+0x690/0xaac
>   kernel:  [<c0484323>] __link_path_walk+0xd29/0xd4b
>   kernel:  [<c045b50a>] filemap_nopage+0x19f/0x349
>   kernel:  [<c06376de>] do_page_fault+0x23a/0x52d
>   kernel:  [<c0637748>] do_page_fault+0x2a4/0x52d
>   kernel:  [<c06374a4>] do_page_fault+0x0/0x52d
>   kernel:  [<c048eb45>] copy_mount_options+0x90/0x109
>   kernel:  [<c048fccb>] sys_mount+0x6d/0xa5
>   kernel:  [<c0404f17>] syscall_call+0x7/0xb
>   kernel:  =======================
>   kernel: NILFS: btree level mismatch: 114 != 1
>   kernel: NILFS error (device sda2): nilfs_ifile_get_inode_block: ifile is broken
>   kernel: Remounting filesystem read-only
>   kernel: NILFS: get root inode failed
>
>
> I ran fsck0.nilfs2:
>   /sbin/fsck0.nilfs2 -v -f /dev/sda2
>   Super-block:
>       revision = 2.0
>       blocksize = 4096
>       write time = 2011-07-02 06:09:20
>       indicated log: blocknr = 2097786
>           segnum = 1024, seq = 2055540, cno=1775795
>
>   Clean FS.
>   The latest log is lost. Trying rollback recovery..
>   .......
>   Selected log: blocknr = 2097655
>       segnum = 1024, seq = 2055540, cno=1775793
>       creation time = 2011-07-02 06:08:13
>   Do you wish to overwrite super block (y/N)? y
>   Recovery will complete on mount.
>
> From then on the mount has worked always, but I get the following error in /var/log/messages always on the mount:
>
>   kernel: segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds
>   kernel: NILFS warning: mounting fs with errors
>
> Also:
>   dmesg | grep -i nilfs
>       NILFS nilfs_fill_super: start(silent=0)
>       NILFS(recovery) nilfs_search_super_root: found super root: segnum=251, seq=2062534, pseg_start=514624, pseg_offset=621
>       NILFS warning: mounting fs with errors
>       NILFS nilfs_fill_super: mounted filesystem
>
>   nilfs-tune -l /dev/sda2
>
>      Filesystem state:         invalid or mounted,error
>
> All of the daemons on our system run with no problems with the existing nilfs partition, but the warnings make us wonder. Can we continue using this nilfs2 partition or might we have issues in the future. Thanks for any help.
>
> Zahid

The current nilfs sets an error flag on super blocks once it detected
inconsistency in the filesystem.

The error flag will not be cleared even after fsck0.nilfs2 or
mount-time rollback succeeded.  This is a limitation of the
fsck0.nilfs2 program, so the warning remains irrelevantly with whether
the filesystem has an actual defect. (sorry)

If you can back up the filesystem and restore it for a new nilfs
partition, I would like to ask you to do so.

This is because there was a crucial btree bug in nilfs modules older
than version 2.0.22.  It can be the cause of the above error (even if
you are now using 2.0.22).

To narrow down whether the error came from older nilfs modules or the
2.0.22 module still has a crucial bug, we need longrun use test with a
nilfs partition which has been never mounted by older modules.

Regards,
Ryusuke Konishi

>
> -----Original Message-----
> From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org]
> Sent: Friday, June 24, 2011 9:27 AM
> To: Zahid Chowdhury
> Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: mount & fsck of nilfs partition fail.
>
> On Thu, 23 Jun 2011 11:21:03 -0700, Zahid Chowdhury wrote:
> > Hello Ryusuke,
> >   After the new kernel module (2.0.22) the nilfs partition mounted
> >   with no problems. I have encountered no problems since then. Doing
> >   a lssu(1) does not show segment 759 to be on the list of used
> >   segments any further:
> >
> > lssu -a /dev/sda2
> >               SEGNUM        DATE     TIME STAT     NBLOCKS
> >                    0  2011-06-23 10:56:42  -d-        2047
> >                    1  2011-06-23 10:56:42  -d-        2048
> >                    2  2011-06-23 10:56:42  -d-        2048
> >                    3  2011-06-23 10:56:42  -d-        2048
> >                    4  2011-06-23 10:56:44  -d-        2048
> >                    5  2011-06-23 10:56:46  -d-        2048
> >                    7  2011-06-23 10:56:46  -d-        2048
> >                    8  2011-06-23 10:56:47  -d-        2048
> >                    9  2011-06-23 10:56:47  -d-        2048
> >                   10  2011-06-23 10:56:47  -d-        2048
> >                   11  2011-06-23 10:56:47  -d-        2048
> >                   12  2011-06-23 10:56:47  -d-        2048
> >                   13  2011-06-23 10:56:52  -d-        2048
> >                   14  2011-06-23 10:56:52  -d-        2048
> >                   16  2011-06-23 10:56:52  -d-        2048
> >                   17  2011-06-23 10:56:52  -d-        2048
> >                   18  2011-06-23 10:56:52  -d-        2048
> >                   19  2011-06-23 10:56:53  -d-        2048
> p>                   20  2011-06-23 10:56:54  ad-        1273
> >                   21  ---------- --:--:--  ad-           0
> >                  946  2011-06-23 10:52:27  -d-        2048
> >                  947  2011-06-23 10:52:28  -d-        2048
> >                  948  2011-06-23 10:52:28  -d-        2048
> >                  949  2011-06-23 10:52:28  -d-        2048
> >                     .
> >                     .
> >                     .
> > Though dumpseg 759 does not show anything untoward (I don't think its used any further, correct?):
> >
> > dumpseg /dev/sda2 759
> > segment: segnum = 759
> >   sequence number = 608068, next segnum = 760
> >   partial segment: blocknr = 1554432, nblocks = 2048
> >     creation time = 2011-06-23 10:48:02
> >     nfinfo = 652
> >     finfo
> >       ino = 7984, cno = 13, nblocks = 756, ndatblk = 756
> >         vblocknr = 146359, blkoff = 30686, blocknr = 1554444
> >         vblocknr = 146360, blkoff = 30687, blocknr = 1554445
> >             .
> >             .
> >             .
> >     finfo
> >       ino = 16619, cno = 3763620, nblocks = 2, ndatblk = 2
> >         vblocknr = 224656, blkoff = 304, blocknr = 1555200
> >         vblocknr = 224635, blkoff = 305, blocknr = 1555201
> >     finfo
> >       ino = 16619, cno = 3763616, nblocks = 1, ndatblk = 1
> >         vblocknr = 224551, blkoff = 303, blocknr = 1555202
> >             .
> >             .
> >             .
>
> Hmm, the segment looks to be overwritten with new data after the
> partition was successfully mounted.  I don't know if it's certainly
> safe now, but It might be needless fear.
>
> > One other question I have for anybody on the list or Ryusuke, on a
> > corruption of nilfs on older kernels (pre 2.6.30) should I leave
> > fsck0.nilfs2 to run on the initscripts besides the new 2.0.22 kernel
> > module or is this really redundant? Thanks for any
> > help/comments. All, as far as I can see, this is a pretty cool
> > filesystem.
>
> For now, fsck0.nilfs2 is just a manual rollback tool.  There is no
> merit to run it from initscripts since it doesn't verify filesystem
> consistency.  ( Clearly, making a true fsck is one of TODO items. )
>
> As for fsck0.nilfs2, you only have to use it when you couldn't mount
> the partition.  I hope this never happens for the 2.0.22 module.
>
> Thanks for your interest and help.
>
> Regards,
> Ryusuke Konishi
>
>
> > Zahid
> >
> > -----Original Message-----
> > From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org]
> > Sent: Thursday, June 23, 2011 4:25 AM
> > To: Zahid Chowdhury
> > Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Subject: Re: mount & fsck of nilfs partition fail.
> >
> > On Mon, 20 Jun 2011 11:27:49 -0700, Zahid Chowdhury wrote:
> > > Hello Ryusuke,
> > >
> > >   Sorry, I was away on the w/e. I've attached the console trace and
> > >   the out file again for posterity. I will be upgrading to the
> > >   recently released 2.0.22 version, and will try to mount the
> > >   corrupted filesystem with it - unlikely, it will work, though it
> > >   should help on future filesystems based on nilfs2? Thanks for the
> > >   fsck help and the new release for older kernels. Please let me
> > >   know if you need anything further, such that I can recover the
> > >   corrupted filesystem.
> > >
> > > Zahid
> > >
> > > The console trace:
> > > /sbin/fsck0.nilfs2 -f -v /dev/sda2
> > > Super-block:
> > >     revision = 2.0
> > >     blocksize = 4096
> > >     write time = 2011-06-11 23:22:03
> > >     indicated log: blocknr = 1648528
> > >         segnum = 804, seq = 401758, cno=3250953
> > >
> > > Unclean FS.
> > > The latest log is lost. Trying rollback recovery..
> > > ......
> > > Searching the latest checkpoint.
> > > get_latest_cno: log_start=1556429 (segnum=759): nfinfo=6, fblocknr=1556430
> > > get_latest_cno: finfo: ino=17874, sum-blocknr=1556429, offset=80, nblocks=2, ndatablk=1, fblocknr=1556430
> > > get_latest_cno: finfo: ino=17875, sum-blocknr=1556429, offset=128, nblocks=1, ndatablk=1, fblocknr=1556432
> > > get_latest_cno: finfo: ino=6, sum-blocknr=1556429, offset=168, nblocks=2, ndatablk=1, fblocknr=1556433
> > > get_latest_cno: finfo: ino=4, sum-blocknr=1556429, offset=216, nblocks=3, ndatablk=2, fblocknr=1556435
> > > get_latest_cno: finfo: ino=4499, sum-blocknr=1556429, offset=280, nblocks=1306282328, ndatablk=0, fblocknr=1556438
> >
> > According to this log, the summary information of segment #759 looks
> > broken.  This may cause future GC failure or filesystem corruption.
> >
> > Could you confirm whether the segment summary is actually broken or
> > not ?  This can be done with dumpseg tool:
> >
> >  # dumpseg /dev/sda2 759
> >
> > If it looks actually broken, I recommend you to back up all data as
> > soon as possible.
> >
> > Regards,
> > Ryusuke Konishi
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: mount & fsck of nilfs partition fail.
       [not found]                                                                                     ` <053D39D3D76C474EB2D2A284AA6BA3181B05F390D2-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
@ 2011-08-08 21:33                                                                                       ` Zahid Chowdhury
       [not found]                                                                                         ` <053D39D3D76C474EB2D2A284AA6BA3181B06066B94-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Zahid Chowdhury @ 2011-08-08 21:33 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hello Ryusuke/nilfs kernel developers,
  Can you please let me know if I should keep the corrupted nilfs filesystem described below for nilfs bug fixes, or if no solution exists for this corruption. If it is the latter then, I will backup, repartition the filesystem, and then finally restore the filesystem and continue my work. Thanks.

Zahid

-----Original Message-----
From: linux-nilfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-nilfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Zahid Chowdhury
Sent: Wednesday, July 13, 2011 5:55 PM
To: Ryusuke Konishi
Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: RE: mount & fsck of nilfs partition fail.

Hello Ryusuke,
  Under a 4 day test with power cycles in there we encountered a similar problem after a complete backup & restore of the filesystem (used gnu tar). The error codes are similar - I tried adding some of the debug options under /proc, no relevant extra information exists. I also tried upgrading to a newer nilfs-utils (2.0.23 vs. the 2.0.22 that I ran with up till now) - no change in the error messages. The fsck0.nilfs2 does not fix this error. though it states things are okay:

fsck0.nilfs2 /dev/sda2
Super-block:
    revision = 2.0
    blocksize = 4096
    write time = 2011-07-13 17:36:01
    indicated log: blocknr = 2165165
        segnum = 1057, seq = 299785, cno=2206916

Clean FS.
A valid log is pointed to by superblock (No change needed): blocknr = 2165165
    segnum = 1057, seq = 299785, cno=2206916
    creation time = 2011-07-13 17:24:31

the mount gives the following error under 2.0.23:

kernel: segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds
kernel: NILFS warning: mounting fs with errors
nilfs_cleanerd[2970]: start
kernel:  [<c04c2fbc>] nilfs_btree_do_lookup+0xa9/0x234
kernel:  [<c04c2fdf>] nilfs_btree_do_lookup+0xcc/0x234
kernel:  [<c04c438c>] nilfs_btree_lookup+0x42/0x7f
kernel:  [<c04c2aa2>] nilfs_bmap_lookup_at_level+0x2b/0x81
kernel:  [<c04c2b11>] nilfs_bmap_lookup+0x19/0x2d
kernel:  [<c04c156a>] nilfs_mdt_submit_block+0x9a/0x131
kernel:  [<c04c163d>] nilfs_mdt_read_block+0x3c/0x1b1
kernel:  [<c04c193a>] nilfs_mdt_get_block+0x2c/0x277
kernel:  [<c04059d7>] apic_timer_interrupt+0x1f/0x24
kernel:  [<c04d1316>] nilfs_palloc_get_entry_block+0x45/0x4c
kernel:  [<c04d13aa>] nilfs_palloc_block_get_entry+0x12/0x41
kernel:  [<c04c69db>] nilfs_dat_get_vinfo+0x46/0x1bf
kernel:  [<c04d2da1>] nilfs_ioctl_do_get_vinfo+0x51/0x60
kernel:  [<c04d20e9>] nilfs_ioctl_wrap_copy+0xdd/0x16b
kernel:  [<c04d21c7>] nilfs_ioctl_get_info+0x50/0x7a
kernel:  [<c04d2d50>] nilfs_ioctl_do_get_vinfo+0x0/0x60
kernel:  [<c04d24d6>] nilfs_ioctl+0x238/0x57d
kernel:  [<c04d2d50>] nilfs_ioctl_do_get_vinfo+0x0/0x60
kernel:  [<c045194a>] delayacct_end+0x58/0x7a
kernel:  [<c045cf86>] get_page_from_freelist+0x96/0x370
kernel:  [<c045d1af>] get_page_from_freelist+0x2bf/0x370
kernel:  [<c045d2c9>] __alloc_pages+0x69/0x2cf
kernel:  [<c0464e3f>] __handle_mm_fault+0x690/0xaac
kernel:  [<c04d229e>] nilfs_ioctl+0x0/0x57d
kernel:  [<c048620d>] do_ioctl+0x1c/0x5d
kernel:  [<c04867a1>] vfs_ioctl+0x47b/0x4d3
kernel:  [<c0637748>] do_page_fault+0x2a4/0x52d
kernel:  [<c0486841>] sys_ioctl+0x48/0x5f
kernel:  [<c0404f17>] syscall_call+0x7/0xb
kernel:  =======================
kernel: NILFS: btree level mismatch: 84 != 1
nilfs_cleanerd[2970]: shutdown

I can mount the filesystem without the cleanerd (-n) as writable, but we do need the gc daemon to clear space on a writable filesystem. Please let me know if there is an easy fix to this problem, otherwise, I guess a backup/restore would let me continue, correct? Also, let me know if you need any more information. Thanks.

Zahid

-----Original Message-----
From: linux-nilfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-nilfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Zahid Chowdhury
Sent: Friday, July 08, 2011 4:53 PM
To: Ryusuke Konishi
Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: RE: mount & fsck of nilfs partition fail.

Hello Ryusuke,
  I have done a backup/restore of the whole nilfs partition. I will let the mailing-list know if we have any more problems on a longrun use test with the new kernel module. Thanks.

Zahid

-----Original Message-----
From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org]
Sent: Tuesday, July 05, 2011 7:16 PM
To: Zahid Chowdhury
Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: mount & fsck of nilfs partition fail.

Hi Zahid,
On Mon, 4 Jul 2011 17:29:01 -0700, Zahid Chowdhury wrote:
> Hello Ryusuke,
>   On a relatively quiescent system I still encountered a mount failure on a power cycle. The messages in /var/log/messages were:
>
>   kernel: NILFS warning: mounting unchecked fs
>   kernel: NILFS: recovery complete.
>   kernel: segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds
>   kernel:  [<c04c2fbc>] nilfs_btree_do_lookup+0xa9/0x234
>   kernel:  [<c04c2fdf>] nilfs_btree_do_lookup+0xcc/0x234
>   kernel:  [<c04c438c>] nilfs_btree_lookup+0x42/0x7f
>   kernel:  [<c04c2aa2>] nilfs_bmap_lookup_at_level+0x2b/0x81
>   kernel:  [<c04c2b11>] nilfs_bmap_lookup+0x19/0x2d
>   kernel:  [<c04c156a>] nilfs_mdt_submit_block+0x9a/0x131
>   kernel:  [<c04c163d>] nilfs_mdt_read_block+0x3c/0x1b1
>   kernel:  [<c04c193a>] nilfs_mdt_get_block+0x2c/0x277
>   kernel:  [<c0477b75>] alloc_page_buffers+0x74/0xba
>   kernel:  [<c04d1316>] nilfs_palloc_get_entry_block+0x45/0x4c
>   kernel:  [<c04c7125>] nilfs_dat_translate+0x3c/0x137
>   kernel:  [<c04c2032>] nilfs_btnode_submit_block+0x1a3/0x29e
>   kernel:  [<c04c2144>] nilfs_btnode_get+0x17/0x5f
>   kernel:  [<c04c2f0f>] nilfs_btree_get_block+0x12/0x16
>   kernel:  [<c04c2fbc>] nilfs_btree_do_lookup+0xa9/0x234
>   kernel:  [<c04c438c>] nilfs_btree_lookup+0x42/0x7f
>   kernel:  [<c04c2aa2>] nilfs_bmap_lookup_at_level+0x2b/0x81
>   kernel:  [<c04c2b11>] nilfs_bmap_lookup+0x19/0x2d
>   kernel:  [<c04c156a>] nilfs_mdt_submit_block+0x9a/0x131
>   kernel:  [<c04c163d>] nilfs_mdt_read_block+0x3c/0x1b1
>   kernel:  [<c04c193a>] nilfs_mdt_get_block+0x2c/0x277
>   kernel:  [<c04d1316>] nilfs_palloc_get_entry_block+0x45/0x4c
>   kernel:  [<c04d0ff3>] nilfs_ifile_get_inode_block+0x57/0x94
>   kernel:  [<c04bcdee>] nilfs_read_inode+0x6a/0x1a6
>   kernel:  [<c04bf7a0>] nilfs_get_sb+0x40f/0x65e
>   kernel:  [<c045d2c9>] __alloc_pages+0x69/0x2cf
>   kernel:  [<c047c152>] vfs_kern_mount+0x7d/0xf2
>   kernel:  [<c047c1f9>] do_kern_mount+0x25/0x36
>   kernel:  [<c048fbee>] do_mount+0x5fb/0x66b
>   kernel:  [<c04589df>] find_get_page+0x18/0x3f
>   kernel:  [<c045b50a>] filemap_nopage+0x19f/0x349
>   kernel:  [<c0464e3f>] __handle_mm_fault+0x690/0xaac
>   kernel:  [<c0484323>] __link_path_walk+0xd29/0xd4b
>   kernel:  [<c045b50a>] filemap_nopage+0x19f/0x349
>   kernel:  [<c06376de>] do_page_fault+0x23a/0x52d
>   kernel:  [<c0637748>] do_page_fault+0x2a4/0x52d
>   kernel:  [<c06374a4>] do_page_fault+0x0/0x52d
>   kernel:  [<c048eb45>] copy_mount_options+0x90/0x109
>   kernel:  [<c048fccb>] sys_mount+0x6d/0xa5
>   kernel:  [<c0404f17>] syscall_call+0x7/0xb
>   kernel:  =======================
>   kernel: NILFS: btree level mismatch: 114 != 1
>   kernel: NILFS error (device sda2): nilfs_ifile_get_inode_block: ifile is broken
>   kernel: Remounting filesystem read-only
>   kernel: NILFS: get root inode failed
>
>
> I ran fsck0.nilfs2:
>   /sbin/fsck0.nilfs2 -v -f /dev/sda2
>   Super-block:
>       revision = 2.0
>       blocksize = 4096
>       write time = 2011-07-02 06:09:20
>       indicated log: blocknr = 2097786
>           segnum = 1024, seq = 2055540, cno=1775795
>
>   Clean FS.
>   The latest log is lost. Trying rollback recovery..
>   .......
>   Selected log: blocknr = 2097655
>       segnum = 1024, seq = 2055540, cno=1775793
>       creation time = 2011-07-02 06:08:13
>   Do you wish to overwrite super block (y/N)? y
>   Recovery will complete on mount.
>
> From then on the mount has worked always, but I get the following error in /var/log/messages always on the mount:
>
>   kernel: segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds
>   kernel: NILFS warning: mounting fs with errors
>
> Also:
>   dmesg | grep -i nilfs
>       NILFS nilfs_fill_super: start(silent=0)
>       NILFS(recovery) nilfs_search_super_root: found super root: segnum=251, seq=2062534, pseg_start=514624, pseg_offset=621
>       NILFS warning: mounting fs with errors
>       NILFS nilfs_fill_super: mounted filesystem
>
>   nilfs-tune -l /dev/sda2
>
>      Filesystem state:         invalid or mounted,error
>
> All of the daemons on our system run with no problems with the existing nilfs partition, but the warnings make us wonder. Can we continue using this nilfs2 partition or might we have issues in the future. Thanks for any help.
>
> Zahid

The current nilfs sets an error flag on super blocks once it detected
inconsistency in the filesystem.

The error flag will not be cleared even after fsck0.nilfs2 or
mount-time rollback succeeded.  This is a limitation of the
fsck0.nilfs2 program, so the warning remains irrelevantly with whether
the filesystem has an actual defect. (sorry)

If you can back up the filesystem and restore it for a new nilfs
partition, I would like to ask you to do so.

This is because there was a crucial btree bug in nilfs modules older
than version 2.0.22.  It can be the cause of the above error (even if
you are now using 2.0.22).

To narrow down whether the error came from older nilfs modules or the
2.0.22 module still has a crucial bug, we need longrun use test with a
nilfs partition which has been never mounted by older modules.

Regards,
Ryusuke Konishi

>
> -----Original Message-----
> From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org]
> Sent: Friday, June 24, 2011 9:27 AM
> To: Zahid Chowdhury
> Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: mount & fsck of nilfs partition fail.
>
> On Thu, 23 Jun 2011 11:21:03 -0700, Zahid Chowdhury wrote:
> > Hello Ryusuke,
> >   After the new kernel module (2.0.22) the nilfs partition mounted
> >   with no problems. I have encountered no problems since then. Doing
> >   a lssu(1) does not show segment 759 to be on the list of used
> >   segments any further:
> >
> > lssu -a /dev/sda2
> >               SEGNUM        DATE     TIME STAT     NBLOCKS
> >                    0  2011-06-23 10:56:42  -d-        2047
> >                    1  2011-06-23 10:56:42  -d-        2048
> >                    2  2011-06-23 10:56:42  -d-        2048
> >                    3  2011-06-23 10:56:42  -d-        2048
> >                    4  2011-06-23 10:56:44  -d-        2048
> >                    5  2011-06-23 10:56:46  -d-        2048
> >                    7  2011-06-23 10:56:46  -d-        2048
> >                    8  2011-06-23 10:56:47  -d-        2048
> >                    9  2011-06-23 10:56:47  -d-        2048
> >                   10  2011-06-23 10:56:47  -d-        2048
> >                   11  2011-06-23 10:56:47  -d-        2048
> >                   12  2011-06-23 10:56:47  -d-        2048
> >                   13  2011-06-23 10:56:52  -d-        2048
> >                   14  2011-06-23 10:56:52  -d-        2048
> >                   16  2011-06-23 10:56:52  -d-        2048
> >                   17  2011-06-23 10:56:52  -d-        2048
> >                   18  2011-06-23 10:56:52  -d-        2048
> >                   19  2011-06-23 10:56:53  -d-        2048
> p>                   20  2011-06-23 10:56:54  ad-        1273
> >                   21  ---------- --:--:--  ad-           0
> >                  946  2011-06-23 10:52:27  -d-        2048
> >                  947  2011-06-23 10:52:28  -d-        2048
> >                  948  2011-06-23 10:52:28  -d-        2048
> >                  949  2011-06-23 10:52:28  -d-        2048
> >                     .
> >                     .
> >                     .
> > Though dumpseg 759 does not show anything untoward (I don't think its used any further, correct?):
> >
> > dumpseg /dev/sda2 759
> > segment: segnum = 759
> >   sequence number = 608068, next segnum = 760
> >   partial segment: blocknr = 1554432, nblocks = 2048
> >     creation time = 2011-06-23 10:48:02
> >     nfinfo = 652
> >     finfo
> >       ino = 7984, cno = 13, nblocks = 756, ndatblk = 756
> >         vblocknr = 146359, blkoff = 30686, blocknr = 1554444
> >         vblocknr = 146360, blkoff = 30687, blocknr = 1554445
> >             .
> >             .
> >             .
> >     finfo
> >       ino = 16619, cno = 3763620, nblocks = 2, ndatblk = 2
> >         vblocknr = 224656, blkoff = 304, blocknr = 1555200
> >         vblocknr = 224635, blkoff = 305, blocknr = 1555201
> >     finfo
> >       ino = 16619, cno = 3763616, nblocks = 1, ndatblk = 1
> >         vblocknr = 224551, blkoff = 303, blocknr = 1555202
> >             .
> >             .
> >             .
>
> Hmm, the segment looks to be overwritten with new data after the
> partition was successfully mounted.  I don't know if it's certainly
> safe now, but It might be needless fear.
>
> > One other question I have for anybody on the list or Ryusuke, on a
> > corruption of nilfs on older kernels (pre 2.6.30) should I leave
> > fsck0.nilfs2 to run on the initscripts besides the new 2.0.22 kernel
> > module or is this really redundant? Thanks for any
> > help/comments. All, as far as I can see, this is a pretty cool
> > filesystem.
>
> For now, fsck0.nilfs2 is just a manual rollback tool.  There is no
> merit to run it from initscripts since it doesn't verify filesystem
> consistency.  ( Clearly, making a true fsck is one of TODO items. )
>
> As for fsck0.nilfs2, you only have to use it when you couldn't mount
> the partition.  I hope this never happens for the 2.0.22 module.
>
> Thanks for your interest and help.
>
> Regards,
> Ryusuke Konishi
>
>
> > Zahid
> >
> > -----Original Message-----
> > From: Ryusuke Konishi [mailto:konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org]
> > Sent: Thursday, June 23, 2011 4:25 AM
> > To: Zahid Chowdhury
> > Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Subject: Re: mount & fsck of nilfs partition fail.
> >
> > On Mon, 20 Jun 2011 11:27:49 -0700, Zahid Chowdhury wrote:
> > > Hello Ryusuke,
> > >
> > >   Sorry, I was away on the w/e. I've attached the console trace and
> > >   the out file again for posterity. I will be upgrading to the
> > >   recently released 2.0.22 version, and will try to mount the
> > >   corrupted filesystem with it - unlikely, it will work, though it
> > >   should help on future filesystems based on nilfs2? Thanks for the
> > >   fsck help and the new release for older kernels. Please let me
> > >   know if you need anything further, such that I can recover the
> > >   corrupted filesystem.
> > >
> > > Zahid
> > >
> > > The console trace:
> > > /sbin/fsck0.nilfs2 -f -v /dev/sda2
> > > Super-block:
> > >     revision = 2.0
> > >     blocksize = 4096
> > >     write time = 2011-06-11 23:22:03
> > >     indicated log: blocknr = 1648528
> > >         segnum = 804, seq = 401758, cno=3250953
> > >
> > > Unclean FS.
> > > The latest log is lost. Trying rollback recovery..
> > > ......
> > > Searching the latest checkpoint.
> > > get_latest_cno: log_start=1556429 (segnum=759): nfinfo=6, fblocknr=1556430
> > > get_latest_cno: finfo: ino=17874, sum-blocknr=1556429, offset=80, nblocks=2, ndatablk=1, fblocknr=1556430
> > > get_latest_cno: finfo: ino=17875, sum-blocknr=1556429, offset=128, nblocks=1, ndatablk=1, fblocknr=1556432
> > > get_latest_cno: finfo: ino=6, sum-blocknr=1556429, offset=168, nblocks=2, ndatablk=1, fblocknr=1556433
> > > get_latest_cno: finfo: ino=4, sum-blocknr=1556429, offset=216, nblocks=3, ndatablk=2, fblocknr=1556435
> > > get_latest_cno: finfo: ino=4499, sum-blocknr=1556429, offset=280, nblocks=1306282328, ndatablk=0, fblocknr=1556438
> >
> > According to this log, the summary information of segment #759 looks
> > broken.  This may cause future GC failure or filesystem corruption.
> >
> > Could you confirm whether the segment summary is actually broken or
> > not ?  This can be done with dumpseg tool:
> >
> >  # dumpseg /dev/sda2 759
> >
> > If it looks actually broken, I recommend you to back up all data as
> > soon as possible.
> >
> > Regards,
> > Ryusuke Konishi
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: mount & fsck of nilfs partition fail.
       [not found]                                                                                         ` <053D39D3D76C474EB2D2A284AA6BA3181B06066B94-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
@ 2011-08-23  3:03                                                                                           ` Ryusuke Konishi
       [not found]                                                                                             ` <20110823.120347.178748126.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 26+ messages in thread
From: Ryusuke Konishi @ 2011-08-23  3:03 UTC (permalink / raw)
  To: zahid.chowdhury-VJizFkI/10gAspv4Qr0y0gC/G2K4zDHf
  Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Mon, 8 Aug 2011 14:33:55 -0700, Zahid Chowdhury wrote:
> Hello Ryusuke/nilfs kernel developers,
>   Can you please let me know if I should keep the corrupted nilfs
>   filesystem described below for nilfs bug fixes, or if no solution
>   exists for this corruption. If it is the latter then, I will
>   backup, repartition the filesystem, and then finally restore the
>   filesystem and continue my work. Thanks.
> 
> Zahid

Sorry for my late reply and the repetitive trouble of yours.

The error log says btree of DAT file broke.  I reviewed the btree
code, but haven't yet found any related issues so far.

It's hard to track the root cause from this kind of situation.
So, please reformat the filesystem and continue your work.

By the way, what kind of device are you using?  Is it a flash device
or a pendrive ?   Please tell me the product name If you don't mind.


Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: mount & fsck of nilfs partition fail.
       [not found]                                                                                             ` <20110823.120347.178748126.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2011-08-23 14:15                                                                                               ` Zahid Chowdhury
  0 siblings, 0 replies; 26+ messages in thread
From: Zahid Chowdhury @ 2011-08-23 14:15 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Hello,
  We are using a Kingspec 16 Gb SSD drive with the SATA i/f from the industrial standard catalog. It is of the MLC variety internally. If you
want more details, it is this part, except ours is 16 Gb:
  http://www.kingspec.com/solid-state-disk-products/ssd-25industsata-imlcj.htm

I still have the failed nilfs & can dual mount it to poke around it, if you
want any further information. I have recently moved on in my development to
another drive and have specifically kept the failed drive for debug
information for the nilfs community. I can still hold on to it for a while more until the reaper of drives wants it back from me. Thanks for any help.

Zahid

-----Original Message-----
From: linux-nilfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-nilfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Ryusuke Konishi
Sent: Monday, August 22, 2011 8:04 PM
To: Zahid Chowdhury
Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: mount & fsck of nilfs partition fail.

On Mon, 8 Aug 2011 14:33:55 -0700, Zahid Chowdhury wrote:
> Hello Ryusuke/nilfs kernel developers,
>   Can you please let me know if I should keep the corrupted nilfs
>   filesystem described below for nilfs bug fixes, or if no solution
>   exists for this corruption. If it is the latter then, I will
>   backup, repartition the filesystem, and then finally restore the
>   filesystem and continue my work. Thanks.
> 
> Zahid

Sorry for my late reply and the repetitive trouble of yours.

The error log says btree of DAT file broke.  I reviewed the btree
code, but haven't yet found any related issues so far.

It's hard to track the root cause from this kind of situation.
So, please reformat the filesystem and continue your work.

By the way, what kind of device are you using?  Is it a flash device
or a pendrive ?   Please tell me the product name If you don't mind.


Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2011-08-23 14:15 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-13  7:13 mount & fsck of nilfs partition fail Zahid Chowdhury
     [not found] ` <053D39D3D76C474EB2D2A284AA6BA3181B05E4E02D-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
2011-06-13 12:33   ` Ryusuke Konishi
     [not found]     ` <20110613.213316.221578492.ryusuke-sG5X7nlA6pw@public.gmane.org>
2011-06-13 21:12       ` Zahid Chowdhury
     [not found]         ` <053D39D3D76C474EB2D2A284AA6BA3181B05E4E167-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
2011-06-13 22:21           ` dexen deVries
     [not found]             ` <201106140021.52229.dexen.devries-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2011-06-13 22:28               ` mount & fsck of nilfs partition fail. [correction] dexen deVries
2011-06-13 23:28               ` mount & fsck of nilfs partition fail Zahid Chowdhury
     [not found]                 ` <053D39D3D76C474EB2D2A284AA6BA3181B05E4E1CE-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
2011-06-13 23:51                   ` Ryusuke Konishi
     [not found]                     ` <20110614.085157.212693296.ryusuke-sG5X7nlA6pw@public.gmane.org>
2011-06-14 18:04                       ` Zahid Chowdhury
     [not found]                         ` <053D39D3D76C474EB2D2A284AA6BA3181B05E4E394-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
2011-06-15  1:42                           ` Ryusuke Konishi
     [not found]                             ` <20110615.104251.29260790.ryusuke-sG5X7nlA6pw@public.gmane.org>
2011-06-15 10:58                               ` Ryusuke Konishi
     [not found]                                 ` <20110615.195858.252298449.ryusuke-sG5X7nlA6pw@public.gmane.org>
2011-06-15 18:32                                   ` Ryusuke Konishi
     [not found]                                     ` <20110616.033201.162617955.ryusuke-sG5X7nlA6pw@public.gmane.org>
2011-06-15 18:38                                       ` Zahid Chowdhury
     [not found]                                         ` <053D39D3D76C474EB2D2A284AA6BA3181B05E99563-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
2011-06-17 18:29                                           ` Ryusuke Konishi
     [not found]                                             ` <20110618.032928.182500686.ryusuke-sG5X7nlA6pw@public.gmane.org>
2011-06-17 21:55                                               ` Zahid Chowdhury
     [not found]                                                 ` <053D39D3D76C474EB2D2A284AA6BA3181B05E99A12-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
2011-06-18  4:53                                                   ` Ryusuke Konishi
     [not found]                                                     ` <20110618.135312.64853996.ryusuke-sG5X7nlA6pw@public.gmane.org>
2011-06-20 18:27                                                       ` Zahid Chowdhury
     [not found]                                                         ` <053D39D3D76C474EB2D2A284AA6BA3181B05E99CC6-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
2011-06-23 11:25                                                           ` Ryusuke Konishi
     [not found]                                                             ` <20110623.202505.27804490.ryusuke-sG5X7nlA6pw@public.gmane.org>
2011-06-23 18:21                                                               ` Zahid Chowdhury
     [not found]                                                                 ` <053D39D3D76C474EB2D2A284AA6BA3181B05E9A356-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
2011-06-24 16:26                                                                   ` Ryusuke Konishi
     [not found]                                                                     ` <20110625.012634.121140098.ryusuke-sG5X7nlA6pw@public.gmane.org>
2011-07-05  0:29                                                                       ` Zahid Chowdhury
     [not found]                                                                         ` <053D39D3D76C474EB2D2A284AA6BA3181B05EE2ED0-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
2011-07-06  2:16                                                                           ` Ryusuke Konishi
     [not found]                                                                             ` <20110706.111615.163244275.ryusuke-sG5X7nlA6pw@public.gmane.org>
2011-07-08 23:52                                                                               ` Zahid Chowdhury
     [not found]                                                                                 ` <053D39D3D76C474EB2D2A284AA6BA3181B05F38A67-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
2011-07-14  0:54                                                                                   ` Zahid Chowdhury
     [not found]                                                                                     ` <053D39D3D76C474EB2D2A284AA6BA3181B05F390D2-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
2011-08-08 21:33                                                                                       ` Zahid Chowdhury
     [not found]                                                                                         ` <053D39D3D76C474EB2D2A284AA6BA3181B06066B94-ZjuI7xOJlFPnaE3xbIMyWkCiaQ3SRT3KFkJ40O1dFu8@public.gmane.org>
2011-08-23  3:03                                                                                           ` Ryusuke Konishi
     [not found]                                                                                             ` <20110823.120347.178748126.ryusuke-sG5X7nlA6pw@public.gmane.org>
2011-08-23 14:15                                                                                               ` Zahid Chowdhury

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.