NILFS: bad btree node

linux-nilfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* NILFS: bad btree node
@ 2012-05-25 14:30 Kenneth Langga
       [not found] ` <CAHmELnWvFNdiePs=mQJ=nqfsxJ_49zxawa9jncE-RJ2-omYHOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Kenneth Langga @ 2012-05-25 14:30 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

I got this error from kernel.log:

NILFS: bad btree node (blocknr=111560943): level = 242, flags = 0x3f,
nchildren = 23369
NILFS error (device sdc2): nilfs_bmap_lookup_contig: broken bmap
(inode number=19696)

What is the correct course of action for this type of error? And what
would have caused this?
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NILFS: bad btree node
       [not found] ` <CAHmELnWvFNdiePs=mQJ=nqfsxJ_49zxawa9jncE-RJ2-omYHOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-05-25 18:06   ` Reinoud Zandijk
       [not found]     ` <20120525180649.GA1236-bVHBekiX4bNgoMqBc1r0ESegHCQxtGRMHZ5vskTnxNA@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Reinoud Zandijk @ 2012-05-25 18:06 UTC (permalink / raw)
  To: Kenneth Langga; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Kenneth.

On Fri, May 25, 2012 at 10:30:40PM +0800, Kenneth Langga wrote:
> NILFS: bad btree node (blocknr=111560943): level = 242, flags = 0x3f,
> nchildren = 23369
> NILFS error (device sdc2): nilfs_bmap_lookup_contig: broken bmap
> (inode number=19696)
> 
> What is the correct course of action for this type of error? And what
> would have caused this?

What struck me is the very high level and the absurt number of number of
children. That can't be good. AFAIR NiLFS only has say upto 3 (or 4?) levels
in its B-tree. It *could* be failing in rebalancing or more likely pointing to
garbage?

Cheers,
Reinoud

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NILFS: bad btree node
       [not found]     ` <20120525180649.GA1236-bVHBekiX4bNgoMqBc1r0ESegHCQxtGRMHZ5vskTnxNA@public.gmane.org>
@ 2012-05-25 18:15       ` Kenneth Langga
       [not found]         ` <CAHmELnVyRNGn1gda0Sw53YCSOAYMm5JUonebi-9NxaFBP7Uidw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Kenneth Langga @ 2012-05-25 18:15 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

It's a 3TB harddisk. Could that be the reason?

Right now, it's mounted read-only. Is it safe to make it read/write
again? And can I run nilfs-clean on it and maybe the error would be
gone?

On Sat, May 26, 2012 at 2:06 AM, Reinoud Zandijk <reinoud-S783fYmB3Ccdnm+yROfE0A@public.gmane.org> wrote:
> Hi Kenneth.
>
> On Fri, May 25, 2012 at 10:30:40PM +0800, Kenneth Langga wrote:
>> NILFS: bad btree node (blocknr=111560943): level = 242, flags = 0x3f,
>> nchildren = 23369
>> NILFS error (device sdc2): nilfs_bmap_lookup_contig: broken bmap
>> (inode number=19696)
>>
>> What is the correct course of action for this type of error? And what
>> would have caused this?
>
> What struck me is the very high level and the absurt number of number of
> children. That can't be good. AFAIR NiLFS only has say upto 3 (or 4?) levels
> in its B-tree. It *could* be failing in rebalancing or more likely pointing to
> garbage?
>
> Cheers,
> Reinoud
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NILFS: bad btree node
       [not found]         ` <CAHmELnVyRNGn1gda0Sw53YCSOAYMm5JUonebi-9NxaFBP7Uidw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-05-26 14:49           ` Christian Smith
       [not found]             ` <20120526144932.GG18110-Ng8wz+J301SNY5Lh21HnMTHS2PGA244I9dF7HbQ/qKg@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: Christian Smith @ 2012-05-26 14:49 UTC (permalink / raw)
  To: Kenneth Langga; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Sat, May 26, 2012 at 02:15:41AM +0800, Kenneth Langga wrote:
> It's a 3TB harddisk. Could that be the reason?
> 
> Right now, it's mounted read-only. Is it safe to make it read/write
> again? And can I run nilfs-clean on it and maybe the error would be
> gone?
> 

You should be able to remount read/write, as you'll still have your 
old snapshots or checkpoints to mount from instead if it all goes
wrong.

In my experience, though, once the clearner fails to be able to
clean segments due to logical errors, it's game over and a
backup/mkfs/restore is needed.  But then, I mostly run NIFLS on
small slow SDD, so that's no great hardship. We desperately need
a fsck to handle scenarios like that.

I can't see the size of the disk being a problem. All the data
pointers are 64-bit, so should comfortably handle 3TB.

In short, try read-write, but be prepared to reformat.

Also, if you're using a 3TB disk for NILFS to store media
files, I'd perhaps suggest against it, if not for the reason
that backup/restore on that much data will take an age. I
currently stick to nilfs for my root filesystems, leaving
big and/or personal data on more stable, less cutting edge
filesystems.

Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NILFS: bad btree node
       [not found]             ` <20120526144932.GG18110-Ng8wz+J301SNY5Lh21HnMTHS2PGA244I9dF7HbQ/qKg@public.gmane.org>
@ 2012-05-26 16:43               ` Kenneth Langga
  0 siblings, 0 replies; 17+ messages in thread
From: Kenneth Langga @ 2012-05-26 16:43 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

I see, I'll follow what you said. Thanks. Btw, what could be the
source of the error so that I may avoid it in the future? And would
deleting the offending file (if the error is tied to one) also remove
the error?

On Sat, May 26, 2012 at 10:49 PM, Christian Smith
<csmith-r5H9PUQoNxUga4AWyxku7ULnMzgEhdHr@public.gmane.org> wrote:
> On Sat, May 26, 2012 at 02:15:41AM +0800, Kenneth Langga wrote:
>> It's a 3TB harddisk. Could that be the reason?
>>
>> Right now, it's mounted read-only. Is it safe to make it read/write
>> again? And can I run nilfs-clean on it and maybe the error would be
>> gone?
>>
>
> You should be able to remount read/write, as you'll still have your
> old snapshots or checkpoints to mount from instead if it all goes
> wrong.
>
> In my experience, though, once the clearner fails to be able to
> clean segments due to logical errors, it's game over and a
> backup/mkfs/restore is needed. Â But then, I mostly run NIFLS on
> small slow SDD, so that's no great hardship. We desperately need
> a fsck to handle scenarios like that.
>
> I can't see the size of the disk being a problem. All the data
> pointers are 64-bit, so should comfortably handle 3TB.
>
> In short, try read-write, but be prepared to reformat.
>
> Also, if you're using a 3TB disk for NILFS to store media
> files, I'd perhaps suggest against it, if not for the reason
> that backup/restore on that much data will take an age. I
> currently stick to nilfs for my root filesystems, leaving
> big and/or personal data on more stable, less cutting edge
> filesystems.
>
> Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* NILFS: bad btree node
@ 2012-12-20  2:46 张 磊
       [not found] ` <86B5C141-ACFA-4541-999F-E17E09F22476-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: 张 磊 @ 2012-12-20  2:46 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hello.
	My nilfs suddenly become read-only. I saw these logs in /var/log/messages:

Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
Dec 19 11:20:05 localhost kernel:
Dec 19 11:20:05 localhost kernel: Remounting filesystem read-only
Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
Dec 19 11:20:05 localhost kernel:
Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
Dec 19 11:20:05 localhost kernel:
Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
Dec 19 11:20:05 localhost kernel:
¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡

	How can I fix this? There is 6TiB data on my disk, I don't want to format the disk.
	I found that a lot of people have encountered the same problem. Is this a bug of nilfs? How can I avoid this problem? When it happens, I was running multiple MySQL and rsync, and nilfs_cleanerd was cleaning segments.

Elmer Zhang--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NILFS: bad btree node
       [not found] ` <86B5C141-ACFA-4541-999F-E17E09F22476-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2012-12-20  6:08   ` Vyacheslav Dubeyko
  2012-12-20  9:08     ` 张 磊
  0 siblings, 1 reply; 17+ messages in thread
From: Vyacheslav Dubeyko @ 2012-12-20  6:08 UTC (permalink / raw)
  To: 张 磊; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

On Thu, 2012-12-20 at 10:46 +0800, 张 磊 wrote:
> Hello.
> 	My nilfs suddenly become read-only. I saw these logs in /var/log/messages:
> 
> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
> Dec 19 11:20:05 localhost kernel:
> Dec 19 11:20:05 localhost kernel: Remounting filesystem read-only
> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
> Dec 19 11:20:05 localhost kernel:
> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
> Dec 19 11:20:05 localhost kernel:
> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
> Dec 19 11:20:05 localhost kernel:
> ……………………………………………………
> 
> 	How can I fix this? There is 6TiB data on my disk, I don't want to format the disk.
> 	I found that a lot of people have encountered the same problem. Is this a bug of nilfs? How can I avoid this problem? When it happens, I was running multiple MySQL and rsync, and nilfs_cleanerd was cleaning segments.
> 

Yes, this issue was reported earlier. As I understand, you can simply
remount your filesystem in read-write mode and to continue using your
NILFS2 filesystem.

If you will encounter any troubles with remounting, please, report about
it.

With the best regards,
Vyacheslav Dubeyko.


> Elmer Zhang--
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NILFS: bad btree node
  2012-12-20  6:08   ` Vyacheslav Dubeyko
@ 2012-12-20  9:08     ` 张 磊
       [not found]       ` <3455B0CD-EF89-4227-90E1-FC6B20F5F8EB-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: 张 磊 @ 2012-12-20  9:08 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

I remounted the filesystem, and started the MySQLs. The filesytstem became readonly again.

Dec 20 16:03:31 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
Dec 20 16:03:31 localhost kernel: NILFS warning: mounting fs with errors
Dec 20 16:03:31 localhost nilfs_cleanerd[29120]: start
Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
Dec 20 16:03:55 localhost kernel:
Dec 20 16:03:55 localhost kernel: Remounting filesystem read-only
Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
Dec 20 16:03:55 localhost kernel:
Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: cannot clean segments: Read-only file system
Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: shutdown

I remounted the filesystem again, and tried to delete the bad files, but delete failed.

Dec 20 16:04:02 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
Dec 20 16:04:02 localhost kernel: NILFS warning: mounting fs with errors
Dec 20 16:04:02 localhost nilfs_cleanerd[30054]: start
Dec 20 16:12:08 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 20 16:12:08 localhost kernel: NILFS error (device sdb2): nilfs_bmap_last_key: broken bmap (inode number=321775)
Dec 20 16:12:08 localhost kernel:
Dec 20 16:12:08 localhost kernel: Remounting filesystem read-only
Dec 20 16:12:08 localhost kernel: NILFS warning (device sdb2): nilfs_truncate_bmap: failed to truncate bmap (ino=321775, err=-5)
Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: cannot clean segments: Read-only file system
Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: shutdown

I tried a third remount, but failed. The server was down, and restarted.

Dec 20 16:12:42 localhost kernel: NILFS warning (device sdb2): nilfs_detach_log_writer: Hit dirty file after stopped log writer
Dec 20 16:12:42 localhost kernel:

I found that fsck.nilfs2 was added into nilfs-utils v4. Can I try it? Where can I download nilfs-utils v4?

ÔÚ 2012-12-20£¬14:08£¬Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> Ð´µÀ£º

> Hi,
> 
> On Thu, 2012-12-20 at 10:46 +0800, ÕÅ ÀÚ wrote:
>> Hello.
>> 	My nilfs suddenly become read-only. I saw these logs in /var/log/messages:
>> 
>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>> Dec 19 11:20:05 localhost kernel:
>> Dec 19 11:20:05 localhost kernel: Remounting filesystem read-only
>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>> Dec 19 11:20:05 localhost kernel:
>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>> Dec 19 11:20:05 localhost kernel:
>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>> Dec 19 11:20:05 localhost kernel:
>> ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡
>> 
>> 	How can I fix this? There is 6TiB data on my disk, I don't want to format the disk.
>> 	I found that a lot of people have encountered the same problem. Is this a bug of nilfs? How can I avoid this problem? When it happens, I was running multiple MySQL and rsync, and nilfs_cleanerd was cleaning segments.
>> 
> 
> Yes, this issue was reported earlier. As I understand, you can simply
> remount your filesystem in read-write mode and to continue using your
> NILFS2 filesystem.
> 
> If you will encounter any troubles with remounting, please, report about
> it.
> 
> With the best regards,
> Vyacheslav Dubeyko.
> 
> 
>> Elmer Zhang--
>> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NILFS: bad btree node
       [not found]       ` <3455B0CD-EF89-4227-90E1-FC6B20F5F8EB-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2012-12-20  9:38         ` Vyacheslav Dubeyko
  2012-12-20 10:16           ` 张 磊
  0 siblings, 1 reply; 17+ messages in thread
From: Vyacheslav Dubeyko @ 2012-12-20  9:38 UTC (permalink / raw)
  To: 张 磊; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, 2012-12-20 at 17:08 +0800, 张 磊 wrote:
> Hi,
> 
> I remounted the filesystem, and started the MySQLs. The filesytstem became readonly again.
> 
> Dec 20 16:03:31 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
> Dec 20 16:03:31 localhost kernel: NILFS warning: mounting fs with errors
> Dec 20 16:03:31 localhost nilfs_cleanerd[29120]: start
> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
> Dec 20 16:03:55 localhost kernel:
> Dec 20 16:03:55 localhost kernel: Remounting filesystem read-only
> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
> Dec 20 16:03:55 localhost kernel:
> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: cannot clean segments: Read-only file system
> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: shutdown
> 
> I remounted the filesystem again, and tried to delete the bad files, but delete failed.
> 
> Dec 20 16:04:02 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
> Dec 20 16:04:02 localhost kernel: NILFS warning: mounting fs with errors
> Dec 20 16:04:02 localhost nilfs_cleanerd[30054]: start
> Dec 20 16:12:08 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> Dec 20 16:12:08 localhost kernel: NILFS error (device sdb2): nilfs_bmap_last_key: broken bmap (inode number=321775)
> Dec 20 16:12:08 localhost kernel:
> Dec 20 16:12:08 localhost kernel: Remounting filesystem read-only
> Dec 20 16:12:08 localhost kernel: NILFS warning (device sdb2): nilfs_truncate_bmap: failed to truncate bmap (ino=321775, err=-5)
> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: cannot clean segments: Read-only file system
> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: shutdown
> 
> I tried a third remount, but failed. The server was down, and restarted.
> 
> Dec 20 16:12:42 localhost kernel: NILFS warning (device sdb2): nilfs_detach_log_writer: Hit dirty file after stopped log writer
> Dec 20 16:12:42 localhost kernel:
> 

Yes, it is bad. The remount solves the trouble earlier.

As a result, do you have NILFS2 volume mounted as read-only?

Could you share more details about your environment? It needs for
understanding situation and trying to reproduce. I need to know:
1. Linux kernel version.
2. nilfs-utils version.
3. "mount" output.
4. "df -h" output.
5. "lscp" output.
6. "lssu" output.
7. "nilfs-tune -l" output (superblock content)

> I found that fsck.nilfs2 was added into nilfs-utils v4. Can I try it? Where can I download nilfs-utils v4?
> 

Last version of nilfs-utils is 2.1.4. Currently, fsck.nilfs2 is on early
stage of development. The v4 is a fsck.nilfs2 patchset version. You can
try fsck.nilfs2 after applying this patchset on source code of
nilfs-utils of 2.1.4 version. But fsck.nilfs2 can check only superblocks
and segment summary headers and can't recover completely. So, I think
that it will be useless for you.

With the best regards,
Vyacheslav Dubeyko.

> 在 2012-12-20，14:08，Vyacheslav Dubeyko <slava@dubeyko.com> 写道：
> 
> > Hi,
> > 
> > On Thu, 2012-12-20 at 10:46 +0800, 张 磊 wrote:
> >> Hello.
> >> 	My nilfs suddenly become read-only. I saw these logs in /var/log/messages:
> >> 
> >> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> >> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
> >> Dec 19 11:20:05 localhost kernel:
> >> Dec 19 11:20:05 localhost kernel: Remounting filesystem read-only
> >> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> >> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
> >> Dec 19 11:20:05 localhost kernel:
> >> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> >> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
> >> Dec 19 11:20:05 localhost kernel:
> >> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> >> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
> >> Dec 19 11:20:05 localhost kernel:
> >> ……………………………………………………
> >> 
> >> 	How can I fix this? There is 6TiB data on my disk, I don't want to format the disk.
> >> 	I found that a lot of people have encountered the same problem. Is this a bug of nilfs? How can I avoid this problem? When it happens, I was running multiple MySQL and rsync, and nilfs_cleanerd was cleaning segments.
> >> 
> > 
> > Yes, this issue was reported earlier. As I understand, you can simply
> > remount your filesystem in read-write mode and to continue using your
> > NILFS2 filesystem.
> > 
> > If you will encounter any troubles with remounting, please, report about
> > it.
> > 
> > With the best regards,
> > Vyacheslav Dubeyko.
> > 
> > 
> >> Elmer Zhang--
> >> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> >> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NILFS: bad btree node
  2012-12-20  9:38         ` Vyacheslav Dubeyko
@ 2012-12-20 10:16           ` 张 磊
       [not found]             ` <14BA4286-BF21-4BD3-8E41-2F8F9512D801-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: 张 磊 @ 2012-12-20 10:16 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

1. Linux kernel version: 2.6.32-220.13.1.el6.x86_64 
2. nilfs-utils version: nilfs-utils-2.1.4
3. "mount" output:
/dev/sdb2 on /data0 type nilfs2 (rw,noatime,gcpid=22909) 

4. "df -h" output:
/dev/sdb2 9.6T 5.9T 3.2T 66% /data0 

5. "lscp" output:
                 CNO        DATE     TIME  MODE  FLG     NBLKINC       ICNT
                   2  2012-12-03 14:03:01   ss    -           14          3
              580481  2012-12-20 16:11:25   cp    -          293     697667
              580482  2012-12-20 16:11:25   cp    -          130     697666
              580483  2012-12-20 16:11:25   cp    -          225     697664
              580484  2012-12-20 16:11:25   cp    -          143     697663
              580485  2012-12-20 16:11:26   cp    -          311     697659
              580486  2012-12-20 16:11:27   cp    -          328     697657
              580487  2012-12-20 16:11:27   cp    -          263     697655
              580488  2012-12-20 16:11:27   cp    -          118     697653
              580489  2012-12-20 16:11:28   cp    -          230     697651
              580490  2012-12-20 16:11:28   cp    -          272     697649
              580491  2012-12-20 16:11:28   cp    -          148     697648
              580492  2012-12-20 16:11:29   cp    -          139     697647
              580493  2012-12-20 16:11:29   cp    -          273     697645
              580494  2012-12-20 16:11:29   cp    -          147     697644
              580495  2012-12-20 16:11:30   cp    -          271     697641
              580496  2012-12-20 16:11:31   cp    -          526     697636
              580497  2012-12-20 16:11:34   cp    -         1684     697625
              580498  2012-12-20 16:11:37   cp    -          983     697609
              580499  2012-12-20 16:11:38   cp    -          421     697605
              580500  2012-12-20 16:11:40   cp    -         1019     697594
              580501  2012-12-20 16:11:40   cp    -          143     697593
              580502  2012-12-20 16:11:41   cp    -         1536     697592
              580503  2012-12-20 16:11:41   cp    -          373     697590
              580504  2012-12-20 16:11:42   cp    -          312     697587
              580505  2012-12-20 16:11:42   cp    -          102     697586
              580506  2012-12-20 16:11:43   cp    -          274     697584
              580507  2012-12-20 16:11:43   cp    -          270     697582
              580508  2012-12-20 16:11:43   cp    -          118     697581
              580509  2012-12-20 16:11:43   cp    -          133     697580
              580510  2012-12-20 16:11:44   cp    -          321     697578
              580511  2012-12-20 16:11:44   cp    -          245     697576
              580512  2012-12-20 16:11:45   cp    -          394     697573
              580513  2012-12-20 16:11:45   cp    -          121     697572
              580514  2012-12-20 16:11:45   cp    -          245     697569
              580515  2012-12-20 16:11:52   cp    -         2705     697543
              580516  2012-12-20 16:11:55   cp    -         2590     697504
              580517  2012-12-20 16:11:59   cp    -         2418     697453
              580518  2012-12-20 16:12:00   cp    -          866     697436
              580519  2012-12-20 16:12:01   cp    -          864     697420
              580520  2012-12-20 16:12:05   cp    -         1765     697357
              580521  2012-12-20 16:12:05   cp    -          120     697356
              580522  2012-12-20 16:12:06   cp    -          820     697332
              580523  2012-12-20 16:12:09   cp    -         1642     697174
              580524  2012-12-20 16:12:09   cp    -           89     697173
              580525  2012-12-20 16:12:10   cp    -           56     697173
              580526  2012-12-20 16:12:42   cp    -          763     697173

6. "lssu" output:
	it's too large, please download it: http://d.pr/f/vnoR

7. "nilfs-tune -l" output (superblock content):

nilfs-tune 2.1.4
Filesystem volume name:	  (none)
Filesystem UUID:	  dcfb7152-a342-48d0-a712-212a3062395e
Filesystem magic number:  0x3434
Filesystem revision #:	  2.0
Filesystem features:      (none)
Filesystem state:	  invalid or mounted,error
Filesystem OS type:	  Linux
Block size:		  4096
Filesystem created:	  Mon Dec  3 13:56:51 2012
Last mount time:	  Thu Dec 20 17:44:03 2012
Last write time:	  Thu Dec 20 17:44:03 2012
Mount count:		  13
Maximum mount count:	  50
Reserve blocks uid:	  0 (user root)
Reserve blocks gid:	  0 (group root)
First inode:		  11
Inode size:		  128
DAT entry size:		  32
Checkpoint size:	  192
Segment usage size:	  16
Number of segments:	  1246464
Device size:		  10456104173568
First data block:	  1
# of blocks per segment:  2048
Reserved segments %:	  5
Last checkpoint #:	  580526
Last block address:	  1040286376
Last sequence #:	  1753809
Free blocks count:	  973875200
Commit interval:	  60
# of blks to create seg:  0
CRC seed:		  0x3adfb6c3
CRC check sum:		  0x8468fbbf
CRC check data size:	  0x00000118


I found this in /var/log/messages, perhaps it is related to the bad bree node:

Dec 18 15:55:02 localhost kernel: rsync: page allocation failure. order:1, mode:0x20
Dec 18 15:55:02 localhost kernel: Pid: 13678, comm: rsync Not tainted 2.6.32-220.13.1.el6.x86_64 #1
Dec 18 15:55:02 localhost kernel: Call Trace:
Dec 18 15:55:02 localhost kernel: <IRQ>  [<ffffffff8112405f>] ? __alloc_pages_nodemask+0x77f/0x940
Dec 18 15:55:02 localhost kernel: [<ffffffff8115e002>] ? kmem_getpages+0x62/0x170
Dec 18 15:55:02 localhost kernel: [<ffffffff8115ec1a>] ? fallback_alloc+0x1ba/0x270
Dec 18 15:55:02 localhost kernel: [<ffffffff8115e66f>] ? cache_grow+0x2cf/0x320
Dec 18 15:55:02 localhost kernel: [<ffffffff8115e999>] ? ____cache_alloc_node+0x99/0x160
Dec 18 15:55:02 localhost kernel: [<ffffffff8115f77b>] ? kmem_cache_alloc+0x11b/0x190
Dec 18 15:55:02 localhost kernel: [<ffffffff8141f998>] ? sk_prot_alloc+0x48/0x1c0
Dec 18 15:55:02 localhost kernel: [<ffffffff8141fc22>] ? sk_clone+0x22/0x2e0
Dec 18 15:55:02 localhost kernel: [<ffffffff8146cee6>] ? inet_csk_clone+0x16/0xd0
Dec 18 15:55:02 localhost kernel: [<ffffffff81485dd3>] ? tcp_create_openreq_child+0x23/0x450
Dec 18 15:55:02 localhost kernel: [<ffffffff814837bd>] ? tcp_v4_syn_recv_sock+0x4d/0x2a0
Dec 18 15:55:02 localhost kernel: [<ffffffff81485b91>] ? tcp_check_req+0x201/0x420
Dec 18 15:55:02 localhost kernel: [<ffffffff8147b646>] ? tcp_rcv_state_process+0x116/0xa30
Dec 18 15:55:02 localhost kernel: [<ffffffff8126a859>] ? cpumask_next_and+0x29/0x50
Dec 18 15:55:02 localhost kernel: [<ffffffff814831db>] ? tcp_v4_do_rcv+0x35b/0x430
Dec 18 15:55:02 localhost kernel: [<ffffffffa00dea69>] ? bnx2_start_xmit+0x239/0x7d0 [bnx2]
Dec 18 15:55:02 localhost kernel: [<ffffffff81484951>] ? tcp_v4_rcv+0x4e1/0x860
Dec 18 15:55:02 localhost kernel: [<ffffffff814626bd>] ? ip_local_deliver_finish+0xdd/0x2d0
Dec 18 15:55:02 localhost kernel: [<ffffffff81462948>] ? ip_local_deliver+0x98/0xa0
Dec 18 15:55:02 localhost kernel: [<ffffffff81461e0d>] ? ip_rcv_finish+0x12d/0x440
Dec 18 15:55:02 localhost kernel: [<ffffffff81462395>] ? ip_rcv+0x275/0x350
Dec 18 15:55:02 localhost kernel: [<ffffffff8104d74e>] ? update_group_power+0xae/0x110
Dec 18 15:55:02 localhost kernel: [<ffffffff8142c34b>] ? __netif_receive_skb+0x49b/0x6f0
Dec 18 15:55:02 localhost kernel: [<ffffffff8142e408>] ? netif_receive_skb+0x58/0x60
Dec 18 15:55:02 localhost kernel: [<ffffffff8142e510>] ? napi_skb_finish+0x50/0x70
Dec 18 15:55:02 localhost kernel: [<ffffffff81430b99>] ? napi_gro_receive+0x39/0x50
Dec 18 15:55:02 localhost kernel: [<ffffffffa00dfd4f>] ? bnx2_poll_work+0xd4f/0x1270 [bnx2]
Dec 18 15:55:02 localhost kernel: [<ffffffff8105ea43>] ? rebalance_domains+0xa3/0x5b0
Dec 18 15:55:02 localhost kernel: [<ffffffffa00e02ad>] ? bnx2_poll_msix+0x3d/0xc0 [bnx2]
Dec 18 15:55:02 localhost kernel: [<ffffffff81430cb3>] ? net_rx_action+0x103/0x2f0
Dec 18 15:55:02 localhost kernel: [<ffffffff81072191>] ? __do_softirq+0xc1/0x1d0
Dec 18 15:55:02 localhost kernel: [<ffffffff810d9640>] ? handle_IRQ_event+0x60/0x170
Dec 18 15:55:02 localhost kernel: [<ffffffff810721ea>] ? __do_softirq+0x11a/0x1d0
Dec 18 15:55:02 localhost kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
Dec 18 15:55:02 localhost kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
Dec 18 15:55:02 localhost kernel: [<ffffffff81071f75>] ? irq_exit+0x85/0x90
Dec 18 15:55:02 localhost kernel: [<ffffffff814f5215>] ? do_IRQ+0x75/0xf0
Dec 18 15:55:02 localhost kernel: [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11
Dec 18 15:55:02 localhost kernel: <EOI>  [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffffa02106aa>] ? nilfs_dirty_inode+0x6a/0xa0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffff811a00bb>] ? __mark_inode_dirty+0x3b/0x160
Dec 18 15:55:02 localhost kernel: [<ffffffff811ab185>] ? generic_write_end+0x65/0xa0
Dec 18 15:55:02 localhost kernel: [<ffffffffa0210940>] ? nilfs_get_block+0x0/0x1d0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffffa020f860>] ? nilfs_write_end+0x70/0xb0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffffa020f230>] ? nilfs_write_begin+0x80/0xb0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffff811115c4>] ? generic_file_buffered_write+0x174/0x2a0
Dec 18 15:55:02 localhost kernel: [<ffffffff810707c7>] ? current_fs_time+0x27/0x30
Dec 18 15:55:02 localhost kernel: [<ffffffff81112eb0>] ? __generic_file_aio_write+0x250/0x480
Dec 18 15:55:02 localhost kernel: [<ffffffff8111314f>] ? generic_file_aio_write+0x6f/0xe0
Dec 18 15:55:02 localhost kernel: [<ffffffff8117651a>] ? do_sync_write+0xfa/0x140
Dec 18 15:55:02 localhost kernel: [<ffffffff81090c30>] ? autoremove_wake_function+0x0/0x40
Dec 18 15:55:02 localhost kernel: [<ffffffff8109b849>] ? ktime_get_ts+0xa9/0xe0
Dec 18 15:55:02 localhost kernel: [<ffffffff8120c546>] ? security_file_permission+0x16/0x20
Dec 18 15:55:02 localhost kernel: [<ffffffff81176818>] ? vfs_write+0xb8/0x1a0
Dec 18 15:55:02 localhost kernel: [<ffffffff81177221>] ? sys_write+0x51/0x90
Dec 18 15:55:02 localhost kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b



在 2012-12-20，17:38，Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> 写道：

> On Thu, 2012-12-20 at 17:08 +0800, 张 磊 wrote:
>> Hi,
>> 
>> I remounted the filesystem, and started the MySQLs. The filesytstem became readonly again.
>> 
>> Dec 20 16:03:31 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
>> Dec 20 16:03:31 localhost kernel: NILFS warning: mounting fs with errors
>> Dec 20 16:03:31 localhost nilfs_cleanerd[29120]: start
>> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>> Dec 20 16:03:55 localhost kernel:
>> Dec 20 16:03:55 localhost kernel: Remounting filesystem read-only
>> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>> Dec 20 16:03:55 localhost kernel:
>> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: cannot clean segments: Read-only file system
>> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: shutdown
>> 
>> I remounted the filesystem again, and tried to delete the bad files, but delete failed.
>> 
>> Dec 20 16:04:02 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
>> Dec 20 16:04:02 localhost kernel: NILFS warning: mounting fs with errors
>> Dec 20 16:04:02 localhost nilfs_cleanerd[30054]: start
>> Dec 20 16:12:08 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>> Dec 20 16:12:08 localhost kernel: NILFS error (device sdb2): nilfs_bmap_last_key: broken bmap (inode number=321775)
>> Dec 20 16:12:08 localhost kernel:
>> Dec 20 16:12:08 localhost kernel: Remounting filesystem read-only
>> Dec 20 16:12:08 localhost kernel: NILFS warning (device sdb2): nilfs_truncate_bmap: failed to truncate bmap (ino=321775, err=-5)
>> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: cannot clean segments: Read-only file system
>> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: shutdown
>> 
>> I tried a third remount, but failed. The server was down, and restarted.
>> 
>> Dec 20 16:12:42 localhost kernel: NILFS warning (device sdb2): nilfs_detach_log_writer: Hit dirty file after stopped log writer
>> Dec 20 16:12:42 localhost kernel:
>> 
> 
> Yes, it is bad. The remount solves the trouble earlier.
> 
> As a result, do you have NILFS2 volume mounted as read-only?
> 
> Could you share more details about your environment? It needs for
> understanding situation and trying to reproduce. I need to know:
> 1. Linux kernel version.
> 2. nilfs-utils version.
> 3. "mount" output.
> 4. "df -h" output.
> 5. "lscp" output.
> 6. "lssu" output.
> 7. "nilfs-tune -l" output (superblock content)
> 
>> I found that fsck.nilfs2 was added into nilfs-utils v4. Can I try it? Where can I download nilfs-utils v4?
>> 
> 
> Last version of nilfs-utils is 2.1.4. Currently, fsck.nilfs2 is on early
> stage of development. The v4 is a fsck.nilfs2 patchset version. You can
> try fsck.nilfs2 after applying this patchset on source code of
> nilfs-utils of 2.1.4 version. But fsck.nilfs2 can check only superblocks
> and segment summary headers and can't recover completely. So, I think
> that it will be useless for you.
> 
> With the best regards,
> Vyacheslav Dubeyko.
> 
>> 在 2012-12-20，14:08，Vyacheslav Dubeyko <slava@dubeyko.com> 写道：
>> 
>>> Hi,
>>> 
>>> On Thu, 2012-12-20 at 10:46 +0800, 张 磊 wrote:
>>>> Hello.
>>>> 	My nilfs suddenly become read-only. I saw these logs in /var/log/messages:
>>>> 
>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>> Dec 19 11:20:05 localhost kernel:
>>>> Dec 19 11:20:05 localhost kernel: Remounting filesystem read-only
>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>> Dec 19 11:20:05 localhost kernel:
>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>> Dec 19 11:20:05 localhost kernel:
>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>> Dec 19 11:20:05 localhost kernel:
>>>> ……………………………………………………
>>>> 
>>>> 	How can I fix this? There is 6TiB data on my disk, I don't want to format the disk.
>>>> 	I found that a lot of people have encountered the same problem. Is this a bug of nilfs? How can I avoid this problem? When it happens, I was running multiple MySQL and rsync, and nilfs_cleanerd was cleaning segments.
>>>> 
>>> 
>>> Yes, this issue was reported earlier. As I understand, you can simply
>>> remount your filesystem in read-write mode and to continue using your
>>> NILFS2 filesystem.
>>> 
>>> If you will encounter any troubles with remounting, please, report about
>>> it.
>>> 
>>> With the best regards,
>>> Vyacheslav Dubeyko.
>>> 
>>> 
>>>> Elmer Zhang--
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> 
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NILFS: bad btree node
       [not found]             ` <14BA4286-BF21-4BD3-8E41-2F8F9512D801-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2012-12-20 10:41               ` Vyacheslav Dubeyko
  2012-12-20 11:02                 ` 张 磊
  2012-12-22 14:12               ` Seiji Kihara
  2012-12-27 10:43               ` Vyacheslav Dubeyko
  2 siblings, 1 reply; 17+ messages in thread
From: Vyacheslav Dubeyko @ 2012-12-20 10:41 UTC (permalink / raw)
  To: 张 磊; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, 2012-12-20 at 18:16 +0800, 张 磊 wrote:

Thank you for info.

[snip]
> 3. "mount" output:
> /dev/sdb2 on /data0 type nilfs2 (rw,noatime,gcpid=22909) 
> 

As I can see, you have NILFS2 volume mounted as read-write. Am I
correct?

[snip]

> 
> I found this in /var/log/messages, perhaps it is related to the bad bree node:
> 
> Dec 18 15:55:02 localhost kernel: rsync: page allocation failure. order:1, mode:0x20
> Dec 18 15:55:02 localhost kernel: Pid: 13678, comm: rsync Not tainted 2.6.32-220.13.1.el6.x86_64 #1
> Dec 18 15:55:02 localhost kernel: Call Trace:
> Dec 18 15:55:02 localhost kernel: <IRQ>  [<ffffffff8112405f>] ? __alloc_pages_nodemask+0x77f/0x940
> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e002>] ? kmem_getpages+0x62/0x170
> Dec 18 15:55:02 localhost kernel: [<ffffffff8115ec1a>] ? fallback_alloc+0x1ba/0x270
> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e66f>] ? cache_grow+0x2cf/0x320
> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e999>] ? ____cache_alloc_node+0x99/0x160
> Dec 18 15:55:02 localhost kernel: [<ffffffff8115f77b>] ? kmem_cache_alloc+0x11b/0x190
> Dec 18 15:55:02 localhost kernel: [<ffffffff8141f998>] ? sk_prot_alloc+0x48/0x1c0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8141fc22>] ? sk_clone+0x22/0x2e0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8146cee6>] ? inet_csk_clone+0x16/0xd0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81485dd3>] ? tcp_create_openreq_child+0x23/0x450
> Dec 18 15:55:02 localhost kernel: [<ffffffff814837bd>] ? tcp_v4_syn_recv_sock+0x4d/0x2a0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81485b91>] ? tcp_check_req+0x201/0x420
> Dec 18 15:55:02 localhost kernel: [<ffffffff8147b646>] ? tcp_rcv_state_process+0x116/0xa30
> Dec 18 15:55:02 localhost kernel: [<ffffffff8126a859>] ? cpumask_next_and+0x29/0x50
> Dec 18 15:55:02 localhost kernel: [<ffffffff814831db>] ? tcp_v4_do_rcv+0x35b/0x430
> Dec 18 15:55:02 localhost kernel: [<ffffffffa00dea69>] ? bnx2_start_xmit+0x239/0x7d0 [bnx2]
> Dec 18 15:55:02 localhost kernel: [<ffffffff81484951>] ? tcp_v4_rcv+0x4e1/0x860
> Dec 18 15:55:02 localhost kernel: [<ffffffff814626bd>] ? ip_local_deliver_finish+0xdd/0x2d0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81462948>] ? ip_local_deliver+0x98/0xa0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81461e0d>] ? ip_rcv_finish+0x12d/0x440
> Dec 18 15:55:02 localhost kernel: [<ffffffff81462395>] ? ip_rcv+0x275/0x350
> Dec 18 15:55:02 localhost kernel: [<ffffffff8104d74e>] ? update_group_power+0xae/0x110
> Dec 18 15:55:02 localhost kernel: [<ffffffff8142c34b>] ? __netif_receive_skb+0x49b/0x6f0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8142e408>] ? netif_receive_skb+0x58/0x60
> Dec 18 15:55:02 localhost kernel: [<ffffffff8142e510>] ? napi_skb_finish+0x50/0x70
> Dec 18 15:55:02 localhost kernel: [<ffffffff81430b99>] ? napi_gro_receive+0x39/0x50
> Dec 18 15:55:02 localhost kernel: [<ffffffffa00dfd4f>] ? bnx2_poll_work+0xd4f/0x1270 [bnx2]
> Dec 18 15:55:02 localhost kernel: [<ffffffff8105ea43>] ? rebalance_domains+0xa3/0x5b0
> Dec 18 15:55:02 localhost kernel: [<ffffffffa00e02ad>] ? bnx2_poll_msix+0x3d/0xc0 [bnx2]
> Dec 18 15:55:02 localhost kernel: [<ffffffff81430cb3>] ? net_rx_action+0x103/0x2f0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81072191>] ? __do_softirq+0xc1/0x1d0
> Dec 18 15:55:02 localhost kernel: [<ffffffff810d9640>] ? handle_IRQ_event+0x60/0x170
> Dec 18 15:55:02 localhost kernel: [<ffffffff810721ea>] ? __do_softirq+0x11a/0x1d0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
> Dec 18 15:55:02 localhost kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81071f75>] ? irq_exit+0x85/0x90
> Dec 18 15:55:02 localhost kernel: [<ffffffff814f5215>] ? do_IRQ+0x75/0xf0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11
> Dec 18 15:55:02 localhost kernel: <EOI>  [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffffa02106aa>] ? nilfs_dirty_inode+0x6a/0xa0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffff811a00bb>] ? __mark_inode_dirty+0x3b/0x160
> Dec 18 15:55:02 localhost kernel: [<ffffffff811ab185>] ? generic_write_end+0x65/0xa0
> Dec 18 15:55:02 localhost kernel: [<ffffffffa0210940>] ? nilfs_get_block+0x0/0x1d0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffffa020f860>] ? nilfs_write_end+0x70/0xb0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffffa020f230>] ? nilfs_write_begin+0x80/0xb0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffff811115c4>] ? generic_file_buffered_write+0x174/0x2a0
> Dec 18 15:55:02 localhost kernel: [<ffffffff810707c7>] ? current_fs_time+0x27/0x30
> Dec 18 15:55:02 localhost kernel: [<ffffffff81112eb0>] ? __generic_file_aio_write+0x250/0x480
> Dec 18 15:55:02 localhost kernel: [<ffffffff8111314f>] ? generic_file_aio_write+0x6f/0xe0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8117651a>] ? do_sync_write+0xfa/0x140
> Dec 18 15:55:02 localhost kernel: [<ffffffff81090c30>] ? autoremove_wake_function+0x0/0x40
> Dec 18 15:55:02 localhost kernel: [<ffffffff8109b849>] ? ktime_get_ts+0xa9/0xe0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8120c546>] ? security_file_permission+0x16/0x20
> Dec 18 15:55:02 localhost kernel: [<ffffffff81176818>] ? vfs_write+0xb8/0x1a0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81177221>] ? sys_write+0x51/0x90
> Dec 18 15:55:02 localhost kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
> 

Is it full backtrace? Or do you have any additional info in your syslog?

With the best regards,
Vyacheslav Dubeyko.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NILFS: bad btree node
  2012-12-20 10:41               ` Vyacheslav Dubeyko
@ 2012-12-20 11:02                 ` 张 磊
       [not found]                   ` <44056E9A-3487-4E8A-A56A-5B9228FC7895-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 17+ messages in thread
From: 张 磊 @ 2012-12-20 11:02 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Yes, I mounted NILFS2 as read-write. It's remounted as read-only by kernel when filesystem found the bad btree node.

That's the full backtrace. I will keep on testing, and report more infomation once I found.

ÔÚ 2012-12-20£¬18:41£¬Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> Ð´µÀ£º

> On Thu, 2012-12-20 at 18:16 +0800, ÕÅ ÀÚ wrote:
> 
> Thank you for info.
> 
> [snip]
>> 3. "mount" output:
>> /dev/sdb2 on /data0 type nilfs2 (rw,noatime,gcpid=22909) 
>> 
> 
> As I can see, you have NILFS2 volume mounted as read-write. Am I
> correct?
> 
> [snip]
> 
>> 
>> I found this in /var/log/messages, perhaps it is related to the bad bree node:
>> 
>> Dec 18 15:55:02 localhost kernel: rsync: page allocation failure. order:1, mode:0x20
>> Dec 18 15:55:02 localhost kernel: Pid: 13678, comm: rsync Not tainted 2.6.32-220.13.1.el6.x86_64 #1
>> Dec 18 15:55:02 localhost kernel: Call Trace:
>> Dec 18 15:55:02 localhost kernel: <IRQ>  [<ffffffff8112405f>] ? __alloc_pages_nodemask+0x77f/0x940
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e002>] ? kmem_getpages+0x62/0x170
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115ec1a>] ? fallback_alloc+0x1ba/0x270
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e66f>] ? cache_grow+0x2cf/0x320
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e999>] ? ____cache_alloc_node+0x99/0x160
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115f77b>] ? kmem_cache_alloc+0x11b/0x190
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8141f998>] ? sk_prot_alloc+0x48/0x1c0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8141fc22>] ? sk_clone+0x22/0x2e0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8146cee6>] ? inet_csk_clone+0x16/0xd0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81485dd3>] ? tcp_create_openreq_child+0x23/0x450
>> Dec 18 15:55:02 localhost kernel: [<ffffffff814837bd>] ? tcp_v4_syn_recv_sock+0x4d/0x2a0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81485b91>] ? tcp_check_req+0x201/0x420
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8147b646>] ? tcp_rcv_state_process+0x116/0xa30
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8126a859>] ? cpumask_next_and+0x29/0x50
>> Dec 18 15:55:02 localhost kernel: [<ffffffff814831db>] ? tcp_v4_do_rcv+0x35b/0x430
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa00dea69>] ? bnx2_start_xmit+0x239/0x7d0 [bnx2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81484951>] ? tcp_v4_rcv+0x4e1/0x860
>> Dec 18 15:55:02 localhost kernel: [<ffffffff814626bd>] ? ip_local_deliver_finish+0xdd/0x2d0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81462948>] ? ip_local_deliver+0x98/0xa0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81461e0d>] ? ip_rcv_finish+0x12d/0x440
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81462395>] ? ip_rcv+0x275/0x350
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8104d74e>] ? update_group_power+0xae/0x110
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8142c34b>] ? __netif_receive_skb+0x49b/0x6f0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8142e408>] ? netif_receive_skb+0x58/0x60
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8142e510>] ? napi_skb_finish+0x50/0x70
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81430b99>] ? napi_gro_receive+0x39/0x50
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa00dfd4f>] ? bnx2_poll_work+0xd4f/0x1270 [bnx2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8105ea43>] ? rebalance_domains+0xa3/0x5b0
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa00e02ad>] ? bnx2_poll_msix+0x3d/0xc0 [bnx2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81430cb3>] ? net_rx_action+0x103/0x2f0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81072191>] ? __do_softirq+0xc1/0x1d0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff810d9640>] ? handle_IRQ_event+0x60/0x170
>> Dec 18 15:55:02 localhost kernel: [<ffffffff810721ea>] ? __do_softirq+0x11a/0x1d0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81071f75>] ? irq_exit+0x85/0x90
>> Dec 18 15:55:02 localhost kernel: [<ffffffff814f5215>] ? do_IRQ+0x75/0xf0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11
>> Dec 18 15:55:02 localhost kernel: <EOI>  [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa02106aa>] ? nilfs_dirty_inode+0x6a/0xa0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff811a00bb>] ? __mark_inode_dirty+0x3b/0x160
>> Dec 18 15:55:02 localhost kernel: [<ffffffff811ab185>] ? generic_write_end+0x65/0xa0
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa0210940>] ? nilfs_get_block+0x0/0x1d0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa020f860>] ? nilfs_write_end+0x70/0xb0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa020f230>] ? nilfs_write_begin+0x80/0xb0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff811115c4>] ? generic_file_buffered_write+0x174/0x2a0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff810707c7>] ? current_fs_time+0x27/0x30
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81112eb0>] ? __generic_file_aio_write+0x250/0x480
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8111314f>] ? generic_file_aio_write+0x6f/0xe0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8117651a>] ? do_sync_write+0xfa/0x140
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81090c30>] ? autoremove_wake_function+0x0/0x40
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8109b849>] ? ktime_get_ts+0xa9/0xe0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8120c546>] ? security_file_permission+0x16/0x20
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81176818>] ? vfs_write+0xb8/0x1a0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81177221>] ? sys_write+0x51/0x90
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
>> 
> 
> Is it full backtrace? Or do you have any additional info in your syslog?
> 
> With the best regards,
> Vyacheslav Dubeyko.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NILFS: bad btree node
       [not found]             ` <14BA4286-BF21-4BD3-8E41-2F8F9512D801-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2012-12-20 10:41               ` Vyacheslav Dubeyko
@ 2012-12-22 14:12               ` Seiji Kihara
       [not found]                 ` <50D5BFD6.1080502-sG5X7nlA6pw@public.gmane.org>
  2012-12-27 10:43               ` Vyacheslav Dubeyko
  2 siblings, 1 reply; 17+ messages in thread
From: Seiji Kihara @ 2012-12-22 14:12 UTC (permalink / raw)
  To: 张 磊; +Cc: Vyacheslav Dubeyko, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hello,

(2012/12/20 19:16), 张 磊 wrote:
> 1. Linux kernel version: 2.6.32-220.13.1.el6.x86_64

If you use nilfs2 kernel module for RHEL 6 clones,
'rpm -q kmod-nilfs2' will help.

http://www.nilfs.org/en/pkg_centos.html
https://github.com/nilfs-dev/nilfs2-kmod-centos6

Regards,

Seiji

> 2. nilfs-utils version: nilfs-utils-2.1.4
> 3. "mount" output:
> /dev/sdb2 on /data0 type nilfs2 (rw,noatime,gcpid=22909)
>
> 4. "df -h" output:
> /dev/sdb2 9.6T 5.9T 3.2T 66% /data0
>
> 5. "lscp" output:
>                   CNO        DATE     TIME  MODE  FLG     NBLKINC       ICNT
>                     2  2012-12-03 14:03:01   ss    -           14          3
>                580481  2012-12-20 16:11:25   cp    -          293     697667
>                580482  2012-12-20 16:11:25   cp    -          130     697666
>                580483  2012-12-20 16:11:25   cp    -          225     697664
>                580484  2012-12-20 16:11:25   cp    -          143     697663
>                580485  2012-12-20 16:11:26   cp    -          311     697659
>                580486  2012-12-20 16:11:27   cp    -          328     697657
>                580487  2012-12-20 16:11:27   cp    -          263     697655
>                580488  2012-12-20 16:11:27   cp    -          118     697653
>                580489  2012-12-20 16:11:28   cp    -          230     697651
>                580490  2012-12-20 16:11:28   cp    -          272     697649
>                580491  2012-12-20 16:11:28   cp    -          148     697648
>                580492  2012-12-20 16:11:29   cp    -          139     697647
>                580493  2012-12-20 16:11:29   cp    -          273     697645
>                580494  2012-12-20 16:11:29   cp    -          147     697644
>                580495  2012-12-20 16:11:30   cp    -          271     697641
>                580496  2012-12-20 16:11:31   cp    -          526     697636
>                580497  2012-12-20 16:11:34   cp    -         1684     697625
>                580498  2012-12-20 16:11:37   cp    -          983     697609
>                580499  2012-12-20 16:11:38   cp    -          421     697605
>                580500  2012-12-20 16:11:40   cp    -         1019     697594
>                580501  2012-12-20 16:11:40   cp    -          143     697593
>                580502  2012-12-20 16:11:41   cp    -         1536     697592
>                580503  2012-12-20 16:11:41   cp    -          373     697590
>                580504  2012-12-20 16:11:42   cp    -          312     697587
>                580505  2012-12-20 16:11:42   cp    -          102     697586
>                580506  2012-12-20 16:11:43   cp    -          274     697584
>                580507  2012-12-20 16:11:43   cp    -          270     697582
>                580508  2012-12-20 16:11:43   cp    -          118     697581
>                580509  2012-12-20 16:11:43   cp    -          133     697580
>                580510  2012-12-20 16:11:44   cp    -          321     697578
>                580511  2012-12-20 16:11:44   cp    -          245     697576
>                580512  2012-12-20 16:11:45   cp    -          394     697573
>                580513  2012-12-20 16:11:45   cp    -          121     697572
>                580514  2012-12-20 16:11:45   cp    -          245     697569
>                580515  2012-12-20 16:11:52   cp    -         2705     697543
>                580516  2012-12-20 16:11:55   cp    -         2590     697504
>                580517  2012-12-20 16:11:59   cp    -         2418     697453
>                580518  2012-12-20 16:12:00   cp    -          866     697436
>                580519  2012-12-20 16:12:01   cp    -          864     697420
>                580520  2012-12-20 16:12:05   cp    -         1765     697357
>                580521  2012-12-20 16:12:05   cp    -          120     697356
>                580522  2012-12-20 16:12:06   cp    -          820     697332
>                580523  2012-12-20 16:12:09   cp    -         1642     697174
>                580524  2012-12-20 16:12:09   cp    -           89     697173
>                580525  2012-12-20 16:12:10   cp    -           56     697173
>                580526  2012-12-20 16:12:42   cp    -          763     697173
>
> 6. "lssu" output:
> 	it's too large, please download it: http://d.pr/f/vnoR
>
> 7. "nilfs-tune -l" output (superblock content):
>
> nilfs-tune 2.1.4
> Filesystem volume name:	  (none)
> Filesystem UUID:	  dcfb7152-a342-48d0-a712-212a3062395e
> Filesystem magic number:  0x3434
> Filesystem revision #:	  2.0
> Filesystem features:      (none)
> Filesystem state:	  invalid or mounted,error
> Filesystem OS type:	  Linux
> Block size:		  4096
> Filesystem created:	  Mon Dec  3 13:56:51 2012
> Last mount time:	  Thu Dec 20 17:44:03 2012
> Last write time:	  Thu Dec 20 17:44:03 2012
> Mount count:		  13
> Maximum mount count:	  50
> Reserve blocks uid:	  0 (user root)
> Reserve blocks gid:	  0 (group root)
> First inode:		  11
> Inode size:		  128
> DAT entry size:		  32
> Checkpoint size:	  192
> Segment usage size:	  16
> Number of segments:	  1246464
> Device size:		  10456104173568
> First data block:	  1
> # of blocks per segment:  2048
> Reserved segments %:	  5
> Last checkpoint #:	  580526
> Last block address:	  1040286376
> Last sequence #:	  1753809
> Free blocks count:	  973875200
> Commit interval:	  60
> # of blks to create seg:  0
> CRC seed:		  0x3adfb6c3
> CRC check sum:		  0x8468fbbf
> CRC check data size:	  0x00000118
>
>
> I found this in /var/log/messages, perhaps it is related to the bad bree node:
>
> Dec 18 15:55:02 localhost kernel: rsync: page allocation failure. order:1, mode:0x20
> Dec 18 15:55:02 localhost kernel: Pid: 13678, comm: rsync Not tainted 2.6.32-220.13.1.el6.x86_64 #1
> Dec 18 15:55:02 localhost kernel: Call Trace:
> Dec 18 15:55:02 localhost kernel: <IRQ>  [<ffffffff8112405f>] ? __alloc_pages_nodemask+0x77f/0x940
> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e002>] ? kmem_getpages+0x62/0x170
> Dec 18 15:55:02 localhost kernel: [<ffffffff8115ec1a>] ? fallback_alloc+0x1ba/0x270
> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e66f>] ? cache_grow+0x2cf/0x320
> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e999>] ? ____cache_alloc_node+0x99/0x160
> Dec 18 15:55:02 localhost kernel: [<ffffffff8115f77b>] ? kmem_cache_alloc+0x11b/0x190
> Dec 18 15:55:02 localhost kernel: [<ffffffff8141f998>] ? sk_prot_alloc+0x48/0x1c0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8141fc22>] ? sk_clone+0x22/0x2e0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8146cee6>] ? inet_csk_clone+0x16/0xd0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81485dd3>] ? tcp_create_openreq_child+0x23/0x450
> Dec 18 15:55:02 localhost kernel: [<ffffffff814837bd>] ? tcp_v4_syn_recv_sock+0x4d/0x2a0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81485b91>] ? tcp_check_req+0x201/0x420
> Dec 18 15:55:02 localhost kernel: [<ffffffff8147b646>] ? tcp_rcv_state_process+0x116/0xa30
> Dec 18 15:55:02 localhost kernel: [<ffffffff8126a859>] ? cpumask_next_and+0x29/0x50
> Dec 18 15:55:02 localhost kernel: [<ffffffff814831db>] ? tcp_v4_do_rcv+0x35b/0x430
> Dec 18 15:55:02 localhost kernel: [<ffffffffa00dea69>] ? bnx2_start_xmit+0x239/0x7d0 [bnx2]
> Dec 18 15:55:02 localhost kernel: [<ffffffff81484951>] ? tcp_v4_rcv+0x4e1/0x860
> Dec 18 15:55:02 localhost kernel: [<ffffffff814626bd>] ? ip_local_deliver_finish+0xdd/0x2d0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81462948>] ? ip_local_deliver+0x98/0xa0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81461e0d>] ? ip_rcv_finish+0x12d/0x440
> Dec 18 15:55:02 localhost kernel: [<ffffffff81462395>] ? ip_rcv+0x275/0x350
> Dec 18 15:55:02 localhost kernel: [<ffffffff8104d74e>] ? update_group_power+0xae/0x110
> Dec 18 15:55:02 localhost kernel: [<ffffffff8142c34b>] ? __netif_receive_skb+0x49b/0x6f0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8142e408>] ? netif_receive_skb+0x58/0x60
> Dec 18 15:55:02 localhost kernel: [<ffffffff8142e510>] ? napi_skb_finish+0x50/0x70
> Dec 18 15:55:02 localhost kernel: [<ffffffff81430b99>] ? napi_gro_receive+0x39/0x50
> Dec 18 15:55:02 localhost kernel: [<ffffffffa00dfd4f>] ? bnx2_poll_work+0xd4f/0x1270 [bnx2]
> Dec 18 15:55:02 localhost kernel: [<ffffffff8105ea43>] ? rebalance_domains+0xa3/0x5b0
> Dec 18 15:55:02 localhost kernel: [<ffffffffa00e02ad>] ? bnx2_poll_msix+0x3d/0xc0 [bnx2]
> Dec 18 15:55:02 localhost kernel: [<ffffffff81430cb3>] ? net_rx_action+0x103/0x2f0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81072191>] ? __do_softirq+0xc1/0x1d0
> Dec 18 15:55:02 localhost kernel: [<ffffffff810d9640>] ? handle_IRQ_event+0x60/0x170
> Dec 18 15:55:02 localhost kernel: [<ffffffff810721ea>] ? __do_softirq+0x11a/0x1d0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
> Dec 18 15:55:02 localhost kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81071f75>] ? irq_exit+0x85/0x90
> Dec 18 15:55:02 localhost kernel: [<ffffffff814f5215>] ? do_IRQ+0x75/0xf0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11
> Dec 18 15:55:02 localhost kernel: <EOI>  [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffffa02106aa>] ? nilfs_dirty_inode+0x6a/0xa0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffff811a00bb>] ? __mark_inode_dirty+0x3b/0x160
> Dec 18 15:55:02 localhost kernel: [<ffffffff811ab185>] ? generic_write_end+0x65/0xa0
> Dec 18 15:55:02 localhost kernel: [<ffffffffa0210940>] ? nilfs_get_block+0x0/0x1d0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffffa020f860>] ? nilfs_write_end+0x70/0xb0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffffa020f230>] ? nilfs_write_begin+0x80/0xb0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffff811115c4>] ? generic_file_buffered_write+0x174/0x2a0
> Dec 18 15:55:02 localhost kernel: [<ffffffff810707c7>] ? current_fs_time+0x27/0x30
> Dec 18 15:55:02 localhost kernel: [<ffffffff81112eb0>] ? __generic_file_aio_write+0x250/0x480
> Dec 18 15:55:02 localhost kernel: [<ffffffff8111314f>] ? generic_file_aio_write+0x6f/0xe0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8117651a>] ? do_sync_write+0xfa/0x140
> Dec 18 15:55:02 localhost kernel: [<ffffffff81090c30>] ? autoremove_wake_function+0x0/0x40
> Dec 18 15:55:02 localhost kernel: [<ffffffff8109b849>] ? ktime_get_ts+0xa9/0xe0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8120c546>] ? security_file_permission+0x16/0x20
> Dec 18 15:55:02 localhost kernel: [<ffffffff81176818>] ? vfs_write+0xb8/0x1a0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81177221>] ? sys_write+0x51/0x90
> Dec 18 15:55:02 localhost kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
>
>
>
> 在 2012-12-20，17:38，Vyacheslav Dubeyko <slava@dubeyko.com> 写道：
>
>> On Thu, 2012-12-20 at 17:08 +0800, 张 磊 wrote:
>>> Hi,
>>>
>>> I remounted the filesystem, and started the MySQLs. The filesytstem became readonly again.
>>>
>>> Dec 20 16:03:31 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
>>> Dec 20 16:03:31 localhost kernel: NILFS warning: mounting fs with errors
>>> Dec 20 16:03:31 localhost nilfs_cleanerd[29120]: start
>>> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>> Dec 20 16:03:55 localhost kernel:
>>> Dec 20 16:03:55 localhost kernel: Remounting filesystem read-only
>>> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>> Dec 20 16:03:55 localhost kernel:
>>> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: cannot clean segments: Read-only file system
>>> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: shutdown
>>>
>>> I remounted the filesystem again, and tried to delete the bad files, but delete failed.
>>>
>>> Dec 20 16:04:02 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
>>> Dec 20 16:04:02 localhost kernel: NILFS warning: mounting fs with errors
>>> Dec 20 16:04:02 localhost nilfs_cleanerd[30054]: start
>>> Dec 20 16:12:08 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>> Dec 20 16:12:08 localhost kernel: NILFS error (device sdb2): nilfs_bmap_last_key: broken bmap (inode number=321775)
>>> Dec 20 16:12:08 localhost kernel:
>>> Dec 20 16:12:08 localhost kernel: Remounting filesystem read-only
>>> Dec 20 16:12:08 localhost kernel: NILFS warning (device sdb2): nilfs_truncate_bmap: failed to truncate bmap (ino=321775, err=-5)
>>> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: cannot clean segments: Read-only file system
>>> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: shutdown
>>>
>>> I tried a third remount, but failed. The server was down, and restarted.
>>>
>>> Dec 20 16:12:42 localhost kernel: NILFS warning (device sdb2): nilfs_detach_log_writer: Hit dirty file after stopped log writer
>>> Dec 20 16:12:42 localhost kernel:
>>>
>> Yes, it is bad. The remount solves the trouble earlier.
>>
>> As a result, do you have NILFS2 volume mounted as read-only?
>>
>> Could you share more details about your environment? It needs for
>> understanding situation and trying to reproduce. I need to know:
>> 1. Linux kernel version.
>> 2. nilfs-utils version.
>> 3. "mount" output.
>> 4. "df -h" output.
>> 5. "lscp" output.
>> 6. "lssu" output.
>> 7. "nilfs-tune -l" output (superblock content)
>>
>>> I found that fsck.nilfs2 was added into nilfs-utils v4. Can I try it? Where can I download nilfs-utils v4?
>>>
>> Last version of nilfs-utils is 2.1.4. Currently, fsck.nilfs2 is on early
>> stage of development. The v4 is a fsck.nilfs2 patchset version. You can
>> try fsck.nilfs2 after applying this patchset on source code of
>> nilfs-utils of 2.1.4 version. But fsck.nilfs2 can check only superblocks
>> and segment summary headers and can't recover completely. So, I think
>> that it will be useless for you.
>>
>> With the best regards,
>> Vyacheslav Dubeyko.
>>
>>> 在 2012-12-20，14:08，Vyacheslav Dubeyko <slava@dubeyko.com> 写道：
>>>
>>>> Hi,
>>>>
>>>> On Thu, 2012-12-20 at 10:46 +0800, 张 磊 wrote:
>>>>> Hello.
>>>>> 	My nilfs suddenly become read-only. I saw these logs in /var/log/messages:
>>>>>
>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>>> Dec 19 11:20:05 localhost kernel:
>>>>> Dec 19 11:20:05 localhost kernel: Remounting filesystem read-only
>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>>> Dec 19 11:20:05 localhost kernel:
>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>>> Dec 19 11:20:05 localhost kernel:
>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>>> Dec 19 11:20:05 localhost kernel:
>>>>> ……………………………………………………
>>>>>
>>>>> 	How can I fix this? There is 6TiB data on my disk, I don't want to format the disk.
>>>>> 	I found that a lot of people have encountered the same problem. Is this a bug of nilfs? How can I avoid this problem? When it happens, I was running multiple MySQL and rsync, and nilfs_cleanerd was cleaning segments.
>>>>>
>>>> Yes, this issue was reported earlier. As I understand, you can simply
>>>> remount your filesystem in read-write mode and to continue using your
>>>> NILFS2 filesystem.
>>>>
>>>> If you will encounter any troubles with remounting, please, report about
>>>> it.
>>>>
>>>> With the best regards,
>>>> Vyacheslav Dubeyko.
>>>>
>>>>
>>>>> Elmer Zhang

-- 
Seiji Kihara

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NILFS: bad btree node
       [not found]                 ` <50D5BFD6.1080502-sG5X7nlA6pw@public.gmane.org>
@ 2012-12-24  3:04                   ` 张 磊
  0 siblings, 0 replies; 17+ messages in thread
From: 张 磊 @ 2012-12-24  3:04 UTC (permalink / raw)
  To: Seiji Kihara; +Cc: Vyacheslav Dubeyko, linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi, I use kmod-nilfs2-0.4.3-1.el6.x86_64

在 2012-12-22，22:12，Seiji Kihara <kihara-sG5X7nlA6pw@public.gmane.org> 写道：

> Hello,
> 
> (2012/12/20 19:16), 张 磊 wrote:
>> 1. Linux kernel version: 2.6.32-220.13.1.el6.x86_64
> 
> If you use nilfs2 kernel module for RHEL 6 clones,
> 'rpm -q kmod-nilfs2' will help.
> 
> http://www.nilfs.org/en/pkg_centos.html
> https://github.com/nilfs-dev/nilfs2-kmod-centos6
> 
> Regards,
> 
> Seiji
> 
>> 2. nilfs-utils version: nilfs-utils-2.1.4
>> 3. "mount" output:
>> /dev/sdb2 on /data0 type nilfs2 (rw,noatime,gcpid=22909)
>> 
>> 4. "df -h" output:
>> /dev/sdb2 9.6T 5.9T 3.2T 66% /data0
>> 
>> 5. "lscp" output:
>>                  CNO        DATE     TIME  MODE  FLG     NBLKINC       ICNT
>>                    2  2012-12-03 14:03:01   ss    -           14          3
>>               580481  2012-12-20 16:11:25   cp    -          293     697667
>>               580482  2012-12-20 16:11:25   cp    -          130     697666
>>               580483  2012-12-20 16:11:25   cp    -          225     697664
>>               580484  2012-12-20 16:11:25   cp    -          143     697663
>>               580485  2012-12-20 16:11:26   cp    -          311     697659
>>               580486  2012-12-20 16:11:27   cp    -          328     697657
>>               580487  2012-12-20 16:11:27   cp    -          263     697655
>>               580488  2012-12-20 16:11:27   cp    -          118     697653
>>               580489  2012-12-20 16:11:28   cp    -          230     697651
>>               580490  2012-12-20 16:11:28   cp    -          272     697649
>>               580491  2012-12-20 16:11:28   cp    -          148     697648
>>               580492  2012-12-20 16:11:29   cp    -          139     697647
>>               580493  2012-12-20 16:11:29   cp    -          273     697645
>>               580494  2012-12-20 16:11:29   cp    -          147     697644
>>               580495  2012-12-20 16:11:30   cp    -          271     697641
>>               580496  2012-12-20 16:11:31   cp    -          526     697636
>>               580497  2012-12-20 16:11:34   cp    -         1684     697625
>>               580498  2012-12-20 16:11:37   cp    -          983     697609
>>               580499  2012-12-20 16:11:38   cp    -          421     697605
>>               580500  2012-12-20 16:11:40   cp    -         1019     697594
>>               580501  2012-12-20 16:11:40   cp    -          143     697593
>>               580502  2012-12-20 16:11:41   cp    -         1536     697592
>>               580503  2012-12-20 16:11:41   cp    -          373     697590
>>               580504  2012-12-20 16:11:42   cp    -          312     697587
>>               580505  2012-12-20 16:11:42   cp    -          102     697586
>>               580506  2012-12-20 16:11:43   cp    -          274     697584
>>               580507  2012-12-20 16:11:43   cp    -          270     697582
>>               580508  2012-12-20 16:11:43   cp    -          118     697581
>>               580509  2012-12-20 16:11:43   cp    -          133     697580
>>               580510  2012-12-20 16:11:44   cp    -          321     697578
>>               580511  2012-12-20 16:11:44   cp    -          245     697576
>>               580512  2012-12-20 16:11:45   cp    -          394     697573
>>               580513  2012-12-20 16:11:45   cp    -          121     697572
>>               580514  2012-12-20 16:11:45   cp    -          245     697569
>>               580515  2012-12-20 16:11:52   cp    -         2705     697543
>>               580516  2012-12-20 16:11:55   cp    -         2590     697504
>>               580517  2012-12-20 16:11:59   cp    -         2418     697453
>>               580518  2012-12-20 16:12:00   cp    -          866     697436
>>               580519  2012-12-20 16:12:01   cp    -          864     697420
>>               580520  2012-12-20 16:12:05   cp    -         1765     697357
>>               580521  2012-12-20 16:12:05   cp    -          120     697356
>>               580522  2012-12-20 16:12:06   cp    -          820     697332
>>               580523  2012-12-20 16:12:09   cp    -         1642     697174
>>               580524  2012-12-20 16:12:09   cp    -           89     697173
>>               580525  2012-12-20 16:12:10   cp    -           56     697173
>>               580526  2012-12-20 16:12:42   cp    -          763     697173
>> 
>> 6. "lssu" output:
>> 	it's too large, please download it: http://d.pr/f/vnoR
>> 
>> 7. "nilfs-tune -l" output (superblock content):
>> 
>> nilfs-tune 2.1.4
>> Filesystem volume name:	  (none)
>> Filesystem UUID:	  dcfb7152-a342-48d0-a712-212a3062395e
>> Filesystem magic number:  0x3434
>> Filesystem revision #:	  2.0
>> Filesystem features:      (none)
>> Filesystem state:	  invalid or mounted,error
>> Filesystem OS type:	  Linux
>> Block size:		  4096
>> Filesystem created:	  Mon Dec  3 13:56:51 2012
>> Last mount time:	  Thu Dec 20 17:44:03 2012
>> Last write time:	  Thu Dec 20 17:44:03 2012
>> Mount count:		  13
>> Maximum mount count:	  50
>> Reserve blocks uid:	  0 (user root)
>> Reserve blocks gid:	  0 (group root)
>> First inode:		  11
>> Inode size:		  128
>> DAT entry size:		  32
>> Checkpoint size:	  192
>> Segment usage size:	  16
>> Number of segments:	  1246464
>> Device size:		  10456104173568
>> First data block:	  1
>> # of blocks per segment:  2048
>> Reserved segments %:	  5
>> Last checkpoint #:	  580526
>> Last block address:	  1040286376
>> Last sequence #:	  1753809
>> Free blocks count:	  973875200
>> Commit interval:	  60
>> # of blks to create seg:  0
>> CRC seed:		  0x3adfb6c3
>> CRC check sum:		  0x8468fbbf
>> CRC check data size:	  0x00000118
>> 
>> 
>> I found this in /var/log/messages, perhaps it is related to the bad bree node:
>> 
>> Dec 18 15:55:02 localhost kernel: rsync: page allocation failure. order:1, mode:0x20
>> Dec 18 15:55:02 localhost kernel: Pid: 13678, comm: rsync Not tainted 2.6.32-220.13.1.el6.x86_64 #1
>> Dec 18 15:55:02 localhost kernel: Call Trace:
>> Dec 18 15:55:02 localhost kernel: <IRQ>  [<ffffffff8112405f>] ? __alloc_pages_nodemask+0x77f/0x940
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e002>] ? kmem_getpages+0x62/0x170
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115ec1a>] ? fallback_alloc+0x1ba/0x270
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e66f>] ? cache_grow+0x2cf/0x320
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e999>] ? ____cache_alloc_node+0x99/0x160
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115f77b>] ? kmem_cache_alloc+0x11b/0x190
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8141f998>] ? sk_prot_alloc+0x48/0x1c0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8141fc22>] ? sk_clone+0x22/0x2e0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8146cee6>] ? inet_csk_clone+0x16/0xd0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81485dd3>] ? tcp_create_openreq_child+0x23/0x450
>> Dec 18 15:55:02 localhost kernel: [<ffffffff814837bd>] ? tcp_v4_syn_recv_sock+0x4d/0x2a0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81485b91>] ? tcp_check_req+0x201/0x420
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8147b646>] ? tcp_rcv_state_process+0x116/0xa30
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8126a859>] ? cpumask_next_and+0x29/0x50
>> Dec 18 15:55:02 localhost kernel: [<ffffffff814831db>] ? tcp_v4_do_rcv+0x35b/0x430
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa00dea69>] ? bnx2_start_xmit+0x239/0x7d0 [bnx2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81484951>] ? tcp_v4_rcv+0x4e1/0x860
>> Dec 18 15:55:02 localhost kernel: [<ffffffff814626bd>] ? ip_local_deliver_finish+0xdd/0x2d0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81462948>] ? ip_local_deliver+0x98/0xa0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81461e0d>] ? ip_rcv_finish+0x12d/0x440
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81462395>] ? ip_rcv+0x275/0x350
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8104d74e>] ? update_group_power+0xae/0x110
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8142c34b>] ? __netif_receive_skb+0x49b/0x6f0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8142e408>] ? netif_receive_skb+0x58/0x60
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8142e510>] ? napi_skb_finish+0x50/0x70
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81430b99>] ? napi_gro_receive+0x39/0x50
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa00dfd4f>] ? bnx2_poll_work+0xd4f/0x1270 [bnx2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8105ea43>] ? rebalance_domains+0xa3/0x5b0
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa00e02ad>] ? bnx2_poll_msix+0x3d/0xc0 [bnx2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81430cb3>] ? net_rx_action+0x103/0x2f0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81072191>] ? __do_softirq+0xc1/0x1d0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff810d9640>] ? handle_IRQ_event+0x60/0x170
>> Dec 18 15:55:02 localhost kernel: [<ffffffff810721ea>] ? __do_softirq+0x11a/0x1d0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81071f75>] ? irq_exit+0x85/0x90
>> Dec 18 15:55:02 localhost kernel: [<ffffffff814f5215>] ? do_IRQ+0x75/0xf0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11
>> Dec 18 15:55:02 localhost kernel: <EOI>  [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa02106aa>] ? nilfs_dirty_inode+0x6a/0xa0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff811a00bb>] ? __mark_inode_dirty+0x3b/0x160
>> Dec 18 15:55:02 localhost kernel: [<ffffffff811ab185>] ? generic_write_end+0x65/0xa0
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa0210940>] ? nilfs_get_block+0x0/0x1d0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa020f860>] ? nilfs_write_end+0x70/0xb0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa020f230>] ? nilfs_write_begin+0x80/0xb0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff811115c4>] ? generic_file_buffered_write+0x174/0x2a0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff810707c7>] ? current_fs_time+0x27/0x30
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81112eb0>] ? __generic_file_aio_write+0x250/0x480
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8111314f>] ? generic_file_aio_write+0x6f/0xe0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8117651a>] ? do_sync_write+0xfa/0x140
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81090c30>] ? autoremove_wake_function+0x0/0x40
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8109b849>] ? ktime_get_ts+0xa9/0xe0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8120c546>] ? security_file_permission+0x16/0x20
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81176818>] ? vfs_write+0xb8/0x1a0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81177221>] ? sys_write+0x51/0x90
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
>> 
>> 
>> 
>> 在 2012-12-20，17:38，Vyacheslav Dubeyko <slava@dubeyko.com> 写道：
>> 
>>> On Thu, 2012-12-20 at 17:08 +0800, 张 磊 wrote:
>>>> Hi,
>>>> 
>>>> I remounted the filesystem, and started the MySQLs. The filesytstem became readonly again.
>>>> 
>>>> Dec 20 16:03:31 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
>>>> Dec 20 16:03:31 localhost kernel: NILFS warning: mounting fs with errors
>>>> Dec 20 16:03:31 localhost nilfs_cleanerd[29120]: start
>>>> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>> Dec 20 16:03:55 localhost kernel:
>>>> Dec 20 16:03:55 localhost kernel: Remounting filesystem read-only
>>>> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>> Dec 20 16:03:55 localhost kernel:
>>>> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: cannot clean segments: Read-only file system
>>>> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: shutdown
>>>> 
>>>> I remounted the filesystem again, and tried to delete the bad files, but delete failed.
>>>> 
>>>> Dec 20 16:04:02 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
>>>> Dec 20 16:04:02 localhost kernel: NILFS warning: mounting fs with errors
>>>> Dec 20 16:04:02 localhost nilfs_cleanerd[30054]: start
>>>> Dec 20 16:12:08 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 20 16:12:08 localhost kernel: NILFS error (device sdb2): nilfs_bmap_last_key: broken bmap (inode number=321775)
>>>> Dec 20 16:12:08 localhost kernel:
>>>> Dec 20 16:12:08 localhost kernel: Remounting filesystem read-only
>>>> Dec 20 16:12:08 localhost kernel: NILFS warning (device sdb2): nilfs_truncate_bmap: failed to truncate bmap (ino=321775, err=-5)
>>>> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: cannot clean segments: Read-only file system
>>>> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: shutdown
>>>> 
>>>> I tried a third remount, but failed. The server was down, and restarted.
>>>> 
>>>> Dec 20 16:12:42 localhost kernel: NILFS warning (device sdb2): nilfs_detach_log_writer: Hit dirty file after stopped log writer
>>>> Dec 20 16:12:42 localhost kernel:
>>>> 
>>> Yes, it is bad. The remount solves the trouble earlier.
>>> 
>>> As a result, do you have NILFS2 volume mounted as read-only?
>>> 
>>> Could you share more details about your environment? It needs for
>>> understanding situation and trying to reproduce. I need to know:
>>> 1. Linux kernel version.
>>> 2. nilfs-utils version.
>>> 3. "mount" output.
>>> 4. "df -h" output.
>>> 5. "lscp" output.
>>> 6. "lssu" output.
>>> 7. "nilfs-tune -l" output (superblock content)
>>> 
>>>> I found that fsck.nilfs2 was added into nilfs-utils v4. Can I try it? Where can I download nilfs-utils v4?
>>>> 
>>> Last version of nilfs-utils is 2.1.4. Currently, fsck.nilfs2 is on early
>>> stage of development. The v4 is a fsck.nilfs2 patchset version. You can
>>> try fsck.nilfs2 after applying this patchset on source code of
>>> nilfs-utils of 2.1.4 version. But fsck.nilfs2 can check only superblocks
>>> and segment summary headers and can't recover completely. So, I think
>>> that it will be useless for you.
>>> 
>>> With the best regards,
>>> Vyacheslav Dubeyko.
>>> 
>>>> 在 2012-12-20，14:08，Vyacheslav Dubeyko <slava@dubeyko.com> 写道：
>>>> 
>>>>> Hi,
>>>>> 
>>>>> On Thu, 2012-12-20 at 10:46 +0800, 张 磊 wrote:
>>>>>> Hello.
>>>>>> 	My nilfs suddenly become read-only. I saw these logs in /var/log/messages:
>>>>>> 
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>>>> Dec 19 11:20:05 localhost kernel:
>>>>>> Dec 19 11:20:05 localhost kernel: Remounting filesystem read-only
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>>>> Dec 19 11:20:05 localhost kernel:
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>>>> Dec 19 11:20:05 localhost kernel:
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>>>> Dec 19 11:20:05 localhost kernel:
>>>>>> ……………………………………………………
>>>>>> 
>>>>>> 	How can I fix this? There is 6TiB data on my disk, I don't want to format the disk.
>>>>>> 	I found that a lot of people have encountered the same problem. Is this a bug of nilfs? How can I avoid this problem? When it happens, I was running multiple MySQL and rsync, and nilfs_cleanerd was cleaning segments.
>>>>>> 
>>>>> Yes, this issue was reported earlier. As I understand, you can simply
>>>>> remount your filesystem in read-write mode and to continue using your
>>>>> NILFS2 filesystem.
>>>>> 
>>>>> If you will encounter any troubles with remounting, please, report about
>>>>> it.
>>>>> 
>>>>> With the best regards,
>>>>> Vyacheslav Dubeyko.
>>>>> 
>>>>> 
>>>>>> Elmer Zhang
> 
> -- 
> Seiji Kihara
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NILFS: bad btree node
       [not found]                   ` <44056E9A-3487-4E8A-A56A-5B9228FC7895-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2012-12-25  6:02                     ` Vyacheslav Dubeyko
  2012-12-25  7:10                       ` Elmer Zhang
  0 siblings, 1 reply; 17+ messages in thread
From: Vyacheslav Dubeyko @ 2012-12-25  6:02 UTC (permalink / raw)
  To: 张 磊; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

On Thu, 2012-12-20 at 19:02 +0800, 张 磊 wrote:
> Yes, I mounted NILFS2 as read-write. It's remounted as read-only by kernel when filesystem found the bad btree node.
> 
> That's the full backtrace. I will keep on testing, and report more infomation once I found.
> 

I am trying to reproduce the issue but currently without any success. I
have a presupposition that it can be a synchronization issue between GC
and main driver logic but I haven't any evidence of it yet. Probably, I
can't reproduce some environment's peculiarities.

So, I think that I need to understand more deeply a workload in that the
issue had occurred. As I remember, you talked about several MySQL
databases and so on. Could you describe in more details about what
applications and how to work before issue occurrence?

Thanks,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NILFS: bad btree node
  2012-12-25  6:02                     ` Vyacheslav Dubeyko
@ 2012-12-25  7:10                       ` Elmer Zhang
  0 siblings, 0 replies; 17+ messages in thread
From: Elmer Zhang @ 2012-12-25  7:10 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

I am trying to use NILFS2 to make MySQL cold backup. I run many MySQL slave servers on the machine, and store the data on NILFS2 filesystem. Most of the engine of the table is MyISAM, few InnoDB. 

When the issue occurred, some MySQL is running, and I am copying data to MySQL data dir with rsync. The util of the NILFS2 partition is almost 100%.

In addition to this problem, I also encountered some other problems. Some MyISAM tables suddenly be crashed, then sql_thread of slave stopped. But I do not need to repair the table, just wait a bit, and then restart the sql_thread, can continue. So I guess that may be a problem with the file system. Below is the error log of mysql about this:

121225 14:38:03 [ERROR] Slave SQL: Error 'Table 'consume_log_2a' is marked as crashed and should be repaired' on query. Default database: 'app_wsgrr'. Query: 'DELETE FROM consume_log_2a WHERE log_time<=1353750526 AND coin_type !=2 AND coin_type!=12 AND coin_type !=13', Error_code: 1194				# table crashed
121225 14:38:03 [Warning] Slave: Table 'consume_log_2a' is marked as crashed and should be repaired Error_code: 1194
121225 14:38:03 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'yf75-bin.000296' position 680291890
121225 14:58:54 [Note] Slave I/O thread exiting, read up to log 'yf75-bin.000298', position 802588261			# restart the sql_thread without repairing table
121225 14:58:55 [Note] Slave I/O thread: connected to master 'replica@10.75.7.75:6011',replication started in log 'yf75-bin.000298' at position 802588261

Version of MySQL Server: Percona Server 5.5.23


sdb2 is the NILFS2 partition. Below is the result of "iostat -xm -p sdb 1" in last few seconds.

Linux 2.6.32-220.13.1.el6.x86_64 (yf237) 	12/25/2012 	_x86_64_	(8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.24    0.00    1.91    4.02    0.00   92.83

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb            3285.75    42.27  259.02  151.13    15.14    18.46   167.73     5.54   13.52   0.81  33.26
sdb1              0.01    42.26    0.37    1.98     0.04     0.17   183.98     0.06   26.61   2.38   0.56
sdb2           3285.74     0.00  258.65  148.65    15.10    18.28   167.85     5.48   13.46   0.82  33.25

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.23    0.00    2.22   16.54    0.00   80.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb             251.00     0.00  736.00   81.00     4.62     8.87    33.81    19.26   23.74   1.22  99.90
sdb1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb2            251.00     0.00  736.00   79.00     4.62     8.87    33.89    19.27   23.79   1.23  99.90

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.80    0.00    1.90   20.99    0.00   72.31

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb             300.00   450.00  900.00   87.00    10.64     9.50    41.80    19.50   19.55   1.01  99.90
sdb1              0.00   450.00    1.00   16.00     0.12     1.82   234.35     0.32   18.88   6.76  11.50
sdb2            300.00     0.00  899.00   69.00    10.52     7.68    38.50    19.18   19.60   1.03  99.90

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.12    0.00    2.25   21.60    0.00   74.03

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb             449.00     0.00  625.00   81.00     5.00     9.18    41.12    15.32   21.36   1.41  99.40
sdb1              0.00     0.00    1.00    0.00     0.12     0.00   256.00     0.02   19.00  19.00   1.90
sdb2            449.00     0.00  624.00   79.00     4.88     9.18    40.93    15.30   21.42   1.41  99.40

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.51    0.00    1.52   15.08    0.00   82.89

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb            9243.00     0.00  720.00   83.00    23.93     8.20    81.94    19.40   21.53   1.25 100.00
sdb1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb2           9341.00     0.00  722.00   79.00    24.18     8.20    82.79    19.39   21.65   1.25  99.90

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.23    0.00    6.67   11.36    0.00   80.74

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb           10607.00    25.00  687.00  718.00    59.84    86.66   213.54    26.66   20.83   0.66  92.20
sdb1              0.00    24.00    0.00    9.00     0.00     0.13    29.33     0.00    0.11   0.11   0.10
sdb2          10509.00     1.00  685.00  705.00    59.59    86.53   215.29    26.66   21.02   0.66  92.30

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.86    0.00    1.84   16.67    0.00   80.64

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb             127.00     0.00  458.00  139.00     3.05    13.29    56.07     9.15   15.29   1.66  99.30
sdb1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb2            127.00     0.00  458.00  137.00     3.05    13.29    56.26     9.15   15.35   1.67  99.20

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.50    0.00    1.38   13.05    0.00   85.07

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb             111.00   229.00  523.00  151.00     3.28    10.09    40.61    12.50   18.36   1.47  99.00
sdb1              0.00   229.00    2.00   54.00     0.25     1.11    49.57     0.33    5.82   1.14   6.40
sdb2            111.00     0.00  521.00   89.00     3.03     8.98    40.31    12.17   19.75   1.62  99.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.88    0.00    1.26   14.99    0.00   82.87

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb             171.00     0.00  539.00   86.00     3.29     7.48    35.28    10.23   16.33   1.58  98.80
sdb1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb2            171.00     0.00  540.00   78.00     3.29     7.48    35.69    10.23   16.55   1.60  98.80

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.76    0.00    1.90   17.62    0.00   79.72

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb             156.00     0.00  536.00  119.00     3.31    11.17    45.26    12.63   18.86   1.51  99.20
sdb1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb2            156.00     0.00  535.00  114.00     3.30    11.17    45.67    12.63   19.00   1.53  99.20

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.49    0.00    1.10   14.81    0.00   83.60

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb             169.00    23.00  552.00   94.00     3.36     9.02    39.24    15.50   24.51   1.54  99.80
sdb1              0.00    23.00    0.00    6.00     0.00     0.11    38.67     0.00    0.00   0.00   0.00
sdb2            169.00     0.00  552.00   82.00     3.36     8.90    39.62    15.50   24.97   1.57  99.80

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.75    0.00    1.25   14.91    0.00   83.08

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb             140.00     0.00  589.00  104.00     3.61     9.41    38.50    12.04   17.35   1.43  99.20
sdb1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb2            140.00     0.00  589.00  103.00     3.61     9.41    38.55    12.04   17.38   1.43  99.20

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.63    0.00    1.14   14.68    0.00   83.54

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb             174.00   323.00  599.00  112.00     3.97    10.52    41.73    12.44   17.52   1.39  98.60
sdb1              0.00   323.00    0.00   11.00     0.00     1.30   242.91     0.11    9.82   1.00   1.10
sdb2            174.00     0.00  601.00   93.00     3.98     9.21    38.93    12.33   17.84   1.42  98.60

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.49    0.00    1.11   15.23    0.00   83.17

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb              95.00     0.00  750.00   65.00     3.91     6.57    26.34    20.28   24.46   1.22  99.50
sdb1              0.00     0.00    1.00    0.00     0.12     0.00   256.00     0.07   71.00  71.00   7.10
sdb2             95.00     0.00  747.00   60.00     3.78     6.57    26.26    20.20   24.56   1.23  99.50

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.50    0.00    1.37   13.04    0.00   85.09

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb             159.00     0.00  557.00   96.00     3.23     8.82    37.81    16.73   26.08   1.52  99.40
sdb1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb2            159.00     0.00  557.00   92.00     3.23     8.82    38.04    16.73   26.24   1.53  99.40

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.51    0.00    1.16   14.51    0.00   83.83

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb             139.00     0.00  637.00   79.00     3.73     7.44    31.96    14.28   20.11   1.39  99.80
sdb1              0.00     0.00    2.00    0.00     0.25     0.00   256.00     0.04   22.00  22.00   4.40
sdb2            139.00     0.00  635.00   67.00     3.48     7.44    31.86    14.24   20.45   1.42  99.80

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.98    0.00    1.60   15.83    0.00   81.60

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb             198.00    15.00  566.00  109.00     3.79    10.37    42.96    14.10   20.60   1.48  99.60
sdb1              0.00    15.00    1.00    6.00     0.12     0.08    60.57     0.02    2.71   2.71   1.90
sdb2            198.00     0.00  566.00   97.00     3.67    10.29    43.11    14.08   20.97   1.50  99.60

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.11    0.00    1.35   17.32    0.00   80.22

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb             168.00   257.00  613.00   99.00     3.60     9.41    37.43    15.08   21.11   1.39  99.10
sdb1              0.00   257.00    0.00   10.00     0.00     1.04   213.60     0.26   26.00   2.60   2.60
sdb2            168.00     0.00  612.00   86.00     3.59     8.37    35.11    14.82   21.13   1.42  99.10


Below is a snapshot of iotop:

Total DISK READ:       3.26 M/s | Total DISK WRITE:      66.36 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                         
32291 be/4 my6013   1854.71 K/s   77.93 K/s  0.00 % 84.47 % mysqld --defaults-file=/data0/mysql6013/my6013.cnf
32038 be/4 my6005    472.77 K/s  233.79 K/s  0.00 % 38.31 % mysqld --defaults-file=/data0/mysql6005/my6005.cnf
27470 be/4 my6015    316.91 K/s    0.00 B/s  0.00 % 26.11 % mysqld --defaults-file=/data0/mysql6015/my6015.cnf
14478 be/4 my6010    124.69 K/s  223.40 K/s  0.00 % 19.95 % mysqld --defaults-file=/data0/mysql6010/my6010.cnf
32131 be/4 my6007    363.67 K/s  264.96 K/s  0.00 % 16.28 % mysqld --defaults-file=/data0/mysql6007/my6007.cnf
11578 be/4 my6018     31.17 K/s  353.28 K/s  0.00 % 14.17 % mysqld --defaults-file=/data0/mysql6018/my6018.cnf
27469 be/4 my6015      5.20 K/s   15.59 K/s  0.00 % 12.47 % mysqld --defaults-file=/data0/mysql6015/my6015.cnf
25104 be/4 my6009     15.59 K/s  161.05 K/s  0.00 %  9.47 % mysqld --defaults-file=/data0/mysql6009/my6009.cnf
 7144 be/4 root       41.56 K/s    5.82 M/s  0.00 %  8.41 % [segctord]
11498 be/4 my6018     46.76 K/s   67.54 K/s  0.00 %  7.50 % mysqld --defaults-file=/data0/mysql6018/my6018.cnf
 1307 be/4 my6016      5.20 K/s  140.27 K/s  0.00 %  4.83 % mysqld --defaults-file=/data0/mysql6016/my6016.cnf
11481 be/4 my6018     20.78 K/s    0.00 B/s  0.00 %  3.89 % mysqld --defaults-file=/data0/mysql6018/my6018.cnf
13831 be/4 my6003      5.20 K/s  181.83 K/s  0.00 %  0.77 % mysqld --defaults-file=/data0/mysql6003/my6003.cnf
  973 be/4 root        0.00 B/s   46.76 K/s  0.00 %  0.03 % [kjournald]
  972 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.02 % [kjournald]
18568 be/4 my6016      0.00 B/s   93.51 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6016/my6016.cnf
18569 be/4 my6016      0.00 B/s  207.81 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6016/my6016.cnf
14477 be/4 my6010      0.00 B/s  109.10 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6010/my6010.cnf
32130 be/4 my6007      0.00 B/s   51.95 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6007/my6007.cnf
12656 be/4 www         0.00 B/s 1449.48 K/s  0.00 %  0.00 % rsync --daemon
25103 be/4 my6009      0.00 B/s   31.17 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6009/my6009.cnf
  962 be/4 my6013      0.00 B/s  353.28 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6013/my6013.cnf
  963 be/4 my6013      0.00 B/s  327.30 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6013/my6013.cnf
  964 be/4 my6013      0.00 B/s  135.08 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6013/my6013.cnf
  965 be/4 my6013      0.00 B/s  290.94 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6013/my6013.cnf
 7145 be/4 root        0.00 B/s  446.79 K/s  0.00 %  0.00 % nilfs_cleanerd -n /dev/sdb2 /data0/
13830 be/4 my6003      0.00 B/s   62.34 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6003/my6003.cnf
27723 be/4 my6015      0.00 B/s  244.18 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6015/my6015.cnf
27722 be/4 my6015      0.00 B/s  150.66 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6015/my6015.cnf
11577 be/4 my6018      0.00 B/s  124.69 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6018/my6018.cnf
32193 be/4 my6011      0.00 B/s   98.71 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6011/my6011.cnf
32240 be/4 my6012      0.00 B/s   93.51 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6012/my6012.cnf
11803 be/4 my6002      0.00 B/s   15.59 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6002/my6002.cnf
11804 be/4 my6002      0.00 B/s   25.98 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6002/my6002.cnf
32290 be/4 my6013      0.00 B/s  140.27 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6013/my6013.cnf
32352 be/4 my6014      0.00 B/s   25.98 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6014/my6014.cnf
32037 be/4 my6005      0.00 B/s  150.66 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6005/my6005.cnf
  984 be/4 my6013      0.00 B/s 1329.99 K/s  0.00 %  0.00 % mysqld --defaults-file=/data0/mysql6013/my6013.cnf



ÔÚ 2012-12-25£¬14:02£¬Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> Ð´µÀ£º

> Hi,
> 
> On Thu, 2012-12-20 at 19:02 +0800, ÕÅ ÀÚ wrote:
>> Yes, I mounted NILFS2 as read-write. It's remounted as read-only by kernel when filesystem found the bad btree node.
>> 
>> That's the full backtrace. I will keep on testing, and report more infomation once I found.
>> 
> 
> I am trying to reproduce the issue but currently without any success. I
> have a presupposition that it can be a synchronization issue between GC
> and main driver logic but I haven't any evidence of it yet. Probably, I
> can't reproduce some environment's peculiarities.
> 
> So, I think that I need to understand more deeply a workload in that the
> issue had occurred. As I remember, you talked about several MySQL
> databases and so on. Could you describe in more details about what
> applications and how to work before issue occurrence?
> 
> Thanks,
> Vyacheslav Dubeyko.
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: NILFS: bad btree node
       [not found]             ` <14BA4286-BF21-4BD3-8E41-2F8F9512D801-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2012-12-20 10:41               ` Vyacheslav Dubeyko
  2012-12-22 14:12               ` Seiji Kihara
@ 2012-12-27 10:43               ` Vyacheslav Dubeyko
  2 siblings, 0 replies; 17+ messages in thread
From: Vyacheslav Dubeyko @ 2012-12-27 10:43 UTC (permalink / raw)
  To: 张 磊; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Thu, 2012-12-20 at 18:16 +0800, 张 磊 wrote:
> 1. Linux kernel version: 2.6.32-220.13.1.el6.x86_64 

Why do you use 2.6.32 Linux kernel? Could you try to use one of the last
vanilla kernel (for example, 3.7.1)?

To be honestly, I tried to reproduce the issue on 3.6.0 version but
without any success. And I know that such issue was reported on 3.6.8
kernel version also. But this issue has not stable reproducing in the
case of 3.6.8 kernel version.

> 2. nilfs-utils version: nilfs-utils-2.1.4
> 3. "mount" output:
> /dev/sdb2 on /data0 type nilfs2 (rw,noatime,gcpid=22909)
> 
> 4. "df -h" output:
> /dev/sdb2 9.6T 5.9T 3.2T 66% /data0 

Do you use any RAID technology?

By the way, what HDD hardware do you use? What vendor?

With the best regards,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-12-27 10:43 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-20  2:46 NILFS: bad btree node 张 磊
     [not found] ` <86B5C141-ACFA-4541-999F-E17E09F22476-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2012-12-20  6:08   ` Vyacheslav Dubeyko
2012-12-20  9:08     ` 张 磊
     [not found]       ` <3455B0CD-EF89-4227-90E1-FC6B20F5F8EB-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2012-12-20  9:38         ` Vyacheslav Dubeyko
2012-12-20 10:16           ` 张 磊
     [not found]             ` <14BA4286-BF21-4BD3-8E41-2F8F9512D801-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2012-12-20 10:41               ` Vyacheslav Dubeyko
2012-12-20 11:02                 ` 张 磊
     [not found]                   ` <44056E9A-3487-4E8A-A56A-5B9228FC7895-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2012-12-25  6:02                     ` Vyacheslav Dubeyko
2012-12-25  7:10                       ` Elmer Zhang
2012-12-22 14:12               ` Seiji Kihara
     [not found]                 ` <50D5BFD6.1080502-sG5X7nlA6pw@public.gmane.org>
2012-12-24  3:04                   ` 张 磊
2012-12-27 10:43               ` Vyacheslav Dubeyko
  -- strict thread matches above, loose matches on Subject: below --
2012-05-25 14:30 Kenneth Langga
     [not found] ` <CAHmELnWvFNdiePs=mQJ=nqfsxJ_49zxawa9jncE-RJ2-omYHOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-05-25 18:06   ` Reinoud Zandijk
     [not found]     ` <20120525180649.GA1236-bVHBekiX4bNgoMqBc1r0ESegHCQxtGRMHZ5vskTnxNA@public.gmane.org>
2012-05-25 18:15       ` Kenneth Langga
     [not found]         ` <CAHmELnVyRNGn1gda0Sw53YCSOAYMm5JUonebi-9NxaFBP7Uidw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-05-26 14:49           ` Christian Smith
     [not found]             ` <20120526144932.GG18110-Ng8wz+J301SNY5Lh21HnMTHS2PGA244I9dF7HbQ/qKg@public.gmane.org>
2012-05-26 16:43               ` Kenneth Langga

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).