* NILFS: bad btree node
@ 2012-05-25 14:30 Kenneth Langga
[not found] ` <CAHmELnWvFNdiePs=mQJ=nqfsxJ_49zxawa9jncE-RJ2-omYHOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 17+ messages in thread
From: Kenneth Langga @ 2012-05-25 14:30 UTC (permalink / raw)
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
I got this error from kernel.log:
NILFS: bad btree node (blocknr=111560943): level = 242, flags = 0x3f,
nchildren = 23369
NILFS error (device sdc2): nilfs_bmap_lookup_contig: broken bmap
(inode number=19696)
What is the correct course of action for this type of error? And what
would have caused this?
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NILFS: bad btree node
[not found] ` <CAHmELnWvFNdiePs=mQJ=nqfsxJ_49zxawa9jncE-RJ2-omYHOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-05-25 18:06 ` Reinoud Zandijk
[not found] ` <20120525180649.GA1236-bVHBekiX4bNgoMqBc1r0ESegHCQxtGRMHZ5vskTnxNA@public.gmane.org>
0 siblings, 1 reply; 17+ messages in thread
From: Reinoud Zandijk @ 2012-05-25 18:06 UTC (permalink / raw)
To: Kenneth Langga; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
Hi Kenneth.
On Fri, May 25, 2012 at 10:30:40PM +0800, Kenneth Langga wrote:
> NILFS: bad btree node (blocknr=111560943): level = 242, flags = 0x3f,
> nchildren = 23369
> NILFS error (device sdc2): nilfs_bmap_lookup_contig: broken bmap
> (inode number=19696)
>
> What is the correct course of action for this type of error? And what
> would have caused this?
What struck me is the very high level and the absurt number of number of
children. That can't be good. AFAIR NiLFS only has say upto 3 (or 4?) levels
in its B-tree. It *could* be failing in rebalancing or more likely pointing to
garbage?
Cheers,
Reinoud
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NILFS: bad btree node
[not found] ` <20120525180649.GA1236-bVHBekiX4bNgoMqBc1r0ESegHCQxtGRMHZ5vskTnxNA@public.gmane.org>
@ 2012-05-25 18:15 ` Kenneth Langga
[not found] ` <CAHmELnVyRNGn1gda0Sw53YCSOAYMm5JUonebi-9NxaFBP7Uidw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 17+ messages in thread
From: Kenneth Langga @ 2012-05-25 18:15 UTC (permalink / raw)
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
It's a 3TB harddisk. Could that be the reason?
Right now, it's mounted read-only. Is it safe to make it read/write
again? And can I run nilfs-clean on it and maybe the error would be
gone?
On Sat, May 26, 2012 at 2:06 AM, Reinoud Zandijk <reinoud-S783fYmB3Ccdnm+yROfE0A@public.gmane.org> wrote:
> Hi Kenneth.
>
> On Fri, May 25, 2012 at 10:30:40PM +0800, Kenneth Langga wrote:
>> NILFS: bad btree node (blocknr=111560943): level = 242, flags = 0x3f,
>> nchildren = 23369
>> NILFS error (device sdc2): nilfs_bmap_lookup_contig: broken bmap
>> (inode number=19696)
>>
>> What is the correct course of action for this type of error? And what
>> would have caused this?
>
> What struck me is the very high level and the absurt number of number of
> children. That can't be good. AFAIR NiLFS only has say upto 3 (or 4?) levels
> in its B-tree. It *could* be failing in rebalancing or more likely pointing to
> garbage?
>
> Cheers,
> Reinoud
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NILFS: bad btree node
[not found] ` <CAHmELnVyRNGn1gda0Sw53YCSOAYMm5JUonebi-9NxaFBP7Uidw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-05-26 14:49 ` Christian Smith
[not found] ` <20120526144932.GG18110-Ng8wz+J301SNY5Lh21HnMTHS2PGA244I9dF7HbQ/qKg@public.gmane.org>
0 siblings, 1 reply; 17+ messages in thread
From: Christian Smith @ 2012-05-26 14:49 UTC (permalink / raw)
To: Kenneth Langga; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On Sat, May 26, 2012 at 02:15:41AM +0800, Kenneth Langga wrote:
> It's a 3TB harddisk. Could that be the reason?
>
> Right now, it's mounted read-only. Is it safe to make it read/write
> again? And can I run nilfs-clean on it and maybe the error would be
> gone?
>
You should be able to remount read/write, as you'll still have your
old snapshots or checkpoints to mount from instead if it all goes
wrong.
In my experience, though, once the clearner fails to be able to
clean segments due to logical errors, it's game over and a
backup/mkfs/restore is needed. But then, I mostly run NIFLS on
small slow SDD, so that's no great hardship. We desperately need
a fsck to handle scenarios like that.
I can't see the size of the disk being a problem. All the data
pointers are 64-bit, so should comfortably handle 3TB.
In short, try read-write, but be prepared to reformat.
Also, if you're using a 3TB disk for NILFS to store media
files, I'd perhaps suggest against it, if not for the reason
that backup/restore on that much data will take an age. I
currently stick to nilfs for my root filesystems, leaving
big and/or personal data on more stable, less cutting edge
filesystems.
Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NILFS: bad btree node
[not found] ` <20120526144932.GG18110-Ng8wz+J301SNY5Lh21HnMTHS2PGA244I9dF7HbQ/qKg@public.gmane.org>
@ 2012-05-26 16:43 ` Kenneth Langga
0 siblings, 0 replies; 17+ messages in thread
From: Kenneth Langga @ 2012-05-26 16:43 UTC (permalink / raw)
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
I see, I'll follow what you said. Thanks. Btw, what could be the
source of the error so that I may avoid it in the future? And would
deleting the offending file (if the error is tied to one) also remove
the error?
On Sat, May 26, 2012 at 10:49 PM, Christian Smith
<csmith-r5H9PUQoNxUga4AWyxku7ULnMzgEhdHr@public.gmane.org> wrote:
> On Sat, May 26, 2012 at 02:15:41AM +0800, Kenneth Langga wrote:
>> It's a 3TB harddisk. Could that be the reason?
>>
>> Right now, it's mounted read-only. Is it safe to make it read/write
>> again? And can I run nilfs-clean on it and maybe the error would be
>> gone?
>>
>
> You should be able to remount read/write, as you'll still have your
> old snapshots or checkpoints to mount from instead if it all goes
> wrong.
>
> In my experience, though, once the clearner fails to be able to
> clean segments due to logical errors, it's game over and a
> backup/mkfs/restore is needed. Â But then, I mostly run NIFLS on
> small slow SDD, so that's no great hardship. We desperately need
> a fsck to handle scenarios like that.
>
> I can't see the size of the disk being a problem. All the data
> pointers are 64-bit, so should comfortably handle 3TB.
>
> In short, try read-write, but be prepared to reformat.
>
> Also, if you're using a 3TB disk for NILFS to store media
> files, I'd perhaps suggest against it, if not for the reason
> that backup/restore on that much data will take an age. I
> currently stick to nilfs for my root filesystems, leaving
> big and/or personal data on more stable, less cutting edge
> filesystems.
>
> Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* NILFS: bad btree node
@ 2012-12-20 2:46 张 磊
[not found] ` <86B5C141-ACFA-4541-999F-E17E09F22476-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 17+ messages in thread
From: 张 磊 @ 2012-12-20 2:46 UTC (permalink / raw)
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
Hello.
My nilfs suddenly become read-only. I saw these logs in /var/log/messages:
Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
Dec 19 11:20:05 localhost kernel:
Dec 19 11:20:05 localhost kernel: Remounting filesystem read-only
Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
Dec 19 11:20:05 localhost kernel:
Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
Dec 19 11:20:05 localhost kernel:
Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
Dec 19 11:20:05 localhost kernel:
¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡
How can I fix this? There is 6TiB data on my disk, I don't want to format the disk.
I found that a lot of people have encountered the same problem. Is this a bug of nilfs? How can I avoid this problem? When it happens, I was running multiple MySQL and rsync, and nilfs_cleanerd was cleaning segments.
Elmer Zhang--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NILFS: bad btree node
[not found] ` <86B5C141-ACFA-4541-999F-E17E09F22476-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2012-12-20 6:08 ` Vyacheslav Dubeyko
2012-12-20 9:08 ` 张 磊
0 siblings, 1 reply; 17+ messages in thread
From: Vyacheslav Dubeyko @ 2012-12-20 6:08 UTC (permalink / raw)
To: 张 磊; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
Hi,
On Thu, 2012-12-20 at 10:46 +0800, 张 磊 wrote:
> Hello.
> My nilfs suddenly become read-only. I saw these logs in /var/log/messages:
>
> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
> Dec 19 11:20:05 localhost kernel:
> Dec 19 11:20:05 localhost kernel: Remounting filesystem read-only
> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
> Dec 19 11:20:05 localhost kernel:
> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
> Dec 19 11:20:05 localhost kernel:
> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
> Dec 19 11:20:05 localhost kernel:
> ……………………………………………………
>
> How can I fix this? There is 6TiB data on my disk, I don't want to format the disk.
> I found that a lot of people have encountered the same problem. Is this a bug of nilfs? How can I avoid this problem? When it happens, I was running multiple MySQL and rsync, and nilfs_cleanerd was cleaning segments.
>
Yes, this issue was reported earlier. As I understand, you can simply
remount your filesystem in read-write mode and to continue using your
NILFS2 filesystem.
If you will encounter any troubles with remounting, please, report about
it.
With the best regards,
Vyacheslav Dubeyko.
> Elmer Zhang--
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NILFS: bad btree node
2012-12-20 6:08 ` Vyacheslav Dubeyko
@ 2012-12-20 9:08 ` 张 磊
[not found] ` <3455B0CD-EF89-4227-90E1-FC6B20F5F8EB-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 17+ messages in thread
From: 张 磊 @ 2012-12-20 9:08 UTC (permalink / raw)
To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
Hi,
I remounted the filesystem, and started the MySQLs. The filesytstem became readonly again.
Dec 20 16:03:31 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
Dec 20 16:03:31 localhost kernel: NILFS warning: mounting fs with errors
Dec 20 16:03:31 localhost nilfs_cleanerd[29120]: start
Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
Dec 20 16:03:55 localhost kernel:
Dec 20 16:03:55 localhost kernel: Remounting filesystem read-only
Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
Dec 20 16:03:55 localhost kernel:
Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: cannot clean segments: Read-only file system
Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: shutdown
I remounted the filesystem again, and tried to delete the bad files, but delete failed.
Dec 20 16:04:02 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
Dec 20 16:04:02 localhost kernel: NILFS warning: mounting fs with errors
Dec 20 16:04:02 localhost nilfs_cleanerd[30054]: start
Dec 20 16:12:08 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
Dec 20 16:12:08 localhost kernel: NILFS error (device sdb2): nilfs_bmap_last_key: broken bmap (inode number=321775)
Dec 20 16:12:08 localhost kernel:
Dec 20 16:12:08 localhost kernel: Remounting filesystem read-only
Dec 20 16:12:08 localhost kernel: NILFS warning (device sdb2): nilfs_truncate_bmap: failed to truncate bmap (ino=321775, err=-5)
Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: cannot clean segments: Read-only file system
Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: shutdown
I tried a third remount, but failed. The server was down, and restarted.
Dec 20 16:12:42 localhost kernel: NILFS warning (device sdb2): nilfs_detach_log_writer: Hit dirty file after stopped log writer
Dec 20 16:12:42 localhost kernel:
I found that fsck.nilfs2 was added into nilfs-utils v4. Can I try it? Where can I download nilfs-utils v4?
ÔÚ 2012-12-20£¬14:08£¬Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> дµÀ£º
> Hi,
>
> On Thu, 2012-12-20 at 10:46 +0800, ÕÅ ÀÚ wrote:
>> Hello.
>> My nilfs suddenly become read-only. I saw these logs in /var/log/messages:
>>
>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>> Dec 19 11:20:05 localhost kernel:
>> Dec 19 11:20:05 localhost kernel: Remounting filesystem read-only
>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>> Dec 19 11:20:05 localhost kernel:
>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>> Dec 19 11:20:05 localhost kernel:
>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>> Dec 19 11:20:05 localhost kernel:
>> ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡
>>
>> How can I fix this? There is 6TiB data on my disk, I don't want to format the disk.
>> I found that a lot of people have encountered the same problem. Is this a bug of nilfs? How can I avoid this problem? When it happens, I was running multiple MySQL and rsync, and nilfs_cleanerd was cleaning segments.
>>
>
> Yes, this issue was reported earlier. As I understand, you can simply
> remount your filesystem in read-write mode and to continue using your
> NILFS2 filesystem.
>
> If you will encounter any troubles with remounting, please, report about
> it.
>
> With the best regards,
> Vyacheslav Dubeyko.
>
>
>> Elmer Zhang--
>> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NILFS: bad btree node
[not found] ` <3455B0CD-EF89-4227-90E1-FC6B20F5F8EB-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2012-12-20 9:38 ` Vyacheslav Dubeyko
2012-12-20 10:16 ` 张 磊
0 siblings, 1 reply; 17+ messages in thread
From: Vyacheslav Dubeyko @ 2012-12-20 9:38 UTC (permalink / raw)
To: 张 磊; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On Thu, 2012-12-20 at 17:08 +0800, 张 磊 wrote:
> Hi,
>
> I remounted the filesystem, and started the MySQLs. The filesytstem became readonly again.
>
> Dec 20 16:03:31 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
> Dec 20 16:03:31 localhost kernel: NILFS warning: mounting fs with errors
> Dec 20 16:03:31 localhost nilfs_cleanerd[29120]: start
> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
> Dec 20 16:03:55 localhost kernel:
> Dec 20 16:03:55 localhost kernel: Remounting filesystem read-only
> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
> Dec 20 16:03:55 localhost kernel:
> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: cannot clean segments: Read-only file system
> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: shutdown
>
> I remounted the filesystem again, and tried to delete the bad files, but delete failed.
>
> Dec 20 16:04:02 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
> Dec 20 16:04:02 localhost kernel: NILFS warning: mounting fs with errors
> Dec 20 16:04:02 localhost nilfs_cleanerd[30054]: start
> Dec 20 16:12:08 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> Dec 20 16:12:08 localhost kernel: NILFS error (device sdb2): nilfs_bmap_last_key: broken bmap (inode number=321775)
> Dec 20 16:12:08 localhost kernel:
> Dec 20 16:12:08 localhost kernel: Remounting filesystem read-only
> Dec 20 16:12:08 localhost kernel: NILFS warning (device sdb2): nilfs_truncate_bmap: failed to truncate bmap (ino=321775, err=-5)
> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: cannot clean segments: Read-only file system
> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: shutdown
>
> I tried a third remount, but failed. The server was down, and restarted.
>
> Dec 20 16:12:42 localhost kernel: NILFS warning (device sdb2): nilfs_detach_log_writer: Hit dirty file after stopped log writer
> Dec 20 16:12:42 localhost kernel:
>
Yes, it is bad. The remount solves the trouble earlier.
As a result, do you have NILFS2 volume mounted as read-only?
Could you share more details about your environment? It needs for
understanding situation and trying to reproduce. I need to know:
1. Linux kernel version.
2. nilfs-utils version.
3. "mount" output.
4. "df -h" output.
5. "lscp" output.
6. "lssu" output.
7. "nilfs-tune -l" output (superblock content)
> I found that fsck.nilfs2 was added into nilfs-utils v4. Can I try it? Where can I download nilfs-utils v4?
>
Last version of nilfs-utils is 2.1.4. Currently, fsck.nilfs2 is on early
stage of development. The v4 is a fsck.nilfs2 patchset version. You can
try fsck.nilfs2 after applying this patchset on source code of
nilfs-utils of 2.1.4 version. But fsck.nilfs2 can check only superblocks
and segment summary headers and can't recover completely. So, I think
that it will be useless for you.
With the best regards,
Vyacheslav Dubeyko.
> 在 2012-12-20,14:08,Vyacheslav Dubeyko <slava@dubeyko.com> 写道:
>
> > Hi,
> >
> > On Thu, 2012-12-20 at 10:46 +0800, 张 磊 wrote:
> >> Hello.
> >> My nilfs suddenly become read-only. I saw these logs in /var/log/messages:
> >>
> >> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> >> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
> >> Dec 19 11:20:05 localhost kernel:
> >> Dec 19 11:20:05 localhost kernel: Remounting filesystem read-only
> >> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> >> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
> >> Dec 19 11:20:05 localhost kernel:
> >> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> >> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
> >> Dec 19 11:20:05 localhost kernel:
> >> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
> >> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
> >> Dec 19 11:20:05 localhost kernel:
> >> ……………………………………………………
> >>
> >> How can I fix this? There is 6TiB data on my disk, I don't want to format the disk.
> >> I found that a lot of people have encountered the same problem. Is this a bug of nilfs? How can I avoid this problem? When it happens, I was running multiple MySQL and rsync, and nilfs_cleanerd was cleaning segments.
> >>
> >
> > Yes, this issue was reported earlier. As I understand, you can simply
> > remount your filesystem in read-write mode and to continue using your
> > NILFS2 filesystem.
> >
> > If you will encounter any troubles with remounting, please, report about
> > it.
> >
> > With the best regards,
> > Vyacheslav Dubeyko.
> >
> >
> >> Elmer Zhang--
> >> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> >> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NILFS: bad btree node
2012-12-20 9:38 ` Vyacheslav Dubeyko
@ 2012-12-20 10:16 ` 张 磊
[not found] ` <14BA4286-BF21-4BD3-8E41-2F8F9512D801-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 17+ messages in thread
From: 张 磊 @ 2012-12-20 10:16 UTC (permalink / raw)
To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
1. Linux kernel version: 2.6.32-220.13.1.el6.x86_64
2. nilfs-utils version: nilfs-utils-2.1.4
3. "mount" output:
/dev/sdb2 on /data0 type nilfs2 (rw,noatime,gcpid=22909)
4. "df -h" output:
/dev/sdb2 9.6T 5.9T 3.2T 66% /data0
5. "lscp" output:
CNO DATE TIME MODE FLG NBLKINC ICNT
2 2012-12-03 14:03:01 ss - 14 3
580481 2012-12-20 16:11:25 cp - 293 697667
580482 2012-12-20 16:11:25 cp - 130 697666
580483 2012-12-20 16:11:25 cp - 225 697664
580484 2012-12-20 16:11:25 cp - 143 697663
580485 2012-12-20 16:11:26 cp - 311 697659
580486 2012-12-20 16:11:27 cp - 328 697657
580487 2012-12-20 16:11:27 cp - 263 697655
580488 2012-12-20 16:11:27 cp - 118 697653
580489 2012-12-20 16:11:28 cp - 230 697651
580490 2012-12-20 16:11:28 cp - 272 697649
580491 2012-12-20 16:11:28 cp - 148 697648
580492 2012-12-20 16:11:29 cp - 139 697647
580493 2012-12-20 16:11:29 cp - 273 697645
580494 2012-12-20 16:11:29 cp - 147 697644
580495 2012-12-20 16:11:30 cp - 271 697641
580496 2012-12-20 16:11:31 cp - 526 697636
580497 2012-12-20 16:11:34 cp - 1684 697625
580498 2012-12-20 16:11:37 cp - 983 697609
580499 2012-12-20 16:11:38 cp - 421 697605
580500 2012-12-20 16:11:40 cp - 1019 697594
580501 2012-12-20 16:11:40 cp - 143 697593
580502 2012-12-20 16:11:41 cp - 1536 697592
580503 2012-12-20 16:11:41 cp - 373 697590
580504 2012-12-20 16:11:42 cp - 312 697587
580505 2012-12-20 16:11:42 cp - 102 697586
580506 2012-12-20 16:11:43 cp - 274 697584
580507 2012-12-20 16:11:43 cp - 270 697582
580508 2012-12-20 16:11:43 cp - 118 697581
580509 2012-12-20 16:11:43 cp - 133 697580
580510 2012-12-20 16:11:44 cp - 321 697578
580511 2012-12-20 16:11:44 cp - 245 697576
580512 2012-12-20 16:11:45 cp - 394 697573
580513 2012-12-20 16:11:45 cp - 121 697572
580514 2012-12-20 16:11:45 cp - 245 697569
580515 2012-12-20 16:11:52 cp - 2705 697543
580516 2012-12-20 16:11:55 cp - 2590 697504
580517 2012-12-20 16:11:59 cp - 2418 697453
580518 2012-12-20 16:12:00 cp - 866 697436
580519 2012-12-20 16:12:01 cp - 864 697420
580520 2012-12-20 16:12:05 cp - 1765 697357
580521 2012-12-20 16:12:05 cp - 120 697356
580522 2012-12-20 16:12:06 cp - 820 697332
580523 2012-12-20 16:12:09 cp - 1642 697174
580524 2012-12-20 16:12:09 cp - 89 697173
580525 2012-12-20 16:12:10 cp - 56 697173
580526 2012-12-20 16:12:42 cp - 763 697173
6. "lssu" output:
it's too large, please download it: http://d.pr/f/vnoR
7. "nilfs-tune -l" output (superblock content):
nilfs-tune 2.1.4
Filesystem volume name: (none)
Filesystem UUID: dcfb7152-a342-48d0-a712-212a3062395e
Filesystem magic number: 0x3434
Filesystem revision #: 2.0
Filesystem features: (none)
Filesystem state: invalid or mounted,error
Filesystem OS type: Linux
Block size: 4096
Filesystem created: Mon Dec 3 13:56:51 2012
Last mount time: Thu Dec 20 17:44:03 2012
Last write time: Thu Dec 20 17:44:03 2012
Mount count: 13
Maximum mount count: 50
Reserve blocks uid: 0 (user root)
Reserve blocks gid: 0 (group root)
First inode: 11
Inode size: 128
DAT entry size: 32
Checkpoint size: 192
Segment usage size: 16
Number of segments: 1246464
Device size: 10456104173568
First data block: 1
# of blocks per segment: 2048
Reserved segments %: 5
Last checkpoint #: 580526
Last block address: 1040286376
Last sequence #: 1753809
Free blocks count: 973875200
Commit interval: 60
# of blks to create seg: 0
CRC seed: 0x3adfb6c3
CRC check sum: 0x8468fbbf
CRC check data size: 0x00000118
I found this in /var/log/messages, perhaps it is related to the bad bree node:
Dec 18 15:55:02 localhost kernel: rsync: page allocation failure. order:1, mode:0x20
Dec 18 15:55:02 localhost kernel: Pid: 13678, comm: rsync Not tainted 2.6.32-220.13.1.el6.x86_64 #1
Dec 18 15:55:02 localhost kernel: Call Trace:
Dec 18 15:55:02 localhost kernel: <IRQ> [<ffffffff8112405f>] ? __alloc_pages_nodemask+0x77f/0x940
Dec 18 15:55:02 localhost kernel: [<ffffffff8115e002>] ? kmem_getpages+0x62/0x170
Dec 18 15:55:02 localhost kernel: [<ffffffff8115ec1a>] ? fallback_alloc+0x1ba/0x270
Dec 18 15:55:02 localhost kernel: [<ffffffff8115e66f>] ? cache_grow+0x2cf/0x320
Dec 18 15:55:02 localhost kernel: [<ffffffff8115e999>] ? ____cache_alloc_node+0x99/0x160
Dec 18 15:55:02 localhost kernel: [<ffffffff8115f77b>] ? kmem_cache_alloc+0x11b/0x190
Dec 18 15:55:02 localhost kernel: [<ffffffff8141f998>] ? sk_prot_alloc+0x48/0x1c0
Dec 18 15:55:02 localhost kernel: [<ffffffff8141fc22>] ? sk_clone+0x22/0x2e0
Dec 18 15:55:02 localhost kernel: [<ffffffff8146cee6>] ? inet_csk_clone+0x16/0xd0
Dec 18 15:55:02 localhost kernel: [<ffffffff81485dd3>] ? tcp_create_openreq_child+0x23/0x450
Dec 18 15:55:02 localhost kernel: [<ffffffff814837bd>] ? tcp_v4_syn_recv_sock+0x4d/0x2a0
Dec 18 15:55:02 localhost kernel: [<ffffffff81485b91>] ? tcp_check_req+0x201/0x420
Dec 18 15:55:02 localhost kernel: [<ffffffff8147b646>] ? tcp_rcv_state_process+0x116/0xa30
Dec 18 15:55:02 localhost kernel: [<ffffffff8126a859>] ? cpumask_next_and+0x29/0x50
Dec 18 15:55:02 localhost kernel: [<ffffffff814831db>] ? tcp_v4_do_rcv+0x35b/0x430
Dec 18 15:55:02 localhost kernel: [<ffffffffa00dea69>] ? bnx2_start_xmit+0x239/0x7d0 [bnx2]
Dec 18 15:55:02 localhost kernel: [<ffffffff81484951>] ? tcp_v4_rcv+0x4e1/0x860
Dec 18 15:55:02 localhost kernel: [<ffffffff814626bd>] ? ip_local_deliver_finish+0xdd/0x2d0
Dec 18 15:55:02 localhost kernel: [<ffffffff81462948>] ? ip_local_deliver+0x98/0xa0
Dec 18 15:55:02 localhost kernel: [<ffffffff81461e0d>] ? ip_rcv_finish+0x12d/0x440
Dec 18 15:55:02 localhost kernel: [<ffffffff81462395>] ? ip_rcv+0x275/0x350
Dec 18 15:55:02 localhost kernel: [<ffffffff8104d74e>] ? update_group_power+0xae/0x110
Dec 18 15:55:02 localhost kernel: [<ffffffff8142c34b>] ? __netif_receive_skb+0x49b/0x6f0
Dec 18 15:55:02 localhost kernel: [<ffffffff8142e408>] ? netif_receive_skb+0x58/0x60
Dec 18 15:55:02 localhost kernel: [<ffffffff8142e510>] ? napi_skb_finish+0x50/0x70
Dec 18 15:55:02 localhost kernel: [<ffffffff81430b99>] ? napi_gro_receive+0x39/0x50
Dec 18 15:55:02 localhost kernel: [<ffffffffa00dfd4f>] ? bnx2_poll_work+0xd4f/0x1270 [bnx2]
Dec 18 15:55:02 localhost kernel: [<ffffffff8105ea43>] ? rebalance_domains+0xa3/0x5b0
Dec 18 15:55:02 localhost kernel: [<ffffffffa00e02ad>] ? bnx2_poll_msix+0x3d/0xc0 [bnx2]
Dec 18 15:55:02 localhost kernel: [<ffffffff81430cb3>] ? net_rx_action+0x103/0x2f0
Dec 18 15:55:02 localhost kernel: [<ffffffff81072191>] ? __do_softirq+0xc1/0x1d0
Dec 18 15:55:02 localhost kernel: [<ffffffff810d9640>] ? handle_IRQ_event+0x60/0x170
Dec 18 15:55:02 localhost kernel: [<ffffffff810721ea>] ? __do_softirq+0x11a/0x1d0
Dec 18 15:55:02 localhost kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
Dec 18 15:55:02 localhost kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
Dec 18 15:55:02 localhost kernel: [<ffffffff81071f75>] ? irq_exit+0x85/0x90
Dec 18 15:55:02 localhost kernel: [<ffffffff814f5215>] ? do_IRQ+0x75/0xf0
Dec 18 15:55:02 localhost kernel: [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11
Dec 18 15:55:02 localhost kernel: <EOI> [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffffa02106aa>] ? nilfs_dirty_inode+0x6a/0xa0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffff811a00bb>] ? __mark_inode_dirty+0x3b/0x160
Dec 18 15:55:02 localhost kernel: [<ffffffff811ab185>] ? generic_write_end+0x65/0xa0
Dec 18 15:55:02 localhost kernel: [<ffffffffa0210940>] ? nilfs_get_block+0x0/0x1d0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffffa020f860>] ? nilfs_write_end+0x70/0xb0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffffa020f230>] ? nilfs_write_begin+0x80/0xb0 [nilfs2]
Dec 18 15:55:02 localhost kernel: [<ffffffff811115c4>] ? generic_file_buffered_write+0x174/0x2a0
Dec 18 15:55:02 localhost kernel: [<ffffffff810707c7>] ? current_fs_time+0x27/0x30
Dec 18 15:55:02 localhost kernel: [<ffffffff81112eb0>] ? __generic_file_aio_write+0x250/0x480
Dec 18 15:55:02 localhost kernel: [<ffffffff8111314f>] ? generic_file_aio_write+0x6f/0xe0
Dec 18 15:55:02 localhost kernel: [<ffffffff8117651a>] ? do_sync_write+0xfa/0x140
Dec 18 15:55:02 localhost kernel: [<ffffffff81090c30>] ? autoremove_wake_function+0x0/0x40
Dec 18 15:55:02 localhost kernel: [<ffffffff8109b849>] ? ktime_get_ts+0xa9/0xe0
Dec 18 15:55:02 localhost kernel: [<ffffffff8120c546>] ? security_file_permission+0x16/0x20
Dec 18 15:55:02 localhost kernel: [<ffffffff81176818>] ? vfs_write+0xb8/0x1a0
Dec 18 15:55:02 localhost kernel: [<ffffffff81177221>] ? sys_write+0x51/0x90
Dec 18 15:55:02 localhost kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
在 2012-12-20,17:38,Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> 写道:
> On Thu, 2012-12-20 at 17:08 +0800, 张 磊 wrote:
>> Hi,
>>
>> I remounted the filesystem, and started the MySQLs. The filesytstem became readonly again.
>>
>> Dec 20 16:03:31 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
>> Dec 20 16:03:31 localhost kernel: NILFS warning: mounting fs with errors
>> Dec 20 16:03:31 localhost nilfs_cleanerd[29120]: start
>> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>> Dec 20 16:03:55 localhost kernel:
>> Dec 20 16:03:55 localhost kernel: Remounting filesystem read-only
>> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>> Dec 20 16:03:55 localhost kernel:
>> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: cannot clean segments: Read-only file system
>> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: shutdown
>>
>> I remounted the filesystem again, and tried to delete the bad files, but delete failed.
>>
>> Dec 20 16:04:02 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
>> Dec 20 16:04:02 localhost kernel: NILFS warning: mounting fs with errors
>> Dec 20 16:04:02 localhost nilfs_cleanerd[30054]: start
>> Dec 20 16:12:08 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>> Dec 20 16:12:08 localhost kernel: NILFS error (device sdb2): nilfs_bmap_last_key: broken bmap (inode number=321775)
>> Dec 20 16:12:08 localhost kernel:
>> Dec 20 16:12:08 localhost kernel: Remounting filesystem read-only
>> Dec 20 16:12:08 localhost kernel: NILFS warning (device sdb2): nilfs_truncate_bmap: failed to truncate bmap (ino=321775, err=-5)
>> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: cannot clean segments: Read-only file system
>> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: shutdown
>>
>> I tried a third remount, but failed. The server was down, and restarted.
>>
>> Dec 20 16:12:42 localhost kernel: NILFS warning (device sdb2): nilfs_detach_log_writer: Hit dirty file after stopped log writer
>> Dec 20 16:12:42 localhost kernel:
>>
>
> Yes, it is bad. The remount solves the trouble earlier.
>
> As a result, do you have NILFS2 volume mounted as read-only?
>
> Could you share more details about your environment? It needs for
> understanding situation and trying to reproduce. I need to know:
> 1. Linux kernel version.
> 2. nilfs-utils version.
> 3. "mount" output.
> 4. "df -h" output.
> 5. "lscp" output.
> 6. "lssu" output.
> 7. "nilfs-tune -l" output (superblock content)
>
>> I found that fsck.nilfs2 was added into nilfs-utils v4. Can I try it? Where can I download nilfs-utils v4?
>>
>
> Last version of nilfs-utils is 2.1.4. Currently, fsck.nilfs2 is on early
> stage of development. The v4 is a fsck.nilfs2 patchset version. You can
> try fsck.nilfs2 after applying this patchset on source code of
> nilfs-utils of 2.1.4 version. But fsck.nilfs2 can check only superblocks
> and segment summary headers and can't recover completely. So, I think
> that it will be useless for you.
>
> With the best regards,
> Vyacheslav Dubeyko.
>
>> 在 2012-12-20,14:08,Vyacheslav Dubeyko <slava@dubeyko.com> 写道:
>>
>>> Hi,
>>>
>>> On Thu, 2012-12-20 at 10:46 +0800, 张 磊 wrote:
>>>> Hello.
>>>> My nilfs suddenly become read-only. I saw these logs in /var/log/messages:
>>>>
>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>> Dec 19 11:20:05 localhost kernel:
>>>> Dec 19 11:20:05 localhost kernel: Remounting filesystem read-only
>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>> Dec 19 11:20:05 localhost kernel:
>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>> Dec 19 11:20:05 localhost kernel:
>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>> Dec 19 11:20:05 localhost kernel:
>>>> ……………………………………………………
>>>>
>>>> How can I fix this? There is 6TiB data on my disk, I don't want to format the disk.
>>>> I found that a lot of people have encountered the same problem. Is this a bug of nilfs? How can I avoid this problem? When it happens, I was running multiple MySQL and rsync, and nilfs_cleanerd was cleaning segments.
>>>>
>>>
>>> Yes, this issue was reported earlier. As I understand, you can simply
>>> remount your filesystem in read-write mode and to continue using your
>>> NILFS2 filesystem.
>>>
>>> If you will encounter any troubles with remounting, please, report about
>>> it.
>>>
>>> With the best regards,
>>> Vyacheslav Dubeyko.
>>>
>>>
>>>> Elmer Zhang--
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NILFS: bad btree node
[not found] ` <14BA4286-BF21-4BD3-8E41-2F8F9512D801-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2012-12-20 10:41 ` Vyacheslav Dubeyko
2012-12-20 11:02 ` 张 磊
2012-12-22 14:12 ` Seiji Kihara
2012-12-27 10:43 ` Vyacheslav Dubeyko
2 siblings, 1 reply; 17+ messages in thread
From: Vyacheslav Dubeyko @ 2012-12-20 10:41 UTC (permalink / raw)
To: 张 磊; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On Thu, 2012-12-20 at 18:16 +0800, 张 磊 wrote:
Thank you for info.
[snip]
> 3. "mount" output:
> /dev/sdb2 on /data0 type nilfs2 (rw,noatime,gcpid=22909)
>
As I can see, you have NILFS2 volume mounted as read-write. Am I
correct?
[snip]
>
> I found this in /var/log/messages, perhaps it is related to the bad bree node:
>
> Dec 18 15:55:02 localhost kernel: rsync: page allocation failure. order:1, mode:0x20
> Dec 18 15:55:02 localhost kernel: Pid: 13678, comm: rsync Not tainted 2.6.32-220.13.1.el6.x86_64 #1
> Dec 18 15:55:02 localhost kernel: Call Trace:
> Dec 18 15:55:02 localhost kernel: <IRQ> [<ffffffff8112405f>] ? __alloc_pages_nodemask+0x77f/0x940
> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e002>] ? kmem_getpages+0x62/0x170
> Dec 18 15:55:02 localhost kernel: [<ffffffff8115ec1a>] ? fallback_alloc+0x1ba/0x270
> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e66f>] ? cache_grow+0x2cf/0x320
> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e999>] ? ____cache_alloc_node+0x99/0x160
> Dec 18 15:55:02 localhost kernel: [<ffffffff8115f77b>] ? kmem_cache_alloc+0x11b/0x190
> Dec 18 15:55:02 localhost kernel: [<ffffffff8141f998>] ? sk_prot_alloc+0x48/0x1c0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8141fc22>] ? sk_clone+0x22/0x2e0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8146cee6>] ? inet_csk_clone+0x16/0xd0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81485dd3>] ? tcp_create_openreq_child+0x23/0x450
> Dec 18 15:55:02 localhost kernel: [<ffffffff814837bd>] ? tcp_v4_syn_recv_sock+0x4d/0x2a0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81485b91>] ? tcp_check_req+0x201/0x420
> Dec 18 15:55:02 localhost kernel: [<ffffffff8147b646>] ? tcp_rcv_state_process+0x116/0xa30
> Dec 18 15:55:02 localhost kernel: [<ffffffff8126a859>] ? cpumask_next_and+0x29/0x50
> Dec 18 15:55:02 localhost kernel: [<ffffffff814831db>] ? tcp_v4_do_rcv+0x35b/0x430
> Dec 18 15:55:02 localhost kernel: [<ffffffffa00dea69>] ? bnx2_start_xmit+0x239/0x7d0 [bnx2]
> Dec 18 15:55:02 localhost kernel: [<ffffffff81484951>] ? tcp_v4_rcv+0x4e1/0x860
> Dec 18 15:55:02 localhost kernel: [<ffffffff814626bd>] ? ip_local_deliver_finish+0xdd/0x2d0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81462948>] ? ip_local_deliver+0x98/0xa0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81461e0d>] ? ip_rcv_finish+0x12d/0x440
> Dec 18 15:55:02 localhost kernel: [<ffffffff81462395>] ? ip_rcv+0x275/0x350
> Dec 18 15:55:02 localhost kernel: [<ffffffff8104d74e>] ? update_group_power+0xae/0x110
> Dec 18 15:55:02 localhost kernel: [<ffffffff8142c34b>] ? __netif_receive_skb+0x49b/0x6f0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8142e408>] ? netif_receive_skb+0x58/0x60
> Dec 18 15:55:02 localhost kernel: [<ffffffff8142e510>] ? napi_skb_finish+0x50/0x70
> Dec 18 15:55:02 localhost kernel: [<ffffffff81430b99>] ? napi_gro_receive+0x39/0x50
> Dec 18 15:55:02 localhost kernel: [<ffffffffa00dfd4f>] ? bnx2_poll_work+0xd4f/0x1270 [bnx2]
> Dec 18 15:55:02 localhost kernel: [<ffffffff8105ea43>] ? rebalance_domains+0xa3/0x5b0
> Dec 18 15:55:02 localhost kernel: [<ffffffffa00e02ad>] ? bnx2_poll_msix+0x3d/0xc0 [bnx2]
> Dec 18 15:55:02 localhost kernel: [<ffffffff81430cb3>] ? net_rx_action+0x103/0x2f0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81072191>] ? __do_softirq+0xc1/0x1d0
> Dec 18 15:55:02 localhost kernel: [<ffffffff810d9640>] ? handle_IRQ_event+0x60/0x170
> Dec 18 15:55:02 localhost kernel: [<ffffffff810721ea>] ? __do_softirq+0x11a/0x1d0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
> Dec 18 15:55:02 localhost kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81071f75>] ? irq_exit+0x85/0x90
> Dec 18 15:55:02 localhost kernel: [<ffffffff814f5215>] ? do_IRQ+0x75/0xf0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11
> Dec 18 15:55:02 localhost kernel: <EOI> [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffffa02106aa>] ? nilfs_dirty_inode+0x6a/0xa0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffff811a00bb>] ? __mark_inode_dirty+0x3b/0x160
> Dec 18 15:55:02 localhost kernel: [<ffffffff811ab185>] ? generic_write_end+0x65/0xa0
> Dec 18 15:55:02 localhost kernel: [<ffffffffa0210940>] ? nilfs_get_block+0x0/0x1d0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffffa020f860>] ? nilfs_write_end+0x70/0xb0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffffa020f230>] ? nilfs_write_begin+0x80/0xb0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffff811115c4>] ? generic_file_buffered_write+0x174/0x2a0
> Dec 18 15:55:02 localhost kernel: [<ffffffff810707c7>] ? current_fs_time+0x27/0x30
> Dec 18 15:55:02 localhost kernel: [<ffffffff81112eb0>] ? __generic_file_aio_write+0x250/0x480
> Dec 18 15:55:02 localhost kernel: [<ffffffff8111314f>] ? generic_file_aio_write+0x6f/0xe0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8117651a>] ? do_sync_write+0xfa/0x140
> Dec 18 15:55:02 localhost kernel: [<ffffffff81090c30>] ? autoremove_wake_function+0x0/0x40
> Dec 18 15:55:02 localhost kernel: [<ffffffff8109b849>] ? ktime_get_ts+0xa9/0xe0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8120c546>] ? security_file_permission+0x16/0x20
> Dec 18 15:55:02 localhost kernel: [<ffffffff81176818>] ? vfs_write+0xb8/0x1a0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81177221>] ? sys_write+0x51/0x90
> Dec 18 15:55:02 localhost kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
>
Is it full backtrace? Or do you have any additional info in your syslog?
With the best regards,
Vyacheslav Dubeyko.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NILFS: bad btree node
2012-12-20 10:41 ` Vyacheslav Dubeyko
@ 2012-12-20 11:02 ` 张 磊
[not found] ` <44056E9A-3487-4E8A-A56A-5B9228FC7895-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 17+ messages in thread
From: 张 磊 @ 2012-12-20 11:02 UTC (permalink / raw)
To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
Yes, I mounted NILFS2 as read-write. It's remounted as read-only by kernel when filesystem found the bad btree node.
That's the full backtrace. I will keep on testing, and report more infomation once I found.
ÔÚ 2012-12-20£¬18:41£¬Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> дµÀ£º
> On Thu, 2012-12-20 at 18:16 +0800, ÕÅ ÀÚ wrote:
>
> Thank you for info.
>
> [snip]
>> 3. "mount" output:
>> /dev/sdb2 on /data0 type nilfs2 (rw,noatime,gcpid=22909)
>>
>
> As I can see, you have NILFS2 volume mounted as read-write. Am I
> correct?
>
> [snip]
>
>>
>> I found this in /var/log/messages, perhaps it is related to the bad bree node:
>>
>> Dec 18 15:55:02 localhost kernel: rsync: page allocation failure. order:1, mode:0x20
>> Dec 18 15:55:02 localhost kernel: Pid: 13678, comm: rsync Not tainted 2.6.32-220.13.1.el6.x86_64 #1
>> Dec 18 15:55:02 localhost kernel: Call Trace:
>> Dec 18 15:55:02 localhost kernel: <IRQ> [<ffffffff8112405f>] ? __alloc_pages_nodemask+0x77f/0x940
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e002>] ? kmem_getpages+0x62/0x170
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115ec1a>] ? fallback_alloc+0x1ba/0x270
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e66f>] ? cache_grow+0x2cf/0x320
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e999>] ? ____cache_alloc_node+0x99/0x160
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115f77b>] ? kmem_cache_alloc+0x11b/0x190
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8141f998>] ? sk_prot_alloc+0x48/0x1c0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8141fc22>] ? sk_clone+0x22/0x2e0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8146cee6>] ? inet_csk_clone+0x16/0xd0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81485dd3>] ? tcp_create_openreq_child+0x23/0x450
>> Dec 18 15:55:02 localhost kernel: [<ffffffff814837bd>] ? tcp_v4_syn_recv_sock+0x4d/0x2a0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81485b91>] ? tcp_check_req+0x201/0x420
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8147b646>] ? tcp_rcv_state_process+0x116/0xa30
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8126a859>] ? cpumask_next_and+0x29/0x50
>> Dec 18 15:55:02 localhost kernel: [<ffffffff814831db>] ? tcp_v4_do_rcv+0x35b/0x430
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa00dea69>] ? bnx2_start_xmit+0x239/0x7d0 [bnx2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81484951>] ? tcp_v4_rcv+0x4e1/0x860
>> Dec 18 15:55:02 localhost kernel: [<ffffffff814626bd>] ? ip_local_deliver_finish+0xdd/0x2d0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81462948>] ? ip_local_deliver+0x98/0xa0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81461e0d>] ? ip_rcv_finish+0x12d/0x440
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81462395>] ? ip_rcv+0x275/0x350
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8104d74e>] ? update_group_power+0xae/0x110
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8142c34b>] ? __netif_receive_skb+0x49b/0x6f0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8142e408>] ? netif_receive_skb+0x58/0x60
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8142e510>] ? napi_skb_finish+0x50/0x70
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81430b99>] ? napi_gro_receive+0x39/0x50
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa00dfd4f>] ? bnx2_poll_work+0xd4f/0x1270 [bnx2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8105ea43>] ? rebalance_domains+0xa3/0x5b0
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa00e02ad>] ? bnx2_poll_msix+0x3d/0xc0 [bnx2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81430cb3>] ? net_rx_action+0x103/0x2f0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81072191>] ? __do_softirq+0xc1/0x1d0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff810d9640>] ? handle_IRQ_event+0x60/0x170
>> Dec 18 15:55:02 localhost kernel: [<ffffffff810721ea>] ? __do_softirq+0x11a/0x1d0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81071f75>] ? irq_exit+0x85/0x90
>> Dec 18 15:55:02 localhost kernel: [<ffffffff814f5215>] ? do_IRQ+0x75/0xf0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11
>> Dec 18 15:55:02 localhost kernel: <EOI> [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa02106aa>] ? nilfs_dirty_inode+0x6a/0xa0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff811a00bb>] ? __mark_inode_dirty+0x3b/0x160
>> Dec 18 15:55:02 localhost kernel: [<ffffffff811ab185>] ? generic_write_end+0x65/0xa0
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa0210940>] ? nilfs_get_block+0x0/0x1d0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa020f860>] ? nilfs_write_end+0x70/0xb0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa020f230>] ? nilfs_write_begin+0x80/0xb0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff811115c4>] ? generic_file_buffered_write+0x174/0x2a0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff810707c7>] ? current_fs_time+0x27/0x30
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81112eb0>] ? __generic_file_aio_write+0x250/0x480
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8111314f>] ? generic_file_aio_write+0x6f/0xe0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8117651a>] ? do_sync_write+0xfa/0x140
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81090c30>] ? autoremove_wake_function+0x0/0x40
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8109b849>] ? ktime_get_ts+0xa9/0xe0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8120c546>] ? security_file_permission+0x16/0x20
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81176818>] ? vfs_write+0xb8/0x1a0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81177221>] ? sys_write+0x51/0x90
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
>>
>
> Is it full backtrace? Or do you have any additional info in your syslog?
>
> With the best regards,
> Vyacheslav Dubeyko.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NILFS: bad btree node
[not found] ` <14BA4286-BF21-4BD3-8E41-2F8F9512D801-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2012-12-20 10:41 ` Vyacheslav Dubeyko
@ 2012-12-22 14:12 ` Seiji Kihara
[not found] ` <50D5BFD6.1080502-sG5X7nlA6pw@public.gmane.org>
2012-12-27 10:43 ` Vyacheslav Dubeyko
2 siblings, 1 reply; 17+ messages in thread
From: Seiji Kihara @ 2012-12-22 14:12 UTC (permalink / raw)
To: 张 磊; +Cc: Vyacheslav Dubeyko, linux-nilfs-u79uwXL29TY76Z2rM5mHXA
Hello,
(2012/12/20 19:16), 张 磊 wrote:
> 1. Linux kernel version: 2.6.32-220.13.1.el6.x86_64
If you use nilfs2 kernel module for RHEL 6 clones,
'rpm -q kmod-nilfs2' will help.
http://www.nilfs.org/en/pkg_centos.html
https://github.com/nilfs-dev/nilfs2-kmod-centos6
Regards,
Seiji
> 2. nilfs-utils version: nilfs-utils-2.1.4
> 3. "mount" output:
> /dev/sdb2 on /data0 type nilfs2 (rw,noatime,gcpid=22909)
>
> 4. "df -h" output:
> /dev/sdb2 9.6T 5.9T 3.2T 66% /data0
>
> 5. "lscp" output:
> CNO DATE TIME MODE FLG NBLKINC ICNT
> 2 2012-12-03 14:03:01 ss - 14 3
> 580481 2012-12-20 16:11:25 cp - 293 697667
> 580482 2012-12-20 16:11:25 cp - 130 697666
> 580483 2012-12-20 16:11:25 cp - 225 697664
> 580484 2012-12-20 16:11:25 cp - 143 697663
> 580485 2012-12-20 16:11:26 cp - 311 697659
> 580486 2012-12-20 16:11:27 cp - 328 697657
> 580487 2012-12-20 16:11:27 cp - 263 697655
> 580488 2012-12-20 16:11:27 cp - 118 697653
> 580489 2012-12-20 16:11:28 cp - 230 697651
> 580490 2012-12-20 16:11:28 cp - 272 697649
> 580491 2012-12-20 16:11:28 cp - 148 697648
> 580492 2012-12-20 16:11:29 cp - 139 697647
> 580493 2012-12-20 16:11:29 cp - 273 697645
> 580494 2012-12-20 16:11:29 cp - 147 697644
> 580495 2012-12-20 16:11:30 cp - 271 697641
> 580496 2012-12-20 16:11:31 cp - 526 697636
> 580497 2012-12-20 16:11:34 cp - 1684 697625
> 580498 2012-12-20 16:11:37 cp - 983 697609
> 580499 2012-12-20 16:11:38 cp - 421 697605
> 580500 2012-12-20 16:11:40 cp - 1019 697594
> 580501 2012-12-20 16:11:40 cp - 143 697593
> 580502 2012-12-20 16:11:41 cp - 1536 697592
> 580503 2012-12-20 16:11:41 cp - 373 697590
> 580504 2012-12-20 16:11:42 cp - 312 697587
> 580505 2012-12-20 16:11:42 cp - 102 697586
> 580506 2012-12-20 16:11:43 cp - 274 697584
> 580507 2012-12-20 16:11:43 cp - 270 697582
> 580508 2012-12-20 16:11:43 cp - 118 697581
> 580509 2012-12-20 16:11:43 cp - 133 697580
> 580510 2012-12-20 16:11:44 cp - 321 697578
> 580511 2012-12-20 16:11:44 cp - 245 697576
> 580512 2012-12-20 16:11:45 cp - 394 697573
> 580513 2012-12-20 16:11:45 cp - 121 697572
> 580514 2012-12-20 16:11:45 cp - 245 697569
> 580515 2012-12-20 16:11:52 cp - 2705 697543
> 580516 2012-12-20 16:11:55 cp - 2590 697504
> 580517 2012-12-20 16:11:59 cp - 2418 697453
> 580518 2012-12-20 16:12:00 cp - 866 697436
> 580519 2012-12-20 16:12:01 cp - 864 697420
> 580520 2012-12-20 16:12:05 cp - 1765 697357
> 580521 2012-12-20 16:12:05 cp - 120 697356
> 580522 2012-12-20 16:12:06 cp - 820 697332
> 580523 2012-12-20 16:12:09 cp - 1642 697174
> 580524 2012-12-20 16:12:09 cp - 89 697173
> 580525 2012-12-20 16:12:10 cp - 56 697173
> 580526 2012-12-20 16:12:42 cp - 763 697173
>
> 6. "lssu" output:
> it's too large, please download it: http://d.pr/f/vnoR
>
> 7. "nilfs-tune -l" output (superblock content):
>
> nilfs-tune 2.1.4
> Filesystem volume name: (none)
> Filesystem UUID: dcfb7152-a342-48d0-a712-212a3062395e
> Filesystem magic number: 0x3434
> Filesystem revision #: 2.0
> Filesystem features: (none)
> Filesystem state: invalid or mounted,error
> Filesystem OS type: Linux
> Block size: 4096
> Filesystem created: Mon Dec 3 13:56:51 2012
> Last mount time: Thu Dec 20 17:44:03 2012
> Last write time: Thu Dec 20 17:44:03 2012
> Mount count: 13
> Maximum mount count: 50
> Reserve blocks uid: 0 (user root)
> Reserve blocks gid: 0 (group root)
> First inode: 11
> Inode size: 128
> DAT entry size: 32
> Checkpoint size: 192
> Segment usage size: 16
> Number of segments: 1246464
> Device size: 10456104173568
> First data block: 1
> # of blocks per segment: 2048
> Reserved segments %: 5
> Last checkpoint #: 580526
> Last block address: 1040286376
> Last sequence #: 1753809
> Free blocks count: 973875200
> Commit interval: 60
> # of blks to create seg: 0
> CRC seed: 0x3adfb6c3
> CRC check sum: 0x8468fbbf
> CRC check data size: 0x00000118
>
>
> I found this in /var/log/messages, perhaps it is related to the bad bree node:
>
> Dec 18 15:55:02 localhost kernel: rsync: page allocation failure. order:1, mode:0x20
> Dec 18 15:55:02 localhost kernel: Pid: 13678, comm: rsync Not tainted 2.6.32-220.13.1.el6.x86_64 #1
> Dec 18 15:55:02 localhost kernel: Call Trace:
> Dec 18 15:55:02 localhost kernel: <IRQ> [<ffffffff8112405f>] ? __alloc_pages_nodemask+0x77f/0x940
> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e002>] ? kmem_getpages+0x62/0x170
> Dec 18 15:55:02 localhost kernel: [<ffffffff8115ec1a>] ? fallback_alloc+0x1ba/0x270
> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e66f>] ? cache_grow+0x2cf/0x320
> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e999>] ? ____cache_alloc_node+0x99/0x160
> Dec 18 15:55:02 localhost kernel: [<ffffffff8115f77b>] ? kmem_cache_alloc+0x11b/0x190
> Dec 18 15:55:02 localhost kernel: [<ffffffff8141f998>] ? sk_prot_alloc+0x48/0x1c0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8141fc22>] ? sk_clone+0x22/0x2e0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8146cee6>] ? inet_csk_clone+0x16/0xd0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81485dd3>] ? tcp_create_openreq_child+0x23/0x450
> Dec 18 15:55:02 localhost kernel: [<ffffffff814837bd>] ? tcp_v4_syn_recv_sock+0x4d/0x2a0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81485b91>] ? tcp_check_req+0x201/0x420
> Dec 18 15:55:02 localhost kernel: [<ffffffff8147b646>] ? tcp_rcv_state_process+0x116/0xa30
> Dec 18 15:55:02 localhost kernel: [<ffffffff8126a859>] ? cpumask_next_and+0x29/0x50
> Dec 18 15:55:02 localhost kernel: [<ffffffff814831db>] ? tcp_v4_do_rcv+0x35b/0x430
> Dec 18 15:55:02 localhost kernel: [<ffffffffa00dea69>] ? bnx2_start_xmit+0x239/0x7d0 [bnx2]
> Dec 18 15:55:02 localhost kernel: [<ffffffff81484951>] ? tcp_v4_rcv+0x4e1/0x860
> Dec 18 15:55:02 localhost kernel: [<ffffffff814626bd>] ? ip_local_deliver_finish+0xdd/0x2d0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81462948>] ? ip_local_deliver+0x98/0xa0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81461e0d>] ? ip_rcv_finish+0x12d/0x440
> Dec 18 15:55:02 localhost kernel: [<ffffffff81462395>] ? ip_rcv+0x275/0x350
> Dec 18 15:55:02 localhost kernel: [<ffffffff8104d74e>] ? update_group_power+0xae/0x110
> Dec 18 15:55:02 localhost kernel: [<ffffffff8142c34b>] ? __netif_receive_skb+0x49b/0x6f0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8142e408>] ? netif_receive_skb+0x58/0x60
> Dec 18 15:55:02 localhost kernel: [<ffffffff8142e510>] ? napi_skb_finish+0x50/0x70
> Dec 18 15:55:02 localhost kernel: [<ffffffff81430b99>] ? napi_gro_receive+0x39/0x50
> Dec 18 15:55:02 localhost kernel: [<ffffffffa00dfd4f>] ? bnx2_poll_work+0xd4f/0x1270 [bnx2]
> Dec 18 15:55:02 localhost kernel: [<ffffffff8105ea43>] ? rebalance_domains+0xa3/0x5b0
> Dec 18 15:55:02 localhost kernel: [<ffffffffa00e02ad>] ? bnx2_poll_msix+0x3d/0xc0 [bnx2]
> Dec 18 15:55:02 localhost kernel: [<ffffffff81430cb3>] ? net_rx_action+0x103/0x2f0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81072191>] ? __do_softirq+0xc1/0x1d0
> Dec 18 15:55:02 localhost kernel: [<ffffffff810d9640>] ? handle_IRQ_event+0x60/0x170
> Dec 18 15:55:02 localhost kernel: [<ffffffff810721ea>] ? __do_softirq+0x11a/0x1d0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
> Dec 18 15:55:02 localhost kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81071f75>] ? irq_exit+0x85/0x90
> Dec 18 15:55:02 localhost kernel: [<ffffffff814f5215>] ? do_IRQ+0x75/0xf0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11
> Dec 18 15:55:02 localhost kernel: <EOI> [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffffa02106aa>] ? nilfs_dirty_inode+0x6a/0xa0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffff811a00bb>] ? __mark_inode_dirty+0x3b/0x160
> Dec 18 15:55:02 localhost kernel: [<ffffffff811ab185>] ? generic_write_end+0x65/0xa0
> Dec 18 15:55:02 localhost kernel: [<ffffffffa0210940>] ? nilfs_get_block+0x0/0x1d0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffffa020f860>] ? nilfs_write_end+0x70/0xb0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffffa020f230>] ? nilfs_write_begin+0x80/0xb0 [nilfs2]
> Dec 18 15:55:02 localhost kernel: [<ffffffff811115c4>] ? generic_file_buffered_write+0x174/0x2a0
> Dec 18 15:55:02 localhost kernel: [<ffffffff810707c7>] ? current_fs_time+0x27/0x30
> Dec 18 15:55:02 localhost kernel: [<ffffffff81112eb0>] ? __generic_file_aio_write+0x250/0x480
> Dec 18 15:55:02 localhost kernel: [<ffffffff8111314f>] ? generic_file_aio_write+0x6f/0xe0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8117651a>] ? do_sync_write+0xfa/0x140
> Dec 18 15:55:02 localhost kernel: [<ffffffff81090c30>] ? autoremove_wake_function+0x0/0x40
> Dec 18 15:55:02 localhost kernel: [<ffffffff8109b849>] ? ktime_get_ts+0xa9/0xe0
> Dec 18 15:55:02 localhost kernel: [<ffffffff8120c546>] ? security_file_permission+0x16/0x20
> Dec 18 15:55:02 localhost kernel: [<ffffffff81176818>] ? vfs_write+0xb8/0x1a0
> Dec 18 15:55:02 localhost kernel: [<ffffffff81177221>] ? sys_write+0x51/0x90
> Dec 18 15:55:02 localhost kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
>
>
>
> 在 2012-12-20,17:38,Vyacheslav Dubeyko <slava@dubeyko.com> 写道:
>
>> On Thu, 2012-12-20 at 17:08 +0800, 张 磊 wrote:
>>> Hi,
>>>
>>> I remounted the filesystem, and started the MySQLs. The filesytstem became readonly again.
>>>
>>> Dec 20 16:03:31 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
>>> Dec 20 16:03:31 localhost kernel: NILFS warning: mounting fs with errors
>>> Dec 20 16:03:31 localhost nilfs_cleanerd[29120]: start
>>> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>> Dec 20 16:03:55 localhost kernel:
>>> Dec 20 16:03:55 localhost kernel: Remounting filesystem read-only
>>> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>> Dec 20 16:03:55 localhost kernel:
>>> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: cannot clean segments: Read-only file system
>>> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: shutdown
>>>
>>> I remounted the filesystem again, and tried to delete the bad files, but delete failed.
>>>
>>> Dec 20 16:04:02 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
>>> Dec 20 16:04:02 localhost kernel: NILFS warning: mounting fs with errors
>>> Dec 20 16:04:02 localhost nilfs_cleanerd[30054]: start
>>> Dec 20 16:12:08 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>> Dec 20 16:12:08 localhost kernel: NILFS error (device sdb2): nilfs_bmap_last_key: broken bmap (inode number=321775)
>>> Dec 20 16:12:08 localhost kernel:
>>> Dec 20 16:12:08 localhost kernel: Remounting filesystem read-only
>>> Dec 20 16:12:08 localhost kernel: NILFS warning (device sdb2): nilfs_truncate_bmap: failed to truncate bmap (ino=321775, err=-5)
>>> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: cannot clean segments: Read-only file system
>>> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: shutdown
>>>
>>> I tried a third remount, but failed. The server was down, and restarted.
>>>
>>> Dec 20 16:12:42 localhost kernel: NILFS warning (device sdb2): nilfs_detach_log_writer: Hit dirty file after stopped log writer
>>> Dec 20 16:12:42 localhost kernel:
>>>
>> Yes, it is bad. The remount solves the trouble earlier.
>>
>> As a result, do you have NILFS2 volume mounted as read-only?
>>
>> Could you share more details about your environment? It needs for
>> understanding situation and trying to reproduce. I need to know:
>> 1. Linux kernel version.
>> 2. nilfs-utils version.
>> 3. "mount" output.
>> 4. "df -h" output.
>> 5. "lscp" output.
>> 6. "lssu" output.
>> 7. "nilfs-tune -l" output (superblock content)
>>
>>> I found that fsck.nilfs2 was added into nilfs-utils v4. Can I try it? Where can I download nilfs-utils v4?
>>>
>> Last version of nilfs-utils is 2.1.4. Currently, fsck.nilfs2 is on early
>> stage of development. The v4 is a fsck.nilfs2 patchset version. You can
>> try fsck.nilfs2 after applying this patchset on source code of
>> nilfs-utils of 2.1.4 version. But fsck.nilfs2 can check only superblocks
>> and segment summary headers and can't recover completely. So, I think
>> that it will be useless for you.
>>
>> With the best regards,
>> Vyacheslav Dubeyko.
>>
>>> 在 2012-12-20,14:08,Vyacheslav Dubeyko <slava@dubeyko.com> 写道:
>>>
>>>> Hi,
>>>>
>>>> On Thu, 2012-12-20 at 10:46 +0800, 张 磊 wrote:
>>>>> Hello.
>>>>> My nilfs suddenly become read-only. I saw these logs in /var/log/messages:
>>>>>
>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>>> Dec 19 11:20:05 localhost kernel:
>>>>> Dec 19 11:20:05 localhost kernel: Remounting filesystem read-only
>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>>> Dec 19 11:20:05 localhost kernel:
>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>>> Dec 19 11:20:05 localhost kernel:
>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>>> Dec 19 11:20:05 localhost kernel:
>>>>> ……………………………………………………
>>>>>
>>>>> How can I fix this? There is 6TiB data on my disk, I don't want to format the disk.
>>>>> I found that a lot of people have encountered the same problem. Is this a bug of nilfs? How can I avoid this problem? When it happens, I was running multiple MySQL and rsync, and nilfs_cleanerd was cleaning segments.
>>>>>
>>>> Yes, this issue was reported earlier. As I understand, you can simply
>>>> remount your filesystem in read-write mode and to continue using your
>>>> NILFS2 filesystem.
>>>>
>>>> If you will encounter any troubles with remounting, please, report about
>>>> it.
>>>>
>>>> With the best regards,
>>>> Vyacheslav Dubeyko.
>>>>
>>>>
>>>>> Elmer Zhang
--
Seiji Kihara
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NILFS: bad btree node
[not found] ` <50D5BFD6.1080502-sG5X7nlA6pw@public.gmane.org>
@ 2012-12-24 3:04 ` 张 磊
0 siblings, 0 replies; 17+ messages in thread
From: 张 磊 @ 2012-12-24 3:04 UTC (permalink / raw)
To: Seiji Kihara; +Cc: Vyacheslav Dubeyko, linux-nilfs-u79uwXL29TY76Z2rM5mHXA
Hi, I use kmod-nilfs2-0.4.3-1.el6.x86_64
在 2012-12-22,22:12,Seiji Kihara <kihara-sG5X7nlA6pw@public.gmane.org> 写道:
> Hello,
>
> (2012/12/20 19:16), 张 磊 wrote:
>> 1. Linux kernel version: 2.6.32-220.13.1.el6.x86_64
>
> If you use nilfs2 kernel module for RHEL 6 clones,
> 'rpm -q kmod-nilfs2' will help.
>
> http://www.nilfs.org/en/pkg_centos.html
> https://github.com/nilfs-dev/nilfs2-kmod-centos6
>
> Regards,
>
> Seiji
>
>> 2. nilfs-utils version: nilfs-utils-2.1.4
>> 3. "mount" output:
>> /dev/sdb2 on /data0 type nilfs2 (rw,noatime,gcpid=22909)
>>
>> 4. "df -h" output:
>> /dev/sdb2 9.6T 5.9T 3.2T 66% /data0
>>
>> 5. "lscp" output:
>> CNO DATE TIME MODE FLG NBLKINC ICNT
>> 2 2012-12-03 14:03:01 ss - 14 3
>> 580481 2012-12-20 16:11:25 cp - 293 697667
>> 580482 2012-12-20 16:11:25 cp - 130 697666
>> 580483 2012-12-20 16:11:25 cp - 225 697664
>> 580484 2012-12-20 16:11:25 cp - 143 697663
>> 580485 2012-12-20 16:11:26 cp - 311 697659
>> 580486 2012-12-20 16:11:27 cp - 328 697657
>> 580487 2012-12-20 16:11:27 cp - 263 697655
>> 580488 2012-12-20 16:11:27 cp - 118 697653
>> 580489 2012-12-20 16:11:28 cp - 230 697651
>> 580490 2012-12-20 16:11:28 cp - 272 697649
>> 580491 2012-12-20 16:11:28 cp - 148 697648
>> 580492 2012-12-20 16:11:29 cp - 139 697647
>> 580493 2012-12-20 16:11:29 cp - 273 697645
>> 580494 2012-12-20 16:11:29 cp - 147 697644
>> 580495 2012-12-20 16:11:30 cp - 271 697641
>> 580496 2012-12-20 16:11:31 cp - 526 697636
>> 580497 2012-12-20 16:11:34 cp - 1684 697625
>> 580498 2012-12-20 16:11:37 cp - 983 697609
>> 580499 2012-12-20 16:11:38 cp - 421 697605
>> 580500 2012-12-20 16:11:40 cp - 1019 697594
>> 580501 2012-12-20 16:11:40 cp - 143 697593
>> 580502 2012-12-20 16:11:41 cp - 1536 697592
>> 580503 2012-12-20 16:11:41 cp - 373 697590
>> 580504 2012-12-20 16:11:42 cp - 312 697587
>> 580505 2012-12-20 16:11:42 cp - 102 697586
>> 580506 2012-12-20 16:11:43 cp - 274 697584
>> 580507 2012-12-20 16:11:43 cp - 270 697582
>> 580508 2012-12-20 16:11:43 cp - 118 697581
>> 580509 2012-12-20 16:11:43 cp - 133 697580
>> 580510 2012-12-20 16:11:44 cp - 321 697578
>> 580511 2012-12-20 16:11:44 cp - 245 697576
>> 580512 2012-12-20 16:11:45 cp - 394 697573
>> 580513 2012-12-20 16:11:45 cp - 121 697572
>> 580514 2012-12-20 16:11:45 cp - 245 697569
>> 580515 2012-12-20 16:11:52 cp - 2705 697543
>> 580516 2012-12-20 16:11:55 cp - 2590 697504
>> 580517 2012-12-20 16:11:59 cp - 2418 697453
>> 580518 2012-12-20 16:12:00 cp - 866 697436
>> 580519 2012-12-20 16:12:01 cp - 864 697420
>> 580520 2012-12-20 16:12:05 cp - 1765 697357
>> 580521 2012-12-20 16:12:05 cp - 120 697356
>> 580522 2012-12-20 16:12:06 cp - 820 697332
>> 580523 2012-12-20 16:12:09 cp - 1642 697174
>> 580524 2012-12-20 16:12:09 cp - 89 697173
>> 580525 2012-12-20 16:12:10 cp - 56 697173
>> 580526 2012-12-20 16:12:42 cp - 763 697173
>>
>> 6. "lssu" output:
>> it's too large, please download it: http://d.pr/f/vnoR
>>
>> 7. "nilfs-tune -l" output (superblock content):
>>
>> nilfs-tune 2.1.4
>> Filesystem volume name: (none)
>> Filesystem UUID: dcfb7152-a342-48d0-a712-212a3062395e
>> Filesystem magic number: 0x3434
>> Filesystem revision #: 2.0
>> Filesystem features: (none)
>> Filesystem state: invalid or mounted,error
>> Filesystem OS type: Linux
>> Block size: 4096
>> Filesystem created: Mon Dec 3 13:56:51 2012
>> Last mount time: Thu Dec 20 17:44:03 2012
>> Last write time: Thu Dec 20 17:44:03 2012
>> Mount count: 13
>> Maximum mount count: 50
>> Reserve blocks uid: 0 (user root)
>> Reserve blocks gid: 0 (group root)
>> First inode: 11
>> Inode size: 128
>> DAT entry size: 32
>> Checkpoint size: 192
>> Segment usage size: 16
>> Number of segments: 1246464
>> Device size: 10456104173568
>> First data block: 1
>> # of blocks per segment: 2048
>> Reserved segments %: 5
>> Last checkpoint #: 580526
>> Last block address: 1040286376
>> Last sequence #: 1753809
>> Free blocks count: 973875200
>> Commit interval: 60
>> # of blks to create seg: 0
>> CRC seed: 0x3adfb6c3
>> CRC check sum: 0x8468fbbf
>> CRC check data size: 0x00000118
>>
>>
>> I found this in /var/log/messages, perhaps it is related to the bad bree node:
>>
>> Dec 18 15:55:02 localhost kernel: rsync: page allocation failure. order:1, mode:0x20
>> Dec 18 15:55:02 localhost kernel: Pid: 13678, comm: rsync Not tainted 2.6.32-220.13.1.el6.x86_64 #1
>> Dec 18 15:55:02 localhost kernel: Call Trace:
>> Dec 18 15:55:02 localhost kernel: <IRQ> [<ffffffff8112405f>] ? __alloc_pages_nodemask+0x77f/0x940
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e002>] ? kmem_getpages+0x62/0x170
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115ec1a>] ? fallback_alloc+0x1ba/0x270
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e66f>] ? cache_grow+0x2cf/0x320
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115e999>] ? ____cache_alloc_node+0x99/0x160
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8115f77b>] ? kmem_cache_alloc+0x11b/0x190
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8141f998>] ? sk_prot_alloc+0x48/0x1c0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8141fc22>] ? sk_clone+0x22/0x2e0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8146cee6>] ? inet_csk_clone+0x16/0xd0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81485dd3>] ? tcp_create_openreq_child+0x23/0x450
>> Dec 18 15:55:02 localhost kernel: [<ffffffff814837bd>] ? tcp_v4_syn_recv_sock+0x4d/0x2a0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81485b91>] ? tcp_check_req+0x201/0x420
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8147b646>] ? tcp_rcv_state_process+0x116/0xa30
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8126a859>] ? cpumask_next_and+0x29/0x50
>> Dec 18 15:55:02 localhost kernel: [<ffffffff814831db>] ? tcp_v4_do_rcv+0x35b/0x430
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa00dea69>] ? bnx2_start_xmit+0x239/0x7d0 [bnx2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81484951>] ? tcp_v4_rcv+0x4e1/0x860
>> Dec 18 15:55:02 localhost kernel: [<ffffffff814626bd>] ? ip_local_deliver_finish+0xdd/0x2d0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81462948>] ? ip_local_deliver+0x98/0xa0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81461e0d>] ? ip_rcv_finish+0x12d/0x440
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81462395>] ? ip_rcv+0x275/0x350
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8104d74e>] ? update_group_power+0xae/0x110
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8142c34b>] ? __netif_receive_skb+0x49b/0x6f0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8142e408>] ? netif_receive_skb+0x58/0x60
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8142e510>] ? napi_skb_finish+0x50/0x70
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81430b99>] ? napi_gro_receive+0x39/0x50
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa00dfd4f>] ? bnx2_poll_work+0xd4f/0x1270 [bnx2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8105ea43>] ? rebalance_domains+0xa3/0x5b0
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa00e02ad>] ? bnx2_poll_msix+0x3d/0xc0 [bnx2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81430cb3>] ? net_rx_action+0x103/0x2f0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81072191>] ? __do_softirq+0xc1/0x1d0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff810d9640>] ? handle_IRQ_event+0x60/0x170
>> Dec 18 15:55:02 localhost kernel: [<ffffffff810721ea>] ? __do_softirq+0x11a/0x1d0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81071f75>] ? irq_exit+0x85/0x90
>> Dec 18 15:55:02 localhost kernel: [<ffffffff814f5215>] ? do_IRQ+0x75/0xf0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11
>> Dec 18 15:55:02 localhost kernel: <EOI> [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa02105d7>] ? nilfs_mark_inode_dirty+0x37/0xa0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa02106aa>] ? nilfs_dirty_inode+0x6a/0xa0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff811a00bb>] ? __mark_inode_dirty+0x3b/0x160
>> Dec 18 15:55:02 localhost kernel: [<ffffffff811ab185>] ? generic_write_end+0x65/0xa0
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa0210940>] ? nilfs_get_block+0x0/0x1d0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa020f860>] ? nilfs_write_end+0x70/0xb0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffffa020f230>] ? nilfs_write_begin+0x80/0xb0 [nilfs2]
>> Dec 18 15:55:02 localhost kernel: [<ffffffff811115c4>] ? generic_file_buffered_write+0x174/0x2a0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff810707c7>] ? current_fs_time+0x27/0x30
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81112eb0>] ? __generic_file_aio_write+0x250/0x480
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8111314f>] ? generic_file_aio_write+0x6f/0xe0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8117651a>] ? do_sync_write+0xfa/0x140
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81090c30>] ? autoremove_wake_function+0x0/0x40
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8109b849>] ? ktime_get_ts+0xa9/0xe0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8120c546>] ? security_file_permission+0x16/0x20
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81176818>] ? vfs_write+0xb8/0x1a0
>> Dec 18 15:55:02 localhost kernel: [<ffffffff81177221>] ? sys_write+0x51/0x90
>> Dec 18 15:55:02 localhost kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
>>
>>
>>
>> 在 2012-12-20,17:38,Vyacheslav Dubeyko <slava@dubeyko.com> 写道:
>>
>>> On Thu, 2012-12-20 at 17:08 +0800, 张 磊 wrote:
>>>> Hi,
>>>>
>>>> I remounted the filesystem, and started the MySQLs. The filesytstem became readonly again.
>>>>
>>>> Dec 20 16:03:31 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
>>>> Dec 20 16:03:31 localhost kernel: NILFS warning: mounting fs with errors
>>>> Dec 20 16:03:31 localhost nilfs_cleanerd[29120]: start
>>>> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>> Dec 20 16:03:55 localhost kernel:
>>>> Dec 20 16:03:55 localhost kernel: Remounting filesystem read-only
>>>> Dec 20 16:03:55 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 20 16:03:55 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>> Dec 20 16:03:55 localhost kernel:
>>>> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: cannot clean segments: Read-only file system
>>>> Dec 20 16:03:57 localhost nilfs_cleanerd[29120]: shutdown
>>>>
>>>> I remounted the filesystem again, and tried to delete the bad files, but delete failed.
>>>>
>>>> Dec 20 16:04:02 localhost kernel: segctord starting. Construction interval = 60 seconds, CP frequency < 30 seconds
>>>> Dec 20 16:04:02 localhost kernel: NILFS warning: mounting fs with errors
>>>> Dec 20 16:04:02 localhost nilfs_cleanerd[30054]: start
>>>> Dec 20 16:12:08 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>> Dec 20 16:12:08 localhost kernel: NILFS error (device sdb2): nilfs_bmap_last_key: broken bmap (inode number=321775)
>>>> Dec 20 16:12:08 localhost kernel:
>>>> Dec 20 16:12:08 localhost kernel: Remounting filesystem read-only
>>>> Dec 20 16:12:08 localhost kernel: NILFS warning (device sdb2): nilfs_truncate_bmap: failed to truncate bmap (ino=321775, err=-5)
>>>> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: cannot clean segments: Read-only file system
>>>> Dec 20 16:12:08 localhost nilfs_cleanerd[30054]: shutdown
>>>>
>>>> I tried a third remount, but failed. The server was down, and restarted.
>>>>
>>>> Dec 20 16:12:42 localhost kernel: NILFS warning (device sdb2): nilfs_detach_log_writer: Hit dirty file after stopped log writer
>>>> Dec 20 16:12:42 localhost kernel:
>>>>
>>> Yes, it is bad. The remount solves the trouble earlier.
>>>
>>> As a result, do you have NILFS2 volume mounted as read-only?
>>>
>>> Could you share more details about your environment? It needs for
>>> understanding situation and trying to reproduce. I need to know:
>>> 1. Linux kernel version.
>>> 2. nilfs-utils version.
>>> 3. "mount" output.
>>> 4. "df -h" output.
>>> 5. "lscp" output.
>>> 6. "lssu" output.
>>> 7. "nilfs-tune -l" output (superblock content)
>>>
>>>> I found that fsck.nilfs2 was added into nilfs-utils v4. Can I try it? Where can I download nilfs-utils v4?
>>>>
>>> Last version of nilfs-utils is 2.1.4. Currently, fsck.nilfs2 is on early
>>> stage of development. The v4 is a fsck.nilfs2 patchset version. You can
>>> try fsck.nilfs2 after applying this patchset on source code of
>>> nilfs-utils of 2.1.4 version. But fsck.nilfs2 can check only superblocks
>>> and segment summary headers and can't recover completely. So, I think
>>> that it will be useless for you.
>>>
>>> With the best regards,
>>> Vyacheslav Dubeyko.
>>>
>>>> 在 2012-12-20,14:08,Vyacheslav Dubeyko <slava@dubeyko.com> 写道:
>>>>
>>>>> Hi,
>>>>>
>>>>> On Thu, 2012-12-20 at 10:46 +0800, 张 磊 wrote:
>>>>>> Hello.
>>>>>> My nilfs suddenly become read-only. I saw these logs in /var/log/messages:
>>>>>>
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>>>> Dec 19 11:20:05 localhost kernel:
>>>>>> Dec 19 11:20:05 localhost kernel: Remounting filesystem read-only
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>>>> Dec 19 11:20:05 localhost kernel:
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>>>> Dec 19 11:20:05 localhost kernel:
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS: bad btree node (blocknr=710153406): level = 0, flags = 0x2, nchildren = 25088
>>>>>> Dec 19 11:20:05 localhost kernel: NILFS error (device sdb2): nilfs_bmap_lookup_contig: broken bmap (inode number=321775)
>>>>>> Dec 19 11:20:05 localhost kernel:
>>>>>> ……………………………………………………
>>>>>>
>>>>>> How can I fix this? There is 6TiB data on my disk, I don't want to format the disk.
>>>>>> I found that a lot of people have encountered the same problem. Is this a bug of nilfs? How can I avoid this problem? When it happens, I was running multiple MySQL and rsync, and nilfs_cleanerd was cleaning segments.
>>>>>>
>>>>> Yes, this issue was reported earlier. As I understand, you can simply
>>>>> remount your filesystem in read-write mode and to continue using your
>>>>> NILFS2 filesystem.
>>>>>
>>>>> If you will encounter any troubles with remounting, please, report about
>>>>> it.
>>>>>
>>>>> With the best regards,
>>>>> Vyacheslav Dubeyko.
>>>>>
>>>>>
>>>>>> Elmer Zhang
>
> --
> Seiji Kihara
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NILFS: bad btree node
[not found] ` <44056E9A-3487-4E8A-A56A-5B9228FC7895-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2012-12-25 6:02 ` Vyacheslav Dubeyko
2012-12-25 7:10 ` Elmer Zhang
0 siblings, 1 reply; 17+ messages in thread
From: Vyacheslav Dubeyko @ 2012-12-25 6:02 UTC (permalink / raw)
To: 张 磊; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
Hi,
On Thu, 2012-12-20 at 19:02 +0800, 张 磊 wrote:
> Yes, I mounted NILFS2 as read-write. It's remounted as read-only by kernel when filesystem found the bad btree node.
>
> That's the full backtrace. I will keep on testing, and report more infomation once I found.
>
I am trying to reproduce the issue but currently without any success. I
have a presupposition that it can be a synchronization issue between GC
and main driver logic but I haven't any evidence of it yet. Probably, I
can't reproduce some environment's peculiarities.
So, I think that I need to understand more deeply a workload in that the
issue had occurred. As I remember, you talked about several MySQL
databases and so on. Could you describe in more details about what
applications and how to work before issue occurrence?
Thanks,
Vyacheslav Dubeyko.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NILFS: bad btree node
2012-12-25 6:02 ` Vyacheslav Dubeyko
@ 2012-12-25 7:10 ` Elmer Zhang
0 siblings, 0 replies; 17+ messages in thread
From: Elmer Zhang @ 2012-12-25 7:10 UTC (permalink / raw)
To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
Hi,
I am trying to use NILFS2 to make MySQL cold backup. I run many MySQL slave servers on the machine, and store the data on NILFS2 filesystem. Most of the engine of the table is MyISAM, few InnoDB.
When the issue occurred, some MySQL is running, and I am copying data to MySQL data dir with rsync. The util of the NILFS2 partition is almost 100%.
In addition to this problem, I also encountered some other problems. Some MyISAM tables suddenly be crashed, then sql_thread of slave stopped. But I do not need to repair the table, just wait a bit, and then restart the sql_thread, can continue. So I guess that may be a problem with the file system. Below is the error log of mysql about this:
121225 14:38:03 [ERROR] Slave SQL: Error 'Table 'consume_log_2a' is marked as crashed and should be repaired' on query. Default database: 'app_wsgrr'. Query: 'DELETE FROM consume_log_2a WHERE log_time<=1353750526 AND coin_type !=2 AND coin_type!=12 AND coin_type !=13', Error_code: 1194 # table crashed
121225 14:38:03 [Warning] Slave: Table 'consume_log_2a' is marked as crashed and should be repaired Error_code: 1194
121225 14:38:03 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'yf75-bin.000296' position 680291890
121225 14:58:54 [Note] Slave I/O thread exiting, read up to log 'yf75-bin.000298', position 802588261 # restart the sql_thread without repairing table
121225 14:58:55 [Note] Slave I/O thread: connected to master 'replica@10.75.7.75:6011',replication started in log 'yf75-bin.000298' at position 802588261
Version of MySQL Server: Percona Server 5.5.23
sdb2 is the NILFS2 partition. Below is the result of "iostat -xm -p sdb 1" in last few seconds.
Linux 2.6.32-220.13.1.el6.x86_64 (yf237) 12/25/2012 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
1.24 0.00 1.91 4.02 0.00 92.83
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 3285.75 42.27 259.02 151.13 15.14 18.46 167.73 5.54 13.52 0.81 33.26
sdb1 0.01 42.26 0.37 1.98 0.04 0.17 183.98 0.06 26.61 2.38 0.56
sdb2 3285.74 0.00 258.65 148.65 15.10 18.28 167.85 5.48 13.46 0.82 33.25
avg-cpu: %user %nice %system %iowait %steal %idle
1.23 0.00 2.22 16.54 0.00 80.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 251.00 0.00 736.00 81.00 4.62 8.87 33.81 19.26 23.74 1.22 99.90
sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb2 251.00 0.00 736.00 79.00 4.62 8.87 33.89 19.27 23.79 1.23 99.90
avg-cpu: %user %nice %system %iowait %steal %idle
4.80 0.00 1.90 20.99 0.00 72.31
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 300.00 450.00 900.00 87.00 10.64 9.50 41.80 19.50 19.55 1.01 99.90
sdb1 0.00 450.00 1.00 16.00 0.12 1.82 234.35 0.32 18.88 6.76 11.50
sdb2 300.00 0.00 899.00 69.00 10.52 7.68 38.50 19.18 19.60 1.03 99.90
avg-cpu: %user %nice %system %iowait %steal %idle
2.12 0.00 2.25 21.60 0.00 74.03
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 449.00 0.00 625.00 81.00 5.00 9.18 41.12 15.32 21.36 1.41 99.40
sdb1 0.00 0.00 1.00 0.00 0.12 0.00 256.00 0.02 19.00 19.00 1.90
sdb2 449.00 0.00 624.00 79.00 4.88 9.18 40.93 15.30 21.42 1.41 99.40
avg-cpu: %user %nice %system %iowait %steal %idle
0.51 0.00 1.52 15.08 0.00 82.89
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 9243.00 0.00 720.00 83.00 23.93 8.20 81.94 19.40 21.53 1.25 100.00
sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb2 9341.00 0.00 722.00 79.00 24.18 8.20 82.79 19.39 21.65 1.25 99.90
avg-cpu: %user %nice %system %iowait %steal %idle
1.23 0.00 6.67 11.36 0.00 80.74
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 10607.00 25.00 687.00 718.00 59.84 86.66 213.54 26.66 20.83 0.66 92.20
sdb1 0.00 24.00 0.00 9.00 0.00 0.13 29.33 0.00 0.11 0.11 0.10
sdb2 10509.00 1.00 685.00 705.00 59.59 86.53 215.29 26.66 21.02 0.66 92.30
avg-cpu: %user %nice %system %iowait %steal %idle
0.86 0.00 1.84 16.67 0.00 80.64
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 127.00 0.00 458.00 139.00 3.05 13.29 56.07 9.15 15.29 1.66 99.30
sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb2 127.00 0.00 458.00 137.00 3.05 13.29 56.26 9.15 15.35 1.67 99.20
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 1.38 13.05 0.00 85.07
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 111.00 229.00 523.00 151.00 3.28 10.09 40.61 12.50 18.36 1.47 99.00
sdb1 0.00 229.00 2.00 54.00 0.25 1.11 49.57 0.33 5.82 1.14 6.40
sdb2 111.00 0.00 521.00 89.00 3.03 8.98 40.31 12.17 19.75 1.62 99.00
avg-cpu: %user %nice %system %iowait %steal %idle
0.88 0.00 1.26 14.99 0.00 82.87
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 171.00 0.00 539.00 86.00 3.29 7.48 35.28 10.23 16.33 1.58 98.80
sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb2 171.00 0.00 540.00 78.00 3.29 7.48 35.69 10.23 16.55 1.60 98.80
avg-cpu: %user %nice %system %iowait %steal %idle
0.76 0.00 1.90 17.62 0.00 79.72
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 156.00 0.00 536.00 119.00 3.31 11.17 45.26 12.63 18.86 1.51 99.20
sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb2 156.00 0.00 535.00 114.00 3.30 11.17 45.67 12.63 19.00 1.53 99.20
avg-cpu: %user %nice %system %iowait %steal %idle
0.49 0.00 1.10 14.81 0.00 83.60
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 169.00 23.00 552.00 94.00 3.36 9.02 39.24 15.50 24.51 1.54 99.80
sdb1 0.00 23.00 0.00 6.00 0.00 0.11 38.67 0.00 0.00 0.00 0.00
sdb2 169.00 0.00 552.00 82.00 3.36 8.90 39.62 15.50 24.97 1.57 99.80
avg-cpu: %user %nice %system %iowait %steal %idle
0.75 0.00 1.25 14.91 0.00 83.08
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 140.00 0.00 589.00 104.00 3.61 9.41 38.50 12.04 17.35 1.43 99.20
sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb2 140.00 0.00 589.00 103.00 3.61 9.41 38.55 12.04 17.38 1.43 99.20
avg-cpu: %user %nice %system %iowait %steal %idle
0.63 0.00 1.14 14.68 0.00 83.54
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 174.00 323.00 599.00 112.00 3.97 10.52 41.73 12.44 17.52 1.39 98.60
sdb1 0.00 323.00 0.00 11.00 0.00 1.30 242.91 0.11 9.82 1.00 1.10
sdb2 174.00 0.00 601.00 93.00 3.98 9.21 38.93 12.33 17.84 1.42 98.60
avg-cpu: %user %nice %system %iowait %steal %idle
0.49 0.00 1.11 15.23 0.00 83.17
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 95.00 0.00 750.00 65.00 3.91 6.57 26.34 20.28 24.46 1.22 99.50
sdb1 0.00 0.00 1.00 0.00 0.12 0.00 256.00 0.07 71.00 71.00 7.10
sdb2 95.00 0.00 747.00 60.00 3.78 6.57 26.26 20.20 24.56 1.23 99.50
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 1.37 13.04 0.00 85.09
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 159.00 0.00 557.00 96.00 3.23 8.82 37.81 16.73 26.08 1.52 99.40
sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb2 159.00 0.00 557.00 92.00 3.23 8.82 38.04 16.73 26.24 1.53 99.40
avg-cpu: %user %nice %system %iowait %steal %idle
0.51 0.00 1.16 14.51 0.00 83.83
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 139.00 0.00 637.00 79.00 3.73 7.44 31.96 14.28 20.11 1.39 99.80
sdb1 0.00 0.00 2.00 0.00 0.25 0.00 256.00 0.04 22.00 22.00 4.40
sdb2 139.00 0.00 635.00 67.00 3.48 7.44 31.86 14.24 20.45 1.42 99.80
avg-cpu: %user %nice %system %iowait %steal %idle
0.98 0.00 1.60 15.83 0.00 81.60
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 198.00 15.00 566.00 109.00 3.79 10.37 42.96 14.10 20.60 1.48 99.60
sdb1 0.00 15.00 1.00 6.00 0.12 0.08 60.57 0.02 2.71 2.71 1.90
sdb2 198.00 0.00 566.00 97.00 3.67 10.29 43.11 14.08 20.97 1.50 99.60
avg-cpu: %user %nice %system %iowait %steal %idle
1.11 0.00 1.35 17.32 0.00 80.22
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sdb 168.00 257.00 613.00 99.00 3.60 9.41 37.43 15.08 21.11 1.39 99.10
sdb1 0.00 257.00 0.00 10.00 0.00 1.04 213.60 0.26 26.00 2.60 2.60
sdb2 168.00 0.00 612.00 86.00 3.59 8.37 35.11 14.82 21.13 1.42 99.10
Below is a snapshot of iotop:
Total DISK READ: 3.26 M/s | Total DISK WRITE: 66.36 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
32291 be/4 my6013 1854.71 K/s 77.93 K/s 0.00 % 84.47 % mysqld --defaults-file=/data0/mysql6013/my6013.cnf
32038 be/4 my6005 472.77 K/s 233.79 K/s 0.00 % 38.31 % mysqld --defaults-file=/data0/mysql6005/my6005.cnf
27470 be/4 my6015 316.91 K/s 0.00 B/s 0.00 % 26.11 % mysqld --defaults-file=/data0/mysql6015/my6015.cnf
14478 be/4 my6010 124.69 K/s 223.40 K/s 0.00 % 19.95 % mysqld --defaults-file=/data0/mysql6010/my6010.cnf
32131 be/4 my6007 363.67 K/s 264.96 K/s 0.00 % 16.28 % mysqld --defaults-file=/data0/mysql6007/my6007.cnf
11578 be/4 my6018 31.17 K/s 353.28 K/s 0.00 % 14.17 % mysqld --defaults-file=/data0/mysql6018/my6018.cnf
27469 be/4 my6015 5.20 K/s 15.59 K/s 0.00 % 12.47 % mysqld --defaults-file=/data0/mysql6015/my6015.cnf
25104 be/4 my6009 15.59 K/s 161.05 K/s 0.00 % 9.47 % mysqld --defaults-file=/data0/mysql6009/my6009.cnf
7144 be/4 root 41.56 K/s 5.82 M/s 0.00 % 8.41 % [segctord]
11498 be/4 my6018 46.76 K/s 67.54 K/s 0.00 % 7.50 % mysqld --defaults-file=/data0/mysql6018/my6018.cnf
1307 be/4 my6016 5.20 K/s 140.27 K/s 0.00 % 4.83 % mysqld --defaults-file=/data0/mysql6016/my6016.cnf
11481 be/4 my6018 20.78 K/s 0.00 B/s 0.00 % 3.89 % mysqld --defaults-file=/data0/mysql6018/my6018.cnf
13831 be/4 my6003 5.20 K/s 181.83 K/s 0.00 % 0.77 % mysqld --defaults-file=/data0/mysql6003/my6003.cnf
973 be/4 root 0.00 B/s 46.76 K/s 0.00 % 0.03 % [kjournald]
972 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.02 % [kjournald]
18568 be/4 my6016 0.00 B/s 93.51 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6016/my6016.cnf
18569 be/4 my6016 0.00 B/s 207.81 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6016/my6016.cnf
14477 be/4 my6010 0.00 B/s 109.10 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6010/my6010.cnf
32130 be/4 my6007 0.00 B/s 51.95 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6007/my6007.cnf
12656 be/4 www 0.00 B/s 1449.48 K/s 0.00 % 0.00 % rsync --daemon
25103 be/4 my6009 0.00 B/s 31.17 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6009/my6009.cnf
962 be/4 my6013 0.00 B/s 353.28 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6013/my6013.cnf
963 be/4 my6013 0.00 B/s 327.30 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6013/my6013.cnf
964 be/4 my6013 0.00 B/s 135.08 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6013/my6013.cnf
965 be/4 my6013 0.00 B/s 290.94 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6013/my6013.cnf
7145 be/4 root 0.00 B/s 446.79 K/s 0.00 % 0.00 % nilfs_cleanerd -n /dev/sdb2 /data0/
13830 be/4 my6003 0.00 B/s 62.34 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6003/my6003.cnf
27723 be/4 my6015 0.00 B/s 244.18 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6015/my6015.cnf
27722 be/4 my6015 0.00 B/s 150.66 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6015/my6015.cnf
11577 be/4 my6018 0.00 B/s 124.69 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6018/my6018.cnf
32193 be/4 my6011 0.00 B/s 98.71 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6011/my6011.cnf
32240 be/4 my6012 0.00 B/s 93.51 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6012/my6012.cnf
11803 be/4 my6002 0.00 B/s 15.59 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6002/my6002.cnf
11804 be/4 my6002 0.00 B/s 25.98 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6002/my6002.cnf
32290 be/4 my6013 0.00 B/s 140.27 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6013/my6013.cnf
32352 be/4 my6014 0.00 B/s 25.98 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6014/my6014.cnf
32037 be/4 my6005 0.00 B/s 150.66 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6005/my6005.cnf
984 be/4 my6013 0.00 B/s 1329.99 K/s 0.00 % 0.00 % mysqld --defaults-file=/data0/mysql6013/my6013.cnf
ÔÚ 2012-12-25£¬14:02£¬Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> дµÀ£º
> Hi,
>
> On Thu, 2012-12-20 at 19:02 +0800, ÕÅ ÀÚ wrote:
>> Yes, I mounted NILFS2 as read-write. It's remounted as read-only by kernel when filesystem found the bad btree node.
>>
>> That's the full backtrace. I will keep on testing, and report more infomation once I found.
>>
>
> I am trying to reproduce the issue but currently without any success. I
> have a presupposition that it can be a synchronization issue between GC
> and main driver logic but I haven't any evidence of it yet. Probably, I
> can't reproduce some environment's peculiarities.
>
> So, I think that I need to understand more deeply a workload in that the
> issue had occurred. As I remember, you talked about several MySQL
> databases and so on. Could you describe in more details about what
> applications and how to work before issue occurrence?
>
> Thanks,
> Vyacheslav Dubeyko.
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NILFS: bad btree node
[not found] ` <14BA4286-BF21-4BD3-8E41-2F8F9512D801-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2012-12-20 10:41 ` Vyacheslav Dubeyko
2012-12-22 14:12 ` Seiji Kihara
@ 2012-12-27 10:43 ` Vyacheslav Dubeyko
2 siblings, 0 replies; 17+ messages in thread
From: Vyacheslav Dubeyko @ 2012-12-27 10:43 UTC (permalink / raw)
To: 张 磊; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On Thu, 2012-12-20 at 18:16 +0800, 张 磊 wrote:
> 1. Linux kernel version: 2.6.32-220.13.1.el6.x86_64
Why do you use 2.6.32 Linux kernel? Could you try to use one of the last
vanilla kernel (for example, 3.7.1)?
To be honestly, I tried to reproduce the issue on 3.6.0 version but
without any success. And I know that such issue was reported on 3.6.8
kernel version also. But this issue has not stable reproducing in the
case of 3.6.8 kernel version.
> 2. nilfs-utils version: nilfs-utils-2.1.4
> 3. "mount" output:
> /dev/sdb2 on /data0 type nilfs2 (rw,noatime,gcpid=22909)
>
> 4. "df -h" output:
> /dev/sdb2 9.6T 5.9T 3.2T 66% /data0
Do you use any RAID technology?
By the way, what HDD hardware do you use? What vendor?
With the best regards,
Vyacheslav Dubeyko.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2012-12-27 10:43 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-20 2:46 NILFS: bad btree node 张 磊
[not found] ` <86B5C141-ACFA-4541-999F-E17E09F22476-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2012-12-20 6:08 ` Vyacheslav Dubeyko
2012-12-20 9:08 ` 张 磊
[not found] ` <3455B0CD-EF89-4227-90E1-FC6B20F5F8EB-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2012-12-20 9:38 ` Vyacheslav Dubeyko
2012-12-20 10:16 ` 张 磊
[not found] ` <14BA4286-BF21-4BD3-8E41-2F8F9512D801-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2012-12-20 10:41 ` Vyacheslav Dubeyko
2012-12-20 11:02 ` 张 磊
[not found] ` <44056E9A-3487-4E8A-A56A-5B9228FC7895-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2012-12-25 6:02 ` Vyacheslav Dubeyko
2012-12-25 7:10 ` Elmer Zhang
2012-12-22 14:12 ` Seiji Kihara
[not found] ` <50D5BFD6.1080502-sG5X7nlA6pw@public.gmane.org>
2012-12-24 3:04 ` 张 磊
2012-12-27 10:43 ` Vyacheslav Dubeyko
-- strict thread matches above, loose matches on Subject: below --
2012-05-25 14:30 Kenneth Langga
[not found] ` <CAHmELnWvFNdiePs=mQJ=nqfsxJ_49zxawa9jncE-RJ2-omYHOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-05-25 18:06 ` Reinoud Zandijk
[not found] ` <20120525180649.GA1236-bVHBekiX4bNgoMqBc1r0ESegHCQxtGRMHZ5vskTnxNA@public.gmane.org>
2012-05-25 18:15 ` Kenneth Langga
[not found] ` <CAHmELnVyRNGn1gda0Sw53YCSOAYMm5JUonebi-9NxaFBP7Uidw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-05-26 14:49 ` Christian Smith
[not found] ` <20120526144932.GG18110-Ng8wz+J301SNY5Lh21HnMTHS2PGA244I9dF7HbQ/qKg@public.gmane.org>
2012-05-26 16:43 ` Kenneth Langga
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).