* snapshots contain the same rrd database
@ 2014-02-26 13:32 Piotr Szymaniak
2014-02-26 13:54 ` Vyacheslav Dubeyko
2014-02-27 2:45 ` Ryusuke Konishi
0 siblings, 2 replies; 6+ messages in thread
From: Piotr Szymaniak @ 2014-02-26 13:32 UTC (permalink / raw)
To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
[-- Attachment #1: Type: text/plain, Size: 3809 bytes --]
Hi,
I got a system crash after some 160+ days uptime. After a hard reboot I
noticed my rrd database looks corrupted.
So I changed some recent checkpoints to snapshots, mounted them and...
all the rrd files are the same!
Here's some info about current state:
wloczykij ~ # lscp /dev/sda3 | grep ss
211211 2014-02-22 16:58:11 ss - 119 54904
211219 2014-02-22 18:18:28 ss - 124 54910
211811 2014-02-25 00:39:21 ss - 140 54922
211872 2014-02-25 09:47:16 ss - 160 54922
212008 2014-02-26 01:13:14 ss - 114 54929
212026 2014-02-26 03:22:45 ss - 28 54928
212042 2014-02-26 04:13:48 ss - 29 54928
212045 2014-02-26 04:24:00 ss - 29 54928
wloczykij ~ # mount | grep cp
/dev/sda3 on /tmp/211219 type nilfs2 (ro,cp=211219)
/dev/sda3 on /tmp/211211 type nilfs2 (ro,cp=211211)
/dev/sda3 on /tmp/212026 type nilfs2 (ro,cp=212026)
/dev/sda3 on /tmp/212045 type nilfs2 (ro,cp=212045)
wloczykij ~ # for sumrrd in 211219 211211 212026 212045; do md5sum /tmp/$sumrrd/var/www/grubelek.pl/termometr/temp0.rrd; done
71f60c620a493021bb5e1c32c555abe8 /tmp/211219/var/www/grubelek.pl/termometr/temp0.rrd
71f60c620a493021bb5e1c32c555abe8 /tmp/211211/var/www/grubelek.pl/termometr/temp0.rrd
71f60c620a493021bb5e1c32c555abe8 /tmp/212026/var/www/grubelek.pl/termometr/temp0.rrd
71f60c620a493021bb5e1c32c555abe8 /tmp/212045/var/www/grubelek.pl/termometr/temp0.rrd
This is bad news! What should I do next? All the rrd dumps have the same
modification date:
<lastupdate>1376166602</lastupdate> <!-- 2013-08-10 22:30:02 CEST -->
(looks previous boot before the crash?)
I just moved the rrd to btrfs and made a subvolume snapshot and after about an
hour rrd files are different:
wloczykij ~ # md5sum /home/services/termometr/temp0.rrd /home/snapshot-2014-02-26/services/termometr/temp0.rrd
2999dc7071d94e701d5246d79ccc488f /home/services/termometr/temp0.rrd
1621f31fb7c27f1f3c0b0d8f0f5ede9e /home/snapshot-2014-02-26/services/termometr/temp0.rrd
wloczykij ~ # nilfs-tune -l /dev/sda3
nilfs-tune 2.1.5
Filesystem volume name: (none)
Filesystem UUID: f18e80b1-f3c1-49ec-baa5-39c0edc4c0b9
Filesystem magic number: 0x3434
Filesystem revision #: 2.0
Filesystem features: (none)
Filesystem state: invalid or mounted
Filesystem OS type: Linux
Block size: 4096
Filesystem created: Sat Aug 13 10:36:21 2011
Last mount time: Wed Feb 26 09:33:53 2014
Last write time: Wed Feb 26 14:15:29 2014
Mount count: 59
Maximum mount count: 50
Reserve blocks uid: 0 (user root)
Reserve blocks gid: 0 (group root)
First inode: 11
Inode size: 128
DAT entry size: 32
Checkpoint size: 192
Segment usage size: 16
Number of segments: 465
Device size: 3908042752
First data block: 1
# of blocks per segment: 2048
Reserved segments %: 5
Last checkpoint #: 212170
Last block address: 546866
Last sequence #: 35128
Free blocks count: 227328
Commit interval: 600
# of blks to create seg: 0
CRC seed: 0x1a1e847d
CRC check sum: 0x57f59c5c
CRC check data size: 0x00000118
wloczykij ~ # uname -sr
Linux 3.4.56
Piotr Szymaniak.
--
(...) postąpili tak, jakby odkryli zasady rządzące fizyką kwantową,
a następnie wykorzystali je do zaprojektowania nowej gry telewizyjnej
- a potem, co gorsza, doszli do wniosku, że cała fizyka kwantowa tylko
do tego się nadaje...
-- Stephen King, "Dreamcatcher"
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: snapshots contain the same rrd database
2014-02-26 13:32 snapshots contain the same rrd database Piotr Szymaniak
@ 2014-02-26 13:54 ` Vyacheslav Dubeyko
2014-02-26 14:21 ` Piotr Szymaniak
2014-02-27 2:45 ` Ryusuke Konishi
1 sibling, 1 reply; 6+ messages in thread
From: Vyacheslav Dubeyko @ 2014-02-26 13:54 UTC (permalink / raw)
To: Piotr Szymaniak; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
Hi Piotr,
On Wed, 2014-02-26 at 14:32 +0100, Piotr Szymaniak wrote:
> Hi,
>
> I got a system crash after some 160+ days uptime. After a hard reboot I
> noticed my rrd database looks corrupted.
>
> So I changed some recent checkpoints to snapshots, mounted them and...
> all the rrd files are the same!
>
To be honest, I don't understand clearly:
(1) How did you get the issue?
(2) Did you create snapshots after crash?
(3) Had you some snapshots before crash?
If you had a crash then you should have some error messages in the
system log. Have you something? Or did you lose all error messages
during the crash?
Anyway, I need to have the reproducing path for investigate the issue.
Of course, I am not going to wait 160 days before achieving the issue
reproducibility. :) One of the possible way is to share some small
NILFS2 volume with good issue reproducibility. But, currently, I don't
quite follow in what way I can reproduce the issue.
Thanks,
Vyacheslav Dubeyko.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: snapshots contain the same rrd database
2014-02-26 13:54 ` Vyacheslav Dubeyko
@ 2014-02-26 14:21 ` Piotr Szymaniak
2014-02-26 14:39 ` Vyacheslav Dubeyko
0 siblings, 1 reply; 6+ messages in thread
From: Piotr Szymaniak @ 2014-02-26 14:21 UTC (permalink / raw)
To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
[-- Attachment #1: Type: text/plain, Size: 1984 bytes --]
On Wed, Feb 26, 2014 at 05:54:21PM +0400, Vyacheslav Dubeyko wrote:
> Hi Piotr,
>
> On Wed, 2014-02-26 at 14:32 +0100, Piotr Szymaniak wrote:
> > Hi,
> >
> > I got a system crash after some 160+ days uptime. After a hard reboot I
> > noticed my rrd database looks corrupted.
> >
> > So I changed some recent checkpoints to snapshots, mounted them and...
> > all the rrd files are the same!
> >
>
> To be honest, I don't understand clearly:
> (1) How did you get the issue?
To me it looks like the file hasn't changed since the first boot. Like
it's not written at all? Is there a way to check something like "file
position" on disk in specific snapshot?
rrds are a bit weird databases. When created they are, ie. size A. And
all the way in time they gather some data and are always in that size A.
The size doesn't change. Maybe this is related?
> (2) Did you create snapshots after crash?
Yes.
> (3) Had you some snapshots before crash?
No.
> If you had a crash then you should have some error messages in the
> system log. Have you something? Or did you lose all error messages
> during the crash?
The crash was related to a process running on a different filesystem. My
syslog has only garbage, so yes, it is lost.
> Anyway, I need to have the reproducing path for investigate the issue.
> Of course, I am not going to wait 160 days before achieving the issue
> reproducibility. :) One of the possible way is to share some small
> NILFS2 volume with good issue reproducibility. But, currently, I don't
> quite follow in what way I can reproduce the issue.
I suppose this could be related to this "size A" mentioned above. Will
try to figure out some reproducibility path.
Piotr Szymaniak.
--
Jest tam jedno powiedzenie... nie pamietam go dokladnie, ale brzmi
mniej wiecej tak: "Czlowiek wyczuwajacy wiatr zmian winien budowac nie
oslony od wiatru, lecz mlyny".
-- Stephen King, "The Dead Zone"
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: snapshots contain the same rrd database
2014-02-26 14:21 ` Piotr Szymaniak
@ 2014-02-26 14:39 ` Vyacheslav Dubeyko
0 siblings, 0 replies; 6+ messages in thread
From: Vyacheslav Dubeyko @ 2014-02-26 14:39 UTC (permalink / raw)
To: Piotr Szymaniak; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
On Wed, 2014-02-26 at 15:21 +0100, Piotr Szymaniak wrote:
> On Wed, Feb 26, 2014 at 05:54:21PM +0400, Vyacheslav Dubeyko wrote:
> > Hi Piotr,
> >
> > On Wed, 2014-02-26 at 14:32 +0100, Piotr Szymaniak wrote:
> > > Hi,
> > >
> > > I got a system crash after some 160+ days uptime. After a hard reboot I
> > > noticed my rrd database looks corrupted.
> > >
> > > So I changed some recent checkpoints to snapshots, mounted them and...
> > > all the rrd files are the same!
> > >
> >
> > To be honest, I don't understand clearly:
> > (1) How did you get the issue?
>
> To me it looks like the file hasn't changed since the first boot. Like
> it's not written at all? Is there a way to check something like "file
> position" on disk in specific snapshot?
>
> rrds are a bit weird databases. When created they are, ie. size A. And
> all the way in time they gather some data and are always in that size A.
> The size doesn't change. Maybe this is related?
So, as far as I can judge, you can reproduce the issue stably. And you
suppose that file doesn't be written at all. How segctor and
nilfs-clenared threads behave itself as processes? Could you check that
they doesn't eat 100% of CPU?
If segctor is unable to flush data then it makes sense to use "echo t
> /proc/sysrq-trigger" for getting info about processes state. This
command outputs into system log usually.
Thanks,
Vyacheslav Dubeyko.
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: snapshots contain the same rrd database
2014-02-26 13:32 snapshots contain the same rrd database Piotr Szymaniak
2014-02-26 13:54 ` Vyacheslav Dubeyko
@ 2014-02-27 2:45 ` Ryusuke Konishi
[not found] ` <20140227.114547.220041242.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
1 sibling, 1 reply; 6+ messages in thread
From: Ryusuke Konishi @ 2014-02-27 2:45 UTC (permalink / raw)
To: Piotr Szymaniak; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
Hi Piotr,
On Wed, 26 Feb 2014 14:32:02 +0100, Piotr Szymaniak wrote:
> Hi,
>
> I got a system crash after some 160+ days uptime. After a hard reboot I
> noticed my rrd database looks corrupted.
>
> So I changed some recent checkpoints to snapshots, mounted them and...
> all the rrd files are the same!
>
> Here's some info about current state:
<snip>
> wloczykij ~ # uname -sr
> Linux 3.4.56
This version looks a bit old. The current head of linux-3.4.y is
v3.4.82.
The following important bug fixes are not included in this version:
$ git shortlog v3.4.56..v3.4.82 | grep nilfs
nilfs2: fix segctor bug that causes file system corruption
nilfs2: remove double bio_put() in nilfs_end_bio_write() for BIO_EOPNOTSUPP error
nilfs2: fix issue with counting number of bio requests for BIO_EOPNOTSUPP error detection
Was it a distro kernel?
If you can try the latest version, I hope it both for avoiding
critical error of yours and for narrowing down cause of the problem.
Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: snapshots contain the same rrd database
[not found] ` <20140227.114547.220041242.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2014-02-27 10:58 ` Piotr Szymaniak
0 siblings, 0 replies; 6+ messages in thread
From: Piotr Szymaniak @ 2014-02-27 10:58 UTC (permalink / raw)
To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA
[-- Attachment #1: Type: text/plain, Size: 1178 bytes --]
On Thu, Feb 27, 2014 at 11:45:47AM +0900, Ryusuke Konishi wrote:
> Hi Piotr,
> On Wed, 26 Feb 2014 14:32:02 +0100, Piotr Szymaniak wrote:
> > Hi,
> >
> > I got a system crash after some 160+ days uptime. After a hard reboot I
> > noticed my rrd database looks corrupted.
> >
> > So I changed some recent checkpoints to snapshots, mounted them and...
> > all the rrd files are the same!
> >
> > Here's some info about current state:
> <snip>
> > wloczykij ~ # uname -sr
> > Linux 3.4.56
>
> This version looks a bit old. The current head of linux-3.4.y is
> v3.4.82.
Updated.
> The following important bug fixes are not included in this version:
>
> $ git shortlog v3.4.56..v3.4.82 | grep nilfs
> nilfs2: fix segctor bug that causes file system corruption
> nilfs2: remove double bio_put() in nilfs_end_bio_write() for BIO_EOPNOTSUPP error
> nilfs2: fix issue with counting number of bio requests for BIO_EOPNOTSUPP error detection
>
> Was it a distro kernel?
No, it's a hand made Gentoo kernel.
Piotr Szymaniak.
--
Czekajcie, czekajcie. Ktos cos do mnie mowil, ale nie wiem kto i nie
wiem co.
-- Rafal Solecki
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-02-27 10:58 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-26 13:32 snapshots contain the same rrd database Piotr Szymaniak
2014-02-26 13:54 ` Vyacheslav Dubeyko
2014-02-26 14:21 ` Piotr Szymaniak
2014-02-26 14:39 ` Vyacheslav Dubeyko
2014-02-27 2:45 ` Ryusuke Konishi
[not found] ` <20140227.114547.220041242.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-02-27 10:58 ` Piotr Szymaniak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox