* silent random symbolic link corruption
@ 2009-01-31 20:13 David Arendt
[not found] ` <4984B0DC.6080905-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: David Arendt @ 2009-01-31 20:13 UTC (permalink / raw)
To: users-JrjvKiOkagjYtjvyW6yDsg
Hi,
After using nilfs2 for half a year now on data partitions without any
problems, I wanted to try it for the root partition. This way I
discovered a silent random symbolic link corruption problem.
Versions:
latest nilfs2 git module
kernel 2.6.28.2
tar 1.20
Step to reproduce it:
tar -xpf zz1.tar (where zz1.tar is a tar file containing many symbolic
links (in my case a directory containing 2 root filesystems for remote
booting)
On untar some symbolic links are missing and 0 byte files are existing
instead.
I repeated the test 3 times on a freshly formated nilfs2 partition and
always had other links missing.
I am currently trying to bzip2 the big tar file and untar this one in
order to verify if there are no timing issues and will report back when
this test is finished.
Could you please look into this ?
Thanks in advance
Bye,
David Arendt
^ permalink raw reply [flat|nested] 9+ messages in thread[parent not found: <4984B0DC.6080905-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>]
* Re: silent random symbolic link corruption [not found] ` <4984B0DC.6080905-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> @ 2009-01-31 20:48 ` David Arendt 2009-01-31 23:21 ` David Arendt 1 sibling, 0 replies; 9+ messages in thread From: David Arendt @ 2009-01-31 20:48 UTC (permalink / raw) To: NILFS Users mailing list Hi, The bzip2 test returned the same result. Thank in advance, David Arendt David Arendt wrote: > Hi, > > After using nilfs2 for half a year now on data partitions without any > problems, I wanted to try it for the root partition. This way I > discovered a silent random symbolic link corruption problem. > > Versions: > > latest nilfs2 git module > kernel 2.6.28.2 > tar 1.20 > > Step to reproduce it: > > tar -xpf zz1.tar (where zz1.tar is a tar file containing many symbolic > links (in my case a directory containing 2 root filesystems for remote > booting) > > On untar some symbolic links are missing and 0 byte files are existing > instead. > > I repeated the test 3 times on a freshly formated nilfs2 partition and > always had other links missing. > > I am currently trying to bzip2 the big tar file and untar this one in > order to verify if there are no timing issues and will report back when > this test is finished. > > Could you please look into this ? > > Thanks in advance > Bye, > David Arendt > > _______________________________________________ > users mailing list > users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org > https://www.nilfs.org/mailman/listinfo/users > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: silent random symbolic link corruption [not found] ` <4984B0DC.6080905-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> 2009-01-31 20:48 ` David Arendt @ 2009-01-31 23:21 ` David Arendt [not found] ` <4984DCF3.8030302-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> 1 sibling, 1 reply; 9+ messages in thread From: David Arendt @ 2009-01-31 23:21 UTC (permalink / raw) To: NILFS Users mailing list Hi, I narrowed the problem down. I'm not sure if it's a problem of tar or of nifls2. Tar handles symbolic links this way: during extraction: if symbolic link and absolute path create a 0 byte file and record link and stat information after extraction: for every symbolic link verify that actual st_dev,st_ino and st_mtime are the same as on creation of the 0 byte file, and only then create the link for some 0 byte files st_ino is different between the first and the second stat of the 0 byte file. As I don't know the nilfs2 internal behavior, so could you please tell me if this is the normal behavior of nilfs2 or if there is something strange with this ? If it's the normal behavior, maybe I should file a bug for tar ? This patch for tar 1.21 solves the symlink problem but I don't know if the problem is to be solved on the tar end or on the nilfs2 end. diff -Naur tar-1.21/src/extract.c tar-1.21.new/src/extract.c --- tar-1.21/src/extract.c 2008-10-30 15:10:28.000000000 +0100 +++ tar-1.21.new/src/extract.c 2009-01-31 23:32:03.000000000 +0100 @@ -1267,7 +1267,6 @@ removed by a later extraction. */ if (lstat (source, &st) == 0 && st.st_dev == ds->dev - && st.st_ino == ds->ino && timespec_cmp (get_stat_mtime (&st), ds->mtime) == 0) { /* Unlink the placeholder, then create a hard link if possible, What do you think ? Thank in advance, David Arendt David Arendt wrote: > Hi, > > After using nilfs2 for half a year now on data partitions without any > problems, I wanted to try it for the root partition. This way I > discovered a silent random symbolic link corruption problem. > > Versions: > > latest nilfs2 git module > kernel 2.6.28.2 > tar 1.20 > > Step to reproduce it: > > tar -xpf zz1.tar (where zz1.tar is a tar file containing many symbolic > links (in my case a directory containing 2 root filesystems for remote > booting) > > On untar some symbolic links are missing and 0 byte files are existing > instead. > > I repeated the test 3 times on a freshly formated nilfs2 partition and > always had other links missing. > > I am currently trying to bzip2 the big tar file and untar this one in > order to verify if there are no timing issues and will report back when > this test is finished. > > Could you please look into this ? > > Thanks in advance > Bye, > David Arendt > > _______________________________________________ > users mailing list > users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org > https://www.nilfs.org/mailman/listinfo/users > ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <4984DCF3.8030302-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>]
* Re: silent random symbolic link corruption [not found] ` <4984DCF3.8030302-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> @ 2009-02-02 2:42 ` Ryusuke Konishi [not found] ` <20090202.114209.59790430.ryusuke-sG5X7nlA6pw@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Ryusuke Konishi @ 2009-02-02 2:42 UTC (permalink / raw) To: users-JrjvKiOkagjYtjvyW6yDsg, admin-/LHdS3kC8BfYtjvyW6yDsg Hi David, On Sun, 01 Feb 2009 00:21:23 +0100, David Arendt wrote: > Hi, > > I narrowed the problem down. I'm not sure if it's a problem of tar or of > nifls2. > > Tar handles symbolic links this way: > > during extraction: if symbolic link and absolute path create a 0 byte > file and record link and stat information > > after extraction: for every symbolic link verify that actual > st_dev,st_ino and st_mtime are the same as on creation of the 0 byte > file, and only then create the link > > for some 0 byte files st_ino is different between the first and the > second stat of the 0 byte file. As I don't know the nilfs2 internal > behavior, so could you please tell me if this is the normal behavior of > nilfs2 or if there is something strange with this ? If it's the normal > behavior, maybe I should file a bug for tar ? This behavior seems unusual. Could you send me a small tar file which can reproduce the problem ? > This patch for tar 1.21 solves the symlink problem but I don't know if > the problem is to be solved on the tar end or on the nilfs2 end. > > > diff -Naur tar-1.21/src/extract.c tar-1.21.new/src/extract.c > --- tar-1.21/src/extract.c 2008-10-30 15:10:28.000000000 +0100 > +++ tar-1.21.new/src/extract.c 2009-01-31 23:32:03.000000000 +0100 > @@ -1267,7 +1267,6 @@ > removed by a later extraction. */ > if (lstat (source, &st) == 0 > && st.st_dev == ds->dev > - && st.st_ino == ds->ino > && timespec_cmp (get_stat_mtime (&st), ds->mtime) == 0) > { > /* Unlink the placeholder, then create a hard link if possible, > > > What do you think ? I think this comparision is appropriate to confirm identity of the placeholder file. I have no idea why inode number had changed as you reported. Symbolic links of nilfs2 are simply implemented. If it's unique to nilfs2, you may hit some sort of timing issue. I think a sample tar file would be helpful to figure out what's happening. Regards, Ryusuke Konishi > David Arendt wrote: > > Hi, > > > > After using nilfs2 for half a year now on data partitions without any > > problems, I wanted to try it for the root partition. This way I > > discovered a silent random symbolic link corruption problem. > > > > Versions: > > > > latest nilfs2 git module > > kernel 2.6.28.2 > > tar 1.20 > > > > Step to reproduce it: > > > > tar -xpf zz1.tar (where zz1.tar is a tar file containing many symbolic > > links (in my case a directory containing 2 root filesystems for remote > > booting) > > > > On untar some symbolic links are missing and 0 byte files are existing > > instead. > > > > I repeated the test 3 times on a freshly formated nilfs2 partition and > > always had other links missing. > > > > I am currently trying to bzip2 the big tar file and untar this one in > > order to verify if there are no timing issues and will report back when > > this test is finished. > > > > Could you please look into this ? > > > > Thanks in advance > > Bye, > > David Arendt > > > > _______________________________________________ > > users mailing list > > users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org > > https://www.nilfs.org/mailman/listinfo/users > > > > > _______________________________________________ > users mailing list > users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org > https://www.nilfs.org/mailman/listinfo/users ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <20090202.114209.59790430.ryusuke-sG5X7nlA6pw@public.gmane.org>]
* Re: silent random symbolic link corruption [not found] ` <20090202.114209.59790430.ryusuke-sG5X7nlA6pw@public.gmane.org> @ 2009-02-02 17:32 ` David Arendt [not found] ` <49872E1E.4090209-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: David Arendt @ 2009-02-02 17:32 UTC (permalink / raw) To: NILFS Users mailing list Hi, I am still inverstigating here, I created tar files with 100000 symbolic links, and they extracted without any problem. Only a tar of my whole nfsroot directory fails to extract properly to nilfs2, but does correctly to ext4.I wouldn't mind giving you that files, but it are 3.3gbytes. What is also curious is that the stat calls in tar seem to return the right inode number, but at the check, it is 0. I am currently adding debugging printfs to tar in hope to catch the problem this way. Bye, David Arendt Ryusuke Konishi wrote: > Hi David, > On Sun, 01 Feb 2009 00:21:23 +0100, David Arendt wrote: > >> Hi, >> >> I narrowed the problem down. I'm not sure if it's a problem of tar or of >> nifls2. >> >> Tar handles symbolic links this way: >> >> during extraction: if symbolic link and absolute path create a 0 byte >> file and record link and stat information >> >> after extraction: for every symbolic link verify that actual >> st_dev,st_ino and st_mtime are the same as on creation of the 0 byte >> file, and only then create the link >> >> for some 0 byte files st_ino is different between the first and the >> second stat of the 0 byte file. As I don't know the nilfs2 internal >> behavior, so could you please tell me if this is the normal behavior of >> nilfs2 or if there is something strange with this ? If it's the normal >> behavior, maybe I should file a bug for tar ? >> > > This behavior seems unusual. > Could you send me a small tar file which can reproduce the problem ? > > >> This patch for tar 1.21 solves the symlink problem but I don't know if >> the problem is to be solved on the tar end or on the nilfs2 end. >> >> >> diff -Naur tar-1.21/src/extract.c tar-1.21.new/src/extract.c >> --- tar-1.21/src/extract.c 2008-10-30 15:10:28.000000000 +0100 >> +++ tar-1.21.new/src/extract.c 2009-01-31 23:32:03.000000000 +0100 >> @@ -1267,7 +1267,6 @@ >> removed by a later extraction. */ >> if (lstat (source, &st) == 0 >> && st.st_dev == ds->dev >> - && st.st_ino == ds->ino >> && timespec_cmp (get_stat_mtime (&st), ds->mtime) == 0) >> { >> /* Unlink the placeholder, then create a hard link if possible, >> >> >> What do you think ? >> > > I think this comparision is appropriate to confirm identity of the > placeholder file. > > I have no idea why inode number had changed as you reported. > Symbolic links of nilfs2 are simply implemented. > > If it's unique to nilfs2, you may hit some sort of timing issue. > I think a sample tar file would be helpful to figure out what's > happening. > > Regards, > Ryusuke Konishi > > >> David Arendt wrote: >> >>> Hi, >>> >>> After using nilfs2 for half a year now on data partitions without any >>> problems, I wanted to try it for the root partition. This way I >>> discovered a silent random symbolic link corruption problem. >>> >>> Versions: >>> >>> latest nilfs2 git module >>> kernel 2.6.28.2 >>> tar 1.20 >>> >>> Step to reproduce it: >>> >>> tar -xpf zz1.tar (where zz1.tar is a tar file containing many symbolic >>> links (in my case a directory containing 2 root filesystems for remote >>> booting) >>> >>> On untar some symbolic links are missing and 0 byte files are existing >>> instead. >>> >>> I repeated the test 3 times on a freshly formated nilfs2 partition and >>> always had other links missing. >>> >>> I am currently trying to bzip2 the big tar file and untar this one in >>> order to verify if there are no timing issues and will report back when >>> this test is finished. >>> >>> Could you please look into this ? >>> >>> Thanks in advance >>> Bye, >>> David Arendt >>> >>> _______________________________________________ >>> users mailing list >>> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org >>> https://www.nilfs.org/mailman/listinfo/users >>> >>> >> _______________________________________________ >> users mailing list >> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org >> https://www.nilfs.org/mailman/listinfo/users >> > _______________________________________________ > users mailing list > users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org > https://www.nilfs.org/mailman/listinfo/users > ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <49872E1E.4090209-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>]
* Re: silent random symbolic link corruption [not found] ` <49872E1E.4090209-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> @ 2009-02-02 21:01 ` David Arendt [not found] ` <49875F16.7060107-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: David Arendt @ 2009-02-02 21:01 UTC (permalink / raw) To: NILFS Users mailing list Hi, Well in fact I cannot find where the problem is coming from. I am also not sure if tar or nilfs2 is causing it. Actually I am using tar -xPpf as this will create symlinks directly without passing through the sometimes failing delaying mechanism. Please tell me if you want any further information. Bye, David Arendt David Arendt wrote: > Hi, > > I am still inverstigating here, I created tar files with 100000 symbolic > links, and they extracted without any problem. Only a tar of my whole > nfsroot directory fails to extract properly to nilfs2, but does > correctly to ext4.I wouldn't mind giving you that files, but it are > 3.3gbytes. What is also curious is that the stat calls in tar seem to > return the right inode number, but at the check, it is 0. I am currently > adding debugging printfs to tar in hope to catch the problem this way. > > Bye, > David Arendt > > Ryusuke Konishi wrote: > >> Hi David, >> On Sun, 01 Feb 2009 00:21:23 +0100, David Arendt wrote: >> >> >>> Hi, >>> >>> I narrowed the problem down. I'm not sure if it's a problem of tar or of >>> nifls2. >>> >>> Tar handles symbolic links this way: >>> >>> during extraction: if symbolic link and absolute path create a 0 byte >>> file and record link and stat information >>> >>> after extraction: for every symbolic link verify that actual >>> st_dev,st_ino and st_mtime are the same as on creation of the 0 byte >>> file, and only then create the link >>> >>> for some 0 byte files st_ino is different between the first and the >>> second stat of the 0 byte file. As I don't know the nilfs2 internal >>> behavior, so could you please tell me if this is the normal behavior of >>> nilfs2 or if there is something strange with this ? If it's the normal >>> behavior, maybe I should file a bug for tar ? >>> >>> >> This behavior seems unusual. >> Could you send me a small tar file which can reproduce the problem ? >> >> >> >>> This patch for tar 1.21 solves the symlink problem but I don't know if >>> the problem is to be solved on the tar end or on the nilfs2 end. >>> >>> >>> diff -Naur tar-1.21/src/extract.c tar-1.21.new/src/extract.c >>> --- tar-1.21/src/extract.c 2008-10-30 15:10:28.000000000 +0100 >>> +++ tar-1.21.new/src/extract.c 2009-01-31 23:32:03.000000000 +0100 >>> @@ -1267,7 +1267,6 @@ >>> removed by a later extraction. */ >>> if (lstat (source, &st) == 0 >>> && st.st_dev == ds->dev >>> - && st.st_ino == ds->ino >>> && timespec_cmp (get_stat_mtime (&st), ds->mtime) == 0) >>> { >>> /* Unlink the placeholder, then create a hard link if possible, >>> >>> >>> What do you think ? >>> >>> >> I think this comparision is appropriate to confirm identity of the >> placeholder file. >> >> I have no idea why inode number had changed as you reported. >> Symbolic links of nilfs2 are simply implemented. >> >> If it's unique to nilfs2, you may hit some sort of timing issue. >> I think a sample tar file would be helpful to figure out what's >> happening. >> >> Regards, >> Ryusuke Konishi >> >> >> >>> David Arendt wrote: >>> >>> >>>> Hi, >>>> >>>> After using nilfs2 for half a year now on data partitions without any >>>> problems, I wanted to try it for the root partition. This way I >>>> discovered a silent random symbolic link corruption problem. >>>> >>>> Versions: >>>> >>>> latest nilfs2 git module >>>> kernel 2.6.28.2 >>>> tar 1.20 >>>> >>>> Step to reproduce it: >>>> >>>> tar -xpf zz1.tar (where zz1.tar is a tar file containing many symbolic >>>> links (in my case a directory containing 2 root filesystems for remote >>>> booting) >>>> >>>> On untar some symbolic links are missing and 0 byte files are existing >>>> instead. >>>> >>>> I repeated the test 3 times on a freshly formated nilfs2 partition and >>>> always had other links missing. >>>> >>>> I am currently trying to bzip2 the big tar file and untar this one in >>>> order to verify if there are no timing issues and will report back when >>>> this test is finished. >>>> >>>> Could you please look into this ? >>>> >>>> Thanks in advance >>>> Bye, >>>> David Arendt >>>> >>>> _______________________________________________ >>>> users mailing list >>>> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org >>>> https://www.nilfs.org/mailman/listinfo/users >>>> >>>> >>>> >>> _______________________________________________ >>> users mailing list >>> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org >>> https://www.nilfs.org/mailman/listinfo/users >>> >>> >> _______________________________________________ >> users mailing list >> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org >> https://www.nilfs.org/mailman/listinfo/users >> >> > > _______________________________________________ > users mailing list > users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org > https://www.nilfs.org/mailman/listinfo/users > ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <49875F16.7060107-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>]
* Re: silent random symbolic link corruption [not found] ` <49875F16.7060107-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org> @ 2009-03-08 6:37 ` Ryusuke Konishi [not found] ` <20090308.153730.64866441.ryusuke-sG5X7nlA6pw@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Ryusuke Konishi @ 2009-03-08 6:37 UTC (permalink / raw) To: users-JrjvKiOkagjYtjvyW6yDsg, admin-/LHdS3kC8BfYtjvyW6yDsg Hi David, On Mon, 02 Feb 2009 22:01:10 +0100, David Arendt wrote: > Hi, > > Well in fact I cannot find where the problem is coming from. I am also > not sure if tar or nilfs2 is causing it. Actually I am using tar -xPpf > as this will create symlinks directly without passing through the > sometimes failing delaying mechanism. Please tell me if you want any > further information. > > Bye, > David Arendt Is your nilfs stable these days? We found this was a timestamp resolution problem. As you informed us, tar creates intermediate zero byte files for symlinks and change them to real symlinks at the last. The problem is the tar checks mtime between the originals and the intermediate files for identification and the comparison uses nano-second values. Some filesystems including ext3 and nilfs2, do not support nano-second timestamps on disk, so nano-second mtime values have possibility to reset to zero when flushed from memory. This is the reason why you saw the problem randomly. The solution for this is 1) supporting nano second time-stamps. 2) changing tar program to stop comparison of nano-second time values 3) changing tar to stop the comparison if underlying filesystem does not support the resolution. The solution 1 suffers compatiblity problem for filesystems. Maybe nilfs should support nano second timestamps, but the inode of nilfs is unfortunately short 32-bits for this. :( If I can allot one 64-bit field of the btree root array, this can be possible. But it breaks the compatibility. Another candidate is an unused 64-bit field reserved for extended attribute. But I'd like to reserve it as is because it is enough important. So it's thorny. The solution 3 requires a new kernel interface. Actually this was discussed recently in the kernel mailing list, but it's unconcluded. I don't know why the tar requires such intermediate file, but the 3 seems required at the thought of conventional file systems. Regards, Ryusuke > David Arendt wrote: > > Hi, > > > > I am still inverstigating here, I created tar files with 100000 symbolic > > links, and they extracted without any problem. Only a tar of my whole > > nfsroot directory fails to extract properly to nilfs2, but does > > correctly to ext4.I wouldn't mind giving you that files, but it are > > 3.3gbytes. What is also curious is that the stat calls in tar seem to > > return the right inode number, but at the check, it is 0. I am currently > > adding debugging printfs to tar in hope to catch the problem this way. > > > > Bye, > > David Arendt > > > > Ryusuke Konishi wrote: > > > >> Hi David, > >> On Sun, 01 Feb 2009 00:21:23 +0100, David Arendt wrote: > >> > >> > >>> Hi, > >>> > >>> I narrowed the problem down. I'm not sure if it's a problem of tar or of > >>> nifls2. > >>> > >>> Tar handles symbolic links this way: > >>> > >>> during extraction: if symbolic link and absolute path create a 0 byte > >>> file and record link and stat information > >>> > >>> after extraction: for every symbolic link verify that actual > >>> st_dev,st_ino and st_mtime are the same as on creation of the 0 byte > >>> file, and only then create the link > >>> > >>> for some 0 byte files st_ino is different between the first and the > >>> second stat of the 0 byte file. As I don't know the nilfs2 internal > >>> behavior, so could you please tell me if this is the normal behavior of > >>> nilfs2 or if there is something strange with this ? If it's the normal > >>> behavior, maybe I should file a bug for tar ? > >>> > >>> > >> This behavior seems unusual. > >> Could you send me a small tar file which can reproduce the problem ? > >> > >> > >> > >>> This patch for tar 1.21 solves the symlink problem but I don't know if > >>> the problem is to be solved on the tar end or on the nilfs2 end. > >>> > >>> > >>> diff -Naur tar-1.21/src/extract.c tar-1.21.new/src/extract.c > >>> --- tar-1.21/src/extract.c 2008-10-30 15:10:28.000000000 +0100 > >>> +++ tar-1.21.new/src/extract.c 2009-01-31 23:32:03.000000000 +0100 > >>> @@ -1267,7 +1267,6 @@ > >>> removed by a later extraction. */ > >>> if (lstat (source, &st) == 0 > >>> && st.st_dev == ds->dev > >>> - && st.st_ino == ds->ino > >>> && timespec_cmp (get_stat_mtime (&st), ds->mtime) == 0) > >>> { > >>> /* Unlink the placeholder, then create a hard link if possible, > >>> > >>> > >>> What do you think ? > >>> > >>> > >> I think this comparision is appropriate to confirm identity of the > >> placeholder file. > >> > >> I have no idea why inode number had changed as you reported. > >> Symbolic links of nilfs2 are simply implemented. > >> > >> If it's unique to nilfs2, you may hit some sort of timing issue. > >> I think a sample tar file would be helpful to figure out what's > >> happening. > >> > >> Regards, > >> Ryusuke Konishi > >> > >> > >> > >>> David Arendt wrote: > >>> > >>> > >>>> Hi, > >>>> > >>>> After using nilfs2 for half a year now on data partitions without any > >>>> problems, I wanted to try it for the root partition. This way I > >>>> discovered a silent random symbolic link corruption problem. > >>>> > >>>> Versions: > >>>> > >>>> latest nilfs2 git module > >>>> kernel 2.6.28.2 > >>>> tar 1.20 > >>>> > >>>> Step to reproduce it: > >>>> > >>>> tar -xpf zz1.tar (where zz1.tar is a tar file containing many symbolic > >>>> links (in my case a directory containing 2 root filesystems for remote > >>>> booting) > >>>> > >>>> On untar some symbolic links are missing and 0 byte files are existing > >>>> instead. > >>>> > >>>> I repeated the test 3 times on a freshly formated nilfs2 partition and > >>>> always had other links missing. > >>>> > >>>> I am currently trying to bzip2 the big tar file and untar this one in > >>>> order to verify if there are no timing issues and will report back when > >>>> this test is finished. > >>>> > >>>> Could you please look into this ? > >>>> > >>>> Thanks in advance > >>>> Bye, > >>>> David Arendt ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <20090308.153730.64866441.ryusuke-sG5X7nlA6pw@public.gmane.org>]
* Re: silent random symbolic link corruption [not found] ` <20090308.153730.64866441.ryusuke-sG5X7nlA6pw@public.gmane.org> @ 2009-03-08 13:07 ` Ryusuke Konishi 2009-03-08 15:45 ` Ryusuke Konishi 1 sibling, 0 replies; 9+ messages in thread From: Ryusuke Konishi @ 2009-03-08 13:07 UTC (permalink / raw) To: users-JrjvKiOkagjYtjvyW6yDsg, admin-/LHdS3kC8BfYtjvyW6yDsg On Sun, 08 Mar 2009 15:37:30 +0900 (JST), Ryusuke Konishi wrote: > Maybe nilfs should support nano second timestamps, but the inode of > nilfs is unfortunately short 32-bits for this. :( > > If I can allot one 64-bit field of the btree root array, this can be > possible. But it breaks the compatibility. Another candidate is an > unused 64-bit field reserved for extended attribute. But I'd like to > reserve it as is because it is enough important. So it's thorny. I just noticed another possibility. The current nilfs inodes have 64-bit dtime field, which stores their deletion time and is not used during inode is alive. We may apply this for the nano-second time stamps without breaking compatibility. I'll think for a second. Cheers, Ryusuke ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: silent random symbolic link corruption [not found] ` <20090308.153730.64866441.ryusuke-sG5X7nlA6pw@public.gmane.org> 2009-03-08 13:07 ` Ryusuke Konishi @ 2009-03-08 15:45 ` Ryusuke Konishi 1 sibling, 0 replies; 9+ messages in thread From: Ryusuke Konishi @ 2009-03-08 15:45 UTC (permalink / raw) To: users-JrjvKiOkagjYtjvyW6yDsg, admin-/LHdS3kC8BfYtjvyW6yDsg On Sun, 08 Mar 2009 15:37:30 +0900 (JST), Ryusuke Konishi wrote: > Some filesystems including ext3 and nilfs2, do not support nano-second > timestamps on disk, so nano-second mtime values have possibility to > reset to zero when flushed from memory. This is the reason why you > saw the problem randomly. > > The solution for this is > > 1) supporting nano second time-stamps. > 2) changing tar program to stop comparison of nano-second time values > 3) changing tar to stop the comparison if underlying filesystem > does not support the resolution. > > The solution 1 suffers compatiblity problem for filesystems. > Maybe nilfs should support nano second timestamps, but the inode of > nilfs is unfortunately short 32-bits for this. :( > > If I can allot one 64-bit field of the btree root array, this can be > possible. But it breaks the compatibility. Another candidate is an > unused 64-bit field reserved for extended attribute. But I'd like to > reserve it as is because it is enough important. So it's thorny. > > The solution 3 requires a new kernel interface. Actually this was > discussed recently in the kernel mailing list, but it's unconcluded. > > I don't know why the tar requires such intermediate file, but the 3 > seems required at the thought of conventional file systems. I've found the cause of this problem in timestamp initialization of on-memory nilfs inode. It initialized timestamps with valid nano second values even though nilfs does not support it on disk. It seems not to be a tar problem, and maybe properly handled file systems (e.g. ext3) does not suffer the problem. By applying the following patch, this problem may be solved. However, I'd like to support nano-second timestamp in the next release in the way I wrote in the previous mail. I think the high resolution timestamp is an important feature for file systems of today. Regards, Ryusuke Konishi diff --git a/fs/inode.c b/fs/inode.c index 46b24e5..3d39dff 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -355,7 +355,7 @@ struct inode *nilfs_new_inode(struct inode *dir, int mode) inode->i_blksize = PAGE_SIZE; /* This is the optimal IO size (for stat), not fs block size */ #endif - inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME; + inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME_SEC; if (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)) { err = nilfs_bmap_read(ii->i_bmap, NULL); ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-03-08 15:45 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-31 20:13 silent random symbolic link corruption David Arendt
[not found] ` <4984B0DC.6080905-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
2009-01-31 20:48 ` David Arendt
2009-01-31 23:21 ` David Arendt
[not found] ` <4984DCF3.8030302-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
2009-02-02 2:42 ` Ryusuke Konishi
[not found] ` <20090202.114209.59790430.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-02-02 17:32 ` David Arendt
[not found] ` <49872E1E.4090209-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
2009-02-02 21:01 ` David Arendt
[not found] ` <49875F16.7060107-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
2009-03-08 6:37 ` Ryusuke Konishi
[not found] ` <20090308.153730.64866441.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-03-08 13:07 ` Ryusuke Konishi
2009-03-08 15:45 ` Ryusuke Konishi
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.