* silent random symbolic link corruption
@ 2009-01-31 20:13 David Arendt
[not found] ` <4984B0DC.6080905-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: David Arendt @ 2009-01-31 20:13 UTC (permalink / raw)
To: users-JrjvKiOkagjYtjvyW6yDsg
Hi,
After using nilfs2 for half a year now on data partitions without any
problems, I wanted to try it for the root partition. This way I
discovered a silent random symbolic link corruption problem.
Versions:
latest nilfs2 git module
kernel 2.6.28.2
tar 1.20
Step to reproduce it:
tar -xpf zz1.tar (where zz1.tar is a tar file containing many symbolic
links (in my case a directory containing 2 root filesystems for remote
booting)
On untar some symbolic links are missing and 0 byte files are existing
instead.
I repeated the test 3 times on a freshly formated nilfs2 partition and
always had other links missing.
I am currently trying to bzip2 the big tar file and untar this one in
order to verify if there are no timing issues and will report back when
this test is finished.
Could you please look into this ?
Thanks in advance
Bye,
David Arendt
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: silent random symbolic link corruption
[not found] ` <4984B0DC.6080905-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
@ 2009-01-31 20:48 ` David Arendt
2009-01-31 23:21 ` David Arendt
1 sibling, 0 replies; 9+ messages in thread
From: David Arendt @ 2009-01-31 20:48 UTC (permalink / raw)
To: NILFS Users mailing list
Hi,
The bzip2 test returned the same result.
Thank in advance,
David Arendt
David Arendt wrote:
> Hi,
>
> After using nilfs2 for half a year now on data partitions without any
> problems, I wanted to try it for the root partition. This way I
> discovered a silent random symbolic link corruption problem.
>
> Versions:
>
> latest nilfs2 git module
> kernel 2.6.28.2
> tar 1.20
>
> Step to reproduce it:
>
> tar -xpf zz1.tar (where zz1.tar is a tar file containing many symbolic
> links (in my case a directory containing 2 root filesystems for remote
> booting)
>
> On untar some symbolic links are missing and 0 byte files are existing
> instead.
>
> I repeated the test 3 times on a freshly formated nilfs2 partition and
> always had other links missing.
>
> I am currently trying to bzip2 the big tar file and untar this one in
> order to verify if there are no timing issues and will report back when
> this test is finished.
>
> Could you please look into this ?
>
> Thanks in advance
> Bye,
> David Arendt
>
> _______________________________________________
> users mailing list
> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> https://www.nilfs.org/mailman/listinfo/users
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: silent random symbolic link corruption
[not found] ` <4984B0DC.6080905-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
2009-01-31 20:48 ` David Arendt
@ 2009-01-31 23:21 ` David Arendt
[not found] ` <4984DCF3.8030302-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
1 sibling, 1 reply; 9+ messages in thread
From: David Arendt @ 2009-01-31 23:21 UTC (permalink / raw)
To: NILFS Users mailing list
Hi,
I narrowed the problem down. I'm not sure if it's a problem of tar or of
nifls2.
Tar handles symbolic links this way:
during extraction: if symbolic link and absolute path create a 0 byte
file and record link and stat information
after extraction: for every symbolic link verify that actual
st_dev,st_ino and st_mtime are the same as on creation of the 0 byte
file, and only then create the link
for some 0 byte files st_ino is different between the first and the
second stat of the 0 byte file. As I don't know the nilfs2 internal
behavior, so could you please tell me if this is the normal behavior of
nilfs2 or if there is something strange with this ? If it's the normal
behavior, maybe I should file a bug for tar ?
This patch for tar 1.21 solves the symlink problem but I don't know if
the problem is to be solved on the tar end or on the nilfs2 end.
diff -Naur tar-1.21/src/extract.c tar-1.21.new/src/extract.c
--- tar-1.21/src/extract.c 2008-10-30 15:10:28.000000000 +0100
+++ tar-1.21.new/src/extract.c 2009-01-31 23:32:03.000000000 +0100
@@ -1267,7 +1267,6 @@
removed by a later extraction. */
if (lstat (source, &st) == 0
&& st.st_dev == ds->dev
- && st.st_ino == ds->ino
&& timespec_cmp (get_stat_mtime (&st), ds->mtime) == 0)
{
/* Unlink the placeholder, then create a hard link if possible,
What do you think ?
Thank in advance,
David Arendt
David Arendt wrote:
> Hi,
>
> After using nilfs2 for half a year now on data partitions without any
> problems, I wanted to try it for the root partition. This way I
> discovered a silent random symbolic link corruption problem.
>
> Versions:
>
> latest nilfs2 git module
> kernel 2.6.28.2
> tar 1.20
>
> Step to reproduce it:
>
> tar -xpf zz1.tar (where zz1.tar is a tar file containing many symbolic
> links (in my case a directory containing 2 root filesystems for remote
> booting)
>
> On untar some symbolic links are missing and 0 byte files are existing
> instead.
>
> I repeated the test 3 times on a freshly formated nilfs2 partition and
> always had other links missing.
>
> I am currently trying to bzip2 the big tar file and untar this one in
> order to verify if there are no timing issues and will report back when
> this test is finished.
>
> Could you please look into this ?
>
> Thanks in advance
> Bye,
> David Arendt
>
> _______________________________________________
> users mailing list
> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> https://www.nilfs.org/mailman/listinfo/users
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: silent random symbolic link corruption
[not found] ` <4984DCF3.8030302-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
@ 2009-02-02 2:42 ` Ryusuke Konishi
[not found] ` <20090202.114209.59790430.ryusuke-sG5X7nlA6pw@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Ryusuke Konishi @ 2009-02-02 2:42 UTC (permalink / raw)
To: users-JrjvKiOkagjYtjvyW6yDsg, admin-/LHdS3kC8BfYtjvyW6yDsg
Hi David,
On Sun, 01 Feb 2009 00:21:23 +0100, David Arendt wrote:
> Hi,
>
> I narrowed the problem down. I'm not sure if it's a problem of tar or of
> nifls2.
>
> Tar handles symbolic links this way:
>
> during extraction: if symbolic link and absolute path create a 0 byte
> file and record link and stat information
>
> after extraction: for every symbolic link verify that actual
> st_dev,st_ino and st_mtime are the same as on creation of the 0 byte
> file, and only then create the link
>
> for some 0 byte files st_ino is different between the first and the
> second stat of the 0 byte file. As I don't know the nilfs2 internal
> behavior, so could you please tell me if this is the normal behavior of
> nilfs2 or if there is something strange with this ? If it's the normal
> behavior, maybe I should file a bug for tar ?
This behavior seems unusual.
Could you send me a small tar file which can reproduce the problem ?
> This patch for tar 1.21 solves the symlink problem but I don't know if
> the problem is to be solved on the tar end or on the nilfs2 end.
>
>
> diff -Naur tar-1.21/src/extract.c tar-1.21.new/src/extract.c
> --- tar-1.21/src/extract.c 2008-10-30 15:10:28.000000000 +0100
> +++ tar-1.21.new/src/extract.c 2009-01-31 23:32:03.000000000 +0100
> @@ -1267,7 +1267,6 @@
> removed by a later extraction. */
> if (lstat (source, &st) == 0
> && st.st_dev == ds->dev
> - && st.st_ino == ds->ino
> && timespec_cmp (get_stat_mtime (&st), ds->mtime) == 0)
> {
> /* Unlink the placeholder, then create a hard link if possible,
>
>
> What do you think ?
I think this comparision is appropriate to confirm identity of the
placeholder file.
I have no idea why inode number had changed as you reported.
Symbolic links of nilfs2 are simply implemented.
If it's unique to nilfs2, you may hit some sort of timing issue.
I think a sample tar file would be helpful to figure out what's
happening.
Regards,
Ryusuke Konishi
> David Arendt wrote:
> > Hi,
> >
> > After using nilfs2 for half a year now on data partitions without any
> > problems, I wanted to try it for the root partition. This way I
> > discovered a silent random symbolic link corruption problem.
> >
> > Versions:
> >
> > latest nilfs2 git module
> > kernel 2.6.28.2
> > tar 1.20
> >
> > Step to reproduce it:
> >
> > tar -xpf zz1.tar (where zz1.tar is a tar file containing many symbolic
> > links (in my case a directory containing 2 root filesystems for remote
> > booting)
> >
> > On untar some symbolic links are missing and 0 byte files are existing
> > instead.
> >
> > I repeated the test 3 times on a freshly formated nilfs2 partition and
> > always had other links missing.
> >
> > I am currently trying to bzip2 the big tar file and untar this one in
> > order to verify if there are no timing issues and will report back when
> > this test is finished.
> >
> > Could you please look into this ?
> >
> > Thanks in advance
> > Bye,
> > David Arendt
> >
> > _______________________________________________
> > users mailing list
> > users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> > https://www.nilfs.org/mailman/listinfo/users
> >
>
>
> _______________________________________________
> users mailing list
> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> https://www.nilfs.org/mailman/listinfo/users
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: silent random symbolic link corruption
[not found] ` <20090202.114209.59790430.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-02-02 17:32 ` David Arendt
[not found] ` <49872E1E.4090209-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: David Arendt @ 2009-02-02 17:32 UTC (permalink / raw)
To: NILFS Users mailing list
Hi,
I am still inverstigating here, I created tar files with 100000 symbolic
links, and they extracted without any problem. Only a tar of my whole
nfsroot directory fails to extract properly to nilfs2, but does
correctly to ext4.I wouldn't mind giving you that files, but it are
3.3gbytes. What is also curious is that the stat calls in tar seem to
return the right inode number, but at the check, it is 0. I am currently
adding debugging printfs to tar in hope to catch the problem this way.
Bye,
David Arendt
Ryusuke Konishi wrote:
> Hi David,
> On Sun, 01 Feb 2009 00:21:23 +0100, David Arendt wrote:
>
>> Hi,
>>
>> I narrowed the problem down. I'm not sure if it's a problem of tar or of
>> nifls2.
>>
>> Tar handles symbolic links this way:
>>
>> during extraction: if symbolic link and absolute path create a 0 byte
>> file and record link and stat information
>>
>> after extraction: for every symbolic link verify that actual
>> st_dev,st_ino and st_mtime are the same as on creation of the 0 byte
>> file, and only then create the link
>>
>> for some 0 byte files st_ino is different between the first and the
>> second stat of the 0 byte file. As I don't know the nilfs2 internal
>> behavior, so could you please tell me if this is the normal behavior of
>> nilfs2 or if there is something strange with this ? If it's the normal
>> behavior, maybe I should file a bug for tar ?
>>
>
> This behavior seems unusual.
> Could you send me a small tar file which can reproduce the problem ?
>
>
>> This patch for tar 1.21 solves the symlink problem but I don't know if
>> the problem is to be solved on the tar end or on the nilfs2 end.
>>
>>
>> diff -Naur tar-1.21/src/extract.c tar-1.21.new/src/extract.c
>> --- tar-1.21/src/extract.c 2008-10-30 15:10:28.000000000 +0100
>> +++ tar-1.21.new/src/extract.c 2009-01-31 23:32:03.000000000 +0100
>> @@ -1267,7 +1267,6 @@
>> removed by a later extraction. */
>> if (lstat (source, &st) == 0
>> && st.st_dev == ds->dev
>> - && st.st_ino == ds->ino
>> && timespec_cmp (get_stat_mtime (&st), ds->mtime) == 0)
>> {
>> /* Unlink the placeholder, then create a hard link if possible,
>>
>>
>> What do you think ?
>>
>
> I think this comparision is appropriate to confirm identity of the
> placeholder file.
>
> I have no idea why inode number had changed as you reported.
> Symbolic links of nilfs2 are simply implemented.
>
> If it's unique to nilfs2, you may hit some sort of timing issue.
> I think a sample tar file would be helpful to figure out what's
> happening.
>
> Regards,
> Ryusuke Konishi
>
>
>> David Arendt wrote:
>>
>>> Hi,
>>>
>>> After using nilfs2 for half a year now on data partitions without any
>>> problems, I wanted to try it for the root partition. This way I
>>> discovered a silent random symbolic link corruption problem.
>>>
>>> Versions:
>>>
>>> latest nilfs2 git module
>>> kernel 2.6.28.2
>>> tar 1.20
>>>
>>> Step to reproduce it:
>>>
>>> tar -xpf zz1.tar (where zz1.tar is a tar file containing many symbolic
>>> links (in my case a directory containing 2 root filesystems for remote
>>> booting)
>>>
>>> On untar some symbolic links are missing and 0 byte files are existing
>>> instead.
>>>
>>> I repeated the test 3 times on a freshly formated nilfs2 partition and
>>> always had other links missing.
>>>
>>> I am currently trying to bzip2 the big tar file and untar this one in
>>> order to verify if there are no timing issues and will report back when
>>> this test is finished.
>>>
>>> Could you please look into this ?
>>>
>>> Thanks in advance
>>> Bye,
>>> David Arendt
>>>
>>> _______________________________________________
>>> users mailing list
>>> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
>>> https://www.nilfs.org/mailman/listinfo/users
>>>
>>>
>> _______________________________________________
>> users mailing list
>> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
>> https://www.nilfs.org/mailman/listinfo/users
>>
> _______________________________________________
> users mailing list
> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> https://www.nilfs.org/mailman/listinfo/users
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: silent random symbolic link corruption
[not found] ` <49872E1E.4090209-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
@ 2009-02-02 21:01 ` David Arendt
[not found] ` <49875F16.7060107-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: David Arendt @ 2009-02-02 21:01 UTC (permalink / raw)
To: NILFS Users mailing list
Hi,
Well in fact I cannot find where the problem is coming from. I am also
not sure if tar or nilfs2 is causing it. Actually I am using tar -xPpf
as this will create symlinks directly without passing through the
sometimes failing delaying mechanism. Please tell me if you want any
further information.
Bye,
David Arendt
David Arendt wrote:
> Hi,
>
> I am still inverstigating here, I created tar files with 100000 symbolic
> links, and they extracted without any problem. Only a tar of my whole
> nfsroot directory fails to extract properly to nilfs2, but does
> correctly to ext4.I wouldn't mind giving you that files, but it are
> 3.3gbytes. What is also curious is that the stat calls in tar seem to
> return the right inode number, but at the check, it is 0. I am currently
> adding debugging printfs to tar in hope to catch the problem this way.
>
> Bye,
> David Arendt
>
> Ryusuke Konishi wrote:
>
>> Hi David,
>> On Sun, 01 Feb 2009 00:21:23 +0100, David Arendt wrote:
>>
>>
>>> Hi,
>>>
>>> I narrowed the problem down. I'm not sure if it's a problem of tar or of
>>> nifls2.
>>>
>>> Tar handles symbolic links this way:
>>>
>>> during extraction: if symbolic link and absolute path create a 0 byte
>>> file and record link and stat information
>>>
>>> after extraction: for every symbolic link verify that actual
>>> st_dev,st_ino and st_mtime are the same as on creation of the 0 byte
>>> file, and only then create the link
>>>
>>> for some 0 byte files st_ino is different between the first and the
>>> second stat of the 0 byte file. As I don't know the nilfs2 internal
>>> behavior, so could you please tell me if this is the normal behavior of
>>> nilfs2 or if there is something strange with this ? If it's the normal
>>> behavior, maybe I should file a bug for tar ?
>>>
>>>
>> This behavior seems unusual.
>> Could you send me a small tar file which can reproduce the problem ?
>>
>>
>>
>>> This patch for tar 1.21 solves the symlink problem but I don't know if
>>> the problem is to be solved on the tar end or on the nilfs2 end.
>>>
>>>
>>> diff -Naur tar-1.21/src/extract.c tar-1.21.new/src/extract.c
>>> --- tar-1.21/src/extract.c 2008-10-30 15:10:28.000000000 +0100
>>> +++ tar-1.21.new/src/extract.c 2009-01-31 23:32:03.000000000 +0100
>>> @@ -1267,7 +1267,6 @@
>>> removed by a later extraction. */
>>> if (lstat (source, &st) == 0
>>> && st.st_dev == ds->dev
>>> - && st.st_ino == ds->ino
>>> && timespec_cmp (get_stat_mtime (&st), ds->mtime) == 0)
>>> {
>>> /* Unlink the placeholder, then create a hard link if possible,
>>>
>>>
>>> What do you think ?
>>>
>>>
>> I think this comparision is appropriate to confirm identity of the
>> placeholder file.
>>
>> I have no idea why inode number had changed as you reported.
>> Symbolic links of nilfs2 are simply implemented.
>>
>> If it's unique to nilfs2, you may hit some sort of timing issue.
>> I think a sample tar file would be helpful to figure out what's
>> happening.
>>
>> Regards,
>> Ryusuke Konishi
>>
>>
>>
>>> David Arendt wrote:
>>>
>>>
>>>> Hi,
>>>>
>>>> After using nilfs2 for half a year now on data partitions without any
>>>> problems, I wanted to try it for the root partition. This way I
>>>> discovered a silent random symbolic link corruption problem.
>>>>
>>>> Versions:
>>>>
>>>> latest nilfs2 git module
>>>> kernel 2.6.28.2
>>>> tar 1.20
>>>>
>>>> Step to reproduce it:
>>>>
>>>> tar -xpf zz1.tar (where zz1.tar is a tar file containing many symbolic
>>>> links (in my case a directory containing 2 root filesystems for remote
>>>> booting)
>>>>
>>>> On untar some symbolic links are missing and 0 byte files are existing
>>>> instead.
>>>>
>>>> I repeated the test 3 times on a freshly formated nilfs2 partition and
>>>> always had other links missing.
>>>>
>>>> I am currently trying to bzip2 the big tar file and untar this one in
>>>> order to verify if there are no timing issues and will report back when
>>>> this test is finished.
>>>>
>>>> Could you please look into this ?
>>>>
>>>> Thanks in advance
>>>> Bye,
>>>> David Arendt
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
>>>> https://www.nilfs.org/mailman/listinfo/users
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> users mailing list
>>> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
>>> https://www.nilfs.org/mailman/listinfo/users
>>>
>>>
>> _______________________________________________
>> users mailing list
>> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
>> https://www.nilfs.org/mailman/listinfo/users
>>
>>
>
> _______________________________________________
> users mailing list
> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> https://www.nilfs.org/mailman/listinfo/users
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: silent random symbolic link corruption
[not found] ` <49875F16.7060107-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
@ 2009-03-08 6:37 ` Ryusuke Konishi
[not found] ` <20090308.153730.64866441.ryusuke-sG5X7nlA6pw@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Ryusuke Konishi @ 2009-03-08 6:37 UTC (permalink / raw)
To: users-JrjvKiOkagjYtjvyW6yDsg, admin-/LHdS3kC8BfYtjvyW6yDsg
Hi David,
On Mon, 02 Feb 2009 22:01:10 +0100, David Arendt wrote:
> Hi,
>
> Well in fact I cannot find where the problem is coming from. I am also
> not sure if tar or nilfs2 is causing it. Actually I am using tar -xPpf
> as this will create symlinks directly without passing through the
> sometimes failing delaying mechanism. Please tell me if you want any
> further information.
>
> Bye,
> David Arendt
Is your nilfs stable these days?
We found this was a timestamp resolution problem.
As you informed us, tar creates intermediate zero byte files for
symlinks and change them to real symlinks at the last.
The problem is the tar checks mtime between the originals and the
intermediate files for identification and the comparison uses
nano-second values.
Some filesystems including ext3 and nilfs2, do not support nano-second
timestamps on disk, so nano-second mtime values have possibility to
reset to zero when flushed from memory. This is the reason why you
saw the problem randomly.
The solution for this is
1) supporting nano second time-stamps.
2) changing tar program to stop comparison of nano-second time values
3) changing tar to stop the comparison if underlying filesystem
does not support the resolution.
The solution 1 suffers compatiblity problem for filesystems.
Maybe nilfs should support nano second timestamps, but the inode of
nilfs is unfortunately short 32-bits for this. :(
If I can allot one 64-bit field of the btree root array, this can be
possible. But it breaks the compatibility. Another candidate is an
unused 64-bit field reserved for extended attribute. But I'd like to
reserve it as is because it is enough important. So it's thorny.
The solution 3 requires a new kernel interface. Actually this was
discussed recently in the kernel mailing list, but it's unconcluded.
I don't know why the tar requires such intermediate file, but the 3
seems required at the thought of conventional file systems.
Regards,
Ryusuke
> David Arendt wrote:
> > Hi,
> >
> > I am still inverstigating here, I created tar files with 100000 symbolic
> > links, and they extracted without any problem. Only a tar of my whole
> > nfsroot directory fails to extract properly to nilfs2, but does
> > correctly to ext4.I wouldn't mind giving you that files, but it are
> > 3.3gbytes. What is also curious is that the stat calls in tar seem to
> > return the right inode number, but at the check, it is 0. I am currently
> > adding debugging printfs to tar in hope to catch the problem this way.
> >
> > Bye,
> > David Arendt
> >
> > Ryusuke Konishi wrote:
> >
> >> Hi David,
> >> On Sun, 01 Feb 2009 00:21:23 +0100, David Arendt wrote:
> >>
> >>
> >>> Hi,
> >>>
> >>> I narrowed the problem down. I'm not sure if it's a problem of tar or of
> >>> nifls2.
> >>>
> >>> Tar handles symbolic links this way:
> >>>
> >>> during extraction: if symbolic link and absolute path create a 0 byte
> >>> file and record link and stat information
> >>>
> >>> after extraction: for every symbolic link verify that actual
> >>> st_dev,st_ino and st_mtime are the same as on creation of the 0 byte
> >>> file, and only then create the link
> >>>
> >>> for some 0 byte files st_ino is different between the first and the
> >>> second stat of the 0 byte file. As I don't know the nilfs2 internal
> >>> behavior, so could you please tell me if this is the normal behavior of
> >>> nilfs2 or if there is something strange with this ? If it's the normal
> >>> behavior, maybe I should file a bug for tar ?
> >>>
> >>>
> >> This behavior seems unusual.
> >> Could you send me a small tar file which can reproduce the problem ?
> >>
> >>
> >>
> >>> This patch for tar 1.21 solves the symlink problem but I don't know if
> >>> the problem is to be solved on the tar end or on the nilfs2 end.
> >>>
> >>>
> >>> diff -Naur tar-1.21/src/extract.c tar-1.21.new/src/extract.c
> >>> --- tar-1.21/src/extract.c 2008-10-30 15:10:28.000000000 +0100
> >>> +++ tar-1.21.new/src/extract.c 2009-01-31 23:32:03.000000000 +0100
> >>> @@ -1267,7 +1267,6 @@
> >>> removed by a later extraction. */
> >>> if (lstat (source, &st) == 0
> >>> && st.st_dev == ds->dev
> >>> - && st.st_ino == ds->ino
> >>> && timespec_cmp (get_stat_mtime (&st), ds->mtime) == 0)
> >>> {
> >>> /* Unlink the placeholder, then create a hard link if possible,
> >>>
> >>>
> >>> What do you think ?
> >>>
> >>>
> >> I think this comparision is appropriate to confirm identity of the
> >> placeholder file.
> >>
> >> I have no idea why inode number had changed as you reported.
> >> Symbolic links of nilfs2 are simply implemented.
> >>
> >> If it's unique to nilfs2, you may hit some sort of timing issue.
> >> I think a sample tar file would be helpful to figure out what's
> >> happening.
> >>
> >> Regards,
> >> Ryusuke Konishi
> >>
> >>
> >>
> >>> David Arendt wrote:
> >>>
> >>>
> >>>> Hi,
> >>>>
> >>>> After using nilfs2 for half a year now on data partitions without any
> >>>> problems, I wanted to try it for the root partition. This way I
> >>>> discovered a silent random symbolic link corruption problem.
> >>>>
> >>>> Versions:
> >>>>
> >>>> latest nilfs2 git module
> >>>> kernel 2.6.28.2
> >>>> tar 1.20
> >>>>
> >>>> Step to reproduce it:
> >>>>
> >>>> tar -xpf zz1.tar (where zz1.tar is a tar file containing many symbolic
> >>>> links (in my case a directory containing 2 root filesystems for remote
> >>>> booting)
> >>>>
> >>>> On untar some symbolic links are missing and 0 byte files are existing
> >>>> instead.
> >>>>
> >>>> I repeated the test 3 times on a freshly formated nilfs2 partition and
> >>>> always had other links missing.
> >>>>
> >>>> I am currently trying to bzip2 the big tar file and untar this one in
> >>>> order to verify if there are no timing issues and will report back when
> >>>> this test is finished.
> >>>>
> >>>> Could you please look into this ?
> >>>>
> >>>> Thanks in advance
> >>>> Bye,
> >>>> David Arendt
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: silent random symbolic link corruption
[not found] ` <20090308.153730.64866441.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-03-08 13:07 ` Ryusuke Konishi
2009-03-08 15:45 ` Ryusuke Konishi
1 sibling, 0 replies; 9+ messages in thread
From: Ryusuke Konishi @ 2009-03-08 13:07 UTC (permalink / raw)
To: users-JrjvKiOkagjYtjvyW6yDsg, admin-/LHdS3kC8BfYtjvyW6yDsg
On Sun, 08 Mar 2009 15:37:30 +0900 (JST), Ryusuke Konishi wrote:
> Maybe nilfs should support nano second timestamps, but the inode of
> nilfs is unfortunately short 32-bits for this. :(
>
> If I can allot one 64-bit field of the btree root array, this can be
> possible. But it breaks the compatibility. Another candidate is an
> unused 64-bit field reserved for extended attribute. But I'd like to
> reserve it as is because it is enough important. So it's thorny.
I just noticed another possibility. The current nilfs inodes have
64-bit dtime field, which stores their deletion time and is not used
during inode is alive. We may apply this for the nano-second time
stamps without breaking compatibility. I'll think for a second.
Cheers,
Ryusuke
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: silent random symbolic link corruption
[not found] ` <20090308.153730.64866441.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-03-08 13:07 ` Ryusuke Konishi
@ 2009-03-08 15:45 ` Ryusuke Konishi
1 sibling, 0 replies; 9+ messages in thread
From: Ryusuke Konishi @ 2009-03-08 15:45 UTC (permalink / raw)
To: users-JrjvKiOkagjYtjvyW6yDsg, admin-/LHdS3kC8BfYtjvyW6yDsg
On Sun, 08 Mar 2009 15:37:30 +0900 (JST), Ryusuke Konishi wrote:
> Some filesystems including ext3 and nilfs2, do not support nano-second
> timestamps on disk, so nano-second mtime values have possibility to
> reset to zero when flushed from memory. This is the reason why you
> saw the problem randomly.
>
> The solution for this is
>
> 1) supporting nano second time-stamps.
> 2) changing tar program to stop comparison of nano-second time values
> 3) changing tar to stop the comparison if underlying filesystem
> does not support the resolution.
>
> The solution 1 suffers compatiblity problem for filesystems.
> Maybe nilfs should support nano second timestamps, but the inode of
> nilfs is unfortunately short 32-bits for this. :(
>
> If I can allot one 64-bit field of the btree root array, this can be
> possible. But it breaks the compatibility. Another candidate is an
> unused 64-bit field reserved for extended attribute. But I'd like to
> reserve it as is because it is enough important. So it's thorny.
>
> The solution 3 requires a new kernel interface. Actually this was
> discussed recently in the kernel mailing list, but it's unconcluded.
>
> I don't know why the tar requires such intermediate file, but the 3
> seems required at the thought of conventional file systems.
I've found the cause of this problem in timestamp initialization of
on-memory nilfs inode. It initialized timestamps with valid nano
second values even though nilfs does not support it on disk.
It seems not to be a tar problem, and maybe properly handled file
systems (e.g. ext3) does not suffer the problem.
By applying the following patch, this problem may be solved.
However, I'd like to support nano-second timestamp in the next release
in the way I wrote in the previous mail. I think the high resolution
timestamp is an important feature for file systems of today.
Regards,
Ryusuke Konishi
diff --git a/fs/inode.c b/fs/inode.c
index 46b24e5..3d39dff 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -355,7 +355,7 @@ struct inode *nilfs_new_inode(struct inode *dir, int mode)
inode->i_blksize = PAGE_SIZE; /* This is the optimal IO size
(for stat), not fs block size */
#endif
- inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME;
+ inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME_SEC;
if (S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)) {
err = nilfs_bmap_read(ii->i_bmap, NULL);
^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-03-08 15:45 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-31 20:13 silent random symbolic link corruption David Arendt
[not found] ` <4984B0DC.6080905-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
2009-01-31 20:48 ` David Arendt
2009-01-31 23:21 ` David Arendt
[not found] ` <4984DCF3.8030302-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
2009-02-02 2:42 ` Ryusuke Konishi
[not found] ` <20090202.114209.59790430.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-02-02 17:32 ` David Arendt
[not found] ` <49872E1E.4090209-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
2009-02-02 21:01 ` David Arendt
[not found] ` <49875F16.7060107-/LHdS3kC8BfYtjvyW6yDsg@public.gmane.org>
2009-03-08 6:37 ` Ryusuke Konishi
[not found] ` <20090308.153730.64866441.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-03-08 13:07 ` Ryusuke Konishi
2009-03-08 15:45 ` Ryusuke Konishi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.