All of lore.kernel.org
 help / color / mirror / Atom feed
* NFS corruption (duplicated data)
@ 2004-06-08 15:44 Andy
  2004-06-08 15:57 ` Trond Myklebust
  2004-06-08 23:51 ` Nathan Scott
  0 siblings, 2 replies; 5+ messages in thread
From: Andy @ 2004-06-08 15:44 UTC (permalink / raw)
  To: linux-kernel

I really don't understand what could be causing this, but it happens on
several machine and at least on kernels 2.4.22, 2.4.25, 2.4.26.
NFS v3 : hard, udp, rsize=8192,wsize=8192
local filesystems are XFS

Trond, this is data corruption not dropped packets so the protocol
being UDP is not the problem.

Here is what is happening :

Copying a file of offsets from machine A to machine B over NFS and then
comparing the file on B with the file on A over NFS, the file on machine B
is corrupted in the following ways. 

Usually, data earlier in the file will show up again later.
For example :

57344 bytes of data from 672190464-672247807 is also in positions
1449664512-1449721855

sometimes, data later in the file is dupped to a position before it
should be

53248 bytes of data from 1197158400-1197211647 is also in positions
1036660736-1036713983

Any ideas

Andy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFS corruption (duplicated data)
  2004-06-08 15:44 NFS corruption (duplicated data) Andy
@ 2004-06-08 15:57 ` Trond Myklebust
  2004-06-08 23:51 ` Nathan Scott
  1 sibling, 0 replies; 5+ messages in thread
From: Trond Myklebust @ 2004-06-08 15:57 UTC (permalink / raw)
  To: Andy; +Cc: linux-kernel

På ty , 08/06/2004 klokka 11:44, skreiv Andy:
> I really don't understand what could be causing this, but it happens on
> several machine and at least on kernels 2.4.22, 2.4.25, 2.4.26.
> NFS v3 : hard, udp, rsize=8192,wsize=8192
> local filesystems are XFS

Does it occur on non-XFS partitions?

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFS corruption (duplicated data)
  2004-06-08 15:44 NFS corruption (duplicated data) Andy
  2004-06-08 15:57 ` Trond Myklebust
@ 2004-06-08 23:51 ` Nathan Scott
  2004-06-09  2:12   ` Russell Cattelan
  2004-06-09  5:19   ` Craig Tierney
  1 sibling, 2 replies; 5+ messages in thread
From: Nathan Scott @ 2004-06-08 23:51 UTC (permalink / raw)
  To: Andy, cattelan; +Cc: linux-kernel, linux-xfs

Hi Andy,

Be good to try this with files served from ext2/3 as well,
to try isolate it to XFS/NFS.  We have a known issue thats
possibly related to this in XFS - Russell, does this sound
like that problem you've been looking at?

If you have a simple test case to reproduce it (we have an
extremely complex test case to reproduce that other issue,
but from your description I'm not sure its the same), that
would be very helpful Andy.

thanks.

On Tue, Jun 08, 2004 at 10:44:22AM -0500, Andy wrote:
> I really don't understand what could be causing this, but it happens on
> several machine and at least on kernels 2.4.22, 2.4.25, 2.4.26.
> NFS v3 : hard, udp, rsize=8192,wsize=8192
> local filesystems are XFS
> 
> Trond, this is data corruption not dropped packets so the protocol
> being UDP is not the problem.
> 
> Here is what is happening :
> 
> Copying a file of offsets from machine A to machine B over NFS and then
> comparing the file on B with the file on A over NFS, the file on machine B
> is corrupted in the following ways. 
> 
> Usually, data earlier in the file will show up again later.
> For example :
> 
> 57344 bytes of data from 672190464-672247807 is also in positions
> 1449664512-1449721855
> 
> sometimes, data later in the file is dupped to a position before it
> should be
> 
> 53248 bytes of data from 1197158400-1197211647 is also in positions
> 1036660736-1036713983
> 
> Any ideas
> 
> Andy
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Nathan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFS corruption (duplicated data)
  2004-06-08 23:51 ` Nathan Scott
@ 2004-06-09  2:12   ` Russell Cattelan
  2004-06-09  5:19   ` Craig Tierney
  1 sibling, 0 replies; 5+ messages in thread
From: Russell Cattelan @ 2004-06-09  2:12 UTC (permalink / raw)
  To: Nathan Scott; +Cc: cattelan, Andy, linux-xfs, linux-kernel

Ya it's probably the same problem.

To really understand what is going on in terms of corruption
pattern up the disk with a known pattern and re-run the test.

If the corrupted areas show up as the known pattern you are dealing
with stale disk data and not data from the wrong file/process.

And at that point I would definitely say it's bug:
http://oss.sgi.com/bugzilla/show_bug.cgi?id=198
On Jun 8, 2004, at 6:51 PM, Nathan Scott wrote:

> Hi Andy,
>
> Be good to try this with files served from ext2/3 as well,
> to try isolate it to XFS/NFS.  We have a known issue thats
> possibly related to this in XFS - Russell, does this sound
> like that problem you've been looking at?
>
> If you have a simple test case to reproduce it (we have an
> extremely complex test case to reproduce that other issue,
> but from your description I'm not sure its the same), that
> would be very helpful Andy.
>
> thanks.
>
> On Tue, Jun 08, 2004 at 10:44:22AM -0500, Andy wrote:
>> I really don't understand what could be causing this, but it happens 
>> on
>> several machine and at least on kernels 2.4.22, 2.4.25, 2.4.26.
>> NFS v3 : hard, udp, rsize=8192,wsize=8192
>> local filesystems are XFS
>>
>> Trond, this is data corruption not dropped packets so the protocol
>> being UDP is not the problem.
>>
>> Here is what is happening :
>>
>> Copying a file of offsets from machine A to machine B over NFS and 
>> then
>> comparing the file on B with the file on A over NFS, the file on 
>> machine B
>> is corrupted in the following ways.
>>
>> Usually, data earlier in the file will show up again later.
>> For example :
>>
>> 57344 bytes of data from 672190464-672247807 is also in positions
>> 1449664512-1449721855
>>
>> sometimes, data later in the file is dupped to a position before it
>> should be
>>
>> 53248 bytes of data from 1197158400-1197211647 is also in positions
>> 1036660736-1036713983
>>
>> Any ideas
>>
>> Andy
>> -
>> To unsubscribe from this list: send the line "unsubscribe 
>> linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>
> -- 
> Nathan
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NFS corruption (duplicated data)
  2004-06-08 23:51 ` Nathan Scott
  2004-06-09  2:12   ` Russell Cattelan
@ 2004-06-09  5:19   ` Craig Tierney
  1 sibling, 0 replies; 5+ messages in thread
From: Craig Tierney @ 2004-06-09  5:19 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Andy, cattelan, linux-kernel, linux-xfs

On Tue, 2004-06-08 at 17:51, Nathan Scott wrote:
> Hi Andy,
> 
> Be good to try this with files served from ext2/3 as well,
> to try isolate it to XFS/NFS.  We have a known issue thats
> possibly related to this in XFS - Russell, does this sound
> like that problem you've been looking at?
> 
> If you have a simple test case to reproduce it (we have an
> extremely complex test case to reproduce that other issue,
> but from your description I'm not sure its the same), that
> would be very helpful Andy.
> 
> thanks.

> 
> On Tue, Jun 08, 2004 at 10:44:22AM -0500, Andy wrote:
> > I really don't understand what could be causing this, but it happens on
> > several machine and at least on kernels 2.4.22, 2.4.25, 2.4.26.
> > NFS v3 : hard, udp, rsize=8192,wsize=8192
> > local filesystems are XFS
> > 
> > Trond, this is data corruption not dropped packets so the protocol
> > being UDP is not the problem.
> > 
> > Here is what is happening :
> > 
> > Copying a file of offsets from machine A to machine B over NFS and then
> > comparing the file on B with the file on A over NFS, the file on machine B
> > is corrupted in the following ways. 
> > 
> > Usually, data earlier in the file will show up again later.
> > For example :
> > 
> > 57344 bytes of data from 672190464-672247807 is also in positions
> > 1449664512-1449721855
> > 
> > sometimes, data later in the file is dupped to a position before it
> > should be
> > 
> > 53248 bytes of data from 1197158400-1197211647 is also in positions
> > 1036660736-1036713983
> > 

Are you performing other IO on the NFS system while this copy is
occuring?  Are you copying the same file over and over to try and
cause this problem?

Is it possible to zero all or as much disk as possible?  If you are 
copying the same file over and over, you might be seeing old data on
the disk and not necessarily seeing the filesystem putting the data in
the wrong place.  This might help isolate the problem to the one Russell
is working on vs. a different problem.

Craig



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-06-09  5:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-08 15:44 NFS corruption (duplicated data) Andy
2004-06-08 15:57 ` Trond Myklebust
2004-06-08 23:51 ` Nathan Scott
2004-06-09  2:12   ` Russell Cattelan
2004-06-09  5:19   ` Craig Tierney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.