* nfs write problems
@ 2007-10-02 19:56 Bob Kryger
2007-10-03 0:07 ` Trond Myklebust
0 siblings, 1 reply; 6+ messages in thread
From: Bob Kryger @ 2007-10-02 19:56 UTC (permalink / raw)
To: nfs, For users of Fedora, Bob Kryger
So, I have a relatively new system on which I am seeing strange NFS
behavior.
In short I am getting seemingly random errors in files written via NFS.
* I do not get the errors if I write files locally.
* I have no errors in the NIC, I even tried a second NIC in a PCI
slot as opposed to the onboard one. There are no errors recorded
on the NIC or the switch on a 1Gb port.
* I see no memory errors, I ran memtest for 3 days clean.
* To test I am using dd if=/dev/zero of various (large) file sizes.
* Since I know that the file should be all zeros I wrote a C program
to read it back and tell me where it finds non-zero bytes. The
program results are confirmed with od.
* The files read back always have the errors in the same place, so
it is not a problem with reading the files.
* There are no errors in any logs.
* The problem occurs on both the RAID1 (ext3) and RAID10 (xfs)
filesystems.
* I've tried two clients, both FC5 one 64bit, and the other a 32 bit
with the same results. This error was uncovered by users
attempting to write files from other systems and other Fedora
releases, so it is repeatable regardless of the client.
* the server is not running anything else and spends a large portion
of the time idle. loadaverages are quite low. swap is mostly
unused. a large portion of RAM is allocated to file cache, but I
expect that this would be normal for this amount of file IO.
The server is running an up-to-date FC6, although this also occurred
with FC5. I am about to try F7.
Hardware is an AMD 1220 dual core 64bit, on a Tyan K8SSA S3950
with an Adaptec Raid 2230SLP and 7 Fujitsu MAU3147NC.
The RAID config is that 1 disks (on diff channels) are in a Mirror for
the OS, 4 are in a Raid 10 config and 1 is a hot spare.
Anyone ever seen anything like this before?
Suggest where I might look next?
Additional tests?
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: nfs write problems
2007-10-02 19:56 nfs write problems Bob Kryger
@ 2007-10-03 0:07 ` Trond Myklebust
2007-10-03 10:21 ` Bob Kryger
0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2007-10-03 0:07 UTC (permalink / raw)
To: Bob Kryger; +Cc: For users of Fedora, nfs
On Tue, 2007-10-02 at 15:56 -0400, Bob Kryger wrote:
> So, I have a relatively new system on which I am seeing strange NFS
> behavior.
>
> In short I am getting seemingly random errors in files written via NFS.
>
> * I do not get the errors if I write files locally.
> * I have no errors in the NIC, I even tried a second NIC in a PCI
> slot as opposed to the onboard one. There are no errors recorded
> on the NIC or the switch on a 1Gb port.
> * I see no memory errors, I ran memtest for 3 days clean.
> * To test I am using dd if=/dev/zero of various (large) file sizes.
> * Since I know that the file should be all zeros I wrote a C program
> to read it back and tell me where it finds non-zero bytes. The
> program results are confirmed with od.
> * The files read back always have the errors in the same place, so
> it is not a problem with reading the files.
> * There are no errors in any logs.
> * The problem occurs on both the RAID1 (ext3) and RAID10 (xfs)
> filesystems.
> * I've tried two clients, both FC5 one 64bit, and the other a 32 bit
> with the same results. This error was uncovered by users
> attempting to write files from other systems and other Fedora
> releases, so it is repeatable regardless of the client.
> * the server is not running anything else and spends a large portion
> of the time idle. loadaverages are quite low. swap is mostly
> unused. a large portion of RAM is allocated to file cache, but I
> expect that this would be normal for this amount of file IO.
>
> The server is running an up-to-date FC6, although this also occurred
> with FC5. I am about to try F7.
>
> Hardware is an AMD 1220 dual core 64bit, on a Tyan K8SSA S3950
> with an Adaptec Raid 2230SLP and 7 Fujitsu MAU3147NC.
> The RAID config is that 1 disks (on diff channels) are in a Mirror for
> the OS, 4 are in a Raid 10 config and 1 is a hot spare.
>
> Anyone ever seen anything like this before?
> Suggest where I might look next?
> Additional tests?
Feel free to describe your test in a bit more detail. Without more
information, we obviously can't rule out the existence of an NFS bug,
however usually whenever people describe this sort of problem it is
because they have failed to understand the NFS caching model as
described in
http://nfs.sourceforge.net/#faq_a8
So please include a reproducible test for us.
Cheers
Trond
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: nfs write problems
2007-10-03 0:07 ` Trond Myklebust
@ 2007-10-03 10:21 ` Bob Kryger
2007-10-03 13:31 ` J. Bruce Fields
0 siblings, 1 reply; 6+ messages in thread
From: Bob Kryger @ 2007-10-03 10:21 UTC (permalink / raw)
To: Trond Myklebust; +Cc: For users of Fedora, nfs
Trond Myklebust wrote:
> On Tue, 2007-10-02 at 15:56 -0400, Bob Kryger wrote:
>
>> So, I have a relatively new system on which I am seeing strange NFS
>> behavior.
>>
>> In short I am getting seemingly random errors in files written via NFS.
>>
[snip details]
>> Anyone ever seen anything like this before?
>> Suggest where I might look next?
>> Additional tests?
>>
>
> Feel free to describe your test in a bit more detail. Without more
> information, we obviously can't rule out the existence of an NFS bug,
>
I was trying to be thorough, I hope I succeeded.
Is there anything else that might be helpful? I certainly would not go
to a bug first, as I may very well have something misconfigured, but I
cannot seem to identify what that might be. I do have about 8 other
linux NFS servers in production on different hardware, SATA mostly,
where I am not seeing any issues. I don't think it's a hardware issue
though, as I cannot reproduce the problem without the use of NFS. (Hmm,
maybe if I NFS mount to the server itself. Would that prove anything?)
> however usually whenever people describe this sort of problem it is
> because they have failed to understand the NFS caching model as
> described in
>
> http://nfs.sourceforge.net/#faq_a8
>
Excellent, Thanks for the lead and I will test these items shortly.
After reading the FAQ, I'm not sure I see how the cache consistency
mechanisms apply to this problem. If I test the files after they are
closed shouldn't the data be consistent, written completely to the
server? If there were a data write error should I not see it somewhere?
If so where? client? server? would it be up to the client program to
catch it? I wonder if dd would see it. For the purpose of testing, I
have limited this server to serving to only a single client at a time,
so there will be no other variables/systems interfering.
So to test this I read back the data of a newly written, 256M file,
right from the client that wrote it. In this case with nocto option.
This should take the client cache into account. I compared the results
from the server side as well. It had errors, the same errors in the same
locations on both the client and the server. So, this seems to indicate
that it is the issue is on the nfs client not the server. (hmmm) But the
same client does not have a problem with any other server. At least one
has never been reported. I'll verify that rigorously.
I am not familiar with the mechanism that NFS uses to verify data
validity between the client and the server. I assume that there is some
sort of checksum. Did I mention that this is NFSv3? At least I have not
specified v4.
> So please include a reproducible test for us.
>
Easily reproducible on this system. Short of providing access to this
system, not sure what more to do. Oh, wait, was that humor? Indicating
that I have provided significant detail? Dang, I've got to sharpen my
international tongue-in-cheek detector.
> Cheers
> Trond
>
Cool name
thanks
Bob
>
>
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nfs write problems
2007-10-03 10:21 ` Bob Kryger
@ 2007-10-03 13:31 ` J. Bruce Fields
2007-10-03 17:53 ` Bob Kryger
2007-10-04 20:12 ` [NFS] " Bob Kryger
0 siblings, 2 replies; 6+ messages in thread
From: J. Bruce Fields @ 2007-10-03 13:31 UTC (permalink / raw)
To: Bob Kryger; +Cc: For users of Fedora, nfs, Trond Myklebust
On Wed, Oct 03, 2007 at 06:21:51AM -0400, Bob Kryger wrote:
> Trond Myklebust wrote:
> > Feel free to describe your test in a bit more detail. Without more
> > information, we obviously can't rule out the existence of an NFS bug,
> >
> I was trying to be thorough, I hope I succeeded.
The dd test sounded simple enough, but if you could include a transcript
of the commands you ran (together with the C code of any test programs)
just to be sure, that might help.
--b.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: nfs write problems
2007-10-03 13:31 ` J. Bruce Fields
@ 2007-10-03 17:53 ` Bob Kryger
2007-10-04 20:12 ` [NFS] " Bob Kryger
1 sibling, 0 replies; 6+ messages in thread
From: Bob Kryger @ 2007-10-03 17:53 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: For users of Fedora, nfs, Trond Myklebust
Sure, easily. I gziped a capture of a single test.
http://www.panix.com/~bobk/typescript.gz
the program is a simple one...
#include <stdio.h>
main()
{
unsigned long long k=0;
unsigned char byte;
while(!feof(stdin))
{
byte=getc(stdin);
if(byte!=0)
printf("nonzero (0%o) found at byte %d\n",byte,k);
k++;
}
printf("file size = %d bytes\n",k);
}
J. Bruce Fields wrote:
> On Wed, Oct 03, 2007 at 06:21:51AM -0400, Bob Kryger wrote:
>
>> Trond Myklebust wrote:
>>
>>> Feel free to describe your test in a bit more detail. Without more
>>> information, we obviously can't rule out the existence of an NFS bug,
>>>
>>>
>> I was trying to be thorough, I hope I succeeded.
>>
>
> The dd test sounded simple enough, but if you could include a transcript
> of the commands you ran (together with the C code of any test programs)
> just to be sure, that might help.
>
> --b.
>
>
>
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [NFS] nfs write problems
2007-10-03 13:31 ` J. Bruce Fields
2007-10-03 17:53 ` Bob Kryger
@ 2007-10-04 20:12 ` Bob Kryger
1 sibling, 0 replies; 6+ messages in thread
From: Bob Kryger @ 2007-10-04 20:12 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: For users of Fedora, nfs, Trond Myklebust
Interesting development that changes the suspect component.
On the suggestion of a colleague I setup samba and mounted the same
location via CIFS.
I am getting the same results the files are corrupted if they are
written via samba or NFS.
Now, I'm really confused. I guess I'll try swapping out the SCSI HBA.
Thanks
Bob
J. Bruce Fields wrote:
> On Wed, Oct 03, 2007 at 06:21:51AM -0400, Bob Kryger wrote:
>
>> Trond Myklebust wrote:
>>
>>> Feel free to describe your test in a bit more detail. Without more
>>> information, we obviously can't rule out the existence of an NFS bug,
>>>
>>>
>> I was trying to be thorough, I hope I succeeded.
>>
>
> The dd test sounded simple enough, but if you could include a transcript
> of the commands you ran (together with the C code of any test programs)
> just to be sure, that might help.
>
> --b.
>
>
>
--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-10-04 20:12 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-02 19:56 nfs write problems Bob Kryger
2007-10-03 0:07 ` Trond Myklebust
2007-10-03 10:21 ` Bob Kryger
2007-10-03 13:31 ` J. Bruce Fields
2007-10-03 17:53 ` Bob Kryger
2007-10-04 20:12 ` [NFS] " Bob Kryger
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.