* knsfd - files don't sync to disk
@ 2002-05-16 19:43 Jörgen Karlsson
2002-05-17 11:58 ` Neil Brown
0 siblings, 1 reply; 4+ messages in thread
From: Jörgen Karlsson @ 2002-05-16 19:43 UTC (permalink / raw)
To: nfs
Hi,
We have a serious problem with knfsd and kernel 2.4.17.
When doing a database backup in our linux cluster we have noticed that
a few files are not written to disk properly. The files have zero
file size when checked with the 'ls' command
Several thousands of files are written to the nfs server during a
short period of time. Average file size is 300-400 bytes.
We have noticed that usually 1-2 out of 3000 files will have their sizes
truncated to 0.
File system is exported with (rw,sync)
The disk filesystem is ext2.
E.g. this is what is happening:
- client writes 272 bytes to the server
- mm/file.c:generic_file_write() returns 272 bytes written
- fs/nfsd/vfs.c:nfs_write() increments nfsdstats.io_write +=272
and returns that the write was sucessful.
- knfsd returns to client that 272 bytes was written.
- when checking with 'ls' command on server the file size is zero.
Apparently the nfs server lies to the client and the files are not
properly synced/written to disk.
Setting no_wdelay makes no difference.
When running with async set (or sync removed) the problem
disappears (no files have their sizes truncated to zero).
The nfs server is a PIII-700/ 1GB RAM
Any ideas what is going on ?
/Jörgen Karlsson
_______________________________________________________________
Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: bandwidth@sourceforge.net
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: knsfd - files don't sync to disk
2002-05-16 19:43 knsfd - files don't sync to disk Jörgen Karlsson
@ 2002-05-17 11:58 ` Neil Brown
2002-05-17 17:18 ` Jörgen Karlsson
0 siblings, 1 reply; 4+ messages in thread
From: Neil Brown @ 2002-05-17 11:58 UTC (permalink / raw)
To: Jörgen Karlsson; +Cc: nfs
Are you using NFSv3?
Does this patch:
http://www.cse.unsw.edu.au/~neilb/patches/linux-stable/2.4.19-pre5/patch-I-NfsdVfsTidyup
make a difference?
NeilBrown
On Thursday May 16, jorgen.karlsson@chello.se wrote:
> Hi,
>
> We have a serious problem with knfsd and kernel 2.4.17.
>
> When doing a database backup in our linux cluster we have noticed that
> a few files are not written to disk properly. The files have zero
> file size when checked with the 'ls' command
>
> Several thousands of files are written to the nfs server during a
> short period of time. Average file size is 300-400 bytes.
>
> We have noticed that usually 1-2 out of 3000 files will have their sizes
> truncated to 0.
>
> File system is exported with (rw,sync)
>
> The disk filesystem is ext2.
>
> E.g. this is what is happening:
> - client writes 272 bytes to the server
> - mm/file.c:generic_file_write() returns 272 bytes written
> - fs/nfsd/vfs.c:nfs_write() increments nfsdstats.io_write +=272
> and returns that the write was sucessful.
> - knfsd returns to client that 272 bytes was written.
> - when checking with 'ls' command on server the file size is zero.
>
> Apparently the nfs server lies to the client and the files are not
> properly synced/written to disk.
>
> Setting no_wdelay makes no difference.
>
> When running with async set (or sync removed) the problem
> disappears (no files have their sizes truncated to zero).
>
> The nfs server is a PIII-700/ 1GB RAM
>
> Any ideas what is going on ?
_______________________________________________________________
Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: bandwidth@sourceforge.net
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: knsfd - files don't sync to disk
2002-05-17 11:58 ` Neil Brown
@ 2002-05-17 17:18 ` Jörgen Karlsson
2002-05-17 20:56 ` Neil Brown
0 siblings, 1 reply; 4+ messages in thread
From: Jörgen Karlsson @ 2002-05-17 17:18 UTC (permalink / raw)
To: Neil Brown; +Cc: nfs
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii; format=flowed, Size: 2930 bytes --]
We are using NFSv2.
We have found a temporary workaround by reducing the number of nfs threads
to 1.
We have a 30 processor cluster that gave zero size files everytime we made
a backup. By reducing the number of threads from 8 to 1 we were able to make
25 backups without any problem. Switching back to 8 threads we immediately
got the corrupted files back.
Maybe some down() and up() are missing.......
If you think the the patch may fix it for v2 as well as v3 we can try the
patch next week.
Can running with only one thread cause any performance problems (with
sync the bottleneck probably is disk io....)?
/Jorgen Karlsson
Neil Brown wrote:
>Are you using NFSv3?
>
>Does this patch:
>http://www.cse.unsw.edu.au/~neilb/patches/linux-stable/2.4.19-pre5/patch-I-NfsdVfsTidyup
>
>make a difference?
>
>NeilBrown
>
>On Thursday May 16, jorgen.karlsson@chello.se wrote:
>
>>Hi,
>>
>>We have a serious problem with knfsd and kernel 2.4.17.
>>
>>When doing a database backup in our linux cluster we have noticed that
>>a few files are not written to disk properly. The files have zero
>>file size when checked with the 'ls' command
>>
>>Several thousands of files are written to the nfs server during a
>>short period of time. Average file size is 300-400 bytes.
>>
>>We have noticed that usually 1-2 out of 3000 files will have their sizes
>>truncated to 0.
>>
>>File system is exported with (rw,sync)
>>
>>The disk filesystem is ext2.
>>
>>E.g. this is what is happening:
>> - client writes 272 bytes to the server
>> - mm/file.c:generic_file_write() returns 272 bytes written
>> - fs/nfsd/vfs.c:nfs_write() increments nfsdstats.io_write +=272
>> and returns that the write was sucessful.
>> - knfsd returns to client that 272 bytes was written.
>> - when checking with 'ls' command on server the file size is zero.
>>
>>Apparently the nfs server lies to the client and the files are not
>>properly synced/written to disk.
>>
>>Setting no_wdelay makes no difference.
>>
>>When running with async set (or sync removed) the problem
>>disappears (no files have their sizes truncated to zero).
>>
>>The nfs server is a PIII-700/ 1GB RAM
>>
>>Any ideas what is going on ?
>>
>
>_______________________________________________________________
>
>Have big pipes? SourceForge.net is looking for download mirrors. We supply
>the hardware. You get the recognition. Email Us: bandwidth@sourceforge.net
>_______________________________________________
>NFS maillist - NFS@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/nfs
>
_______________________________________________________________
Hundreds of nodes, one monster rendering program.
Now thats a super model! Visit http://clustering.foundries.sf.net/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: knsfd - files don't sync to disk
2002-05-17 17:18 ` Jörgen Karlsson
@ 2002-05-17 20:56 ` Neil Brown
0 siblings, 0 replies; 4+ messages in thread
From: Neil Brown @ 2002-05-17 20:56 UTC (permalink / raw)
To: Jörgen Karlsson; +Cc: nfs
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 1854 bytes --]
On Friday May 17, publius@chello.se wrote:
> We are using NFSv2.
>
> We have found a temporary workaround by reducing the number of nfs threads
> to 1.
>
> We have a 30 processor cluster that gave zero size files everytime we made
> a backup. By reducing the number of threads from 8 to 1 we were able to make
> 25 backups without any problem. Switching back to 8 threads we immediately
> got the corrupted files back.
>
> Maybe some down() and up() are missing.......
>
> If you think the the patch may fix it for v2 as well as v3 we can try the
> patch next week.
The patch would not affect NFSv2 with the no_wdelay export option, so
if that combination still have problems, don't bother with the patch.
>
> Can running with only one thread cause any performance problems (with
> sync the bottleneck probably is disk io....)?
The more threads, the more concurrent IO you can be waiting on, so I
would definately expect a reduction in performance.
The whole scenario is very odd..
You have confirmed that nfsd does actually write data to the file, but
if you look afterwards, the file is empty.
This suggests that one of:
It wrote to the wrong file by mistake
A subsequent "truncate" request was received
The filesystem lied when it said that it had written data.
Can you get a complete tcpdump trace of NFS activity on the server
and let me have a look at it?
Something like
tcpdump -w /var/tmp/file -s 1500 port 2049
bzip2 /var/tmp/file
on the server, and let me know where to pick it up??
NeilBrown
_______________________________________________________________
Hundreds of nodes, one monster rendering program.
Now thats a super model! Visit http://clustering.foundries.sf.net/
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2002-05-17 20:56 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-16 19:43 knsfd - files don't sync to disk Jörgen Karlsson
2002-05-17 11:58 ` Neil Brown
2002-05-17 17:18 ` Jörgen Karlsson
2002-05-17 20:56 ` Neil Brown
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.