All of lore.kernel.org
 help / color / mirror / Atom feed
* knsfd - files don't sync to disk
@ 2002-05-16 19:43 Jörgen Karlsson
  2002-05-17 11:58 ` Neil Brown
  0 siblings, 1 reply; 4+ messages in thread
From: Jörgen Karlsson @ 2002-05-16 19:43 UTC (permalink / raw)
  To: nfs

Hi,

We have a serious problem with knfsd and kernel 2.4.17.

When doing a database backup in our linux cluster we have noticed that
a few files are not written to disk properly. The files have zero
file size when checked with the 'ls' command

Several thousands of files are written to the nfs server during a
short period of time.  Average file size is 300-400 bytes.

We have noticed that usually 1-2 out of 3000 files will have their sizes
truncated to 0.

File  system is exported with (rw,sync)

The disk filesystem is ext2.

E.g. this is what is happening:
          - client writes 272 bytes to the server
          - mm/file.c:generic_file_write() returns 272 bytes written
          - fs/nfsd/vfs.c:nfs_write() increments nfsdstats.io_write +=272
             and returns that the write was sucessful.
          - knfsd returns to client that 272 bytes was written.
          - when checking with 'ls' command on server the file size is zero.

Apparently the nfs server lies to the client and the files are not
properly synced/written to disk.

Setting no_wdelay makes no difference.

When running with async set (or sync removed) the problem
disappears (no files have their sizes truncated to zero).

The nfs server is a PIII-700/ 1GB RAM

Any ideas what is going on ?

/Jörgen Karlsson














_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: bandwidth@sourceforge.net
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: knsfd - files don't sync to disk
  2002-05-16 19:43 knsfd - files don't sync to disk Jörgen Karlsson
@ 2002-05-17 11:58 ` Neil Brown
  2002-05-17 17:18   ` Jörgen Karlsson
  0 siblings, 1 reply; 4+ messages in thread
From: Neil Brown @ 2002-05-17 11:58 UTC (permalink / raw)
  To: Jörgen Karlsson; +Cc: nfs


Are you using NFSv3?

Does this patch:
http://www.cse.unsw.edu.au/~neilb/patches/linux-stable/2.4.19-pre5/patch-I-NfsdVfsTidyup

make a difference?

NeilBrown

On Thursday May 16, jorgen.karlsson@chello.se wrote:
> Hi,
> 
> We have a serious problem with knfsd and kernel 2.4.17.
> 
> When doing a database backup in our linux cluster we have noticed that
> a few files are not written to disk properly. The files have zero
> file size when checked with the 'ls' command
> 
> Several thousands of files are written to the nfs server during a
> short period of time.  Average file size is 300-400 bytes.
> 
> We have noticed that usually 1-2 out of 3000 files will have their sizes
> truncated to 0.
> 
> File  system is exported with (rw,sync)
> 
> The disk filesystem is ext2.
> 
> E.g. this is what is happening:
>           - client writes 272 bytes to the server
>           - mm/file.c:generic_file_write() returns 272 bytes written
>           - fs/nfsd/vfs.c:nfs_write() increments nfsdstats.io_write +=272
>              and returns that the write was sucessful.
>           - knfsd returns to client that 272 bytes was written.
>           - when checking with 'ls' command on server the file size is zero.
> 
> Apparently the nfs server lies to the client and the files are not
> properly synced/written to disk.
> 
> Setting no_wdelay makes no difference.
> 
> When running with async set (or sync removed) the problem
> disappears (no files have their sizes truncated to zero).
> 
> The nfs server is a PIII-700/ 1GB RAM
> 
> Any ideas what is going on ?

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: bandwidth@sourceforge.net
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: knsfd - files don't sync to disk
  2002-05-17 11:58 ` Neil Brown
@ 2002-05-17 17:18   ` Jörgen Karlsson
  2002-05-17 20:56     ` Neil Brown
  0 siblings, 1 reply; 4+ messages in thread
From: Jörgen Karlsson @ 2002-05-17 17:18 UTC (permalink / raw)
  To: Neil Brown; +Cc: nfs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii; format=flowed, Size: 2930 bytes --]

We are using NFSv2.

We have found a temporary workaround by reducing the number of nfs threads
to 1. 

We have a 30 processor cluster that gave zero size files everytime we made
a backup. By reducing the number of threads from 8 to 1 we were able to make
25 backups without any problem. Switching back to 8 threads we immediately
got the corrupted files back.

Maybe some down() and up() are missing.......

If you think the the patch may fix it for v2 as well as v3 we can try the
patch next week.

Can running with only one thread cause any performance problems (with
sync the bottleneck probably is disk io....)?

/Jorgen Karlsson

Neil Brown wrote:

>Are you using NFSv3?
>
>Does this patch:
>http://www.cse.unsw.edu.au/~neilb/patches/linux-stable/2.4.19-pre5/patch-I-NfsdVfsTidyup
>
>make a difference?
>
>NeilBrown
>
>On Thursday May 16, jorgen.karlsson@chello.se wrote:
>
>>Hi,
>>
>>We have a serious problem with knfsd and kernel 2.4.17.
>>
>>When doing a database backup in our linux cluster we have noticed that
>>a few files are not written to disk properly. The files have zero
>>file size when checked with the 'ls' command
>>
>>Several thousands of files are written to the nfs server during a
>>short period of time.  Average file size is 300-400 bytes.
>>
>>We have noticed that usually 1-2 out of 3000 files will have their sizes
>>truncated to 0.
>>
>>File  system is exported with (rw,sync)
>>
>>The disk filesystem is ext2.
>>
>>E.g. this is what is happening:
>>          - client writes 272 bytes to the server
>>          - mm/file.c:generic_file_write() returns 272 bytes written
>>          - fs/nfsd/vfs.c:nfs_write() increments nfsdstats.io_write +=272
>>             and returns that the write was sucessful.
>>          - knfsd returns to client that 272 bytes was written.
>>          - when checking with 'ls' command on server the file size is zero.
>>
>>Apparently the nfs server lies to the client and the files are not
>>properly synced/written to disk.
>>
>>Setting no_wdelay makes no difference.
>>
>>When running with async set (or sync removed) the problem
>>disappears (no files have their sizes truncated to zero).
>>
>>The nfs server is a PIII-700/ 1GB RAM
>>
>>Any ideas what is going on ?
>>
>
>_______________________________________________________________
>
>Have big pipes? SourceForge.net is looking for download mirrors. We supply
>the hardware. You get the recognition. Email Us: bandwidth@sourceforge.net
>_______________________________________________
>NFS maillist  -  NFS@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/nfs
>




_______________________________________________________________

Hundreds of nodes, one monster rendering program.
Now that’s a super model! Visit http://clustering.foundries.sf.net/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: knsfd - files don't sync to disk
  2002-05-17 17:18   ` Jörgen Karlsson
@ 2002-05-17 20:56     ` Neil Brown
  0 siblings, 0 replies; 4+ messages in thread
From: Neil Brown @ 2002-05-17 20:56 UTC (permalink / raw)
  To: Jörgen Karlsson; +Cc: nfs

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 1854 bytes --]

On Friday May 17, publius@chello.se wrote:
> We are using NFSv2.
> 
> We have found a temporary workaround by reducing the number of nfs threads
> to 1. 
> 
> We have a 30 processor cluster that gave zero size files everytime we made
> a backup. By reducing the number of threads from 8 to 1 we were able to make
> 25 backups without any problem. Switching back to 8 threads we immediately
> got the corrupted files back.
> 
> Maybe some down() and up() are missing.......
> 
> If you think the the patch may fix it for v2 as well as v3 we can try the
> patch next week.

The patch would not affect NFSv2 with the no_wdelay export option, so
if that combination still have problems, don't bother with the patch.

> 
> Can running with only one thread cause any performance problems (with
> sync the bottleneck probably is disk io....)?

The more threads, the more concurrent IO you can be waiting on, so I
would definately expect a reduction in performance.

The whole scenario is very odd..
You have confirmed that nfsd does actually write data to the file, but
if you look afterwards, the file is empty.
This suggests that one of:
  It wrote to the wrong file by mistake
  A subsequent "truncate" request was received
  The filesystem lied when it said that it had written data.

Can you get a complete  tcpdump trace of NFS activity on the server
and let me have a look at it?
Something like

   tcpdump -w /var/tmp/file -s 1500 port 2049
   bzip2 /var/tmp/file

on the server, and let me know where to pick it up??

NeilBrown

_______________________________________________________________

Hundreds of nodes, one monster rendering program.
Now that’s a super model! Visit http://clustering.foundries.sf.net/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2002-05-17 20:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-16 19:43 knsfd - files don't sync to disk Jörgen Karlsson
2002-05-17 11:58 ` Neil Brown
2002-05-17 17:18   ` Jörgen Karlsson
2002-05-17 20:56     ` Neil Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.