public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG] knfsd causes file system corruption when files are locked.
@ 2000-11-15 23:24 Ivan Kanis
  2000-11-16  0:00 ` Neil Brown
  0 siblings, 1 reply; 5+ messages in thread
From: Ivan Kanis @ 2000-11-15 23:24 UTC (permalink / raw)
  To: linux-kernel

[1.] knfsd causes file system corruption when files are locked.

[2.] Lock down a file using the NLM_SHARE sharing mechanism. Remove
the file. Unlock the file using NLM_UNSHARE. The filesystem does not
recover the file space. I am running this on ext2fs. Fsck-ing the
filesystem does not help. The only way to recover the space is to
reformat the partition.

[3.] knfsd, lock, NLM_SHARE, NLM_UNSHARE

[4.] Linux version 2.2.16 (root@jedi) (gcc version 2.7.2.3)

[5.] N/A

[6.] This test.c program will reproduce the problem. You need to compile it
on a Solaris machine because Linux fcntl does not support NLM_SHARE.

-----start here
#include <fcntl.h> 
#include <errno.h> 
#include <stdio.h> 
  
int main (int argc, char *argv[])  
{ 
  struct fshare lck; 
  int fd, ret; 
  if (argc != 2) { 
    printf ("Usage: %s file to lock\n", argv[0]); 
    return 1; 
  } 
  fd = open (argv[1], O_WRONLY); 
  memset (&lck, 0, sizeof (struct flock)); 
  lck.f_access = F_WRACC; 
  lck.f_deny = F_NODNY; 
  ret = fcntl (fd, F_SHARE, &lck); 
  unlink (argv[1]); 
  ret = fcntl (fd, F_UNSHARE, &lck); 
 
  return 0; 
} 
-----end here

Step to reproduce the problem

- Compile the program:
gcc test.c -o test

- Mount a Linux nfs partition on Solaris: (Remember the partition will
get corrupted, use a partition that you don't care about.)
mount -o vers=2 jedi:/sandbox /mnt

- Create a chunk of data on /mnt
dd if=/dev/zero of=/mnt/chunk count=10000

- Do a df before running the program

- Run the test program
./test /mnt/chunk

- Run df again. The free space reamains the same. The space is gone
till you reformat the partition.


[7.] This bug was seen on a Debian 2.2 machine. We have seen the same
thing happens on systems running Red Hat 6.2 and TurboLinux 6.0 distributions.

[7.1] Environment:
Kernel modules         found
Gnu C                  2.95.2
Binutils               2.9.5.0.37
Linux C Library        ..
Dynamic Linker (ld.so) 1.9.11
ls: /usr/lib/libg++.so: No such file or directory
Procps                 2.0.6
Mount                  2.10f
Net-tools              (1999-04-20)
Kbd                    0.99
Sh-utils               2.0
Sh-utils               Parker.
Sh-utils               
Sh-utils               Inc.
Sh-utils               NO
Sh-utils               PURPOSE.

[7.2] Processor information 

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 5
model name	: Pentium II (Deschutes)
stepping	: 2
cpu MHz		: 447.700
cache size	: 512 KB
fdiv_bug	: no
hlt_bug		: no
sep_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr
bogomips	: 891.29

[7.3] Module information

aic7xxx               124328   1
nfsd                  140436   8 (autoclean)
snd-pcm-oss            16840   1 (autoclean)
snd-pcm-plugin         13000   0 (autoclean) [snd-pcm-oss]
snd-mixer-oss           4308   1 (autoclean) [snd-pcm-oss]
snd-card-cs4236         5224   2
snd-mpu401-uart         2356   0 [snd-card-cs4236]
snd-rawmidi             9752   0 [snd-mpu401-uart]
snd-seq-device          3476   0 [snd-rawmidi]
isapnp                 27572   0 [snd-card-cs4236]
snd-cs4236             20580   0 [snd-card-cs4236]
snd-cs4231             19008   0 [snd-card-cs4236 snd-cs4236]
snd-mixer              23536   0 [snd-mixer-oss snd-cs4236 snd-cs4231]
snd-pcm                29784   0 [snd-pcm-oss snd-pcm-plugin snd-cs4231]
snd-opl3                4328   0 [snd-card-cs4236]
snd-timer               8224   0 [snd-cs4231 snd-pcm snd-opl3]
snd-hwdep               3052   0 [snd-opl3]
snd                    36300   1 [snd-pcm-oss snd-pcm-plugin snd-mixer-oss snd-card-cs4236 snd-mpu401-uart snd-rawmidi snd-seq-device snd-cs4236 snd-cs4231 snd-mixer snd-pcm snd-opl3 snd-timer snd-hwdep]
soundcore               2448   3 [snd]
3c59x                  18212   1

[7.4] SCSI Information

Attached devices: 
Host: scsi0 Channel: 00 Id: 05 Lun: 00
  Vendor: NEC      Model: CD-ROM DRIVE:465 Rev: 1.03
  Type:   CD-ROM                           ANSI SCSI revision: 02

[7.5] N/A

[8.] Here is a trace from the Solaris snoop program while the test
program mentioned above is running:

         sun -> jedi.wrq.com NFS C LOOKUP2 FH=2344 chunk 
jedi.wrq.com -> sun          NFS R LOOKUP2 OK FH=D308 
         sun -> jedi.wrq.com NLM C SHARE3 OH=0009 FH=D308 Mode=0 Access=2 
jedi.wrq.com -> sun          NLM R SHARE3 OH=0009 granted 0 
         sun -> jedi.wrq.com NLM C UNSHARE3 OH=000A FH=D308 Mode=0 Access=2 
jedi.wrq.com -> sun          NLM R UNSHARE3 OH=000A granted 0 
         sun -> jedi.wrq.com NFS C GETATTR2 FH=2344 
jedi.wrq.com -> sun          NFS R GETATTR2 OK 
         sun -> jedi.wrq.com NFS C LOOKUP2 FH=2344 chunk 
jedi.wrq.com -> sun          NFS R LOOKUP2 OK FH=D308 
         sun -> jedi.wrq.com NFS C LOOKUP2 FH=2344 .nfs5FC7 
jedi.wrq.com -> sun          NFS R LOOKUP2 No such file or directory 
         sun -> jedi.wrq.com NFS C RENAME2 FH=2344 chunk to FH=2344 .nfs5FC7 
jedi.wrq.com -> sun          NFS R RENAME2 OK 
         sun -> jedi.wrq.com NLM C UNSHARE3 OH=000B FH=D308 Mode=0 Access=1 
jedi.wrq.com -> sun          NLM R UNSHARE3 OH=000B granted 0 
         sun -> jedi.wrq.com NFS C REMOVE2 FH=2344 .nfs5FC7 
jedi.wrq.com -> sun          NFS R REMOVE2 OK 

/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] knfsd causes file system corruption when files are locked.
  2000-11-15 23:24 [BUG] knfsd causes file system corruption when files are locked Ivan Kanis
@ 2000-11-16  0:00 ` Neil Brown
  2000-11-16  2:43   ` Ivan Kanis
  0 siblings, 1 reply; 5+ messages in thread
From: Neil Brown @ 2000-11-16  0:00 UTC (permalink / raw)
  To: Ivan Kanis; +Cc: linux-kernel, nfs-devel

On Wednesday November 15, ivank@wrq.com wrote:
> [1.] knfsd causes file system corruption when files are locked.
> 
> [2.] Lock down a file using the NLM_SHARE sharing mechanism. Remove
> the file. Unlock the file using NLM_UNSHARE. The filesystem does not
> recover the file space. I am running this on ext2fs. Fsck-ing the
> filesystem does not help. The only way to recover the space is to
> reformat the partition.
> 
> [3.] knfsd, lock, NLM_SHARE, NLM_UNSHARE
> 
> [4.] Linux version 2.2.16 (root@jedi) (gcc version 2.7.2.3)

Lots of changes have gone into knfsd since 2.2.16.  Could you please
try again with either a later 2.2.18pre kernel, or 2.2.16 with patches
from 
   http://nfs.sourceforge.net/
applied?  Thanks.

Quick guide is:
    2.2.16
  plus
    http://www.fys.uio.no/~trondmy/src/nfsv3-old/linux-2.2.16-nfsv3-0.22.0.dif.bz2
  plus
    http://download.sourceforge.net/nfs/kernel-nfs-dhiggen_merge-2.0.gz

NeilBrown

> 
> [5.] N/A
> 
> [6.] This test.c program will reproduce the problem. You need to compile it
> on a Solaris machine because Linux fcntl does not support NLM_SHARE.
> 
> -----start here
> #include <fcntl.h> 
> #include <errno.h> 
> #include <stdio.h> 
>   
> int main (int argc, char *argv[])  
> { 
>   struct fshare lck; 
>   int fd, ret; 
>   if (argc != 2) { 
>     printf ("Usage: %s file to lock\n", argv[0]); 
>     return 1; 
>   } 
>   fd = open (argv[1], O_WRONLY); 
>   memset (&lck, 0, sizeof (struct flock)); 
>   lck.f_access = F_WRACC; 
>   lck.f_deny = F_NODNY; 
>   ret = fcntl (fd, F_SHARE, &lck); 
>   unlink (argv[1]); 
>   ret = fcntl (fd, F_UNSHARE, &lck); 
>  
>   return 0; 
> } 
> -----end here
> 
> Step to reproduce the problem
> 
> - Compile the program:
> gcc test.c -o test
> 
> - Mount a Linux nfs partition on Solaris: (Remember the partition will
> get corrupted, use a partition that you don't care about.)
> mount -o vers=2 jedi:/sandbox /mnt
> 
> - Create a chunk of data on /mnt
> dd if=/dev/zero of=/mnt/chunk count=10000
> 
> - Do a df before running the program
> 
> - Run the test program
> ./test /mnt/chunk
> 
> - Run df again. The free space reamains the same. The space is gone
> till you reformat the partition.
> 
> 
> [7.] This bug was seen on a Debian 2.2 machine. We have seen the same
> thing happens on systems running Red Hat 6.2 and TurboLinux 6.0 distributions.
> 
> [7.1] Environment:
> Kernel modules         found
> Gnu C                  2.95.2
> Binutils               2.9.5.0.37
> Linux C Library        ..
> Dynamic Linker (ld.so) 1.9.11
> ls: /usr/lib/libg++.so: No such file or directory
> Procps                 2.0.6
> Mount                  2.10f
> Net-tools              (1999-04-20)
> Kbd                    0.99
> Sh-utils               2.0
> Sh-utils               Parker.
> Sh-utils               
> Sh-utils               Inc.
> Sh-utils               NO
> Sh-utils               PURPOSE.
> 
> [7.2] Processor information 
> 
> processor	: 0
> vendor_id	: GenuineIntel
> cpu family	: 6
> model		: 5
> model name	: Pentium II (Deschutes)
> stepping	: 2
> cpu MHz		: 447.700
> cache size	: 512 KB
> fdiv_bug	: no
> hlt_bug		: no
> sep_bug		: no
> f00f_bug	: no
> coma_bug	: no
> fpu		: yes
> fpu_exception	: yes
> cpuid level	: 2
> wp		: yes
> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr
> bogomips	: 891.29
> 
> [7.3] Module information
> 
> aic7xxx               124328   1
> nfsd                  140436   8 (autoclean)
> snd-pcm-oss            16840   1 (autoclean)
> snd-pcm-plugin         13000   0 (autoclean) [snd-pcm-oss]
> snd-mixer-oss           4308   1 (autoclean) [snd-pcm-oss]
> snd-card-cs4236         5224   2
> snd-mpu401-uart         2356   0 [snd-card-cs4236]
> snd-rawmidi             9752   0 [snd-mpu401-uart]
> snd-seq-device          3476   0 [snd-rawmidi]
> isapnp                 27572   0 [snd-card-cs4236]
> snd-cs4236             20580   0 [snd-card-cs4236]
> snd-cs4231             19008   0 [snd-card-cs4236 snd-cs4236]
> snd-mixer              23536   0 [snd-mixer-oss snd-cs4236 snd-cs4231]
> snd-pcm                29784   0 [snd-pcm-oss snd-pcm-plugin snd-cs4231]
> snd-opl3                4328   0 [snd-card-cs4236]
> snd-timer               8224   0 [snd-cs4231 snd-pcm snd-opl3]
> snd-hwdep               3052   0 [snd-opl3]
> snd                    36300   1 [snd-pcm-oss snd-pcm-plugin snd-mixer-oss snd-card-cs4236 snd-mpu401-uart snd-rawmidi snd-seq-device snd-cs4236 snd-cs4231 snd-mixer snd-pcm snd-opl3 snd-timer snd-hwdep]
> soundcore               2448   3 [snd]
> 3c59x                  18212   1
> 
> [7.4] SCSI Information
> 
> Attached devices: 
> Host: scsi0 Channel: 00 Id: 05 Lun: 00
>   Vendor: NEC      Model: CD-ROM DRIVE:465 Rev: 1.03
>   Type:   CD-ROM                           ANSI SCSI revision: 02
> 
> [7.5] N/A
> 
> [8.] Here is a trace from the Solaris snoop program while the test
> program mentioned above is running:
> 
>          sun -> jedi.wrq.com NFS C LOOKUP2 FH=2344 chunk 
> jedi.wrq.com -> sun          NFS R LOOKUP2 OK FH=D308 
>          sun -> jedi.wrq.com NLM C SHARE3 OH=0009 FH=D308 Mode=0 Access=2 
> jedi.wrq.com -> sun          NLM R SHARE3 OH=0009 granted 0 
>          sun -> jedi.wrq.com NLM C UNSHARE3 OH=000A FH=D308 Mode=0 Access=2 
> jedi.wrq.com -> sun          NLM R UNSHARE3 OH=000A granted 0 
>          sun -> jedi.wrq.com NFS C GETATTR2 FH=2344 
> jedi.wrq.com -> sun          NFS R GETATTR2 OK 
>          sun -> jedi.wrq.com NFS C LOOKUP2 FH=2344 chunk 
> jedi.wrq.com -> sun          NFS R LOOKUP2 OK FH=D308 
>          sun -> jedi.wrq.com NFS C LOOKUP2 FH=2344 .nfs5FC7 
> jedi.wrq.com -> sun          NFS R LOOKUP2 No such file or directory 
>          sun -> jedi.wrq.com NFS C RENAME2 FH=2344 chunk to FH=2344 .nfs5FC7 
> jedi.wrq.com -> sun          NFS R RENAME2 OK 
>          sun -> jedi.wrq.com NLM C UNSHARE3 OH=000B FH=D308 Mode=0 Access=1 
> jedi.wrq.com -> sun          NLM R UNSHARE3 OH=000B granted 0 
>          sun -> jedi.wrq.com NFS C REMOVE2 FH=2344 .nfs5FC7 
> jedi.wrq.com -> sun          NFS R REMOVE2 OK 
> 
> /
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] knfsd causes file system corruption when files are locked.
  2000-11-16  0:00 ` Neil Brown
@ 2000-11-16  2:43   ` Ivan Kanis
  2000-11-16 14:36     ` Trond Myklebust
  0 siblings, 1 reply; 5+ messages in thread
From: Ivan Kanis @ 2000-11-16  2:43 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-kernel, nfs-devel


> On Wednesday November 15, ivank@wrq.com wrote:
    Ivan> [1.] knfsd causes file system corruption when files are locked.
    Ivan>
    Ivan> [2.] Lock down a file using the NLM_SHARE sharing
    Ivan> mechanism. Remove the file. Unlock the file using
    Ivan> NLM_UNSHARE. The filesystem does not recover the file space. I
    Ivan> am running this on ext2fs. Fsck-ing the filesystem does not
    Ivan> help. The only way to recover the space is to reformat the
    Ivan> partition.
    Ivan>
    Ivan> [3.] knfsd, lock, NLM_SHARE, NLM_UNSHARE
    Ivan>
    Ivan> [4.] Linux version 2.2.16 (root@jedi) (gcc version 2.7.2.3)


>>>>> "Neil" == Neil Brown <neilb@cse.unsw.edu.au> writes:

    Neil> Lots of changes have gone into knfsd since 2.2.16.  Could
    Neil> you please try again with either a later 2.2.18pre kernel,
    Neil> or 2.2.16 with patches from
    Neil>    http://nfs.sourceforge.net/
    Neil> applied?  Thanks.

    Neil> Quick guide is:
    Neil>     2.2.16
    Neil>   plus
    Neil>     http://www.fys.uio.no/~trondmy/src/nfsv3-old/linux-2.2.16-nfsv3-0.22.0.dif.bz2
    Neil>   plus
    Neil>     http://download.sourceforge.net/nfs/kernel-nfs-dhiggen_merge-2.0.gz

    Neil> NeilBrown

I can reproduce the bug using:

Linux version 2.2.18pre21 (root@jedi) (gcc version 2.7.2.3)

I don't have to type vers=2 to mount a linux nfs share on Solaris
(yeah!)

Ivan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] knfsd causes file system corruption when files are locked.
  2000-11-16  2:43   ` Ivan Kanis
@ 2000-11-16 14:36     ` Trond Myklebust
  2000-11-20 18:43       ` Ivan Kanis
  0 siblings, 1 reply; 5+ messages in thread
From: Trond Myklebust @ 2000-11-16 14:36 UTC (permalink / raw)
  To: Ivan Kanis; +Cc: Neil Brown, linux-kernel, nfs-devel

>>>>> " " == Ivan Kanis <ivank@wrq.com> writes:

    Ivan> space. I am running this on ext2fs. Fsck-ing the filesystem
    Ivan> does not help. The only way to recover the space is to
    Ivan> reformat the partition.
    Ivan>
    Ivan> [3.] knfsd, lock, NLM_SHARE, NLM_UNSHARE
    Ivan>
    Ivan> [4.] Linux version 2.2.16 (root@jedi) (gcc version 2.7.2.3)


Please dig around in dejanews for the locking patch I posted on l-k
last week (to be applied on top of 2.2.18pre21). It fixes 3 leaks in
the locking code (amongst them the share leak).

If you can't find it, I'll be happy to post it via private mail...

Cheers,
  Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] knfsd causes file system corruption when files are locked.
  2000-11-16 14:36     ` Trond Myklebust
@ 2000-11-20 18:43       ` Ivan Kanis
  0 siblings, 0 replies; 5+ messages in thread
From: Ivan Kanis @ 2000-11-20 18:43 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Neil Brown, linux-kernel, nfs-devel, Paul Pietromonaco

>>>>> "Trond" == Trond Myklebust <trond.myklebust@fys.uio.no> writes:
>>>>> " " == Ivan Kanis <ivank@wrq.com> writes:

    Ivan> space. I am running this on ext2fs. Fsck-ing the filesystem
    Ivan> does not help. The only way to recover the space is to
    Ivan> reformat the partition.
    Ivan>
    Ivan> [3.] knfsd, lock, NLM_SHARE, NLM_UNSHARE
    Ivan>
    Ivan> [4.] Linux version 2.2.16 (root@jedi) (gcc version 2.7.2.3)


    Trond> Please dig around in dejanews for the locking patch I
    Trond> posted on l-k last week (to be applied on top of
    Trond> 2.2.18pre21). It fixes 3 leaks in the locking code (amongst
    Trond> them the share leak).

Hi Trond,

Fantastic news: your patches fixed the bug! I hope it will make it in
the kernel pretty soon, I will be testing it some more today.

Thank you,

Ivan Kanis


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2000-11-20 19:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-11-15 23:24 [BUG] knfsd causes file system corruption when files are locked Ivan Kanis
2000-11-16  0:00 ` Neil Brown
2000-11-16  2:43   ` Ivan Kanis
2000-11-16 14:36     ` Trond Myklebust
2000-11-20 18:43       ` Ivan Kanis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox