From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Lever, Charles" Subject: RE: Corrupt Data when using NFS on Linux Date: Mon, 28 Oct 2002 12:48:14 -0800 Sender: nfs-admin@lists.sourceforge.net Message-ID: <6440EA1A6AA1D5118C6900902745938E07D54FE2@black.eng.netapp.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C27EC3.55E21E80" Cc: nfs@lists.sourceforge.net Return-path: Received: from mx01.netapp.com ([198.95.226.53]) by usw-sf-list1.sourceforge.net with esmtp (Exim 3.31-VA-mm2 #1 (Debian)) id 186GoJ-000345-00 for ; Mon, 28 Oct 2002 12:48:31 -0800 To: "'Alan Witz'" Errors-To: nfs-admin@lists.sourceforge.net List-Help: List-Post: List-Subscribe: , List-Id: Discussion of NFS under Linux development, interoperability, and testing. List-Unsubscribe: , List-Archive: This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_001_01C27EC3.55E21E80 Content-Type: text/plain; charset="iso-8859-1" hi alan- that information is crap, and should be removed from whereever you found it. the problem is that typical file systems used on *Linux* NFS servers (like ext2) can't store time stamps with sub-second resolution. this is not a problem with typical commercial NFS servers like Solaris or NetApp filers. i'm not aware of any plan to address this specific problem in 2.5, but that doesn't mean it won't be. can you tell us more about your environment, especially which kernel is running on your clients and what mount options you're using? -----Original Message----- From: Alan Witz [mailto:awitz@magstarinc.com] Sent: Monday, October 28, 2002 3:07 PM To: nfs@lists.sourceforge.net Subject: [NFS] Corrupt Data when using NFS on Linux I work for a small software company that recently began using NFS to implement a solution using a lesser-known database (Appgen). The problem is that we're getting lots of corrupt database files in those files modified via NFS. The on-line manual on linux.org makes the following reference which I think may be relevant: SYMPTOM107.10. File Corruption When Using Multiple Clients If a file has been modified within one second of its previous modification and left the same size, it will continue to generate the same inode number. Because of this, constant reads and writes to a file by multiple clients may cause file corruption. Fixing this bug requires changes deep within the filesystem layer, and therefore it is a 2.5 item. I was wondering if someone could clarify what is meant by this. What is the relevance of the inode number? And doesn't the inode of the file stay the same even if it is being modified? Any help would be greatly appreciated. Even some direction as to where else I might look would be helpful. Thanks, Alan Witz ------_=_NextPart_001_01C27EC3.55E21E80 Content-Type: text/html; charset="iso-8859-1"
hi alan-
 
that information is crap, and should be removed from whereever you found it.
 
the problem is that typical file systems used on *Linux* NFS servers (like ext2) can't
store time stamps with sub-second resolution.  this is not a problem with typical
commercial NFS servers like Solaris or NetApp filers.  i'm not aware of any plan to
address this specific problem in 2.5, but that doesn't mean it won't be.
 
can you tell us more about your environment, especially which kernel is running
on your clients and what mount options you're using?
 
-----Original Message-----
From: Alan Witz [mailto:awitz@magstarinc.com]
Sent: Monday, October 28, 2002 3:07 PM
To: nfs@lists.sourceforge.net
Subject: [NFS] Corrupt Data when using NFS on Linux

I work for a small software company that recently began using NFS to implement a solution using a lesser-known database (Appgen).  The problem is that we're getting lots of corrupt database files in those files modified via NFS.  The on-line manual on linux.org makes the following reference which I think may be relevant:
 
Alan Witz
------_=_NextPart_001_01C27EC3.55E21E80-- ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Alan Witz" Subject: Re: Corrupt Data when using NFS on Linux Date: Mon, 28 Oct 2002 18:05:06 -0500 Sender: nfs-admin@lists.sourceforge.net Message-ID: <007601c27ed6$74986240$2864a8c0@alanw> References: <6440EA1A6AA1D5118C6900902745938E07D54FE2@black.eng.netapp.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0073_01C27EAC.8B9C34A0" Cc: Return-path: Received: from mail4.uunet.ca ([209.167.141.34]) by usw-sf-list1.sourceforge.net with esmtp (Exim 3.31-VA-mm2 #1 (Debian)) id 186Iwd-0007kM-00 for ; Mon, 28 Oct 2002 15:05:15 -0800 To: "Lever, Charles" Errors-To: nfs-admin@lists.sourceforge.net List-Help: List-Post: List-Subscribe: , List-Id: Discussion of NFS under Linux development, interoperability, and testing. List-Unsubscribe: , List-Archive: This is a multi-part message in MIME format. ------=_NextPart_000_0073_01C27EAC.8B9C34A0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Thanks for your quick response. The operating environment is Red Hat Linux 2.4.18-17.7.xsmp. We are = running NFS version 3. We have implemented some of our own rudimentary = file locking techniques to try and circumvent the problem. This = consists of creating a lock file which acts as a flag which tells the = other clients not to access the file. Basically, if the lock file = exists then the other clients will wait until the file is cleared before = writing to the database file. To ensure that this works properly the = "lock" flag is being created using the "ln" command so that the process = of checking for a lock and setting a lock is essentially done in one = step (thus eliminating the possibility of another client setting the = lock after the current client has checked for the lock but before it can = set the lock itself). We are also running NFS in synchronous mode to = try and reduce the chances of data corruption due to multiple clients. = The mount options are as follows: rsize=3D8192,wsize=3D8192,noac,hard,sync,nfsvers=3D3 Any thoughts would be greatly appreciated. Alan Witz ----- Original Message -----=20 From: Lever, Charles=20 To: 'Alan Witz'=20 Cc: nfs@lists.sourceforge.net=20 Sent: Monday, October 28, 2002 3:48 PM Subject: RE: [NFS] Corrupt Data when using NFS on Linux hi alan- that information is crap, and should be removed from whereever you = found it. the problem is that typical file systems used on *Linux* NFS servers = (like ext2) can't store time stamps with sub-second resolution. this is not a problem = with typical commercial NFS servers like Solaris or NetApp filers. i'm not aware = of any plan to address this specific problem in 2.5, but that doesn't mean it won't = be. can you tell us more about your environment, especially which kernel = is running on your clients and what mount options you're using? -----Original Message----- From: Alan Witz [mailto:awitz@magstarinc.com] Sent: Monday, October 28, 2002 3:07 PM To: nfs@lists.sourceforge.net Subject: [NFS] Corrupt Data when using NFS on Linux I work for a small software company that recently began using NFS to = implement a solution using a lesser-known database (Appgen). The = problem is that we're getting lots of corrupt database files in those = files modified via NFS. The on-line manual on linux.org makes the = following reference which I think may be relevant: 7.10. File Corruption When Using Multiple Clients If a file has been modified within one second of its previous = modification and left the same size, it will continue to generate the = same inode number. Because of this, constant reads and writes to a file = by multiple clients may cause file corruption. Fixing this bug requires = changes deep within the filesystem layer, and therefore it is a 2.5 = item.=20 I was wondering if someone could clarify what is meant by this. = What is the relevance of the inode number? And doesn't the inode of the = file stay the same even if it is being modified? Any help would be = greatly appreciated. Even some direction as to where else I might look = would be helpful. Thanks, Alan Witz ------=_NextPart_000_0073_01C27EAC.8B9C34A0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Thanks for your quick = response.
 
The operating environment is Red Hat = Linux=20 2.4.18-17.7.xsmp.  We are running NFS version 3.  We have = implemented=20 some of our own rudimentary file locking techniques to try and = circumvent the=20 problem.  This consists of creating a lock file which acts as a = flag which=20 tells the other clients not to access the file.  Basically, if the = lock=20 file exists then the other clients will wait until the file is cleared = before=20 writing to the database file.  To ensure that this works properly = the=20 "lock" flag is being created using the "ln" command so that the process = of=20 checking for a lock and setting a lock is essentially done in one step = (thus=20 eliminating the possibility of another client setting the lock after the = current=20 client has checked for the lock but before it can set the lock = itself).  We=20 are also running NFS in synchronous mode to try and reduce the chances = of data=20 corruption due to multiple clients.  The mount options are as=20 follows:
 
   =20 rsize=3D8192,wsize=3D8192,noac,hard,sync,nfsvers=3D3
 
Any thoughts would be greatly=20 appreciated.
 
        Alan=20 Witz
 
 
----- Original Message -----
From:=20 Lever, Charles
Sent: Monday, October 28, 2002 = 3:48=20 PM
Subject: RE: [NFS] Corrupt Data = when=20 using NFS on Linux

hi=20 alan-
 
that=20 information is crap, and should be removed from whereever you found=20 it.
 
the=20 problem is that typical file systems used on *Linux* NFS servers (like = ext2) can't
store time stamps with sub-second resolution.  this is = not a=20 problem with typical
commercial NFS servers like Solaris or NetApp filers.  = i'm not=20 aware of any plan to
address this specific problem in 2.5, but that doesn't mean = it won't=20 be.
 
can=20 you tell us more about your environment, especially which kernel is=20 running
on=20 your clients and what mount options you're using?
 
-----Original Message-----
From: Alan Witz=20 [mailto:awitz@magstarinc.com]
Sent: Monday, October 28, = 2002 3:07=20 PM
To: nfs@lists.sourceforge.net
Subject: [NFS] = Corrupt=20 Data when using NFS on Linux

I work for a small software company = that=20 recently began using NFS to implement a solution using a = lesser-known=20 database (Appgen).  The problem is that we're getting lots of = corrupt=20 database files in those files modified via NFS.  The = on-line=20 manual on linux.org makes the following reference which I think = may be=20 relevant:
 
Alan=20 Witz
------=_NextPart_000_0073_01C27EAC.8B9C34A0-- ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Eff Norwood" Subject: RE: Corrupt Data when using NFS on Linux Date: Mon, 28 Oct 2002 17:34:38 -0800 Sender: nfs-admin@lists.sourceforge.net Message-ID: References: <007601c27ed6$74986240$2864a8c0@alanw> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Cc: Return-path: Received: from cynaptic.com ([128.121.116.181]) by usw-sf-list1.sourceforge.net with esmtp (Exim 3.31-VA-mm2 #1 (Debian)) id 186LHN-0002Of-00 for ; Mon, 28 Oct 2002 17:34:49 -0800 To: "Alan Witz" , "Lever, Charles" In-Reply-To: <007601c27ed6$74986240$2864a8c0@alanw> Errors-To: nfs-admin@lists.sourceforge.net List-Help: List-Post: List-Subscribe: , List-Id: Discussion of NFS under Linux development, interoperability, and testing. List-Unsubscribe: , List-Archive: Hi Alan, > rsize=8192,wsize=8192,noac,hard,sync,nfsvers=3 > >Any thoughts would be greatly appreciated. >>From my own experiences with similar issues, I think you have two and maybe three options: 1. Stick with your own Linux locking scheme until the kernel catches up. It will eventually as there are a lot of darn smart people working on it-like those on this list. :) 2. Possibly use FreeBSD? I don't know if it addresses this problem or not. Does anyone know if FreeBSD works here? 3. Get a non-free working commercial solution like a Network Appliance filer. Eff Norwood ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Forrest Subject: Re: Corrupt Data when using NFS on Linux Date: Tue, 29 Oct 2002 11:03:15 -0600 Sender: nfs-admin@lists.sourceforge.net Message-ID: <200210291703.g9TH3FD32594@leinie.lmcg.wisc.edu> References: <6440EA1A6AA1D5118C6900902745938E07D54FE2@black.eng.netapp.com> <007601c27ed6$74986240$2864a8c0@alanw> Reply-To: Daniel Forrest Cc: Return-path: Received: from mail.lmcg.wisc.edu ([144.92.101.145]) by usw-sf-list1.sourceforge.net with esmtp (Exim 3.31-VA-mm2 #1 (Debian)) id 186Zlt-0006sa-00 for ; Tue, 29 Oct 2002 09:03:17 -0800 To: "Alan Witz" In-reply-to: <007601c27ed6$74986240$2864a8c0@alanw> (awitz@magstarinc.com) Errors-To: nfs-admin@lists.sourceforge.net List-Help: List-Post: List-Subscribe: , List-Id: Discussion of NFS under Linux development, interoperability, and testing. List-Unsubscribe: , List-Archive: Alan, >> The operating environment is Red Hat Linux 2.4.18-17.7.xsmp. We >> are running NFS version 3. We have implemented some of our own >> rudimentary file locking techniques to try and circumvent the >> problem. This consists of creating a lock file which acts as a >> flag which tells the other clients not to access the file. >> Basically, if the lock file exists then the other clients will wait >> until the file is cleared before writing to the database file. To >> ensure that this works properly the "lock" flag is being created >> using the "ln" command so that the process of checking for a lock >> and setting a lock is essentially done in one step (thus >> eliminating the possibility of another client setting the lock >> after the current client has checked for the lock but before it can >> set the lock itself). We are also running NFS in synchronous mode >> to try and reduce the chances of data corruption due to multiple >> clients. The mount options are as follows: >> >> rsize=8192,wsize=8192,noac,hard,sync,nfsvers=3 >> >> Any thoughts would be greatly appreciated. You need to be careful when creating your lock files. The "guaranteed" way to create a lock file over NFS: create tempfile link tempfile lockfile (ignore return code) stat tempfile If the link count is 2, then you have the lock file. Apparently, link may return success even if the link failed or return failure even if the link succeeded (I don't remember which). Doing the stat verifies if you have actually created a link to the temporary file. While I have never seen this problem, the people who do mailbox locking have documented this as a problem over NFS. Also, you will have to use "fcntl" locking if you want to ensure the data you are reading is consistent. Doing a "lockf(fd, F_LOCK, 0)" will guarantee that data written by other clients has been written to the file and clear the client cache. Of course, now that you're using "fcntl" locking for this, you can probably get rid of the lock file. -- +----------------------------------+----------------------------------+ | Daniel K. Forrest | Laboratory for Molecular and | | forrest@lmcg.wisc.edu | Computational Genomics | | (608)262-9479 | University of Wisconsin, Madison | +----------------------------------+----------------------------------+ ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Bryan J. Smith" Subject: Re: Corrupt Data when using NFS on Linux -- This should be in the HOWTO Date: Tue, 29 Oct 2002 12:20:13 -0500 (EST) Sender: nfs-admin@lists.sourceforge.net Message-ID: <1035912013.3dbec34dddd7e@webmail.smithconcepts.com> References: <6440EA1A6AA1D5118C6900902745938E07D54FE2@black.eng.netapp.com> <007601c27ed6$74986240$2864a8c0@alanw> <200210291703.g9TH3FD32594@leinie.lmcg.wisc.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: Alan Witz , nfs@lists.sourceforge.net Return-path: Received: from knight.01.dios.net ([65.222.230.112] helo=taz2.fiberhosting.com) by usw-sf-list1.sourceforge.net with smtp (Exim 3.31-VA-mm2 #1 (Debian)) id 186a2P-0004zH-00 for ; Tue, 29 Oct 2002 09:20:21 -0800 To: Daniel Forrest In-Reply-To: <200210291703.g9TH3FD32594@leinie.lmcg.wisc.edu> Errors-To: nfs-admin@lists.sourceforge.net List-Help: List-Post: List-Subscribe: , List-Id: Discussion of NFS under Linux development, interoperability, and testing. List-Unsubscribe: , List-Archive: Quoting Daniel Forrest : > The "guaranteed" way to create a lock file over NFS: > create tempfile > link tempfile lockfile (ignore return code) > stat tempfile > If the link count is 2, then you have the lock file. Apparently, link > may return success even if the link failed or return failure even if > the link succeeded (I don't remember which). Doing the stat verifies > if you have actually created a link to the temporary file. I know the HOWTO is more for users/sysadmins, but stuff like this could really help in an additional "common workarounds for potential gotchas" section at the end of the HOWTO. -- Bryan J. Smith, E.I. Contact Info: http://thebs.org A+/i-Net+/Linux+/Network+/Server+ CCNA CIWA CNA SCSA/SCWSE/SCNA --------------------------------------------------------------- limit guilt = { psychopath, remorse->0 innocent } ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tom McNeal Subject: Re: Corrupt Data when using NFS on Linux -- This should be in the HOWTO Date: Tue, 29 Oct 2002 10:07:48 -0800 Sender: nfs-admin@lists.sourceforge.net Message-ID: <3DBECE74.506@attbi.com> References: <6440EA1A6AA1D5118C6900902745938E07D54FE2@black.eng.netapp.com> <007601c27ed6$74986240$2864a8c0@alanw> <200210291703.g9TH3FD32594@leinie.lmcg.wisc.edu> <1035912013.3dbec34dddd7e@webmail.smithconcepts.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Cc: Daniel Forrest , Alan Witz , nfs@lists.sourceforge.net Return-path: Received: from rwcrmhc53.attbi.com ([204.127.198.39]) by usw-sf-list1.sourceforge.net with esmtp (Exim 3.31-VA-mm2 #1 (Debian)) id 186aoS-0001bZ-00 for ; Tue, 29 Oct 2002 10:10:00 -0800 To: "Bryan J. Smith" Errors-To: nfs-admin@lists.sourceforge.net List-Help: List-Post: List-Subscribe: , List-Id: Discussion of NFS under Linux development, interoperability, and testing. List-Unsubscribe: , List-Archive: Its easier with the FAQ, but the HOWTO might be the better place. We'll take a look at it... Tom Bryan J. Smith wrote: > Quoting Daniel Forrest : > >>The "guaranteed" way to create a lock file over NFS: >> create tempfile >> link tempfile lockfile (ignore return code) >> stat tempfile >>If the link count is 2, then you have the lock file. Apparently, link >>may return success even if the link failed or return failure even if >>the link succeeded (I don't remember which). Doing the stat verifies >>if you have actually created a link to the temporary file. >> > > I know the HOWTO is more for users/sysadmins, but stuff like this could really > help in an additional "common workarounds for potential gotchas" section at the > end of the HOWTO. > > -- ------------------------------------------------------------------------ Tom McNeal trmcneal@attbi.com (650)906-0761 (cell) ------------------------------------------------------------------------ ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Bryan J. Smith" Subject: Re: Corrupt Data when using NFS on Linux -- This should be in the HOWTO Date: Tue, 29 Oct 2002 13:23:43 -0500 (EST) Sender: nfs-admin@lists.sourceforge.net Message-ID: <1035915823.3dbed22fa9dc0@webmail.smithconcepts.com> References: <6440EA1A6AA1D5118C6900902745938E07D54FE2@black.eng.netapp.com> <007601c27ed6$74986240$2864a8c0@alanw> <200210291703.g9TH3FD32594@leinie.lmcg.wisc.edu> <1035912013.3dbec34dddd7e@webmail.smithconcepts.com> <3DBECE74.506@attbi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: "Bryan J. Smith" , Daniel Forrest , Alan Witz , nfs@lists.sourceforge.net Return-path: Received: from knight.01.dios.net ([65.222.230.112] helo=taz2.fiberhosting.com) by usw-sf-list1.sourceforge.net with smtp (Exim 3.31-VA-mm2 #1 (Debian)) id 186b1q-0008RM-00 for ; Tue, 29 Oct 2002 10:23:50 -0800 To: Tom McNeal In-Reply-To: <3DBECE74.506@attbi.com> Errors-To: nfs-admin@lists.sourceforge.net List-Help: List-Post: List-Subscribe: , List-Id: Discussion of NFS under Linux development, interoperability, and testing. List-Unsubscribe: , List-Archive: Quoting Tom McNeal : > Its easier with the FAQ, but the HOWTO might be the better place. > We'll take a look at it... Maybe just a "Top 10 Differences Between NFS And Local" - Mounts in exported filesystems are not automatically exported - Locking issues and accomodation in scripts/programs - Etc... I think it would go a long way to helping many. -- Bryan J. Smith, E.I. Contact Info: http://thebs.org A+/i-Net+/Linux+/Network+/Server+ CCNA CIWA CNA SCSA/SCWSE/SCNA --------------------------------------------------------------- limit guilt = { psychopath, remorse->0 innocent } ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs