All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eli Stair <estair@ilm.com>
To: nfs@lists.sourceforge.net
Subject: Linux client cache corruption, system call returning incorrectly
Date: Thu, 01 Mar 2007 19:15:03 -0800	[thread overview]
Message-ID: <45E796B7.9010707@ilm.com> (raw)


I'm having a serious client cache issue on recent kernels.  On 2.6.18 
and 2.6.20 (but /not/ 2.6.15.4) clients I'm seeing periodic file GETATTR 
or ACCESS calls that return fine from the server pass an ENOFILE up to 
the application.  It occurs against all NFS servers I've tested (2.6.18 
knfsd, OnTap 10.0.1, SpinOS 2.5.5p8).

The triggering usage stage is repeated stat'ing of a several hundred 
files that are opened read-only (but not close() or open()'ing it again) 
during runtime.  I have been unable to duplicate the usage into a 
bug-triggering testcase yet, but it is very easily triggered by an 
internal app.  Mounting the NFS filesystems with 'nocto' appears to 
mitigate the issue by about 50%, but does not completely get rid of it. 
  Also, using 2.6.20+ Trond's NFS_ALL patches and this one you supplied 
also slow the rate of errors, but not completely.

I'm rigging the application with an strace harness so I can track down 
specifically what ops are failing in production.  I can confirm that 
those errors I have witnessed under debug are NOT failing due to an NFS 
call returning where access is denied, or on an open(), it appears to be 
stat() of the file (usually several dozen or hundreds in sequence) that 
return ENOFILE, though the call should return sucess.

Any tips on using rpcdebug effectively?  I'm getting tremendous levels 
of info output with '-m nfs -s all', too much to parse well.

I'll update with some more hard data as I get further along, but want to 
see if a) anyone else has noticed this and working on a fix, and b) if 
there are any suggestions on getting more useful data than what I'm 
working towards.

Reverting to 2.6.15.4 (which doesn't exhibit this particular bug) isn't 
a direct solution even temporarily, as that has a nasty NFS fseek bug 
(seek to EOF goes to wrong offset).

Cheers,


/eli


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

             reply	other threads:[~2007-03-02  3:15 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-03-02  3:15 Eli Stair [this message]
2007-03-02 13:45 ` Linux client cache corruption, system call returning incorrectly Talpey, Thomas
2007-03-02 21:58   ` Eli Stair
2007-03-02 23:19     ` Talpey, Thomas
2007-03-03 15:52       ` Talpey, Thomas
2007-03-27 19:21         ` Eli Stair

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45E796B7.9010707@ilm.com \
    --to=estair@ilm.com \
    --cc=nfs@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.