Linux NFS development
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Olga Kornievskaia <aglo@umich.edu>, "Andrew J. Romero" <romero@fnal.gov>
Cc: Chuck Lever III <chuck.lever@oracle.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: Zombie / Orphan open files
Date: Tue, 31 Jan 2023 17:28:22 -0500	[thread overview]
Message-ID: <5c2856f1408d801b9fede6478071c94b755376bf.camel@kernel.org> (raw)
In-Reply-To: <CAN-5tyGdaL_pYgqgS0TDwqCzVu=0rgLau8TDZMTe+hmC395UtQ@mail.gmail.com>

On Tue, 2023-01-31 at 17:14 -0500, Olga Kornievskaia wrote:
> On Tue, Jan 31, 2023 at 2:55 PM Andrew J. Romero <romero@fnal.gov> wrote:
> > 
> > 
> > 
> > > What you are describing sounds like a bug in a system (be it client or
> > > server). There is state that the client thought it closed but the
> > > server still keeping that state.
> > 
> > Hi Olga
> > 
> > Based on my simple test script experiment,
> > Here's a summary of what I believe is happening
> > 
> > 1. An interactive user starts a process that opens a file or multiple files
> > 
> > 2. A disruption, that prevents
> >    NFS-client <-> NFS-server communication,
> >    occurs while the file is open.  This could be due to
> >    having the file open a long time or due to opening the file
> >    too close to the time of disruption.
> > 
> > ( I believe the most common "disruption" is
> >   credential expiration )
> > 
> > 3) The user's process terminates before the disruption
> >      is cleared.  ( or stated another way ,  the disruption is not cleared until after the user
> >     process terminates )
> > 
> >    At the time the user process terminates, the process
> >    can not tell the server to close the server-side file state.
> > 
> >   After the process terminates, nothing will ever tell the server
> >   to close the files.  The now zombie open files will continue to
> >   consume server-side resources.
> > 
> >   In environments with many users, the problem is significant
> > 
> > My reasons for posting:
> > 
> > - Are not to have your team  help troubleshoot my specific issue
> >    ( that would be quite rude )
> > 
> > they are:
> > 
> > - Determine If my NAS vendor might be accidentally
> >   not doing something they should be.
> >   (  I now don't really think this is the case. )
> 
> It's hard to say who's at fault here without having some more info
> like tracepoints or network traces.
> 
> > - Determine if this is a known behavior common to all NFS implementations
> >    ( Linux, ....etc ) and if so have your team determine if this is a problem that should be addressed
> >    in the spec and the implementations.
> 
> What you describe  --- having different views of state on the client
> and server -- is not a known common behaviour.
> 
> I have tried it on my Kerberos setup.
> Gotten a 5min ticket.
> As a user opened a file in a process that went to sleep.
> My user credentials have expired (after 5mins). I verified that by
> doing an "ls" on a mounted filesystem which resulted in permission
> denied error.
> Then I killed the application that had an opened file. This resulted
> in a NFS CLOSE being sent to the server using the machine's gss
> context (which is a default behaviour of the linux client regardless
> of whether or not user's credentials are valid).
> 
> Basically as far as I can tell, a linux client can handle cleaning up
> state when user's credentials have expired.
> > 

That's pretty much what I expected from looking at the code. I think
this is done via the call to nfs4_state_protect. That calls:

       if (test_bit(sp4_mode, &clp->cl_sp4_flags)) {                   
                msg->rpc_cred = rpc_machine_cred();
                ...                            
       }

Could it be that cl_sp4_flags doesn't have NFS_SP4_MACH_CRED_CLEANUP set
on his clients? AFAICT, that comes from the server. It also looks like
cl_sp4_flags may not get set on a NFSv4.0 mount.

Olga, can you test that with a v4.0 mount?
-- 
Jeff Layton <jlayton@kernel.org>

  parent reply	other threads:[~2023-01-31 22:28 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-23 16:31 Trying to reduce NFSv4 timeouts to a few seconds on an established connection Andrew Klaassen
2023-01-23 16:35 ` Chuck Lever III
2023-01-23 16:41   ` Andrew Klaassen
2023-01-26 15:31 ` Andrew Klaassen
2023-01-26 22:08   ` Andrew Klaassen
2023-01-27 13:33     ` Jeff Layton
2023-01-30 19:33       ` Andrew Klaassen
2023-01-30 19:55         ` Jeff Layton
2023-01-30 20:03           ` Andrew Klaassen
2023-01-30 20:31             ` Jeff Layton
2023-01-30 22:11               ` Zombie / Orphan open files Andrew J. Romero
2023-01-31  0:10                 ` Chuck Lever III
2023-01-31 13:27                 ` Jeff Layton
2023-01-31 14:42                   ` Andrew J. Romero
2023-01-31 15:24                     ` Jeff Layton
2023-01-31 15:31                     ` Chuck Lever III
2023-01-31 16:34                     ` Chuck Lever III
2023-01-31 16:59                       ` Andrew J. Romero
2023-01-31 18:05                         ` Chuck Lever III
2023-01-31 18:33                           ` Andrew J. Romero
2023-01-31 18:51                             ` Chuck Lever III
2023-01-31 19:32                               ` Andrew J. Romero
2023-01-31 19:08                             ` Olga Kornievskaia
2023-01-31 19:31                         ` Olga Kornievskaia
2023-01-31 19:54                           ` Andrew J. Romero
2023-01-31 22:14                             ` Olga Kornievskaia
2023-01-31 22:26                               ` Andrew J. Romero
2023-01-31 22:47                                 ` Olga Kornievskaia
2023-01-31 23:08                                   ` Andrew J. Romero
2023-02-01 14:28                                     ` Olga Kornievskaia
     [not found]                                       ` <SA1PR09MB755217D2B3E29E9486D4796FA7D19@SA1PR09MB7552.namprd09.prod.outlook.com>
     [not found]                                         ` <CAN-5tyGaX=Go+kwrM33K2EaY41sXmf4v1+2JO8MhbDuGTGG7zA@mail.gmail.com>
     [not found]                                           ` <SA1PR09MB755277F59EB463643BEBDD77A7D69@SA1PR09MB7552.namprd09.prod.outlook.com>
2023-02-02  0:53                                             ` Olga Kornievskaia
2023-01-31 22:28                               ` Jeff Layton [this message]
2023-01-31 18:13                       ` Jeff Layton
2023-01-31 16:26                 ` Olga Kornievskaia
2023-01-31 17:44                   ` Andrew J. Romero
2023-01-31 18:18                   ` Frank Filz
2023-01-31 19:19                     ` Olga Kornievskaia
2023-01-31 21:31                       ` Frank Filz
2023-01-31 21:46                         ` Andrew J. Romero
2023-02-02 18:16               ` Trying to reduce NFSv4 timeouts to a few seconds on an established connection Andrew Klaassen
2023-02-06 15:27                 ` Andrew Klaassen
2023-02-06 17:18                   ` Andrew Klaassen
2023-02-27 14:48                     ` Andrew Klaassen
2023-02-28 13:23                       ` Jeff Layton
2023-03-02 15:25                         ` Andrew Klaassen
2023-03-02 18:47                         ` Andrew Klaassen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5c2856f1408d801b9fede6478071c94b755376bf.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=aglo@umich.edu \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=romero@fnal.gov \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox