From: Jeff Layton <jlayton@kernel.org>
To: Olga Kornievskaia <aglo@umich.edu>, "Andrew J. Romero" <romero@fnal.gov>
Cc: Chuck Lever III <chuck.lever@oracle.com>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: Zombie / Orphan open files
Date: Tue, 31 Jan 2023 17:28:22 -0500 [thread overview]
Message-ID: <5c2856f1408d801b9fede6478071c94b755376bf.camel@kernel.org> (raw)
In-Reply-To: <CAN-5tyGdaL_pYgqgS0TDwqCzVu=0rgLau8TDZMTe+hmC395UtQ@mail.gmail.com>
On Tue, 2023-01-31 at 17:14 -0500, Olga Kornievskaia wrote:
> On Tue, Jan 31, 2023 at 2:55 PM Andrew J. Romero <romero@fnal.gov> wrote:
> >
> >
> >
> > > What you are describing sounds like a bug in a system (be it client or
> > > server). There is state that the client thought it closed but the
> > > server still keeping that state.
> >
> > Hi Olga
> >
> > Based on my simple test script experiment,
> > Here's a summary of what I believe is happening
> >
> > 1. An interactive user starts a process that opens a file or multiple files
> >
> > 2. A disruption, that prevents
> > NFS-client <-> NFS-server communication,
> > occurs while the file is open. This could be due to
> > having the file open a long time or due to opening the file
> > too close to the time of disruption.
> >
> > ( I believe the most common "disruption" is
> > credential expiration )
> >
> > 3) The user's process terminates before the disruption
> > is cleared. ( or stated another way , the disruption is not cleared until after the user
> > process terminates )
> >
> > At the time the user process terminates, the process
> > can not tell the server to close the server-side file state.
> >
> > After the process terminates, nothing will ever tell the server
> > to close the files. The now zombie open files will continue to
> > consume server-side resources.
> >
> > In environments with many users, the problem is significant
> >
> > My reasons for posting:
> >
> > - Are not to have your team help troubleshoot my specific issue
> > ( that would be quite rude )
> >
> > they are:
> >
> > - Determine If my NAS vendor might be accidentally
> > not doing something they should be.
> > ( I now don't really think this is the case. )
>
> It's hard to say who's at fault here without having some more info
> like tracepoints or network traces.
>
> > - Determine if this is a known behavior common to all NFS implementations
> > ( Linux, ....etc ) and if so have your team determine if this is a problem that should be addressed
> > in the spec and the implementations.
>
> What you describe --- having different views of state on the client
> and server -- is not a known common behaviour.
>
> I have tried it on my Kerberos setup.
> Gotten a 5min ticket.
> As a user opened a file in a process that went to sleep.
> My user credentials have expired (after 5mins). I verified that by
> doing an "ls" on a mounted filesystem which resulted in permission
> denied error.
> Then I killed the application that had an opened file. This resulted
> in a NFS CLOSE being sent to the server using the machine's gss
> context (which is a default behaviour of the linux client regardless
> of whether or not user's credentials are valid).
>
> Basically as far as I can tell, a linux client can handle cleaning up
> state when user's credentials have expired.
> >
That's pretty much what I expected from looking at the code. I think
this is done via the call to nfs4_state_protect. That calls:
if (test_bit(sp4_mode, &clp->cl_sp4_flags)) {
msg->rpc_cred = rpc_machine_cred();
...
}
Could it be that cl_sp4_flags doesn't have NFS_SP4_MACH_CRED_CLEANUP set
on his clients? AFAICT, that comes from the server. It also looks like
cl_sp4_flags may not get set on a NFSv4.0 mount.
Olga, can you test that with a v4.0 mount?
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2023-01-31 22:28 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-23 16:31 Trying to reduce NFSv4 timeouts to a few seconds on an established connection Andrew Klaassen
2023-01-23 16:35 ` Chuck Lever III
2023-01-23 16:41 ` Andrew Klaassen
2023-01-26 15:31 ` Andrew Klaassen
2023-01-26 22:08 ` Andrew Klaassen
2023-01-27 13:33 ` Jeff Layton
2023-01-30 19:33 ` Andrew Klaassen
2023-01-30 19:55 ` Jeff Layton
2023-01-30 20:03 ` Andrew Klaassen
2023-01-30 20:31 ` Jeff Layton
2023-01-30 22:11 ` Zombie / Orphan open files Andrew J. Romero
2023-01-31 0:10 ` Chuck Lever III
2023-01-31 13:27 ` Jeff Layton
2023-01-31 14:42 ` Andrew J. Romero
2023-01-31 15:24 ` Jeff Layton
2023-01-31 15:31 ` Chuck Lever III
2023-01-31 16:34 ` Chuck Lever III
2023-01-31 16:59 ` Andrew J. Romero
2023-01-31 18:05 ` Chuck Lever III
2023-01-31 18:33 ` Andrew J. Romero
2023-01-31 18:51 ` Chuck Lever III
2023-01-31 19:32 ` Andrew J. Romero
2023-01-31 19:08 ` Olga Kornievskaia
2023-01-31 19:31 ` Olga Kornievskaia
2023-01-31 19:54 ` Andrew J. Romero
2023-01-31 22:14 ` Olga Kornievskaia
2023-01-31 22:26 ` Andrew J. Romero
2023-01-31 22:47 ` Olga Kornievskaia
2023-01-31 23:08 ` Andrew J. Romero
2023-02-01 14:28 ` Olga Kornievskaia
[not found] ` <SA1PR09MB755217D2B3E29E9486D4796FA7D19@SA1PR09MB7552.namprd09.prod.outlook.com>
[not found] ` <CAN-5tyGaX=Go+kwrM33K2EaY41sXmf4v1+2JO8MhbDuGTGG7zA@mail.gmail.com>
[not found] ` <SA1PR09MB755277F59EB463643BEBDD77A7D69@SA1PR09MB7552.namprd09.prod.outlook.com>
2023-02-02 0:53 ` Olga Kornievskaia
2023-01-31 22:28 ` Jeff Layton [this message]
2023-01-31 18:13 ` Jeff Layton
2023-01-31 16:26 ` Olga Kornievskaia
2023-01-31 17:44 ` Andrew J. Romero
2023-01-31 18:18 ` Frank Filz
2023-01-31 19:19 ` Olga Kornievskaia
2023-01-31 21:31 ` Frank Filz
2023-01-31 21:46 ` Andrew J. Romero
2023-02-02 18:16 ` Trying to reduce NFSv4 timeouts to a few seconds on an established connection Andrew Klaassen
2023-02-06 15:27 ` Andrew Klaassen
2023-02-06 17:18 ` Andrew Klaassen
2023-02-27 14:48 ` Andrew Klaassen
2023-02-28 13:23 ` Jeff Layton
2023-03-02 15:25 ` Andrew Klaassen
2023-03-02 18:47 ` Andrew Klaassen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5c2856f1408d801b9fede6478071c94b755376bf.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=aglo@umich.edu \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=romero@fnal.gov \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox