Linux NFS development
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@kernel.org>
To: "Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de>,
	linux-nfs <linux-nfs@vger.kernel.org>
Cc: Anna Schumaker <anna@kernel.org>
Subject: Re: [PATCH 1/1] pNFS/flexfiles: mark device unavailable on fatal connection error
Date: Wed, 25 Jun 2025 15:39:15 -0400	[thread overview]
Message-ID: <e2a8b1e647e9d5f74e0ab5dd0924495625a02d3f.camel@kernel.org> (raw)
In-Reply-To: <1758012324.14467514.1750879171477.JavaMail.zimbra@desy.de>

On Wed, 2025-06-25 at 21:19 +0200, Mkrtchyan, Tigran wrote:
> 
> Hi Folks,
> 
> Do you have any opinion on this one? Would you like me to address it
> differently?
> 

I don't think we should mark the device as being unavailable just
because someone signalled the RPC task.

It would be better to have nfs4_ff_layout_prepare_ds() return any fatal
errors that it encounters using ERR_PTR(), so that the callers can
handle them. Then maybe return ERR_PTR(-EAGAIN) for the case where we
currently return NULL so that those callers don't have to use the hated
IS_ERR_OR_NULL() test.

> Tigran. 
> 
> ----- Original Message -----
> > From: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> > To: "linux-nfs" <linux-nfs@vger.kernel.org>
> > Cc: "Trond Myklebust" <trondmy@kernel.org>, "Anna Schumaker"
> > <anna@kernel.org>, "Tigran Mkrtchyan"
> > <tigran.mkrtchyan@desy.de>
> > Sent: Monday, 9 June, 2025 23:43:03
> > Subject: [PATCH 1/1] pNFS/flexfiles: mark device unavailable on
> > fatal connection error
> 
> > Fixes: 260f32adb88 ("pNFS/flexfiles: Check the result of
> > nfs4_pnfs_ds_connect")
> > 
> > When an applications get killed (SIGTERM/SIGINT) while pNFS client
> > performs a
> > connection
> > to DS, client ends in an infinite loop of connect-disconnect. This
> > source of the issue, it that
> > flexfilelayoutdev#nfs4_ff_layout_prepare_ds gets an
> > error
> > on nfs4_pnfs_ds_connect with status ERESTARTSYS, which is set by
> > rpc_signal_task, but
> > the error is treated as transient, thus retried.
> > 
> > The issue is reproducible with script as (there should be ~1000
> > files in
> > a directory, client should must not have any connections to DSes):
> > 
> > ```
> > echo 3 > /proc/sys/vm/drop_caches
> > 
> > for i in *
> > do
> >        head -1 $i &
> >        PP=$!
> >        sleep 10e-03
> >        kill -TERM $PP
> > done
> > ```
> > 
> > Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
> > ---
> > fs/nfs/flexfilelayout/flexfilelayoutdev.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> > 
> > diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c
> > b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
> > index 4a304cf17c4b..0008a8180c9b 100644
> > --- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c
> > +++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
> > @@ -410,6 +410,10 @@ nfs4_ff_layout_prepare_ds(struct
> > pnfs_layout_segment *lseg,
> > 			mirror->mirror_ds->ds_versions[0].wsize =
> > max_payload;
> > 		goto out;
> > 	}
> > +	/* There is a fatal error to connect to DS. Mark it
> > unavailable to avoid
> > infinite retry loop. */
> > +	if (nfs_error_is_fatal(status))
> > +		nfs4_mark_deviceid_unavailable(&mirror->mirror_ds-
> > >id_node);
> > +
> > noconnect:
> > 	ff_layout_track_ds_error(FF_LAYOUT_FROM_HDR(lseg-
> > >pls_layout),
> > 				 mirror, lseg->pls_range.offset,
> > --
> > 2.49.0

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

  reply	other threads:[~2025-06-25 19:39 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-09 21:43 [PATCH 0/1] pNFS/flexfiles: mark device unavailable on fatal connection error Tigran Mkrtchyan
2025-06-09 21:43 ` [PATCH 1/1] " Tigran Mkrtchyan
2025-06-25 19:19   ` Mkrtchyan, Tigran
2025-06-25 19:39     ` Trond Myklebust [this message]
2025-06-26  9:17       ` Mkrtchyan, Tigran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e2a8b1e647e9d5f74e0ab5dd0924495625a02d3f.camel@kernel.org \
    --to=trondmy@kernel.org \
    --cc=anna@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=tigran.mkrtchyan@desy.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox