From: Trond Myklebust <trondmy@kernel.org>
To: Harshvardhan Jha <harshvardhan.j.jha@oracle.com>,
NeilBrown <neil@brown.name>, Anna Schumaker <anna@kernel.org>
Cc: Mark Brown <broonie@kernel.org>, linux-nfs@vger.kernel.org
Subject: Re: [PATCH] sunrpc: don't fail immediately in rpc_wait_bit_killable()
Date: Thu, 28 Aug 2025 06:10:21 -0700 [thread overview]
Message-ID: <e954bac8c996015ce93e54731c30d20a4b9c56c7.camel@kernel.org> (raw)
In-Reply-To: <b409c469-260e-4bd5-9cf8-49f524f3fd5a@oracle.com>
On Thu, 2025-08-28 at 18:12 +0530, Harshvardhan Jha wrote:
> Hi there,
>
> On 20/08/25 3:08 AM, NeilBrown wrote:
> > rpc_wait_bit_killable() is called when it is appropriate for a
> > fatal
> > signal to abort the wait.
> >
> > If it is called late during process exit after exit_signals() is
> > called
> > (and when PF_EXITING is set), it cannot receive a fatal signal so
> > waiting indefinitely is not safe.
> >
> > However aborting immediately, as it currently does, is not ideal as
> > it
> > mean that the related NFS request cannot succeed, even if the
> > network
> > and server are working properly.
> >
> > One of the causes of filesystem IO when PF_EXITING is set is
> > acct_process() which may access the process accounting file. For a
> > NFS-root configuration, this can be accessed over NFS.
> >
> > In this configuration LTP test "acct02" fails.
> >
> > Though waiting indefinitely is not appropriate, aborting
> > immediately is
> > also not desirable. This patch aims for a middle ground of waiting
> > at
> > most 5 seconds. This should be enough when NFS service is working,
> > but
> > not so much as to delay process exit excessively when NFS service
> > is not
> > functioning.
> >
> > Reported-by: Mark Brown <broonie@kernel.org>
> > Reported-and-tested-by: Harshvardhan Jha
> > <harshvardhan.j.jha@oracle.com>
> > Link:
> > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Flore.kernel.org%2Flinux-nfs%2F7d4d57b0-39a3-49f1-8ada-60364743e3b4%40sirena.org.uk%2F__%3B!!ACWV5N9M2RV99hQ!LaRJdjZulcG71nHFWdEAszB9mJEhezxPsDxHO8xeQJ7P8a9UfYNRIm1ziuuHU5wxgEXW14vAqC1dlpSQraWaxA%24&data=05%7C02%7Ctrondmy%40hammerspace.com%7C5463825a86c248e5766c08dde6306312%7C0d4fed5c3a7046fe9430ece41741f59e%7C0%7C0%7C638919817692991630%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=GBRM89CFMk2vKyeN4yKBvjiV9IrODC4tbwMfRSk4Cfc%3D&reserved=0
> >
> > Fixes: 14e41b16e8cb ("SUNRPC: Don't allow waiting for exiting
> > tasks")
> > Signed-off-by: NeilBrown <neil@brown.name>
> > ---
> > net/sunrpc/sched.c | 14 +++++++++-----
> > 1 file changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > index 73bc39281ef5..92f39e828fbe 100644
> > --- a/net/sunrpc/sched.c
> > +++ b/net/sunrpc/sched.c
> > @@ -276,11 +276,15 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
> >
> > static int rpc_wait_bit_killable(struct wait_bit_key *key, int
> > mode)
> > {
> > - if (unlikely(current->flags & PF_EXITING))
> > - return -EINTR;
> > - schedule();
> > - if (signal_pending_state(mode, current))
> > - return -ERESTARTSYS;
> > + if (unlikely(current->flags & PF_EXITING)) {
> > + /* Cannot be killed by a signal, so don't wait
> > indefinitely */
> > + if (schedule_timeout(5 * HZ) == 0)
> > + return -EINTR;
> > + } else {
> > + schedule();
> > + if (signal_pending_state(mode, current))
> > + return -ERESTARTSYS;
> > + }
> > return 0;
> > }
> >
> Is it possible to get this merged in 6.17? I have tested this and the
> LTP tests pass
If we are in this situation, it is because some signal has already
killed the parent task. That throws any data integrity guarantees right
out of the window.
So what problem is this patch trying to fix?
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com
next prev parent reply other threads:[~2025-08-28 13:10 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-19 21:38 [PATCH] sunrpc: don't fail immediately in rpc_wait_bit_killable() NeilBrown
2025-08-28 12:42 ` Harshvardhan Jha
2025-08-28 13:10 ` Trond Myklebust [this message]
2025-08-29 1:04 ` NeilBrown
2025-09-05 19:38 ` Trond Myklebust
2025-09-05 22:45 ` NeilBrown
2025-09-05 23:16 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e954bac8c996015ce93e54731c30d20a4b9c56c7.camel@kernel.org \
--to=trondmy@kernel.org \
--cc=anna@kernel.org \
--cc=broonie@kernel.org \
--cc=harshvardhan.j.jha@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox