* [PATCH] sunrpc: don't fail immediately in rpc_wait_bit_killable()
@ 2025-08-19 21:38 NeilBrown
2025-08-28 12:42 ` Harshvardhan Jha
0 siblings, 1 reply; 7+ messages in thread
From: NeilBrown @ 2025-08-19 21:38 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: Harshvardhan Jha, Mark Brown, linux-nfs
rpc_wait_bit_killable() is called when it is appropriate for a fatal
signal to abort the wait.
If it is called late during process exit after exit_signals() is called
(and when PF_EXITING is set), it cannot receive a fatal signal so
waiting indefinitely is not safe.
However aborting immediately, as it currently does, is not ideal as it
mean that the related NFS request cannot succeed, even if the network
and server are working properly.
One of the causes of filesystem IO when PF_EXITING is set is
acct_process() which may access the process accounting file. For a
NFS-root configuration, this can be accessed over NFS.
In this configuration LTP test "acct02" fails.
Though waiting indefinitely is not appropriate, aborting immediately is
also not desirable. This patch aims for a middle ground of waiting at
most 5 seconds. This should be enough when NFS service is working, but
not so much as to delay process exit excessively when NFS service is not
functioning.
Reported-by: Mark Brown <broonie@kernel.org>
Reported-and-tested-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com>
Link: https://lore.kernel.org/linux-nfs/7d4d57b0-39a3-49f1-8ada-60364743e3b4@sirena.org.uk/
Fixes: 14e41b16e8cb ("SUNRPC: Don't allow waiting for exiting tasks")
Signed-off-by: NeilBrown <neil@brown.name>
---
net/sunrpc/sched.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 73bc39281ef5..92f39e828fbe 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -276,11 +276,15 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
{
- if (unlikely(current->flags & PF_EXITING))
- return -EINTR;
- schedule();
- if (signal_pending_state(mode, current))
- return -ERESTARTSYS;
+ if (unlikely(current->flags & PF_EXITING)) {
+ /* Cannot be killed by a signal, so don't wait indefinitely */
+ if (schedule_timeout(5 * HZ) == 0)
+ return -EINTR;
+ } else {
+ schedule();
+ if (signal_pending_state(mode, current))
+ return -ERESTARTSYS;
+ }
return 0;
}
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] sunrpc: don't fail immediately in rpc_wait_bit_killable()
2025-08-19 21:38 [PATCH] sunrpc: don't fail immediately in rpc_wait_bit_killable() NeilBrown
@ 2025-08-28 12:42 ` Harshvardhan Jha
2025-08-28 13:10 ` Trond Myklebust
2025-09-05 19:38 ` Trond Myklebust
0 siblings, 2 replies; 7+ messages in thread
From: Harshvardhan Jha @ 2025-08-28 12:42 UTC (permalink / raw)
To: NeilBrown, Trond Myklebust, Anna Schumaker; +Cc: Mark Brown, linux-nfs
Hi there,
On 20/08/25 3:08 AM, NeilBrown wrote:
> rpc_wait_bit_killable() is called when it is appropriate for a fatal
> signal to abort the wait.
>
> If it is called late during process exit after exit_signals() is called
> (and when PF_EXITING is set), it cannot receive a fatal signal so
> waiting indefinitely is not safe.
>
> However aborting immediately, as it currently does, is not ideal as it
> mean that the related NFS request cannot succeed, even if the network
> and server are working properly.
>
> One of the causes of filesystem IO when PF_EXITING is set is
> acct_process() which may access the process accounting file. For a
> NFS-root configuration, this can be accessed over NFS.
>
> In this configuration LTP test "acct02" fails.
>
> Though waiting indefinitely is not appropriate, aborting immediately is
> also not desirable. This patch aims for a middle ground of waiting at
> most 5 seconds. This should be enough when NFS service is working, but
> not so much as to delay process exit excessively when NFS service is not
> functioning.
>
> Reported-by: Mark Brown <broonie@kernel.org>
> Reported-and-tested-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com>
> Link: https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/7d4d57b0-39a3-49f1-8ada-60364743e3b4@sirena.org.uk/__;!!ACWV5N9M2RV99hQ!LaRJdjZulcG71nHFWdEAszB9mJEhezxPsDxHO8xeQJ7P8a9UfYNRIm1ziuuHU5wxgEXW14vAqC1dlpSQraWaxA$
> Fixes: 14e41b16e8cb ("SUNRPC: Don't allow waiting for exiting tasks")
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
> net/sunrpc/sched.c | 14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index 73bc39281ef5..92f39e828fbe 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -276,11 +276,15 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
>
> static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
> {
> - if (unlikely(current->flags & PF_EXITING))
> - return -EINTR;
> - schedule();
> - if (signal_pending_state(mode, current))
> - return -ERESTARTSYS;
> + if (unlikely(current->flags & PF_EXITING)) {
> + /* Cannot be killed by a signal, so don't wait indefinitely */
> + if (schedule_timeout(5 * HZ) == 0)
> + return -EINTR;
> + } else {
> + schedule();
> + if (signal_pending_state(mode, current))
> + return -ERESTARTSYS;
> + }
> return 0;
> }
>
Is it possible to get this merged in 6.17? I have tested this and the
LTP tests pass.
Thanks & Regards,
Harshvardhan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] sunrpc: don't fail immediately in rpc_wait_bit_killable()
2025-08-28 12:42 ` Harshvardhan Jha
@ 2025-08-28 13:10 ` Trond Myklebust
2025-08-29 1:04 ` NeilBrown
2025-09-05 19:38 ` Trond Myklebust
1 sibling, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2025-08-28 13:10 UTC (permalink / raw)
To: Harshvardhan Jha, NeilBrown, Anna Schumaker; +Cc: Mark Brown, linux-nfs
On Thu, 2025-08-28 at 18:12 +0530, Harshvardhan Jha wrote:
> Hi there,
>
> On 20/08/25 3:08 AM, NeilBrown wrote:
> > rpc_wait_bit_killable() is called when it is appropriate for a
> > fatal
> > signal to abort the wait.
> >
> > If it is called late during process exit after exit_signals() is
> > called
> > (and when PF_EXITING is set), it cannot receive a fatal signal so
> > waiting indefinitely is not safe.
> >
> > However aborting immediately, as it currently does, is not ideal as
> > it
> > mean that the related NFS request cannot succeed, even if the
> > network
> > and server are working properly.
> >
> > One of the causes of filesystem IO when PF_EXITING is set is
> > acct_process() which may access the process accounting file. For a
> > NFS-root configuration, this can be accessed over NFS.
> >
> > In this configuration LTP test "acct02" fails.
> >
> > Though waiting indefinitely is not appropriate, aborting
> > immediately is
> > also not desirable. This patch aims for a middle ground of waiting
> > at
> > most 5 seconds. This should be enough when NFS service is working,
> > but
> > not so much as to delay process exit excessively when NFS service
> > is not
> > functioning.
> >
> > Reported-by: Mark Brown <broonie@kernel.org>
> > Reported-and-tested-by: Harshvardhan Jha
> > <harshvardhan.j.jha@oracle.com>
> > Link:
> > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Flore.kernel.org%2Flinux-nfs%2F7d4d57b0-39a3-49f1-8ada-60364743e3b4%40sirena.org.uk%2F__%3B!!ACWV5N9M2RV99hQ!LaRJdjZulcG71nHFWdEAszB9mJEhezxPsDxHO8xeQJ7P8a9UfYNRIm1ziuuHU5wxgEXW14vAqC1dlpSQraWaxA%24&data=05%7C02%7Ctrondmy%40hammerspace.com%7C5463825a86c248e5766c08dde6306312%7C0d4fed5c3a7046fe9430ece41741f59e%7C0%7C0%7C638919817692991630%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=GBRM89CFMk2vKyeN4yKBvjiV9IrODC4tbwMfRSk4Cfc%3D&reserved=0
> >
> > Fixes: 14e41b16e8cb ("SUNRPC: Don't allow waiting for exiting
> > tasks")
> > Signed-off-by: NeilBrown <neil@brown.name>
> > ---
> > net/sunrpc/sched.c | 14 +++++++++-----
> > 1 file changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > index 73bc39281ef5..92f39e828fbe 100644
> > --- a/net/sunrpc/sched.c
> > +++ b/net/sunrpc/sched.c
> > @@ -276,11 +276,15 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
> >
> > static int rpc_wait_bit_killable(struct wait_bit_key *key, int
> > mode)
> > {
> > - if (unlikely(current->flags & PF_EXITING))
> > - return -EINTR;
> > - schedule();
> > - if (signal_pending_state(mode, current))
> > - return -ERESTARTSYS;
> > + if (unlikely(current->flags & PF_EXITING)) {
> > + /* Cannot be killed by a signal, so don't wait
> > indefinitely */
> > + if (schedule_timeout(5 * HZ) == 0)
> > + return -EINTR;
> > + } else {
> > + schedule();
> > + if (signal_pending_state(mode, current))
> > + return -ERESTARTSYS;
> > + }
> > return 0;
> > }
> >
> Is it possible to get this merged in 6.17? I have tested this and the
> LTP tests pass
If we are in this situation, it is because some signal has already
killed the parent task. That throws any data integrity guarantees right
out of the window.
So what problem is this patch trying to fix?
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] sunrpc: don't fail immediately in rpc_wait_bit_killable()
2025-08-28 13:10 ` Trond Myklebust
@ 2025-08-29 1:04 ` NeilBrown
0 siblings, 0 replies; 7+ messages in thread
From: NeilBrown @ 2025-08-29 1:04 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Harshvardhan Jha, Anna Schumaker, Mark Brown, linux-nfs
On Thu, 28 Aug 2025, Trond Myklebust wrote:
> On Thu, 2025-08-28 at 18:12 +0530, Harshvardhan Jha wrote:
> > Hi there,
> >
> > On 20/08/25 3:08 AM, NeilBrown wrote:
> > > rpc_wait_bit_killable() is called when it is appropriate for a
> > > fatal
> > > signal to abort the wait.
> > >
> > > If it is called late during process exit after exit_signals() is
> > > called
> > > (and when PF_EXITING is set), it cannot receive a fatal signal so
> > > waiting indefinitely is not safe.
> > >
> > > However aborting immediately, as it currently does, is not ideal as
> > > it
> > > mean that the related NFS request cannot succeed, even if the
> > > network
> > > and server are working properly.
> > >
> > > One of the causes of filesystem IO when PF_EXITING is set is
> > > acct_process() which may access the process accounting file. For a
> > > NFS-root configuration, this can be accessed over NFS.
> > >
> > > In this configuration LTP test "acct02" fails.
> > >
> > > Though waiting indefinitely is not appropriate, aborting
> > > immediately is
> > > also not desirable. This patch aims for a middle ground of waiting
> > > at
> > > most 5 seconds. This should be enough when NFS service is working,
> > > but
> > > not so much as to delay process exit excessively when NFS service
> > > is not
> > > functioning.
> > >
> > > Reported-by: Mark Brown <broonie@kernel.org>
> > > Reported-and-tested-by: Harshvardhan Jha
> > > <harshvardhan.j.jha@oracle.com>
> > > Link:
> > > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Flore.kernel.org%2Flinux-nfs%2F7d4d57b0-39a3-49f1-8ada-60364743e3b4%40sirena.org.uk%2F__%3B!!ACWV5N9M2RV99hQ!LaRJdjZulcG71nHFWdEAszB9mJEhezxPsDxHO8xeQJ7P8a9UfYNRIm1ziuuHU5wxgEXW14vAqC1dlpSQraWaxA%24&data=05%7C02%7Ctrondmy%40hammerspace.com%7C5463825a86c248e5766c08dde6306312%7C0d4fed5c3a7046fe9430ece41741f59e%7C0%7C0%7C638919817692991630%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=GBRM89CFMk2vKyeN4yKBvjiV9IrODC4tbwMfRSk4Cfc%3D&reserved=0
> > >
> > > Fixes: 14e41b16e8cb ("SUNRPC: Don't allow waiting for exiting
> > > tasks")
> > > Signed-off-by: NeilBrown <neil@brown.name>
> > > ---
> > > net/sunrpc/sched.c | 14 +++++++++-----
> > > 1 file changed, 9 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > > index 73bc39281ef5..92f39e828fbe 100644
> > > --- a/net/sunrpc/sched.c
> > > +++ b/net/sunrpc/sched.c
> > > @@ -276,11 +276,15 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
> > >
> > > static int rpc_wait_bit_killable(struct wait_bit_key *key, int
> > > mode)
> > > {
> > > - if (unlikely(current->flags & PF_EXITING))
> > > - return -EINTR;
> > > - schedule();
> > > - if (signal_pending_state(mode, current))
> > > - return -ERESTARTSYS;
> > > + if (unlikely(current->flags & PF_EXITING)) {
> > > + /* Cannot be killed by a signal, so don't wait
> > > indefinitely */
> > > + if (schedule_timeout(5 * HZ) == 0)
> > > + return -EINTR;
> > > + } else {
> > > + schedule();
> > > + if (signal_pending_state(mode, current))
> > > + return -ERESTARTSYS;
> > > + }
> > > return 0;
> > > }
> > >
> > Is it possible to get this merged in 6.17? I have tested this and the
> > LTP tests pass
>
> If we are in this situation, it is because some signal has already
> killed the parent task. That throws any data integrity guarantees right
> out of the window.
Or it could be because the task has exited normally.
>
> So what problem is this patch trying to fix?
Process accounting writes a record to the accounting file when a process
exits. It writes this record from the context of the process that is
exiting. It does this after signals have been disabled.
So this is not related to data integrity for any data that the process
wrote. I believe that is all handled correctly, partly because writes
and close are performed asynchronously and partly because nfs_wb_all()
ultimately uses a non-killable wait.
It is only related to writing to the process accounting file.
Thanks,
NeilBrown
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trondmy@kernel.org, trond.myklebust@hammerspace.com
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] sunrpc: don't fail immediately in rpc_wait_bit_killable()
2025-08-28 12:42 ` Harshvardhan Jha
2025-08-28 13:10 ` Trond Myklebust
@ 2025-09-05 19:38 ` Trond Myklebust
2025-09-05 22:45 ` NeilBrown
1 sibling, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2025-09-05 19:38 UTC (permalink / raw)
To: Harshvardhan Jha, NeilBrown, Anna Schumaker; +Cc: Mark Brown, linux-nfs
On Thu, 2025-08-28 at 18:12 +0530, Harshvardhan Jha wrote:
> Hi there,
>
> On 20/08/25 3:08 AM, NeilBrown wrote:
> > rpc_wait_bit_killable() is called when it is appropriate for a
> > fatal
> > signal to abort the wait.
> >
> > If it is called late during process exit after exit_signals() is
> > called
> > (and when PF_EXITING is set), it cannot receive a fatal signal so
> > waiting indefinitely is not safe.
> >
> > However aborting immediately, as it currently does, is not ideal as
> > it
> > mean that the related NFS request cannot succeed, even if the
> > network
> > and server are working properly.
> >
> > One of the causes of filesystem IO when PF_EXITING is set is
> > acct_process() which may access the process accounting file. For a
> > NFS-root configuration, this can be accessed over NFS.
> >
> > In this configuration LTP test "acct02" fails.
> >
> > Though waiting indefinitely is not appropriate, aborting
> > immediately is
> > also not desirable. This patch aims for a middle ground of waiting
> > at
> > most 5 seconds. This should be enough when NFS service is working,
> > but
> > not so much as to delay process exit excessively when NFS service
> > is not
> > functioning.
> >
> > Reported-by: Mark Brown <broonie@kernel.org>
> > Reported-and-tested-by: Harshvardhan Jha
> > <harshvardhan.j.jha@oracle.com>
> > Link:
> > https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/7d4d57b0-39a3-49f1-8ada-60364743e3b4@sirena.org.uk/__;!!ACWV5N9M2RV99hQ!LaRJdjZulcG71nHFWdEAszB9mJEhezxPsDxHO8xeQJ7P8a9UfYNRIm1ziuuHU5wxgEXW14vAqC1dlpSQraWaxA$
> >
> > Fixes: 14e41b16e8cb ("SUNRPC: Don't allow waiting for exiting
> > tasks")
> > Signed-off-by: NeilBrown <neil@brown.name>
> > ---
> > net/sunrpc/sched.c | 14 +++++++++-----
> > 1 file changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > index 73bc39281ef5..92f39e828fbe 100644
> > --- a/net/sunrpc/sched.c
> > +++ b/net/sunrpc/sched.c
> > @@ -276,11 +276,15 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
> >
> > static int rpc_wait_bit_killable(struct wait_bit_key *key, int
> > mode)
> > {
> > - if (unlikely(current->flags & PF_EXITING))
> > - return -EINTR;
> > - schedule();
> > - if (signal_pending_state(mode, current))
> > - return -ERESTARTSYS;
> > + if (unlikely(current->flags & PF_EXITING)) {
> > + /* Cannot be killed by a signal, so don't wait
> > indefinitely */
> > + if (schedule_timeout(5 * HZ) == 0)
> > + return -EINTR;
> > + } else {
> > + schedule();
> > + if (signal_pending_state(mode, current))
> > + return -ERESTARTSYS;
> > + }
> > return 0;
> > }
> >
> Is it possible to get this merged in 6.17? I have tested this and the
> LTP tests pass.
After much thought, I think I'd rather just revert the commit that
caused the issue. I'll work on an alternative for the 6.18 timeframe
instead.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] sunrpc: don't fail immediately in rpc_wait_bit_killable()
2025-09-05 19:38 ` Trond Myklebust
@ 2025-09-05 22:45 ` NeilBrown
2025-09-05 23:16 ` Trond Myklebust
0 siblings, 1 reply; 7+ messages in thread
From: NeilBrown @ 2025-09-05 22:45 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Harshvardhan Jha, Anna Schumaker, Mark Brown, linux-nfs
On Sat, 06 Sep 2025, Trond Myklebust wrote:
> On Thu, 2025-08-28 at 18:12 +0530, Harshvardhan Jha wrote:
> > Is it possible to get this merged in 6.17? I have tested this and the
> > LTP tests pass.
>
> After much thought, I think I'd rather just revert the commit that
> caused the issue. I'll work on an alternative for the 6.18 timeframe
> instead.
That seems reasonable - thanks. I'd be curious to know what the
original issue was. I'm guess it was CLOSE blocking ??
If you do revert, would you consider the following? I wrote it a while
ago but it became irrelevant with the patch that you might now revert.
I wonder if it would make sense for some part of bit_wait() (or
rpc_wait_bit_killable()) to warn if waiting in TASK_KILLABLE when
PF_EXITING is set.
Thanks,
NeilBrown
From: NeilBrown <neil@brown.name>
Date: Fri, 27 Sep 2024 12:53:52 +1000
Subject: [PATCH] sunrpc: discard rpc_wait_bit_killable()
rpc_wait_bit_killable() currently differs from bit_wait() in the it returns
-ERESTARTSYS rather then -EINTR. The sunrpc and nfs code never really
care about the difference. The error could get up to user-space but it
is only generated when a process is being killed, in which case there is
no user-space to see the difference.
Signed-off-by: NeilBrown <neil@brown.name>
---
net/sunrpc/sched.c | 14 +++-----------
1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 9b45fbdc90ca..247e48641d91 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -274,14 +274,6 @@ void rpc_destroy_wait_queue(struct rpc_wait_queue *queue)
}
EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
-static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
-{
- schedule();
- if (signal_pending_state(mode, current))
- return -ERESTARTSYS;
- return 0;
-}
-
#if IS_ENABLED(CONFIG_SUNRPC_DEBUG) || IS_ENABLED(CONFIG_TRACEPOINTS)
static void rpc_task_set_debuginfo(struct rpc_task *task)
{
@@ -343,7 +335,7 @@ static int rpc_complete_task(struct rpc_task *task)
int rpc_wait_for_completion_task(struct rpc_task *task)
{
return out_of_line_wait_on_bit(&task->tk_runstate, RPC_TASK_ACTIVE,
- rpc_wait_bit_killable, TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
+ bit_wait, TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
}
EXPORT_SYMBOL_GPL(rpc_wait_for_completion_task);
@@ -983,12 +975,12 @@ static void __rpc_execute(struct rpc_task *task)
/* sync task: sleep here */
trace_rpc_task_sync_sleep(task, task->tk_action);
status = out_of_line_wait_on_bit(&task->tk_runstate,
- RPC_TASK_QUEUED, rpc_wait_bit_killable,
+ RPC_TASK_QUEUED, bit_wait,
TASK_KILLABLE|TASK_FREEZABLE);
if (status < 0) {
/*
* When a sync task receives a signal, it exits with
- * -ERESTARTSYS. In order to catch any callbacks that
+ * -EINTR. In order to catch any callbacks that
* clean up after sleeping on some queue, we don't
* break the loop here, but go around once more.
*/
--
2.50.0.107.gf914562f5916.dirty
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] sunrpc: don't fail immediately in rpc_wait_bit_killable()
2025-09-05 22:45 ` NeilBrown
@ 2025-09-05 23:16 ` Trond Myklebust
0 siblings, 0 replies; 7+ messages in thread
From: Trond Myklebust @ 2025-09-05 23:16 UTC (permalink / raw)
To: NeilBrown; +Cc: Harshvardhan Jha, Anna Schumaker, Mark Brown, linux-nfs
On Sat, 2025-09-06 at 08:45 +1000, NeilBrown wrote:
> On Sat, 06 Sep 2025, Trond Myklebust wrote:
> > On Thu, 2025-08-28 at 18:12 +0530, Harshvardhan Jha wrote:
> > > Is it possible to get this merged in 6.17? I have tested this and
> > > the
> > > LTP tests pass.
> >
> > After much thought, I think I'd rather just revert the commit that
> > caused the issue. I'll work on an alternative for the 6.18
> > timeframe
> > instead.
>
> That seems reasonable - thanks. I'd be curious to know what the
> original issue was. I'm guess it was CLOSE blocking ??
>
A customer of ours was seeing the NFS flush-on-close hanging after the
process itself had been killed. So the patch was really intended to
address the problem of signalled threads. The fact that it also
affected ordinary processes that rely on exit() to close the file
descriptors was due to a brain fart on my part.
> If you do revert, would you consider the following? I wrote it a
> while
> ago but it became irrelevant with the patch that you might now
> revert.
>
> I wonder if it would make sense for some part of bit_wait() (or
> rpc_wait_bit_killable()) to warn if waiting in TASK_KILLABLE when
> PF_EXITING is set.
What I've been thinking is that for a process in the PF_EXITING state,
we might automatically set RPC_TASK_TIMEOUT on any new RPC calls. The
main problem with that is that it could cause loss of data, since
ETIMEDOUT is a fatal error. So perhaps a new flag similar to
RPC_TASK_TIMEDOUT, but that instead sets EINTR?
The reason for doing that rather than changing the wait, is that when
everything is working correctly, we might want to allow the exiting
process to wait for a very large file flush to complete. It's only when
the RPC calls themselves start to hang due to the server being
unavailable, that we want to actually pull the plug.
>
> Thanks,
> NeilBrown
>
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-09-05 23:16 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-19 21:38 [PATCH] sunrpc: don't fail immediately in rpc_wait_bit_killable() NeilBrown
2025-08-28 12:42 ` Harshvardhan Jha
2025-08-28 13:10 ` Trond Myklebust
2025-08-29 1:04 ` NeilBrown
2025-09-05 19:38 ` Trond Myklebust
2025-09-05 22:45 ` NeilBrown
2025-09-05 23:16 ` Trond Myklebust
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox