* [PATCH 1/2] SUNRPC: Don't allow waiting for exiting tasks
@ 2025-03-28 17:40 trondmy
2025-03-28 17:40 ` [PATCH 2/2] NFS: " trondmy
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: trondmy @ 2025-03-28 17:40 UTC (permalink / raw)
To: linux-nfs
From: Trond Myklebust <trond.myklebust@hammerspace.com>
Once a task calls exit_signals() it can no longer be signalled. So do
not allow it to do killable waits.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
net/sunrpc/sched.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 9b45fbdc90ca..73bc39281ef5 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -276,6 +276,8 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
{
+ if (unlikely(current->flags & PF_EXITING))
+ return -EINTR;
schedule();
if (signal_pending_state(mode, current))
return -ERESTARTSYS;
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/2] NFS: Don't allow waiting for exiting tasks
2025-03-28 17:40 [PATCH 1/2] SUNRPC: Don't allow waiting for exiting tasks trondmy
@ 2025-03-28 17:40 ` trondmy
2025-03-28 18:23 ` Jeff Layton
2025-03-28 17:53 ` [PATCH 1/2] SUNRPC: " Jeff Layton
2025-04-08 10:31 ` Mark Brown
2 siblings, 1 reply; 17+ messages in thread
From: trondmy @ 2025-03-28 17:40 UTC (permalink / raw)
To: linux-nfs
From: Trond Myklebust <trond.myklebust@hammerspace.com>
Once a task calls exit_signals() it can no longer be signalled. So do
not allow it to do killable waits.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
fs/nfs/inode.c | 2 ++
fs/nfs/internal.h | 5 +++++
fs/nfs/nfs3proc.c | 2 +-
fs/nfs/nfs4proc.c | 9 +++++++--
4 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 1aa67fca69b2..119e447758b9 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -74,6 +74,8 @@ nfs_fattr_to_ino_t(struct nfs_fattr *fattr)
int nfs_wait_bit_killable(struct wait_bit_key *key, int mode)
{
+ if (unlikely(nfs_current_task_exiting()))
+ return -EINTR;
schedule();
if (signal_pending_state(mode, current))
return -ERESTARTSYS;
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index fae2c7ae4acc..2133b3c20bad 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -912,6 +912,11 @@ static inline u32 nfs_stateid_hash(nfs4_stateid *stateid)
}
#endif
+static inline bool nfs_current_task_exiting(void)
+{
+ return (current->flags & PF_EXITING) != 0;
+}
+
static inline bool nfs_error_is_fatal(int err)
{
switch (err) {
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index 0c3bc98cd999..c1736dbb92b6 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -39,7 +39,7 @@ nfs3_rpc_wrapper(struct rpc_clnt *clnt, struct rpc_message *msg, int flags)
__set_current_state(TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
schedule_timeout(NFS_JUKEBOX_RETRY_TIME);
res = -ERESTARTSYS;
- } while (!fatal_signal_pending(current));
+ } while (!fatal_signal_pending(current) && !nfs_current_task_exiting());
return res;
}
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 50be54e0f578..da97f87ecaa9 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -446,6 +446,8 @@ static int nfs4_delay_killable(long *timeout)
{
might_sleep();
+ if (unlikely(nfs_current_task_exiting()))
+ return -EINTR;
__set_current_state(TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
schedule_timeout(nfs4_update_delay(timeout));
if (!__fatal_signal_pending(current))
@@ -457,6 +459,8 @@ static int nfs4_delay_interruptible(long *timeout)
{
might_sleep();
+ if (unlikely(nfs_current_task_exiting()))
+ return -EINTR;
__set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE_UNSAFE);
schedule_timeout(nfs4_update_delay(timeout));
if (!signal_pending(current))
@@ -1777,7 +1781,8 @@ static void nfs_set_open_stateid_locked(struct nfs4_state *state,
rcu_read_unlock();
trace_nfs4_open_stateid_update_wait(state->inode, stateid, 0);
- if (!fatal_signal_pending(current)) {
+ if (!fatal_signal_pending(current) &&
+ !nfs_current_task_exiting()) {
if (schedule_timeout(5*HZ) == 0)
status = -EAGAIN;
else
@@ -3581,7 +3586,7 @@ static bool nfs4_refresh_open_old_stateid(nfs4_stateid *dst,
write_sequnlock(&state->seqlock);
trace_nfs4_close_stateid_update_wait(state->inode, dst, 0);
- if (fatal_signal_pending(current))
+ if (fatal_signal_pending(current) || nfs_current_task_exiting())
status = -EINTR;
else
if (schedule_timeout(5*HZ) != 0)
--
2.49.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] SUNRPC: Don't allow waiting for exiting tasks
2025-03-28 17:40 [PATCH 1/2] SUNRPC: Don't allow waiting for exiting tasks trondmy
2025-03-28 17:40 ` [PATCH 2/2] NFS: " trondmy
@ 2025-03-28 17:53 ` Jeff Layton
2025-03-28 18:00 ` Trond Myklebust
2025-04-08 10:31 ` Mark Brown
2 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2025-03-28 17:53 UTC (permalink / raw)
To: trondmy, linux-nfs
On Fri, 2025-03-28 at 13:40 -0400, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> Once a task calls exit_signals() it can no longer be signalled. So do
> not allow it to do killable waits.
>
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
> net/sunrpc/sched.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index 9b45fbdc90ca..73bc39281ef5 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -276,6 +276,8 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
>
> static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
> {
> + if (unlikely(current->flags & PF_EXITING))
> + return -EINTR;
> schedule();
> if (signal_pending_state(mode, current))
> return -ERESTARTSYS;
Won't this mean that if a task is signalled and does a final fput, that
a CLOSE sent in task_work will never get sent?
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] SUNRPC: Don't allow waiting for exiting tasks
2025-03-28 17:53 ` [PATCH 1/2] SUNRPC: " Jeff Layton
@ 2025-03-28 18:00 ` Trond Myklebust
2025-03-28 18:09 ` Jeff Layton
0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2025-03-28 18:00 UTC (permalink / raw)
To: linux-nfs@vger.kernel.org, jlayton@kernel.org
On Fri, 2025-03-28 at 13:53 -0400, Jeff Layton wrote:
> On Fri, 2025-03-28 at 13:40 -0400, trondmy@kernel.org wrote:
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> >
> > Once a task calls exit_signals() it can no longer be signalled. So
> > do
> > not allow it to do killable waits.
> >
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > ---
> > net/sunrpc/sched.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > index 9b45fbdc90ca..73bc39281ef5 100644
> > --- a/net/sunrpc/sched.c
> > +++ b/net/sunrpc/sched.c
> > @@ -276,6 +276,8 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
> >
> > static int rpc_wait_bit_killable(struct wait_bit_key *key, int
> > mode)
> > {
> > + if (unlikely(current->flags & PF_EXITING))
> > + return -EINTR;
> > schedule();
> > if (signal_pending_state(mode, current))
> > return -ERESTARTSYS;
>
> Won't this mean that if a task is signalled and does a final fput,
> that
> a CLOSE sent in task_work will never get sent?
It should mean that the close gets sent, but the task won't wait for
completion.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] SUNRPC: Don't allow waiting for exiting tasks
2025-03-28 18:00 ` Trond Myklebust
@ 2025-03-28 18:09 ` Jeff Layton
2025-03-28 19:36 ` Trond Myklebust
0 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2025-03-28 18:09 UTC (permalink / raw)
To: Trond Myklebust, linux-nfs@vger.kernel.org
On Fri, 2025-03-28 at 18:00 +0000, Trond Myklebust wrote:
> On Fri, 2025-03-28 at 13:53 -0400, Jeff Layton wrote:
> > On Fri, 2025-03-28 at 13:40 -0400, trondmy@kernel.org wrote:
> > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > >
> > > Once a task calls exit_signals() it can no longer be signalled. So
> > > do
> > > not allow it to do killable waits.
> > >
> > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > ---
> > > net/sunrpc/sched.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > > index 9b45fbdc90ca..73bc39281ef5 100644
> > > --- a/net/sunrpc/sched.c
> > > +++ b/net/sunrpc/sched.c
> > > @@ -276,6 +276,8 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
> > >
> > > static int rpc_wait_bit_killable(struct wait_bit_key *key, int
> > > mode)
> > > {
> > > + if (unlikely(current->flags & PF_EXITING))
> > > + return -EINTR;
> > > schedule();
> > > if (signal_pending_state(mode, current))
> > > return -ERESTARTSYS;
> >
> > Won't this mean that if a task is signalled and does a final fput,
> > that
> > a CLOSE sent in task_work will never get sent?
>
> It should mean that the close gets sent, but the task won't wait for
> completion.
>
Good enough, I guess.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] NFS: Don't allow waiting for exiting tasks
2025-03-28 17:40 ` [PATCH 2/2] NFS: " trondmy
@ 2025-03-28 18:23 ` Jeff Layton
0 siblings, 0 replies; 17+ messages in thread
From: Jeff Layton @ 2025-03-28 18:23 UTC (permalink / raw)
To: trondmy, linux-nfs
On Fri, 2025-03-28 at 13:40 -0400, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> Once a task calls exit_signals() it can no longer be signalled. So do
> not allow it to do killable waits.
>
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
> fs/nfs/inode.c | 2 ++
> fs/nfs/internal.h | 5 +++++
> fs/nfs/nfs3proc.c | 2 +-
> fs/nfs/nfs4proc.c | 9 +++++++--
> 4 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index 1aa67fca69b2..119e447758b9 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -74,6 +74,8 @@ nfs_fattr_to_ino_t(struct nfs_fattr *fattr)
>
> int nfs_wait_bit_killable(struct wait_bit_key *key, int mode)
> {
> + if (unlikely(nfs_current_task_exiting()))
> + return -EINTR;
> schedule();
> if (signal_pending_state(mode, current))
> return -ERESTARTSYS;
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index fae2c7ae4acc..2133b3c20bad 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -912,6 +912,11 @@ static inline u32 nfs_stateid_hash(nfs4_stateid *stateid)
> }
> #endif
>
> +static inline bool nfs_current_task_exiting(void)
> +{
> + return (current->flags & PF_EXITING) != 0;
> +}
> +
> static inline bool nfs_error_is_fatal(int err)
> {
> switch (err) {
> diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
> index 0c3bc98cd999..c1736dbb92b6 100644
> --- a/fs/nfs/nfs3proc.c
> +++ b/fs/nfs/nfs3proc.c
> @@ -39,7 +39,7 @@ nfs3_rpc_wrapper(struct rpc_clnt *clnt, struct rpc_message *msg, int flags)
> __set_current_state(TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
> schedule_timeout(NFS_JUKEBOX_RETRY_TIME);
> res = -ERESTARTSYS;
> - } while (!fatal_signal_pending(current));
> + } while (!fatal_signal_pending(current) && !nfs_current_task_exiting());
> return res;
> }
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 50be54e0f578..da97f87ecaa9 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -446,6 +446,8 @@ static int nfs4_delay_killable(long *timeout)
> {
> might_sleep();
>
> + if (unlikely(nfs_current_task_exiting()))
> + return -EINTR;
> __set_current_state(TASK_KILLABLE|TASK_FREEZABLE_UNSAFE);
> schedule_timeout(nfs4_update_delay(timeout));
> if (!__fatal_signal_pending(current))
> @@ -457,6 +459,8 @@ static int nfs4_delay_interruptible(long *timeout)
> {
> might_sleep();
>
> + if (unlikely(nfs_current_task_exiting()))
> + return -EINTR;
> __set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE_UNSAFE);
> schedule_timeout(nfs4_update_delay(timeout));
> if (!signal_pending(current))
> @@ -1777,7 +1781,8 @@ static void nfs_set_open_stateid_locked(struct nfs4_state *state,
> rcu_read_unlock();
> trace_nfs4_open_stateid_update_wait(state->inode, stateid, 0);
>
> - if (!fatal_signal_pending(current)) {
> + if (!fatal_signal_pending(current) &&
> + !nfs_current_task_exiting()) {
> if (schedule_timeout(5*HZ) == 0)
> status = -EAGAIN;
> else
> @@ -3581,7 +3586,7 @@ static bool nfs4_refresh_open_old_stateid(nfs4_stateid *dst,
> write_sequnlock(&state->seqlock);
> trace_nfs4_close_stateid_update_wait(state->inode, dst, 0);
>
> - if (fatal_signal_pending(current))
> + if (fatal_signal_pending(current) || nfs_current_task_exiting())
> status = -EINTR;
> else
> if (schedule_timeout(5*HZ) != 0)
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] SUNRPC: Don't allow waiting for exiting tasks
2025-03-28 18:09 ` Jeff Layton
@ 2025-03-28 19:36 ` Trond Myklebust
0 siblings, 0 replies; 17+ messages in thread
From: Trond Myklebust @ 2025-03-28 19:36 UTC (permalink / raw)
To: linux-nfs@vger.kernel.org, jlayton@kernel.org
On Fri, 2025-03-28 at 14:09 -0400, Jeff Layton wrote:
> On Fri, 2025-03-28 at 18:00 +0000, Trond Myklebust wrote:
> > On Fri, 2025-03-28 at 13:53 -0400, Jeff Layton wrote:
> > > On Fri, 2025-03-28 at 13:40 -0400, trondmy@kernel.org wrote:
> > > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > >
> > > > Once a task calls exit_signals() it can no longer be signalled.
> > > > So
> > > > do
> > > > not allow it to do killable waits.
> > > >
> > > > Signed-off-by: Trond Myklebust
> > > > <trond.myklebust@hammerspace.com>
> > > > ---
> > > > net/sunrpc/sched.c | 2 ++
> > > > 1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > > > index 9b45fbdc90ca..73bc39281ef5 100644
> > > > --- a/net/sunrpc/sched.c
> > > > +++ b/net/sunrpc/sched.c
> > > > @@ -276,6 +276,8 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
> > > >
> > > > static int rpc_wait_bit_killable(struct wait_bit_key *key, int
> > > > mode)
> > > > {
> > > > + if (unlikely(current->flags & PF_EXITING))
> > > > + return -EINTR;
> > > > schedule();
> > > > if (signal_pending_state(mode, current))
> > > > return -ERESTARTSYS;
> > >
> > > Won't this mean that if a task is signalled and does a final
> > > fput,
> > > that
> > > a CLOSE sent in task_work will never get sent?
> >
> > It should mean that the close gets sent, but the task won't wait
> > for
> > completion.
> >
>
> Good enough, I guess.
>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
>
It ensures that the task can die, and with the actual close RPC call
being asynchronous, it will keep going until either the server comes
back and processes it, or someone force umounts the partition and kills
the call, etc.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] SUNRPC: Don't allow waiting for exiting tasks
2025-03-28 17:40 [PATCH 1/2] SUNRPC: Don't allow waiting for exiting tasks trondmy
2025-03-28 17:40 ` [PATCH 2/2] NFS: " trondmy
2025-03-28 17:53 ` [PATCH 1/2] SUNRPC: " Jeff Layton
@ 2025-04-08 10:31 ` Mark Brown
2025-07-23 7:02 ` Harshvardhan Jha
2 siblings, 1 reply; 17+ messages in thread
From: Mark Brown @ 2025-04-08 10:31 UTC (permalink / raw)
To: trondmy; +Cc: linux-nfs, Aishwarya.TCV
[-- Attachment #1: Type: text/plain, Size: 5240 bytes --]
On Fri, Mar 28, 2025 at 01:40:44PM -0400, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> Once a task calls exit_signals() it can no longer be signalled. So do
> not allow it to do killable waits.
We're seeing the LTP acct02 test failing in kernels with this patch
applied, testing on systems with NFS root filesystems:
10271 05:03:09.064993 tst_test.c:1900: TINFO: LTP version: 20250130-1-g60fe84aaf
10272 05:03:09.076425 tst_test.c:1904: TINFO: Tested kernel: 6.15.0-rc1 #1 SMP PREEMPT Sun Apr 6 21:18:14 UTC 2025 aarch64
10273 05:03:09.076733 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
10274 05:03:09.087803 tst_test.c:1722: TINFO: Overall timeout per run is 0h 01m 30s
10275 05:03:09.088107 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
10276 05:03:09.093097 acct02.c:63: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
10277 05:03:09.093400 acct02.c:240: TINFO: Verifying using 'struct acct_v3'
10278 05:03:10.053504 <6>[ 98.043143] Process accounting resumed
10279 05:03:10.053935 <6>[ 98.043143] Process accounting resumed
10280 05:03:10.064653 acct02.c:193: TINFO: == entry 1 ==
10281 05:03:10.064953 acct02.c:84: TINFO: ac_comm != 'acct02_helper' ('acct02')
10282 05:03:10.076029 acct02.c:133: TINFO: ac_exitcode != 32768 (0)
10283 05:03:10.076331 acct02.c:141: TINFO: ac_ppid != 2466 (2461)
10284 05:03:10.076572 acct02.c:183: TFAIL: end of file reached
10285 05:03:10.076790
10286 05:03:10.087439 HINT: You _MAY_ be missing kernel fixes:
10287 05:03:10.087741
10288 05:03:10.087979 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d9570158b626
10289 05:03:10.088201
10290 05:03:10.088414 Summary:
10291 05:03:10.088618 passed 0
10292 05:03:10.098852 failed 1
10293 05:03:10.099212 broken 0
10294 05:03:10.099454 skipped 0
10295 05:03:10.099675 warnings 0
I ran a bisect which zeroed in on this commit (log below), I didn't look
into it properly but the test does start and exit a test program to
verify that accounting records get created properly which does look
relevant.
git bisect start
# status: waiting for both good and bad commits
# bad: [0af2f6be1b4281385b618cb86ad946eded089ac8] Linux 6.15-rc1
git bisect bad 0af2f6be1b4281385b618cb86ad946eded089ac8
# status: waiting for good commit(s), bad commit known
# good: [38fec10eb60d687e30c8c6b5420d86e8149f7557] Linux 6.14
git bisect good 38fec10eb60d687e30c8c6b5420d86e8149f7557
# good: [fd71def6d9abc5ae362fb9995d46049b7b0ed391] Merge tag 'for-6.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
git bisect good fd71def6d9abc5ae362fb9995d46049b7b0ed391
# good: [93d52288679e29aaa44a6f12d5a02e8a90e742c5] Merge tag 'backlight-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight
git bisect good 93d52288679e29aaa44a6f12d5a02e8a90e742c5
# good: [2cd5769fb0b78b8ef583ab4c0015c2c48d525dac] Merge tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
git bisect good 2cd5769fb0b78b8ef583ab4c0015c2c48d525dac
# bad: [25757984d77da731922bed5001431673b6daf5ac] Merge tag 'staging-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect bad 25757984d77da731922bed5001431673b6daf5ac
# good: [28a1b05678f4e88de90b0987b06e13c454ad9bd6] Merge tag 'i2c-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
git bisect good 28a1b05678f4e88de90b0987b06e13c454ad9bd6
# good: [92b71befc349587d58fdbbe6cdd68fb67f4933a8] Merge tag 'objtool-urgent-2025-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 92b71befc349587d58fdbbe6cdd68fb67f4933a8
# good: [5e17b5c71729d8ce936c83a579ed45f65efcb456] Merge tag 'fuse-update-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
git bisect good 5e17b5c71729d8ce936c83a579ed45f65efcb456
# good: [344a50b0f4eecc160c61d780f53d2f75586016ce] staging: gpib: lpvo_usb_gpib: struct gpib_board
git bisect good 344a50b0f4eecc160c61d780f53d2f75586016ce
# bad: [9e8f324bd44c1fe026b582b75213de4eccfa1163] NFSv4: Check for delegation validity in nfs_start_delegation_return_locked()
git bisect bad 9e8f324bd44c1fe026b582b75213de4eccfa1163
# good: [df210d9b0951d714c1668c511ca5c8ff38cf6916] sunrpc: Add a sysfs file for adding a new xprt
git bisect good df210d9b0951d714c1668c511ca5c8ff38cf6916
# good: [bf9be373b830a3e48117da5d89bb6145a575f880] SUNRPC: rpc_clnt_set_transport() must not change the autobind setting
git bisect good bf9be373b830a3e48117da5d89bb6145a575f880
# good: [c81d5bcb7b38ab0322aea93152c091451b82d3f3] NFSv4: clp->cl_cons_state < 0 signifies an invalid nfs_client
git bisect good c81d5bcb7b38ab0322aea93152c091451b82d3f3
# bad: [14e41b16e8cb677bb440dca2edba8b041646c742] SUNRPC: Don't allow waiting for exiting tasks
git bisect bad 14e41b16e8cb677bb440dca2edba8b041646c742
# good: [0af5fb5ed3d2fd9e110c6112271f022b744a849a] NFSv4: Treat ENETUNREACH errors as fatal for state recovery
git bisect good 0af5fb5ed3d2fd9e110c6112271f022b744a849a
# first bad commit: [14e41b16e8cb677bb440dca2edba8b041646c742] SUNRPC: Don't allow waiting for exiting tasks
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] SUNRPC: Don't allow waiting for exiting tasks
2025-04-08 10:31 ` Mark Brown
@ 2025-07-23 7:02 ` Harshvardhan Jha
2025-07-23 8:07 ` NeilBrown
0 siblings, 1 reply; 17+ messages in thread
From: Harshvardhan Jha @ 2025-07-23 7:02 UTC (permalink / raw)
To: Mark Brown, trondmy
Cc: linux-nfs, Aishwarya.TCV, ltp, Chuck Lever, Jeff Layton,
NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey, Anna Schumaker
On 08/04/25 4:01 PM, Mark Brown wrote:
> On Fri, Mar 28, 2025 at 01:40:44PM -0400, trondmy@kernel.org wrote:
>> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>>
>> Once a task calls exit_signals() it can no longer be signalled. So do
>> not allow it to do killable waits.
> We're seeing the LTP acct02 test failing in kernels with this patch
> applied, testing on systems with NFS root filesystems:
>
> 10271 05:03:09.064993 tst_test.c:1900: TINFO: LTP version: 20250130-1-g60fe84aaf
> 10272 05:03:09.076425 tst_test.c:1904: TINFO: Tested kernel: 6.15.0-rc1 #1 SMP PREEMPT Sun Apr 6 21:18:14 UTC 2025 aarch64
> 10273 05:03:09.076733 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
> 10274 05:03:09.087803 tst_test.c:1722: TINFO: Overall timeout per run is 0h 01m 30s
> 10275 05:03:09.088107 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
> 10276 05:03:09.093097 acct02.c:63: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
> 10277 05:03:09.093400 acct02.c:240: TINFO: Verifying using 'struct acct_v3'
> 10278 05:03:10.053504 <6>[ 98.043143] Process accounting resumed
> 10279 05:03:10.053935 <6>[ 98.043143] Process accounting resumed
> 10280 05:03:10.064653 acct02.c:193: TINFO: == entry 1 ==
> 10281 05:03:10.064953 acct02.c:84: TINFO: ac_comm != 'acct02_helper' ('acct02')
> 10282 05:03:10.076029 acct02.c:133: TINFO: ac_exitcode != 32768 (0)
> 10283 05:03:10.076331 acct02.c:141: TINFO: ac_ppid != 2466 (2461)
> 10284 05:03:10.076572 acct02.c:183: TFAIL: end of file reached
> 10285 05:03:10.076790
> 10286 05:03:10.087439 HINT: You _MAY_ be missing kernel fixes:
> 10287 05:03:10.087741
> 10288 05:03:10.087979 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d9570158b626
> 10289 05:03:10.088201
> 10290 05:03:10.088414 Summary:
> 10291 05:03:10.088618 passed 0
> 10292 05:03:10.098852 failed 1
> 10293 05:03:10.099212 broken 0
> 10294 05:03:10.099454 skipped 0
> 10295 05:03:10.099675 warnings 0
>
> I ran a bisect which zeroed in on this commit (log below), I didn't look
> into it properly but the test does start and exit a test program to
> verify that accounting records get created properly which does look
> relevant.
Hi there,
I faced the same issue and reverting this patch fixed the issue.
Is this an issue introduced by this patch or it's due to the ltp
testcase being outdated?
Thanks & Regards,
Harshvardhan
>
> git bisect start
> # status: waiting for both good and bad commits
> # bad: [0af2f6be1b4281385b618cb86ad946eded089ac8] Linux 6.15-rc1
> git bisect bad 0af2f6be1b4281385b618cb86ad946eded089ac8
> # status: waiting for good commit(s), bad commit known
> # good: [38fec10eb60d687e30c8c6b5420d86e8149f7557] Linux 6.14
> git bisect good 38fec10eb60d687e30c8c6b5420d86e8149f7557
> # good: [fd71def6d9abc5ae362fb9995d46049b7b0ed391] Merge tag 'for-6.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
> git bisect good fd71def6d9abc5ae362fb9995d46049b7b0ed391
> # good: [93d52288679e29aaa44a6f12d5a02e8a90e742c5] Merge tag 'backlight-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight
> git bisect good 93d52288679e29aaa44a6f12d5a02e8a90e742c5
> # good: [2cd5769fb0b78b8ef583ab4c0015c2c48d525dac] Merge tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
> git bisect good 2cd5769fb0b78b8ef583ab4c0015c2c48d525dac
> # bad: [25757984d77da731922bed5001431673b6daf5ac] Merge tag 'staging-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
> git bisect bad 25757984d77da731922bed5001431673b6daf5ac
> # good: [28a1b05678f4e88de90b0987b06e13c454ad9bd6] Merge tag 'i2c-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
> git bisect good 28a1b05678f4e88de90b0987b06e13c454ad9bd6
> # good: [92b71befc349587d58fdbbe6cdd68fb67f4933a8] Merge tag 'objtool-urgent-2025-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good 92b71befc349587d58fdbbe6cdd68fb67f4933a8
> # good: [5e17b5c71729d8ce936c83a579ed45f65efcb456] Merge tag 'fuse-update-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
> git bisect good 5e17b5c71729d8ce936c83a579ed45f65efcb456
> # good: [344a50b0f4eecc160c61d780f53d2f75586016ce] staging: gpib: lpvo_usb_gpib: struct gpib_board
> git bisect good 344a50b0f4eecc160c61d780f53d2f75586016ce
> # bad: [9e8f324bd44c1fe026b582b75213de4eccfa1163] NFSv4: Check for delegation validity in nfs_start_delegation_return_locked()
> git bisect bad 9e8f324bd44c1fe026b582b75213de4eccfa1163
> # good: [df210d9b0951d714c1668c511ca5c8ff38cf6916] sunrpc: Add a sysfs file for adding a new xprt
> git bisect good df210d9b0951d714c1668c511ca5c8ff38cf6916
> # good: [bf9be373b830a3e48117da5d89bb6145a575f880] SUNRPC: rpc_clnt_set_transport() must not change the autobind setting
> git bisect good bf9be373b830a3e48117da5d89bb6145a575f880
> # good: [c81d5bcb7b38ab0322aea93152c091451b82d3f3] NFSv4: clp->cl_cons_state < 0 signifies an invalid nfs_client
> git bisect good c81d5bcb7b38ab0322aea93152c091451b82d3f3
> # bad: [14e41b16e8cb677bb440dca2edba8b041646c742] SUNRPC: Don't allow waiting for exiting tasks
> git bisect bad 14e41b16e8cb677bb440dca2edba8b041646c742
> # good: [0af5fb5ed3d2fd9e110c6112271f022b744a849a] NFSv4: Treat ENETUNREACH errors as fatal for state recovery
> git bisect good 0af5fb5ed3d2fd9e110c6112271f022b744a849a
> # first bad commit: [14e41b16e8cb677bb440dca2edba8b041646c742] SUNRPC: Don't allow waiting for exiting tasks
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] SUNRPC: Don't allow waiting for exiting tasks
2025-07-23 7:02 ` Harshvardhan Jha
@ 2025-07-23 8:07 ` NeilBrown
2025-07-25 11:59 ` Harshvardhan Jha
0 siblings, 1 reply; 17+ messages in thread
From: NeilBrown @ 2025-07-23 8:07 UTC (permalink / raw)
To: Harshvardhan Jha
Cc: Mark Brown, trondmy, linux-nfs, Aishwarya.TCV, ltp, Chuck Lever,
Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Anna Schumaker
On Wed, 23 Jul 2025, Harshvardhan Jha wrote:
> On 08/04/25 4:01 PM, Mark Brown wrote:
> > On Fri, Mar 28, 2025 at 01:40:44PM -0400, trondmy@kernel.org wrote:
> >> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> >>
> >> Once a task calls exit_signals() it can no longer be signalled. So do
> >> not allow it to do killable waits.
> > We're seeing the LTP acct02 test failing in kernels with this patch
> > applied, testing on systems with NFS root filesystems:
> >
> > 10271 05:03:09.064993 tst_test.c:1900: TINFO: LTP version: 20250130-1-g60fe84aaf
> > 10272 05:03:09.076425 tst_test.c:1904: TINFO: Tested kernel: 6.15.0-rc1 #1 SMP PREEMPT Sun Apr 6 21:18:14 UTC 2025 aarch64
> > 10273 05:03:09.076733 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
> > 10274 05:03:09.087803 tst_test.c:1722: TINFO: Overall timeout per run is 0h 01m 30s
> > 10275 05:03:09.088107 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
> > 10276 05:03:09.093097 acct02.c:63: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
> > 10277 05:03:09.093400 acct02.c:240: TINFO: Verifying using 'struct acct_v3'
> > 10278 05:03:10.053504 <6>[ 98.043143] Process accounting resumed
> > 10279 05:03:10.053935 <6>[ 98.043143] Process accounting resumed
> > 10280 05:03:10.064653 acct02.c:193: TINFO: == entry 1 ==
> > 10281 05:03:10.064953 acct02.c:84: TINFO: ac_comm != 'acct02_helper' ('acct02')
> > 10282 05:03:10.076029 acct02.c:133: TINFO: ac_exitcode != 32768 (0)
> > 10283 05:03:10.076331 acct02.c:141: TINFO: ac_ppid != 2466 (2461)
It seems that the acct02 process got logged..
Maybe the vfork attempt (trying to run acct02_helper) got half way an
aborted.
It got far enough that accounting got interested.
It didn't get far enough to update the ppid.
I'd be surprised if that were even possible....
If you would like to help debug this, changing the
+ if (unlikely(current->flags & PF_EXITING))
to
+ if (unlikely(WARN_ON(current->flags & PF_EXITING)))
would provide stack traces so we can wee where -EINTR is actually being
returned. That should provide some hints.
NeilBrown
> > 10284 05:03:10.076572 acct02.c:183: TFAIL: end of file reached
> > 10285 05:03:10.076790
> > 10286 05:03:10.087439 HINT: You _MAY_ be missing kernel fixes:
> > 10287 05:03:10.087741
> > 10288 05:03:10.087979 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d9570158b626
> > 10289 05:03:10.088201
> > 10290 05:03:10.088414 Summary:
> > 10291 05:03:10.088618 passed 0
> > 10292 05:03:10.098852 failed 1
> > 10293 05:03:10.099212 broken 0
> > 10294 05:03:10.099454 skipped 0
> > 10295 05:03:10.099675 warnings 0
> >
> > I ran a bisect which zeroed in on this commit (log below), I didn't look
> > into it properly but the test does start and exit a test program to
> > verify that accounting records get created properly which does look
> > relevant.
>
> Hi there,
> I faced the same issue and reverting this patch fixed the issue.
> Is this an issue introduced by this patch or it's due to the ltp
> testcase being outdated?
>
> Thanks & Regards,
> Harshvardhan
>
> >
> > git bisect start
> > # status: waiting for both good and bad commits
> > # bad: [0af2f6be1b4281385b618cb86ad946eded089ac8] Linux 6.15-rc1
> > git bisect bad 0af2f6be1b4281385b618cb86ad946eded089ac8
> > # status: waiting for good commit(s), bad commit known
> > # good: [38fec10eb60d687e30c8c6b5420d86e8149f7557] Linux 6.14
> > git bisect good 38fec10eb60d687e30c8c6b5420d86e8149f7557
> > # good: [fd71def6d9abc5ae362fb9995d46049b7b0ed391] Merge tag 'for-6.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
> > git bisect good fd71def6d9abc5ae362fb9995d46049b7b0ed391
> > # good: [93d52288679e29aaa44a6f12d5a02e8a90e742c5] Merge tag 'backlight-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight
> > git bisect good 93d52288679e29aaa44a6f12d5a02e8a90e742c5
> > # good: [2cd5769fb0b78b8ef583ab4c0015c2c48d525dac] Merge tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
> > git bisect good 2cd5769fb0b78b8ef583ab4c0015c2c48d525dac
> > # bad: [25757984d77da731922bed5001431673b6daf5ac] Merge tag 'staging-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
> > git bisect bad 25757984d77da731922bed5001431673b6daf5ac
> > # good: [28a1b05678f4e88de90b0987b06e13c454ad9bd6] Merge tag 'i2c-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
> > git bisect good 28a1b05678f4e88de90b0987b06e13c454ad9bd6
> > # good: [92b71befc349587d58fdbbe6cdd68fb67f4933a8] Merge tag 'objtool-urgent-2025-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > git bisect good 92b71befc349587d58fdbbe6cdd68fb67f4933a8
> > # good: [5e17b5c71729d8ce936c83a579ed45f65efcb456] Merge tag 'fuse-update-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
> > git bisect good 5e17b5c71729d8ce936c83a579ed45f65efcb456
> > # good: [344a50b0f4eecc160c61d780f53d2f75586016ce] staging: gpib: lpvo_usb_gpib: struct gpib_board
> > git bisect good 344a50b0f4eecc160c61d780f53d2f75586016ce
> > # bad: [9e8f324bd44c1fe026b582b75213de4eccfa1163] NFSv4: Check for delegation validity in nfs_start_delegation_return_locked()
> > git bisect bad 9e8f324bd44c1fe026b582b75213de4eccfa1163
> > # good: [df210d9b0951d714c1668c511ca5c8ff38cf6916] sunrpc: Add a sysfs file for adding a new xprt
> > git bisect good df210d9b0951d714c1668c511ca5c8ff38cf6916
> > # good: [bf9be373b830a3e48117da5d89bb6145a575f880] SUNRPC: rpc_clnt_set_transport() must not change the autobind setting
> > git bisect good bf9be373b830a3e48117da5d89bb6145a575f880
> > # good: [c81d5bcb7b38ab0322aea93152c091451b82d3f3] NFSv4: clp->cl_cons_state < 0 signifies an invalid nfs_client
> > git bisect good c81d5bcb7b38ab0322aea93152c091451b82d3f3
> > # bad: [14e41b16e8cb677bb440dca2edba8b041646c742] SUNRPC: Don't allow waiting for exiting tasks
> > git bisect bad 14e41b16e8cb677bb440dca2edba8b041646c742
> > # good: [0af5fb5ed3d2fd9e110c6112271f022b744a849a] NFSv4: Treat ENETUNREACH errors as fatal for state recovery
> > git bisect good 0af5fb5ed3d2fd9e110c6112271f022b744a849a
> > # first bad commit: [14e41b16e8cb677bb440dca2edba8b041646c742] SUNRPC: Don't allow waiting for exiting tasks
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] SUNRPC: Don't allow waiting for exiting tasks
2025-07-23 8:07 ` NeilBrown
@ 2025-07-25 11:59 ` Harshvardhan Jha
2025-07-27 4:50 ` NeilBrown
0 siblings, 1 reply; 17+ messages in thread
From: Harshvardhan Jha @ 2025-07-25 11:59 UTC (permalink / raw)
To: NeilBrown
Cc: Mark Brown, trondmy, linux-nfs, Aishwarya.TCV, ltp, Chuck Lever,
Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Anna Schumaker
On 23/07/25 1:37 PM, NeilBrown wrote:
> On Wed, 23 Jul 2025, Harshvardhan Jha wrote:
>> On 08/04/25 4:01 PM, Mark Brown wrote:
>>> On Fri, Mar 28, 2025 at 01:40:44PM -0400, trondmy@kernel.org wrote:
>>>> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>>>>
>>>> Once a task calls exit_signals() it can no longer be signalled. So do
>>>> not allow it to do killable waits.
>>> We're seeing the LTP acct02 test failing in kernels with this patch
>>> applied, testing on systems with NFS root filesystems:
>>>
>>> 10271 05:03:09.064993 tst_test.c:1900: TINFO: LTP version: 20250130-1-g60fe84aaf
>>> 10272 05:03:09.076425 tst_test.c:1904: TINFO: Tested kernel: 6.15.0-rc1 #1 SMP PREEMPT Sun Apr 6 21:18:14 UTC 2025 aarch64
>>> 10273 05:03:09.076733 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
>>> 10274 05:03:09.087803 tst_test.c:1722: TINFO: Overall timeout per run is 0h 01m 30s
>>> 10275 05:03:09.088107 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
>>> 10276 05:03:09.093097 acct02.c:63: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
>>> 10277 05:03:09.093400 acct02.c:240: TINFO: Verifying using 'struct acct_v3'
>>> 10278 05:03:10.053504 <6>[ 98.043143] Process accounting resumed
>>> 10279 05:03:10.053935 <6>[ 98.043143] Process accounting resumed
>>> 10280 05:03:10.064653 acct02.c:193: TINFO: == entry 1 ==
>>> 10281 05:03:10.064953 acct02.c:84: TINFO: ac_comm != 'acct02_helper' ('acct02')
>>> 10282 05:03:10.076029 acct02.c:133: TINFO: ac_exitcode != 32768 (0)
>>> 10283 05:03:10.076331 acct02.c:141: TINFO: ac_ppid != 2466 (2461)
> It seems that the acct02 process got logged..
> Maybe the vfork attempt (trying to run acct02_helper) got half way an
> aborted.
> It got far enough that accounting got interested.
> It didn't get far enough to update the ppid.
> I'd be surprised if that were even possible....
>
> If you would like to help debug this, changing the
>
> + if (unlikely(current->flags & PF_EXITING))
>
> to
>
> + if (unlikely(WARN_ON(current->flags & PF_EXITING)))
>
> would provide stack traces so we can wee where -EINTR is actually being
> returned. That should provide some hints.
>
> NeilBrown
Hi Neil,
Upon this addition I got this in the logs
<<<test_start>>>
tag=acct02 stime=1753444172
cmdline="acct02"
contacts=""
analysis=exit
<<<test_output>>>
tst_kconfig.c:88: TINFO: Parsing kernel config
'/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
tst_tmpdir.c:316: TINFO: Using /tmpdir/ltp-w1ozKKlJ6n/LTP_acc4RRfLh as
tmpdir (nfs filesystem)
tst_test.c:2004: TINFO: LTP version: 20250530-105-gda73e1527
tst_test.c:2007: TINFO: Tested kernel:
6.15.8-1.bug38227970.el9.rc2.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jul 25
02:03:04 PDT 2025 x86_64
tst_kconfig.c:88: TINFO: Parsing kernel config
'/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
tst_test.c:1825: TINFO: Overall timeout per run is 0h 00m 30s
tst_kconfig.c:88: TINFO: Parsing kernel config
'/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
acct02.c:61: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
acct02.c:238: TINFO: Verifying using 'struct acct_v3'
acct02.c:191: TINFO: == entry 1 ==
acct02.c:82: TINFO: ac_comm != 'acct02_helper' ('acct02')
acct02.c:131: TINFO: ac_exitcode != 32768 (0)
acct02.c:139: TINFO: ac_ppid != 88929 (88928)
acct02.c:181: TFAIL: end of file reached
HINT: You _MAY_ be missing kernel fixes:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d9570158b626
Summary:
passed 0
failed 1
broken 0
skipped 0
warnings 0
incrementing stop
<<<execution_status>>>
initiation_status="ok"
duration=1 termination_type=exited termination_id=1 corefile=no
cutime=0 cstime=20
<<<test_end>>>
Thanks & Regards,
Harshvardhan
>
>>> 10284 05:03:10.076572 acct02.c:183: TFAIL: end of file reached
>>> 10285 05:03:10.076790
>>> 10286 05:03:10.087439 HINT: You _MAY_ be missing kernel fixes:
>>> 10287 05:03:10.087741
>>> 10288 05:03:10.087979 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d9570158b626
>>> 10289 05:03:10.088201
>>> 10290 05:03:10.088414 Summary:
>>> 10291 05:03:10.088618 passed 0
>>> 10292 05:03:10.098852 failed 1
>>> 10293 05:03:10.099212 broken 0
>>> 10294 05:03:10.099454 skipped 0
>>> 10295 05:03:10.099675 warnings 0
>>>
>>> I ran a bisect which zeroed in on this commit (log below), I didn't look
>>> into it properly but the test does start and exit a test program to
>>> verify that accounting records get created properly which does look
>>> relevant.
>> Hi there,
>> I faced the same issue and reverting this patch fixed the issue.
>> Is this an issue introduced by this patch or it's due to the ltp
>> testcase being outdated?
>>
>> Thanks & Regards,
>> Harshvardhan
>>
>>> git bisect start
>>> # status: waiting for both good and bad commits
>>> # bad: [0af2f6be1b4281385b618cb86ad946eded089ac8] Linux 6.15-rc1
>>> git bisect bad 0af2f6be1b4281385b618cb86ad946eded089ac8
>>> # status: waiting for good commit(s), bad commit known
>>> # good: [38fec10eb60d687e30c8c6b5420d86e8149f7557] Linux 6.14
>>> git bisect good 38fec10eb60d687e30c8c6b5420d86e8149f7557
>>> # good: [fd71def6d9abc5ae362fb9995d46049b7b0ed391] Merge tag 'for-6.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
>>> git bisect good fd71def6d9abc5ae362fb9995d46049b7b0ed391
>>> # good: [93d52288679e29aaa44a6f12d5a02e8a90e742c5] Merge tag 'backlight-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight
>>> git bisect good 93d52288679e29aaa44a6f12d5a02e8a90e742c5
>>> # good: [2cd5769fb0b78b8ef583ab4c0015c2c48d525dac] Merge tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
>>> git bisect good 2cd5769fb0b78b8ef583ab4c0015c2c48d525dac
>>> # bad: [25757984d77da731922bed5001431673b6daf5ac] Merge tag 'staging-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
>>> git bisect bad 25757984d77da731922bed5001431673b6daf5ac
>>> # good: [28a1b05678f4e88de90b0987b06e13c454ad9bd6] Merge tag 'i2c-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
>>> git bisect good 28a1b05678f4e88de90b0987b06e13c454ad9bd6
>>> # good: [92b71befc349587d58fdbbe6cdd68fb67f4933a8] Merge tag 'objtool-urgent-2025-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>> git bisect good 92b71befc349587d58fdbbe6cdd68fb67f4933a8
>>> # good: [5e17b5c71729d8ce936c83a579ed45f65efcb456] Merge tag 'fuse-update-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
>>> git bisect good 5e17b5c71729d8ce936c83a579ed45f65efcb456
>>> # good: [344a50b0f4eecc160c61d780f53d2f75586016ce] staging: gpib: lpvo_usb_gpib: struct gpib_board
>>> git bisect good 344a50b0f4eecc160c61d780f53d2f75586016ce
>>> # bad: [9e8f324bd44c1fe026b582b75213de4eccfa1163] NFSv4: Check for delegation validity in nfs_start_delegation_return_locked()
>>> git bisect bad 9e8f324bd44c1fe026b582b75213de4eccfa1163
>>> # good: [df210d9b0951d714c1668c511ca5c8ff38cf6916] sunrpc: Add a sysfs file for adding a new xprt
>>> git bisect good df210d9b0951d714c1668c511ca5c8ff38cf6916
>>> # good: [bf9be373b830a3e48117da5d89bb6145a575f880] SUNRPC: rpc_clnt_set_transport() must not change the autobind setting
>>> git bisect good bf9be373b830a3e48117da5d89bb6145a575f880
>>> # good: [c81d5bcb7b38ab0322aea93152c091451b82d3f3] NFSv4: clp->cl_cons_state < 0 signifies an invalid nfs_client
>>> git bisect good c81d5bcb7b38ab0322aea93152c091451b82d3f3
>>> # bad: [14e41b16e8cb677bb440dca2edba8b041646c742] SUNRPC: Don't allow waiting for exiting tasks
>>> git bisect bad 14e41b16e8cb677bb440dca2edba8b041646c742
>>> # good: [0af5fb5ed3d2fd9e110c6112271f022b744a849a] NFSv4: Treat ENETUNREACH errors as fatal for state recovery
>>> git bisect good 0af5fb5ed3d2fd9e110c6112271f022b744a849a
>>> # first bad commit: [14e41b16e8cb677bb440dca2edba8b041646c742] SUNRPC: Don't allow waiting for exiting tasks
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] SUNRPC: Don't allow waiting for exiting tasks
2025-07-25 11:59 ` Harshvardhan Jha
@ 2025-07-27 4:50 ` NeilBrown
2025-07-28 8:07 ` Harshvardhan Jha
0 siblings, 1 reply; 17+ messages in thread
From: NeilBrown @ 2025-07-27 4:50 UTC (permalink / raw)
To: Harshvardhan Jha
Cc: Mark Brown, trondmy, linux-nfs, Aishwarya.TCV, ltp, Chuck Lever,
Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Anna Schumaker
On Fri, 25 Jul 2025, Harshvardhan Jha wrote:
> On 23/07/25 1:37 PM, NeilBrown wrote:
> > On Wed, 23 Jul 2025, Harshvardhan Jha wrote:
> >> On 08/04/25 4:01 PM, Mark Brown wrote:
> >>> On Fri, Mar 28, 2025 at 01:40:44PM -0400, trondmy@kernel.org wrote:
> >>>> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> >>>>
> >>>> Once a task calls exit_signals() it can no longer be signalled. So do
> >>>> not allow it to do killable waits.
> >>> We're seeing the LTP acct02 test failing in kernels with this patch
> >>> applied, testing on systems with NFS root filesystems:
> >>>
> >>> 10271 05:03:09.064993 tst_test.c:1900: TINFO: LTP version: 20250130-1-g60fe84aaf
> >>> 10272 05:03:09.076425 tst_test.c:1904: TINFO: Tested kernel: 6.15.0-rc1 #1 SMP PREEMPT Sun Apr 6 21:18:14 UTC 2025 aarch64
> >>> 10273 05:03:09.076733 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
> >>> 10274 05:03:09.087803 tst_test.c:1722: TINFO: Overall timeout per run is 0h 01m 30s
> >>> 10275 05:03:09.088107 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
> >>> 10276 05:03:09.093097 acct02.c:63: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
> >>> 10277 05:03:09.093400 acct02.c:240: TINFO: Verifying using 'struct acct_v3'
> >>> 10278 05:03:10.053504 <6>[ 98.043143] Process accounting resumed
> >>> 10279 05:03:10.053935 <6>[ 98.043143] Process accounting resumed
> >>> 10280 05:03:10.064653 acct02.c:193: TINFO: == entry 1 ==
> >>> 10281 05:03:10.064953 acct02.c:84: TINFO: ac_comm != 'acct02_helper' ('acct02')
> >>> 10282 05:03:10.076029 acct02.c:133: TINFO: ac_exitcode != 32768 (0)
> >>> 10283 05:03:10.076331 acct02.c:141: TINFO: ac_ppid != 2466 (2461)
> > It seems that the acct02 process got logged..
> > Maybe the vfork attempt (trying to run acct02_helper) got half way an
> > aborted.
> > It got far enough that accounting got interested.
> > It didn't get far enough to update the ppid.
> > I'd be surprised if that were even possible....
> >
> > If you would like to help debug this, changing the
> >
> > + if (unlikely(current->flags & PF_EXITING))
> >
> > to
> >
> > + if (unlikely(WARN_ON(current->flags & PF_EXITING)))
> >
> > would provide stack traces so we can wee where -EINTR is actually being
> > returned. That should provide some hints.
> >
> > NeilBrown
>
> Hi Neil,
>
> Upon this addition I got this in the logs
Thanks for testing. Was there anything new in the kernel logs? I was
expecting a WARNING message followed by a "Call Trace".
If there wasn't, then this patch cannot have caused the problem.
If there was, then I need to see it.
Thanks,
NeilBrown
>
> <<<test_start>>>
> tag=acct02 stime=1753444172
> cmdline="acct02"
> contacts=""
> analysis=exit
> <<<test_output>>>
> tst_kconfig.c:88: TINFO: Parsing kernel config
> '/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
> tst_tmpdir.c:316: TINFO: Using /tmpdir/ltp-w1ozKKlJ6n/LTP_acc4RRfLh as
> tmpdir (nfs filesystem)
> tst_test.c:2004: TINFO: LTP version: 20250530-105-gda73e1527
> tst_test.c:2007: TINFO: Tested kernel:
> 6.15.8-1.bug38227970.el9.rc2.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jul 25
> 02:03:04 PDT 2025 x86_64
> tst_kconfig.c:88: TINFO: Parsing kernel config
> '/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
> tst_test.c:1825: TINFO: Overall timeout per run is 0h 00m 30s
> tst_kconfig.c:88: TINFO: Parsing kernel config
> '/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
> acct02.c:61: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
> acct02.c:238: TINFO: Verifying using 'struct acct_v3'
> acct02.c:191: TINFO: == entry 1 ==
> acct02.c:82: TINFO: ac_comm != 'acct02_helper' ('acct02')
> acct02.c:131: TINFO: ac_exitcode != 32768 (0)
> acct02.c:139: TINFO: ac_ppid != 88929 (88928)
> acct02.c:181: TFAIL: end of file reached
>
> HINT: You _MAY_ be missing kernel fixes:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d9570158b626
>
> Summary:
> passed 0
> failed 1
> broken 0
> skipped 0
> warnings 0
> incrementing stop
> <<<execution_status>>>
> initiation_status="ok"
> duration=1 termination_type=exited termination_id=1 corefile=no
> cutime=0 cstime=20
>
> <<<test_end>>>
>
>
> Thanks & Regards,
>
> Harshvardhan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] SUNRPC: Don't allow waiting for exiting tasks
2025-07-27 4:50 ` NeilBrown
@ 2025-07-28 8:07 ` Harshvardhan Jha
2025-07-28 9:34 ` NeilBrown
0 siblings, 1 reply; 17+ messages in thread
From: Harshvardhan Jha @ 2025-07-28 8:07 UTC (permalink / raw)
To: NeilBrown
Cc: Mark Brown, trondmy, linux-nfs, Aishwarya.TCV, ltp, Chuck Lever,
Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Anna Schumaker
On 27/07/25 10:20 AM, NeilBrown wrote:
> On Fri, 25 Jul 2025, Harshvardhan Jha wrote:
>> On 23/07/25 1:37 PM, NeilBrown wrote:
>>> On Wed, 23 Jul 2025, Harshvardhan Jha wrote:
>>>> On 08/04/25 4:01 PM, Mark Brown wrote:
>>>>> On Fri, Mar 28, 2025 at 01:40:44PM -0400, trondmy@kernel.org wrote:
>>>>>> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>>>>>>
>>>>>> Once a task calls exit_signals() it can no longer be signalled. So do
>>>>>> not allow it to do killable waits.
>>>>> We're seeing the LTP acct02 test failing in kernels with this patch
>>>>> applied, testing on systems with NFS root filesystems:
>>>>>
>>>>> 10271 05:03:09.064993 tst_test.c:1900: TINFO: LTP version: 20250130-1-g60fe84aaf
>>>>> 10272 05:03:09.076425 tst_test.c:1904: TINFO: Tested kernel: 6.15.0-rc1 #1 SMP PREEMPT Sun Apr 6 21:18:14 UTC 2025 aarch64
>>>>> 10273 05:03:09.076733 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
>>>>> 10274 05:03:09.087803 tst_test.c:1722: TINFO: Overall timeout per run is 0h 01m 30s
>>>>> 10275 05:03:09.088107 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
>>>>> 10276 05:03:09.093097 acct02.c:63: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
>>>>> 10277 05:03:09.093400 acct02.c:240: TINFO: Verifying using 'struct acct_v3'
>>>>> 10278 05:03:10.053504 <6>[ 98.043143] Process accounting resumed
>>>>> 10279 05:03:10.053935 <6>[ 98.043143] Process accounting resumed
>>>>> 10280 05:03:10.064653 acct02.c:193: TINFO: == entry 1 ==
>>>>> 10281 05:03:10.064953 acct02.c:84: TINFO: ac_comm != 'acct02_helper' ('acct02')
>>>>> 10282 05:03:10.076029 acct02.c:133: TINFO: ac_exitcode != 32768 (0)
>>>>> 10283 05:03:10.076331 acct02.c:141: TINFO: ac_ppid != 2466 (2461)
>>> It seems that the acct02 process got logged..
>>> Maybe the vfork attempt (trying to run acct02_helper) got half way an
>>> aborted.
>>> It got far enough that accounting got interested.
>>> It didn't get far enough to update the ppid.
>>> I'd be surprised if that were even possible....
>>>
>>> If you would like to help debug this, changing the
>>>
>>> + if (unlikely(current->flags & PF_EXITING))
>>>
>>> to
>>>
>>> + if (unlikely(WARN_ON(current->flags & PF_EXITING)))
>>>
>>> would provide stack traces so we can wee where -EINTR is actually being
>>> returned. That should provide some hints.
>>>
>>> NeilBrown
>> Hi Neil,
>>
>> Upon this addition I got this in the logs
> Thanks for testing. Was there anything new in the kernel logs? I was
> expecting a WARNING message followed by a "Call Trace".
>
> If there wasn't, then this patch cannot have caused the problem.
> If there was, then I need to see it.
>
> Thanks,
> NeilBrown
This is what the dmesg contains:
[ 678.814887] LTP: starting acct02
[ 679.831232] ------------[ cut here ]------------
[ 679.833500] WARNING: CPU: 6 PID: 88930 at net/sunrpc/sched.c:279
rpc_wait_bit_killable+0x76/0x90 [sunrpc]
[ 679.837308] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs
netfs rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd
grace loop nft_redir ipt_REJECT xt_comment xt_owner nft_compat
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib rfkill nft_reject_inet
nf_reject_
ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 ip_set cuse vfat fat intel_rapl_msr
intel_rapl_common kvm_amd ccp kvm drm_shmem_helper irqbypass i2c_piix4
drm_kms_helper pcspkr pvpanic_mmio i2c_smbus pvpanic drm fuse xfs
crc32c_generic
nvme_tcp nvme_fabrics nvme_core nvme_keyring nvme_auth sd_mod
virtio_net sg net_failover virtio_scsi failover ata_generic pata_acpi
ata_piix ghash_clmulni_intel libata sha512_ssse3 virtio_pci sha256_ssse3
virtio_pci_legacy_dev sha1_ssse3 virtio_pci_modern_dev serio_raw
dm_multipath btrfs
blake2b_generic xor zstd_compress raid6_pq sunrpc dm_mirror
dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls
cxgb3i cxgb3 mdio libcxgbi libcxgb
[ 679.837524] qla4xxx iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi iscsi_ibft iscsi_boot_sysfs qemu_fw_cfg aesni_intel
crypto_simd cryptd [last unloaded: kheaders]
[ 679.873316] CPU: 6 UID: 0 PID: 88930 Comm: acct02_helper Kdump:
loaded Not tainted 6.15.8-1.el9.rc2.x86_64 #1 PREEMPT(voluntary)
[ 679.877769] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.6.4 02/27/2023
[ 679.880782] RIP: 0010:rpc_wait_bit_killable+0x76/0x90 [sunrpc]
[ 679.883189] Code: 01 b8 00 fe ff ff 75 d5 48 8b 85 48 0d 00 00 5b 5d
48 c1 e8 08 83 e0 01 f7 d8 19 c0 25 00 fe ff ff 31 d2 31 f6 e9 8a e6 c4
d4 <0f> 0b b8 fc ff ff ff 5b 5d 31 d2 31 f6 e9 78 e6 c4 d4 0f 1f 84 00
[ 679.889976] RSP: 0018:ffffaf47811a7770 EFLAGS: 00010202
[ 679.892196] RAX: ffff97be48e00330 RBX: ffffaf47811a77c0 RCX:
0000000000000000
[ 679.894978] RDX: 0000000000000001 RSI: 0000000000002102 RDI:
ffffaf47811a77c0
[ 679.897786] RBP: ffff97be61588000 R08: 0000000000000000 R09:
0000000000000000
[ 679.900600] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000002102
[ 679.903432] R13: ffffffff96408ea0 R14: ffffaf47811a77d8 R15:
ffffffffc07568e0
[ 679.906233] FS: 00007fc2563f8600(0000) GS:ffff97c5c890f000(0000)
knlGS:0000000000000000
[ 679.909289] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 679.911736] CR2: 00007fc2561fba70 CR3: 00000003bce3a000 CR4:
00000000003506f0
[ 679.914555] Call Trace:
[ 679.915918] <TASK>
[ 679.917215] __wait_on_bit+0x31/0xa0
[ 679.918932] out_of_line_wait_on_bit+0x93/0xc0
[ 679.920914] ? __pfx_wake_bit_function+0x10/0x10
[ 679.922944] __rpc_execute+0x109/0x310 [sunrpc]
[ 679.925024] rpc_execute+0x137/0x160 [sunrpc]
[ 679.927020] rpc_run_task+0x107/0x170 [sunrpc]
[ 679.929032] nfs4_call_sync_sequence+0x74/0xc0 [nfsv4]
[ 679.931319] _nfs4_proc_statfs+0xc7/0x100 [nfsv4]
[ 679.933520] ? srso_return_thunk+0x5/0x5f
[ 679.935391] nfs4_proc_statfs+0x6b/0xb0 [nfsv4]
[ 679.937367] nfs_statfs+0x7e/0x1e0 [nfs]
[ 679.939138] statfs_by_dentry+0x67/0xa0
[ 679.940887] vfs_statfs+0x1c/0x40
[ 679.942596] check_free_space+0x71/0x110
[ 679.944433] acct_write_process+0x45/0x180
[ 679.946313] acct_process+0xff/0x180
[ 679.948003] do_exit+0x216/0x480
[ 679.949799] ? srso_return_thunk+0x5/0x5f
[ 679.951621] do_group_exit+0x30/0x80
[ 679.953329] __x64_sys_exit_group+0x18/0x20
[ 679.955217] x64_sys_call+0xfdb/0x14f0
[ 679.956971] do_syscall_64+0x82/0x7a0
[ 679.958717] ? srso_return_thunk+0x5/0x5f
[ 679.960550] ? ___pte_offset_map+0x1b/0x1a0
[ 679.962434] ? srso_return_thunk+0x5/0x5f
[ 679.964261] ? __alloc_frozen_pages_noprof+0x18d/0x340
[ 679.966389] ? srso_return_thunk+0x5/0x5f
[ 679.968183] ? srso_return_thunk+0x5/0x5f
[ 679.969945] ? __mod_memcg_lruvec_state+0xb6/0x1b0
[ 679.971977] ? srso_return_thunk+0x5/0x5f
[ 679.973690] ? __lruvec_stat_mod_folio+0x83/0xd0
[ 679.975671] ? srso_return_thunk+0x5/0x5f
[ 679.977392] ? srso_return_thunk+0x5/0x5f
[ 679.979079] ? set_ptes.isra.0+0x36/0x90
[ 679.980771] ? srso_return_thunk+0x5/0x5f
[ 679.982375] ? srso_return_thunk+0x5/0x5f
[ 679.984052] ? wp_page_copy+0x333/0x730
[ 679.985648] ? srso_return_thunk+0x5/0x5f
[ 679.987220] ? __handle_mm_fault+0x397/0x6f0
[ 679.988818] ? srso_return_thunk+0x5/0x5f
[ 679.990411] ? __count_memcg_events+0xbb/0x150
[ 679.992111] ? srso_return_thunk+0x5/0x5f
[ 679.993689] ? count_memcg_events.constprop.0+0x26/0x50
[ 679.995590] ? srso_return_thunk+0x5/0x5f
[ 679.997177] ? handle_mm_fault+0x245/0x350
[ 679.998807] ? srso_return_thunk+0x5/0x5f
[ 680.000339] ? do_user_addr_fault+0x221/0x690
[ 680.002042] ? srso_return_thunk+0x5/0x5f
[ 680.003553] ? arch_exit_to_user_mode_prepare.isra.0+0x1e/0xd0
[ 680.005643] ? srso_return_thunk+0x5/0x5f
[ 680.007202] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 680.009025] RIP: 0033:0x7fc2560d985d
[ 680.010510] Code: Unable to access opcode bytes at 0x7fc2560d9833.
[ 680.012660] RSP: 002b:00007ffde591df68 EFLAGS: 00000246 ORIG_RAX:
00000000000000e7
[ 680.015355] RAX: ffffffffffffffda RBX: 00007fc2561f59e0 RCX:
00007fc2560d985d
[ 680.017749] RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI:
0000000000000080
[ 680.020292] RBP: 0000000000000080 R08: 0000000000000000 R09:
0000000000000020
[ 680.022729] R10: 00007ffde591de10 R11: 0000000000000246 R12:
00007fc2561f59e0
[ 680.025174] R13: 00007fc2561faf20 R14: 0000000000000001 R15:
00007fc2561faf08
[ 680.027593] </TASK>
[ 680.028661] ---[ end trace 0000000000000000 ]---
Thanks & Regards,
Harshvardhan
>
>> <<<test_start>>>
>> tag=acct02 stime=1753444172
>> cmdline="acct02"
>> contacts=""
>> analysis=exit
>> <<<test_output>>>
>> tst_kconfig.c:88: TINFO: Parsing kernel config
>> '/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
>> tst_tmpdir.c:316: TINFO: Using /tmpdir/ltp-w1ozKKlJ6n/LTP_acc4RRfLh as
>> tmpdir (nfs filesystem)
>> tst_test.c:2004: TINFO: LTP version: 20250530-105-gda73e1527
>> tst_test.c:2007: TINFO: Tested kernel:
>> 6.15.8-1.bug38227970.el9.rc2.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jul 25
>> 02:03:04 PDT 2025 x86_64
>> tst_kconfig.c:88: TINFO: Parsing kernel config
>> '/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
>> tst_test.c:1825: TINFO: Overall timeout per run is 0h 00m 30s
>> tst_kconfig.c:88: TINFO: Parsing kernel config
>> '/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
>> acct02.c:61: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
>> acct02.c:238: TINFO: Verifying using 'struct acct_v3'
>> acct02.c:191: TINFO: == entry 1 ==
>> acct02.c:82: TINFO: ac_comm != 'acct02_helper' ('acct02')
>> acct02.c:131: TINFO: ac_exitcode != 32768 (0)
>> acct02.c:139: TINFO: ac_ppid != 88929 (88928)
>> acct02.c:181: TFAIL: end of file reached
>>
>> HINT: You _MAY_ be missing kernel fixes:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d9570158b626
>>
>> Summary:
>> passed 0
>> failed 1
>> broken 0
>> skipped 0
>> warnings 0
>> incrementing stop
>> <<<execution_status>>>
>> initiation_status="ok"
>> duration=1 termination_type=exited termination_id=1 corefile=no
>> cutime=0 cstime=20
>>
>> <<<test_end>>>
>>
>>
>> Thanks & Regards,
>>
>> Harshvardhan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] SUNRPC: Don't allow waiting for exiting tasks
2025-07-28 8:07 ` Harshvardhan Jha
@ 2025-07-28 9:34 ` NeilBrown
2025-08-04 7:45 ` Harshvardhan Jha
0 siblings, 1 reply; 17+ messages in thread
From: NeilBrown @ 2025-07-28 9:34 UTC (permalink / raw)
To: Harshvardhan Jha
Cc: Mark Brown, trondmy, linux-nfs, Aishwarya.TCV, ltp, Chuck Lever,
Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Anna Schumaker
On Mon, 28 Jul 2025, Harshvardhan Jha wrote:
> On 27/07/25 10:20 AM, NeilBrown wrote:
> > On Fri, 25 Jul 2025, Harshvardhan Jha wrote:
> >> On 23/07/25 1:37 PM, NeilBrown wrote:
> >>> On Wed, 23 Jul 2025, Harshvardhan Jha wrote:
> >>>> On 08/04/25 4:01 PM, Mark Brown wrote:
> >>>>> On Fri, Mar 28, 2025 at 01:40:44PM -0400, trondmy@kernel.org wrote:
> >>>>>> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> >>>>>>
> >>>>>> Once a task calls exit_signals() it can no longer be signalled. So do
> >>>>>> not allow it to do killable waits.
> >>>>> We're seeing the LTP acct02 test failing in kernels with this patch
> >>>>> applied, testing on systems with NFS root filesystems:
> >>>>>
> >>>>> 10271 05:03:09.064993 tst_test.c:1900: TINFO: LTP version: 20250130-1-g60fe84aaf
> >>>>> 10272 05:03:09.076425 tst_test.c:1904: TINFO: Tested kernel: 6.15.0-rc1 #1 SMP PREEMPT Sun Apr 6 21:18:14 UTC 2025 aarch64
> >>>>> 10273 05:03:09.076733 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
> >>>>> 10274 05:03:09.087803 tst_test.c:1722: TINFO: Overall timeout per run is 0h 01m 30s
> >>>>> 10275 05:03:09.088107 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
> >>>>> 10276 05:03:09.093097 acct02.c:63: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
> >>>>> 10277 05:03:09.093400 acct02.c:240: TINFO: Verifying using 'struct acct_v3'
> >>>>> 10278 05:03:10.053504 <6>[ 98.043143] Process accounting resumed
> >>>>> 10279 05:03:10.053935 <6>[ 98.043143] Process accounting resumed
> >>>>> 10280 05:03:10.064653 acct02.c:193: TINFO: == entry 1 ==
> >>>>> 10281 05:03:10.064953 acct02.c:84: TINFO: ac_comm != 'acct02_helper' ('acct02')
> >>>>> 10282 05:03:10.076029 acct02.c:133: TINFO: ac_exitcode != 32768 (0)
> >>>>> 10283 05:03:10.076331 acct02.c:141: TINFO: ac_ppid != 2466 (2461)
> >>> It seems that the acct02 process got logged..
> >>> Maybe the vfork attempt (trying to run acct02_helper) got half way an
> >>> aborted.
> >>> It got far enough that accounting got interested.
> >>> It didn't get far enough to update the ppid.
> >>> I'd be surprised if that were even possible....
> >>>
> >>> If you would like to help debug this, changing the
> >>>
> >>> + if (unlikely(current->flags & PF_EXITING))
> >>>
> >>> to
> >>>
> >>> + if (unlikely(WARN_ON(current->flags & PF_EXITING)))
> >>>
> >>> would provide stack traces so we can wee where -EINTR is actually being
> >>> returned. That should provide some hints.
> >>>
> >>> NeilBrown
> >> Hi Neil,
> >>
> >> Upon this addition I got this in the logs
> > Thanks for testing. Was there anything new in the kernel logs? I was
> > expecting a WARNING message followed by a "Call Trace".
> >
> > If there wasn't, then this patch cannot have caused the problem.
> > If there was, then I need to see it.
> >
> > Thanks,
> > NeilBrown
>
> This is what the dmesg contains:
>
> [ 678.814887] LTP: starting acct02
> [ 679.831232] ------------[ cut here ]------------
> [ 679.833500] WARNING: CPU: 6 PID: 88930 at net/sunrpc/sched.c:279
> rpc_wait_bit_killable+0x76/0x90 [sunrpc]
> [ 679.837308] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs
> netfs rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd
> grace loop nft_redir ipt_REJECT xt_comment xt_owner nft_compat
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib rfkill nft_reject_inet
> nf_reject_
> ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 ip_set cuse vfat fat intel_rapl_msr
> intel_rapl_common kvm_amd ccp kvm drm_shmem_helper irqbypass i2c_piix4
> drm_kms_helper pcspkr pvpanic_mmio i2c_smbus pvpanic drm fuse xfs
> crc32c_generic
> nvme_tcp nvme_fabrics nvme_core nvme_keyring nvme_auth sd_mod
> virtio_net sg net_failover virtio_scsi failover ata_generic pata_acpi
> ata_piix ghash_clmulni_intel libata sha512_ssse3 virtio_pci sha256_ssse3
> virtio_pci_legacy_dev sha1_ssse3 virtio_pci_modern_dev serio_raw
> dm_multipath btrfs
> blake2b_generic xor zstd_compress raid6_pq sunrpc dm_mirror
> dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls
> cxgb3i cxgb3 mdio libcxgbi libcxgb
> [ 679.837524] qla4xxx iscsi_tcp libiscsi_tcp libiscsi
> scsi_transport_iscsi iscsi_ibft iscsi_boot_sysfs qemu_fw_cfg aesni_intel
> crypto_simd cryptd [last unloaded: kheaders]
> [ 679.873316] CPU: 6 UID: 0 PID: 88930 Comm: acct02_helper Kdump:
> loaded Not tainted 6.15.8-1.el9.rc2.x86_64 #1 PREEMPT(voluntary)
> [ 679.877769] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.6.4 02/27/2023
> [ 679.880782] RIP: 0010:rpc_wait_bit_killable+0x76/0x90 [sunrpc]
> [ 679.883189] Code: 01 b8 00 fe ff ff 75 d5 48 8b 85 48 0d 00 00 5b 5d
> 48 c1 e8 08 83 e0 01 f7 d8 19 c0 25 00 fe ff ff 31 d2 31 f6 e9 8a e6 c4
> d4 <0f> 0b b8 fc ff ff ff 5b 5d 31 d2 31 f6 e9 78 e6 c4 d4 0f 1f 84 00
> [ 679.889976] RSP: 0018:ffffaf47811a7770 EFLAGS: 00010202
> [ 679.892196] RAX: ffff97be48e00330 RBX: ffffaf47811a77c0 RCX:
> 0000000000000000
> [ 679.894978] RDX: 0000000000000001 RSI: 0000000000002102 RDI:
> ffffaf47811a77c0
> [ 679.897786] RBP: ffff97be61588000 R08: 0000000000000000 R09:
> 0000000000000000
> [ 679.900600] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000002102
> [ 679.903432] R13: ffffffff96408ea0 R14: ffffaf47811a77d8 R15:
> ffffffffc07568e0
> [ 679.906233] FS: 00007fc2563f8600(0000) GS:ffff97c5c890f000(0000)
> knlGS:0000000000000000
> [ 679.909289] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 679.911736] CR2: 00007fc2561fba70 CR3: 00000003bce3a000 CR4:
> 00000000003506f0
> [ 679.914555] Call Trace:
> [ 679.915918] <TASK>
> [ 679.917215] __wait_on_bit+0x31/0xa0
> [ 679.918932] out_of_line_wait_on_bit+0x93/0xc0
> [ 679.920914] ? __pfx_wake_bit_function+0x10/0x10
> [ 679.922944] __rpc_execute+0x109/0x310 [sunrpc]
> [ 679.925024] rpc_execute+0x137/0x160 [sunrpc]
> [ 679.927020] rpc_run_task+0x107/0x170 [sunrpc]
> [ 679.929032] nfs4_call_sync_sequence+0x74/0xc0 [nfsv4]
> [ 679.931319] _nfs4_proc_statfs+0xc7/0x100 [nfsv4]
> [ 679.933520] ? srso_return_thunk+0x5/0x5f
> [ 679.935391] nfs4_proc_statfs+0x6b/0xb0 [nfsv4]
> [ 679.937367] nfs_statfs+0x7e/0x1e0 [nfs]
> [ 679.939138] statfs_by_dentry+0x67/0xa0
> [ 679.940887] vfs_statfs+0x1c/0x40
> [ 679.942596] check_free_space+0x71/0x110
Thanks. I'm not sure why this causes a problem as if vfs_statfs() fail,
check_free_space() assumes there is still free space.
However it does strongly suggest that we still need to NFS to work in
processes where signals have been shutdown.
Could you change rpc_wait_bit_killable() to be the following and retest?
I intention is that when the process is exiting, we wait up to 5 seconds
for each request and then fail. It's a bit ugly, but it is a rather
strange situation. It blocking forever that we really want to avoid
here, not blocking at all.
Thanks,
NeilBrown
static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
{
if (unlikely(current->flags & PF_EXITING)) {
if (schedule_timeout(5*HZ) > 0)
/* timed out */
return 0;
return -EINTR;
}
schedule();
if (signal_pending_state(mode, current))
return -ERESTARTSYS;
return 0;
}
> [ 679.944433] acct_write_process+0x45/0x180
> [ 679.946313] acct_process+0xff/0x180
> [ 679.948003] do_exit+0x216/0x480
> [ 679.949799] ? srso_return_thunk+0x5/0x5f
> [ 679.951621] do_group_exit+0x30/0x80
> [ 679.953329] __x64_sys_exit_group+0x18/0x20
> [ 679.955217] x64_sys_call+0xfdb/0x14f0
> [ 679.956971] do_syscall_64+0x82/0x7a0
> [ 679.958717] ? srso_return_thunk+0x5/0x5f
> [ 679.960550] ? ___pte_offset_map+0x1b/0x1a0
> [ 679.962434] ? srso_return_thunk+0x5/0x5f
> [ 679.964261] ? __alloc_frozen_pages_noprof+0x18d/0x340
> [ 679.966389] ? srso_return_thunk+0x5/0x5f
> [ 679.968183] ? srso_return_thunk+0x5/0x5f
> [ 679.969945] ? __mod_memcg_lruvec_state+0xb6/0x1b0
> [ 679.971977] ? srso_return_thunk+0x5/0x5f
> [ 679.973690] ? __lruvec_stat_mod_folio+0x83/0xd0
> [ 679.975671] ? srso_return_thunk+0x5/0x5f
> [ 679.977392] ? srso_return_thunk+0x5/0x5f
> [ 679.979079] ? set_ptes.isra.0+0x36/0x90
> [ 679.980771] ? srso_return_thunk+0x5/0x5f
> [ 679.982375] ? srso_return_thunk+0x5/0x5f
> [ 679.984052] ? wp_page_copy+0x333/0x730
> [ 679.985648] ? srso_return_thunk+0x5/0x5f
> [ 679.987220] ? __handle_mm_fault+0x397/0x6f0
> [ 679.988818] ? srso_return_thunk+0x5/0x5f
> [ 679.990411] ? __count_memcg_events+0xbb/0x150
> [ 679.992111] ? srso_return_thunk+0x5/0x5f
> [ 679.993689] ? count_memcg_events.constprop.0+0x26/0x50
> [ 679.995590] ? srso_return_thunk+0x5/0x5f
> [ 679.997177] ? handle_mm_fault+0x245/0x350
> [ 679.998807] ? srso_return_thunk+0x5/0x5f
> [ 680.000339] ? do_user_addr_fault+0x221/0x690
> [ 680.002042] ? srso_return_thunk+0x5/0x5f
> [ 680.003553] ? arch_exit_to_user_mode_prepare.isra.0+0x1e/0xd0
> [ 680.005643] ? srso_return_thunk+0x5/0x5f
> [ 680.007202] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 680.009025] RIP: 0033:0x7fc2560d985d
> [ 680.010510] Code: Unable to access opcode bytes at 0x7fc2560d9833.
> [ 680.012660] RSP: 002b:00007ffde591df68 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000e7
> [ 680.015355] RAX: ffffffffffffffda RBX: 00007fc2561f59e0 RCX:
> 00007fc2560d985d
> [ 680.017749] RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI:
> 0000000000000080
> [ 680.020292] RBP: 0000000000000080 R08: 0000000000000000 R09:
> 0000000000000020
> [ 680.022729] R10: 00007ffde591de10 R11: 0000000000000246 R12:
> 00007fc2561f59e0
> [ 680.025174] R13: 00007fc2561faf20 R14: 0000000000000001 R15:
> 00007fc2561faf08
> [ 680.027593] </TASK>
> [ 680.028661] ---[ end trace 0000000000000000 ]---
>
>
> Thanks & Regards,
> Harshvardhan
>
> >
> >> <<<test_start>>>
> >> tag=acct02 stime=1753444172
> >> cmdline="acct02"
> >> contacts=""
> >> analysis=exit
> >> <<<test_output>>>
> >> tst_kconfig.c:88: TINFO: Parsing kernel config
> >> '/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
> >> tst_tmpdir.c:316: TINFO: Using /tmpdir/ltp-w1ozKKlJ6n/LTP_acc4RRfLh as
> >> tmpdir (nfs filesystem)
> >> tst_test.c:2004: TINFO: LTP version: 20250530-105-gda73e1527
> >> tst_test.c:2007: TINFO: Tested kernel:
> >> 6.15.8-1.bug38227970.el9.rc2.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jul 25
> >> 02:03:04 PDT 2025 x86_64
> >> tst_kconfig.c:88: TINFO: Parsing kernel config
> >> '/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
> >> tst_test.c:1825: TINFO: Overall timeout per run is 0h 00m 30s
> >> tst_kconfig.c:88: TINFO: Parsing kernel config
> >> '/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
> >> acct02.c:61: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
> >> acct02.c:238: TINFO: Verifying using 'struct acct_v3'
> >> acct02.c:191: TINFO: == entry 1 ==
> >> acct02.c:82: TINFO: ac_comm != 'acct02_helper' ('acct02')
> >> acct02.c:131: TINFO: ac_exitcode != 32768 (0)
> >> acct02.c:139: TINFO: ac_ppid != 88929 (88928)
> >> acct02.c:181: TFAIL: end of file reached
> >>
> >> HINT: You _MAY_ be missing kernel fixes:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d9570158b626
> >>
> >> Summary:
> >> passed 0
> >> failed 1
> >> broken 0
> >> skipped 0
> >> warnings 0
> >> incrementing stop
> >> <<<execution_status>>>
> >> initiation_status="ok"
> >> duration=1 termination_type=exited termination_id=1 corefile=no
> >> cutime=0 cstime=20
> >>
> >> <<<test_end>>>
> >>
> >>
> >> Thanks & Regards,
> >>
> >> Harshvardhan
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] SUNRPC: Don't allow waiting for exiting tasks
2025-07-28 9:34 ` NeilBrown
@ 2025-08-04 7:45 ` Harshvardhan Jha
2025-08-06 5:47 ` Harshvardhan Jha
0 siblings, 1 reply; 17+ messages in thread
From: Harshvardhan Jha @ 2025-08-04 7:45 UTC (permalink / raw)
To: NeilBrown
Cc: Mark Brown, trondmy, linux-nfs, Aishwarya.TCV, ltp, Chuck Lever,
Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Anna Schumaker
On 28/07/25 3:04 PM, NeilBrown wrote:
> On Mon, 28 Jul 2025, Harshvardhan Jha wrote:
>> On 27/07/25 10:20 AM, NeilBrown wrote:
>>> On Fri, 25 Jul 2025, Harshvardhan Jha wrote:
>>>> On 23/07/25 1:37 PM, NeilBrown wrote:
>>>>> On Wed, 23 Jul 2025, Harshvardhan Jha wrote:
>>>>>> On 08/04/25 4:01 PM, Mark Brown wrote:
>>>>>>> On Fri, Mar 28, 2025 at 01:40:44PM -0400, trondmy@kernel.org wrote:
>>>>>>>> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>>>>>>>>
>>>>>>>> Once a task calls exit_signals() it can no longer be signalled. So do
>>>>>>>> not allow it to do killable waits.
>>>>>>> We're seeing the LTP acct02 test failing in kernels with this patch
>>>>>>> applied, testing on systems with NFS root filesystems:
>>>>>>>
>>>>>>> 10271 05:03:09.064993 tst_test.c:1900: TINFO: LTP version: 20250130-1-g60fe84aaf
>>>>>>> 10272 05:03:09.076425 tst_test.c:1904: TINFO: Tested kernel: 6.15.0-rc1 #1 SMP PREEMPT Sun Apr 6 21:18:14 UTC 2025 aarch64
>>>>>>> 10273 05:03:09.076733 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
>>>>>>> 10274 05:03:09.087803 tst_test.c:1722: TINFO: Overall timeout per run is 0h 01m 30s
>>>>>>> 10275 05:03:09.088107 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
>>>>>>> 10276 05:03:09.093097 acct02.c:63: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
>>>>>>> 10277 05:03:09.093400 acct02.c:240: TINFO: Verifying using 'struct acct_v3'
>>>>>>> 10278 05:03:10.053504 <6>[ 98.043143] Process accounting resumed
>>>>>>> 10279 05:03:10.053935 <6>[ 98.043143] Process accounting resumed
>>>>>>> 10280 05:03:10.064653 acct02.c:193: TINFO: == entry 1 ==
>>>>>>> 10281 05:03:10.064953 acct02.c:84: TINFO: ac_comm != 'acct02_helper' ('acct02')
>>>>>>> 10282 05:03:10.076029 acct02.c:133: TINFO: ac_exitcode != 32768 (0)
>>>>>>> 10283 05:03:10.076331 acct02.c:141: TINFO: ac_ppid != 2466 (2461)
>>>>> It seems that the acct02 process got logged..
>>>>> Maybe the vfork attempt (trying to run acct02_helper) got half way an
>>>>> aborted.
>>>>> It got far enough that accounting got interested.
>>>>> It didn't get far enough to update the ppid.
>>>>> I'd be surprised if that were even possible....
>>>>>
>>>>> If you would like to help debug this, changing the
>>>>>
>>>>> + if (unlikely(current->flags & PF_EXITING))
>>>>>
>>>>> to
>>>>>
>>>>> + if (unlikely(WARN_ON(current->flags & PF_EXITING)))
>>>>>
>>>>> would provide stack traces so we can wee where -EINTR is actually being
>>>>> returned. That should provide some hints.
>>>>>
>>>>> NeilBrown
>>>> Hi Neil,
>>>>
>>>> Upon this addition I got this in the logs
>>> Thanks for testing. Was there anything new in the kernel logs? I was
>>> expecting a WARNING message followed by a "Call Trace".
>>>
>>> If there wasn't, then this patch cannot have caused the problem.
>>> If there was, then I need to see it.
>>>
>>> Thanks,
>>> NeilBrown
>> This is what the dmesg contains:
>>
>> [ 678.814887] LTP: starting acct02
>> [ 679.831232] ------------[ cut here ]------------
>> [ 679.833500] WARNING: CPU: 6 PID: 88930 at net/sunrpc/sched.c:279
>> rpc_wait_bit_killable+0x76/0x90 [sunrpc]
>> [ 679.837308] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs
>> netfs rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd
>> grace loop nft_redir ipt_REJECT xt_comment xt_owner nft_compat
>> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib rfkill nft_reject_inet
>> nf_reject_
>> ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
>> nf_defrag_ipv6 nf_defrag_ipv4 ip_set cuse vfat fat intel_rapl_msr
>> intel_rapl_common kvm_amd ccp kvm drm_shmem_helper irqbypass i2c_piix4
>> drm_kms_helper pcspkr pvpanic_mmio i2c_smbus pvpanic drm fuse xfs
>> crc32c_generic
>> nvme_tcp nvme_fabrics nvme_core nvme_keyring nvme_auth sd_mod
>> virtio_net sg net_failover virtio_scsi failover ata_generic pata_acpi
>> ata_piix ghash_clmulni_intel libata sha512_ssse3 virtio_pci sha256_ssse3
>> virtio_pci_legacy_dev sha1_ssse3 virtio_pci_modern_dev serio_raw
>> dm_multipath btrfs
>> blake2b_generic xor zstd_compress raid6_pq sunrpc dm_mirror
>> dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls
>> cxgb3i cxgb3 mdio libcxgbi libcxgb
>> [ 679.837524] qla4xxx iscsi_tcp libiscsi_tcp libiscsi
>> scsi_transport_iscsi iscsi_ibft iscsi_boot_sysfs qemu_fw_cfg aesni_intel
>> crypto_simd cryptd [last unloaded: kheaders]
>> [ 679.873316] CPU: 6 UID: 0 PID: 88930 Comm: acct02_helper Kdump:
>> loaded Not tainted 6.15.8-1.el9.rc2.x86_64 #1 PREEMPT(voluntary)
>> [ 679.877769] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS 1.6.4 02/27/2023
>> [ 679.880782] RIP: 0010:rpc_wait_bit_killable+0x76/0x90 [sunrpc]
>> [ 679.883189] Code: 01 b8 00 fe ff ff 75 d5 48 8b 85 48 0d 00 00 5b 5d
>> 48 c1 e8 08 83 e0 01 f7 d8 19 c0 25 00 fe ff ff 31 d2 31 f6 e9 8a e6 c4
>> d4 <0f> 0b b8 fc ff ff ff 5b 5d 31 d2 31 f6 e9 78 e6 c4 d4 0f 1f 84 00
>> [ 679.889976] RSP: 0018:ffffaf47811a7770 EFLAGS: 00010202
>> [ 679.892196] RAX: ffff97be48e00330 RBX: ffffaf47811a77c0 RCX:
>> 0000000000000000
>> [ 679.894978] RDX: 0000000000000001 RSI: 0000000000002102 RDI:
>> ffffaf47811a77c0
>> [ 679.897786] RBP: ffff97be61588000 R08: 0000000000000000 R09:
>> 0000000000000000
>> [ 679.900600] R10: 0000000000000000 R11: 0000000000000000 R12:
>> 0000000000002102
>> [ 679.903432] R13: ffffffff96408ea0 R14: ffffaf47811a77d8 R15:
>> ffffffffc07568e0
>> [ 679.906233] FS: 00007fc2563f8600(0000) GS:ffff97c5c890f000(0000)
>> knlGS:0000000000000000
>> [ 679.909289] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 679.911736] CR2: 00007fc2561fba70 CR3: 00000003bce3a000 CR4:
>> 00000000003506f0
>> [ 679.914555] Call Trace:
>> [ 679.915918] <TASK>
>> [ 679.917215] __wait_on_bit+0x31/0xa0
>> [ 679.918932] out_of_line_wait_on_bit+0x93/0xc0
>> [ 679.920914] ? __pfx_wake_bit_function+0x10/0x10
>> [ 679.922944] __rpc_execute+0x109/0x310 [sunrpc]
>> [ 679.925024] rpc_execute+0x137/0x160 [sunrpc]
>> [ 679.927020] rpc_run_task+0x107/0x170 [sunrpc]
>> [ 679.929032] nfs4_call_sync_sequence+0x74/0xc0 [nfsv4]
>> [ 679.931319] _nfs4_proc_statfs+0xc7/0x100 [nfsv4]
>> [ 679.933520] ? srso_return_thunk+0x5/0x5f
>> [ 679.935391] nfs4_proc_statfs+0x6b/0xb0 [nfsv4]
>> [ 679.937367] nfs_statfs+0x7e/0x1e0 [nfs]
>> [ 679.939138] statfs_by_dentry+0x67/0xa0
>> [ 679.940887] vfs_statfs+0x1c/0x40
>> [ 679.942596] check_free_space+0x71/0x110
> Thanks. I'm not sure why this causes a problem as if vfs_statfs() fail,
> check_free_space() assumes there is still free space.
> However it does strongly suggest that we still need to NFS to work in
> processes where signals have been shutdown.
>
> Could you change rpc_wait_bit_killable() to be the following and retest?
> I intention is that when the process is exiting, we wait up to 5 seconds
> for each request and then fail. It's a bit ugly, but it is a rather
> strange situation. It blocking forever that we really want to avoid
> here, not blocking at all.
>
> Thanks,
> NeilBrown
>
>
> static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
> {
> if (unlikely(current->flags & PF_EXITING)) {
> if (schedule_timeout(5*HZ) > 0)
> /* timed out */
> return 0;
> return -EINTR;
> }
> schedule();
> if (signal_pending_state(mode, current))
> return -ERESTARTSYS;
> return 0;
> }
Adding this change makes the test pass:
<<<test_start>>>
tag=acct02 stime=1754066481
cmdline="acct02"
contacts=""
analysis=exit
<<<test_output>>>
tst_kconfig.c:88: TINFO: Parsing kernel config '/lib/modules/6.15.8-master.sunrpc.el9.rc3.x86_64/config'
tst_tmpdir.c:316: TINFO: Using /tmpdir/ltp-lNzAk1qhuX/LTP_accZ75zl1 as tmpdir (nfs filesystem)
tst_test.c:2004: TINFO: LTP version: 20250530-128-g6505f9e29
tst_test.c:2007: TINFO: Tested kernel: 6.15.8-master.sunrpc.el9.rc3.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Jul 29 05:06:28 PDT 2025 x86_64
tst_kconfig.c:88: TINFO: Parsing kernel config '/lib/modules/6.15.8-master.sunrpc.el9.rc3.x86_64/config'
tst_test.c:1825: TINFO: Overall timeout per run is 0h 00m 30s
tst_kconfig.c:88: TINFO: Parsing kernel config '/lib/modules/6.15.8-master.sunrpc.el9.rc3.x86_64/config'
acct02.c:61: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
acct02.c:238: TINFO: Verifying using 'struct acct_v3'
acct02.c:191: TINFO: == entry 1 ==
acct02.c:82: TINFO: ac_comm != 'acct02_helper' ('iscsiadm')
acct02.c:131: TINFO: ac_exitcode != 32768 (5376)
acct02.c:139: TINFO: ac_ppid != 52326 (2475)
acct02.c:191: TINFO: == entry 2 ==
acct02.c:82: TINFO: ac_comm != 'acct02_helper' ('systemd')
acct02.c:125: TINFO: elap_time/clock_ticks >= 2 (1065/100: 10.00)
acct02.c:131: TINFO: ac_exitcode != 32768 (0)
acct02.c:139: TINFO: ac_ppid != 52326 (1)
acct02.c:191: TINFO: == entry 3 ==
acct02.c:82: TINFO: ac_comm != 'acct02_helper' ('(sd-pam)')
acct02.c:125: TINFO: elap_time/clock_ticks >= 2 (1062/100: 10.00)
acct02.c:131: TINFO: ac_exitcode != 32768 (9)
acct02.c:139: TINFO: ac_ppid != 52326 (1)
acct02.c:191: TINFO: == entry 4 ==
acct02.c:82: TINFO: ac_comm != 'acct02_helper' ('systemd-user-ru')
acct02.c:131: TINFO: ac_exitcode != 32768 (0)
acct02.c:139: TINFO: ac_ppid != 52326 (1)
acct02.c:191: TINFO: == entry 5 ==
acct02.c:202: TINFO: Number of accounting file entries tested: 5
acct02.c:208: TPASS: acct() wrote correct file contents!
Summary:
passed 1
failed 0
broken 0
skipped 0
warnings 0
incrementing stop
<<<execution_status>>>
initiation_status="ok"
duration=1 termination_type=exited termination_id=0 corefile=no
cutime=0 cstime=0
<<<test_end>>>
Thanks & Regards,
Harshvardhan
>
>> [ 679.944433] acct_write_process+0x45/0x180
>> [ 679.946313] acct_process+0xff/0x180
>> [ 679.948003] do_exit+0x216/0x480
>> [ 679.949799] ? srso_return_thunk+0x5/0x5f
>> [ 679.951621] do_group_exit+0x30/0x80
>> [ 679.953329] __x64_sys_exit_group+0x18/0x20
>> [ 679.955217] x64_sys_call+0xfdb/0x14f0
>> [ 679.956971] do_syscall_64+0x82/0x7a0
>> [ 679.958717] ? srso_return_thunk+0x5/0x5f
>> [ 679.960550] ? ___pte_offset_map+0x1b/0x1a0
>> [ 679.962434] ? srso_return_thunk+0x5/0x5f
>> [ 679.964261] ? __alloc_frozen_pages_noprof+0x18d/0x340
>> [ 679.966389] ? srso_return_thunk+0x5/0x5f
>> [ 679.968183] ? srso_return_thunk+0x5/0x5f
>> [ 679.969945] ? __mod_memcg_lruvec_state+0xb6/0x1b0
>> [ 679.971977] ? srso_return_thunk+0x5/0x5f
>> [ 679.973690] ? __lruvec_stat_mod_folio+0x83/0xd0
>> [ 679.975671] ? srso_return_thunk+0x5/0x5f
>> [ 679.977392] ? srso_return_thunk+0x5/0x5f
>> [ 679.979079] ? set_ptes.isra.0+0x36/0x90
>> [ 679.980771] ? srso_return_thunk+0x5/0x5f
>> [ 679.982375] ? srso_return_thunk+0x5/0x5f
>> [ 679.984052] ? wp_page_copy+0x333/0x730
>> [ 679.985648] ? srso_return_thunk+0x5/0x5f
>> [ 679.987220] ? __handle_mm_fault+0x397/0x6f0
>> [ 679.988818] ? srso_return_thunk+0x5/0x5f
>> [ 679.990411] ? __count_memcg_events+0xbb/0x150
>> [ 679.992111] ? srso_return_thunk+0x5/0x5f
>> [ 679.993689] ? count_memcg_events.constprop.0+0x26/0x50
>> [ 679.995590] ? srso_return_thunk+0x5/0x5f
>> [ 679.997177] ? handle_mm_fault+0x245/0x350
>> [ 679.998807] ? srso_return_thunk+0x5/0x5f
>> [ 680.000339] ? do_user_addr_fault+0x221/0x690
>> [ 680.002042] ? srso_return_thunk+0x5/0x5f
>> [ 680.003553] ? arch_exit_to_user_mode_prepare.isra.0+0x1e/0xd0
>> [ 680.005643] ? srso_return_thunk+0x5/0x5f
>> [ 680.007202] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> [ 680.009025] RIP: 0033:0x7fc2560d985d
>> [ 680.010510] Code: Unable to access opcode bytes at 0x7fc2560d9833.
>> [ 680.012660] RSP: 002b:00007ffde591df68 EFLAGS: 00000246 ORIG_RAX:
>> 00000000000000e7
>> [ 680.015355] RAX: ffffffffffffffda RBX: 00007fc2561f59e0 RCX:
>> 00007fc2560d985d
>> [ 680.017749] RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI:
>> 0000000000000080
>> [ 680.020292] RBP: 0000000000000080 R08: 0000000000000000 R09:
>> 0000000000000020
>> [ 680.022729] R10: 00007ffde591de10 R11: 0000000000000246 R12:
>> 00007fc2561f59e0
>> [ 680.025174] R13: 00007fc2561faf20 R14: 0000000000000001 R15:
>> 00007fc2561faf08
>> [ 680.027593] </TASK>
>> [ 680.028661] ---[ end trace 0000000000000000 ]---
>>
>>
>> Thanks & Regards,
>> Harshvardhan
>>
>>>> <<<test_start>>>
>>>> tag=acct02 stime=1753444172
>>>> cmdline="acct02"
>>>> contacts=""
>>>> analysis=exit
>>>> <<<test_output>>>
>>>> tst_kconfig.c:88: TINFO: Parsing kernel config
>>>> '/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
>>>> tst_tmpdir.c:316: TINFO: Using /tmpdir/ltp-w1ozKKlJ6n/LTP_acc4RRfLh as
>>>> tmpdir (nfs filesystem)
>>>> tst_test.c:2004: TINFO: LTP version: 20250530-105-gda73e1527
>>>> tst_test.c:2007: TINFO: Tested kernel:
>>>> 6.15.8-1.bug38227970.el9.rc2.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jul 25
>>>> 02:03:04 PDT 2025 x86_64
>>>> tst_kconfig.c:88: TINFO: Parsing kernel config
>>>> '/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
>>>> tst_test.c:1825: TINFO: Overall timeout per run is 0h 00m 30s
>>>> tst_kconfig.c:88: TINFO: Parsing kernel config
>>>> '/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
>>>> acct02.c:61: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
>>>> acct02.c:238: TINFO: Verifying using 'struct acct_v3'
>>>> acct02.c:191: TINFO: == entry 1 ==
>>>> acct02.c:82: TINFO: ac_comm != 'acct02_helper' ('acct02')
>>>> acct02.c:131: TINFO: ac_exitcode != 32768 (0)
>>>> acct02.c:139: TINFO: ac_ppid != 88929 (88928)
>>>> acct02.c:181: TFAIL: end of file reached
>>>>
>>>> HINT: You _MAY_ be missing kernel fixes:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d9570158b626
>>>>
>>>> Summary:
>>>> passed 0
>>>> failed 1
>>>> broken 0
>>>> skipped 0
>>>> warnings 0
>>>> incrementing stop
>>>> <<<execution_status>>>
>>>> initiation_status="ok"
>>>> duration=1 termination_type=exited termination_id=1 corefile=no
>>>> cutime=0 cstime=20
>>>>
>>>> <<<test_end>>>
>>>>
>>>>
>>>> Thanks & Regards,
>>>>
>>>> Harshvardhan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] SUNRPC: Don't allow waiting for exiting tasks
2025-08-04 7:45 ` Harshvardhan Jha
@ 2025-08-06 5:47 ` Harshvardhan Jha
2025-08-19 10:06 ` Harshvardhan Jha
0 siblings, 1 reply; 17+ messages in thread
From: Harshvardhan Jha @ 2025-08-06 5:47 UTC (permalink / raw)
To: NeilBrown
Cc: Mark Brown, trondmy, linux-nfs, Aishwarya.TCV, ltp, Chuck Lever,
Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Anna Schumaker
On 04/08/25 1:15 PM, Harshvardhan Jha wrote:
> On 28/07/25 3:04 PM, NeilBrown wrote:
>> On Mon, 28 Jul 2025, Harshvardhan Jha wrote:
>>> On 27/07/25 10:20 AM, NeilBrown wrote:
>>>> On Fri, 25 Jul 2025, Harshvardhan Jha wrote:
>>>>> On 23/07/25 1:37 PM, NeilBrown wrote:
>>>>>> On Wed, 23 Jul 2025, Harshvardhan Jha wrote:
>>>>>>> On 08/04/25 4:01 PM, Mark Brown wrote:
>>>>>>>> On Fri, Mar 28, 2025 at 01:40:44PM -0400, trondmy@kernel.org wrote:
>>>>>>>>> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>>>>>>>>>
>>>>>>>>> Once a task calls exit_signals() it can no longer be signalled. So do
>>>>>>>>> not allow it to do killable waits.
>>>>>>>> We're seeing the LTP acct02 test failing in kernels with this patch
>>>>>>>> applied, testing on systems with NFS root filesystems:
>>>>>>>>
>>>>>>>> 10271 05:03:09.064993 tst_test.c:1900: TINFO: LTP version: 20250130-1-g60fe84aaf
>>>>>>>> 10272 05:03:09.076425 tst_test.c:1904: TINFO: Tested kernel: 6.15.0-rc1 #1 SMP PREEMPT Sun Apr 6 21:18:14 UTC 2025 aarch64
>>>>>>>> 10273 05:03:09.076733 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
>>>>>>>> 10274 05:03:09.087803 tst_test.c:1722: TINFO: Overall timeout per run is 0h 01m 30s
>>>>>>>> 10275 05:03:09.088107 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
>>>>>>>> 10276 05:03:09.093097 acct02.c:63: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
>>>>>>>> 10277 05:03:09.093400 acct02.c:240: TINFO: Verifying using 'struct acct_v3'
>>>>>>>> 10278 05:03:10.053504 <6>[ 98.043143] Process accounting resumed
>>>>>>>> 10279 05:03:10.053935 <6>[ 98.043143] Process accounting resumed
>>>>>>>> 10280 05:03:10.064653 acct02.c:193: TINFO: == entry 1 ==
>>>>>>>> 10281 05:03:10.064953 acct02.c:84: TINFO: ac_comm != 'acct02_helper' ('acct02')
>>>>>>>> 10282 05:03:10.076029 acct02.c:133: TINFO: ac_exitcode != 32768 (0)
>>>>>>>> 10283 05:03:10.076331 acct02.c:141: TINFO: ac_ppid != 2466 (2461)
>>>>>> It seems that the acct02 process got logged..
>>>>>> Maybe the vfork attempt (trying to run acct02_helper) got half way an
>>>>>> aborted.
>>>>>> It got far enough that accounting got interested.
>>>>>> It didn't get far enough to update the ppid.
>>>>>> I'd be surprised if that were even possible....
>>>>>>
>>>>>> If you would like to help debug this, changing the
>>>>>>
>>>>>> + if (unlikely(current->flags & PF_EXITING))
>>>>>>
>>>>>> to
>>>>>>
>>>>>> + if (unlikely(WARN_ON(current->flags & PF_EXITING)))
>>>>>>
>>>>>> would provide stack traces so we can wee where -EINTR is actually being
>>>>>> returned. That should provide some hints.
>>>>>>
>>>>>> NeilBrown
>>>>> Hi Neil,
>>>>>
>>>>> Upon this addition I got this in the logs
>>>> Thanks for testing. Was there anything new in the kernel logs? I was
>>>> expecting a WARNING message followed by a "Call Trace".
>>>>
>>>> If there wasn't, then this patch cannot have caused the problem.
>>>> If there was, then I need to see it.
>>>>
>>>> Thanks,
>>>> NeilBrown
>>> This is what the dmesg contains:
>>>
>>> [ 678.814887] LTP: starting acct02
>>> [ 679.831232] ------------[ cut here ]------------
>>> [ 679.833500] WARNING: CPU: 6 PID: 88930 at net/sunrpc/sched.c:279
>>> rpc_wait_bit_killable+0x76/0x90 [sunrpc]
>>> [ 679.837308] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs
>>> netfs rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd
>>> grace loop nft_redir ipt_REJECT xt_comment xt_owner nft_compat
>>> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib rfkill nft_reject_inet
>>> nf_reject_
>>> ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
>>> nf_defrag_ipv6 nf_defrag_ipv4 ip_set cuse vfat fat intel_rapl_msr
>>> intel_rapl_common kvm_amd ccp kvm drm_shmem_helper irqbypass i2c_piix4
>>> drm_kms_helper pcspkr pvpanic_mmio i2c_smbus pvpanic drm fuse xfs
>>> crc32c_generic
>>> nvme_tcp nvme_fabrics nvme_core nvme_keyring nvme_auth sd_mod
>>> virtio_net sg net_failover virtio_scsi failover ata_generic pata_acpi
>>> ata_piix ghash_clmulni_intel libata sha512_ssse3 virtio_pci sha256_ssse3
>>> virtio_pci_legacy_dev sha1_ssse3 virtio_pci_modern_dev serio_raw
>>> dm_multipath btrfs
>>> blake2b_generic xor zstd_compress raid6_pq sunrpc dm_mirror
>>> dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls
>>> cxgb3i cxgb3 mdio libcxgbi libcxgb
>>> [ 679.837524] qla4xxx iscsi_tcp libiscsi_tcp libiscsi
>>> scsi_transport_iscsi iscsi_ibft iscsi_boot_sysfs qemu_fw_cfg aesni_intel
>>> crypto_simd cryptd [last unloaded: kheaders]
>>> [ 679.873316] CPU: 6 UID: 0 PID: 88930 Comm: acct02_helper Kdump:
>>> loaded Not tainted 6.15.8-1.el9.rc2.x86_64 #1 PREEMPT(voluntary)
>>> [ 679.877769] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>>> BIOS 1.6.4 02/27/2023
>>> [ 679.880782] RIP: 0010:rpc_wait_bit_killable+0x76/0x90 [sunrpc]
>>> [ 679.883189] Code: 01 b8 00 fe ff ff 75 d5 48 8b 85 48 0d 00 00 5b 5d
>>> 48 c1 e8 08 83 e0 01 f7 d8 19 c0 25 00 fe ff ff 31 d2 31 f6 e9 8a e6 c4
>>> d4 <0f> 0b b8 fc ff ff ff 5b 5d 31 d2 31 f6 e9 78 e6 c4 d4 0f 1f 84 00
>>> [ 679.889976] RSP: 0018:ffffaf47811a7770 EFLAGS: 00010202
>>> [ 679.892196] RAX: ffff97be48e00330 RBX: ffffaf47811a77c0 RCX:
>>> 0000000000000000
>>> [ 679.894978] RDX: 0000000000000001 RSI: 0000000000002102 RDI:
>>> ffffaf47811a77c0
>>> [ 679.897786] RBP: ffff97be61588000 R08: 0000000000000000 R09:
>>> 0000000000000000
>>> [ 679.900600] R10: 0000000000000000 R11: 0000000000000000 R12:
>>> 0000000000002102
>>> [ 679.903432] R13: ffffffff96408ea0 R14: ffffaf47811a77d8 R15:
>>> ffffffffc07568e0
>>> [ 679.906233] FS: 00007fc2563f8600(0000) GS:ffff97c5c890f000(0000)
>>> knlGS:0000000000000000
>>> [ 679.909289] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 679.911736] CR2: 00007fc2561fba70 CR3: 00000003bce3a000 CR4:
>>> 00000000003506f0
>>> [ 679.914555] Call Trace:
>>> [ 679.915918] <TASK>
>>> [ 679.917215] __wait_on_bit+0x31/0xa0
>>> [ 679.918932] out_of_line_wait_on_bit+0x93/0xc0
>>> [ 679.920914] ? __pfx_wake_bit_function+0x10/0x10
>>> [ 679.922944] __rpc_execute+0x109/0x310 [sunrpc]
>>> [ 679.925024] rpc_execute+0x137/0x160 [sunrpc]
>>> [ 679.927020] rpc_run_task+0x107/0x170 [sunrpc]
>>> [ 679.929032] nfs4_call_sync_sequence+0x74/0xc0 [nfsv4]
>>> [ 679.931319] _nfs4_proc_statfs+0xc7/0x100 [nfsv4]
>>> [ 679.933520] ? srso_return_thunk+0x5/0x5f
>>> [ 679.935391] nfs4_proc_statfs+0x6b/0xb0 [nfsv4]
>>> [ 679.937367] nfs_statfs+0x7e/0x1e0 [nfs]
>>> [ 679.939138] statfs_by_dentry+0x67/0xa0
>>> [ 679.940887] vfs_statfs+0x1c/0x40
>>> [ 679.942596] check_free_space+0x71/0x110
>> Thanks. I'm not sure why this causes a problem as if vfs_statfs() fail,
>> check_free_space() assumes there is still free space.
>> However it does strongly suggest that we still need to NFS to work in
>> processes where signals have been shutdown.
>>
>> Could you change rpc_wait_bit_killable() to be the following and retest?
>> I intention is that when the process is exiting, we wait up to 5 seconds
>> for each request and then fail. It's a bit ugly, but it is a rather
>> strange situation. It blocking forever that we really want to avoid
>> here, not blocking at all.
>>
>> Thanks,
>> NeilBrown
>>
>>
>> static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
>> {
>> if (unlikely(current->flags & PF_EXITING)) {
>> if (schedule_timeout(5*HZ) > 0)
>> /* timed out */
>> return 0;
>> return -EINTR;
>> }
>> schedule();
>> if (signal_pending_state(mode, current))
>> return -ERESTARTSYS;
>> return 0;
>> }
> Adding this change makes the test pass:
>
> <<<test_start>>>
> tag=acct02 stime=1754066481
> cmdline="acct02"
> contacts=""
> analysis=exit
> <<<test_output>>>
> tst_kconfig.c:88: TINFO: Parsing kernel config '/lib/modules/6.15.8-master.sunrpc.el9.rc3.x86_64/config'
> tst_tmpdir.c:316: TINFO: Using /tmpdir/ltp-lNzAk1qhuX/LTP_accZ75zl1 as tmpdir (nfs filesystem)
> tst_test.c:2004: TINFO: LTP version: 20250530-128-g6505f9e29
> tst_test.c:2007: TINFO: Tested kernel: 6.15.8-master.sunrpc.el9.rc3.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Jul 29 05:06:28 PDT 2025 x86_64
> tst_kconfig.c:88: TINFO: Parsing kernel config '/lib/modules/6.15.8-master.sunrpc.el9.rc3.x86_64/config'
> tst_test.c:1825: TINFO: Overall timeout per run is 0h 00m 30s
> tst_kconfig.c:88: TINFO: Parsing kernel config '/lib/modules/6.15.8-master.sunrpc.el9.rc3.x86_64/config'
> acct02.c:61: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
> acct02.c:238: TINFO: Verifying using 'struct acct_v3'
> acct02.c:191: TINFO: == entry 1 ==
> acct02.c:82: TINFO: ac_comm != 'acct02_helper' ('iscsiadm')
> acct02.c:131: TINFO: ac_exitcode != 32768 (5376)
> acct02.c:139: TINFO: ac_ppid != 52326 (2475)
> acct02.c:191: TINFO: == entry 2 ==
> acct02.c:82: TINFO: ac_comm != 'acct02_helper' ('systemd')
> acct02.c:125: TINFO: elap_time/clock_ticks >= 2 (1065/100: 10.00)
> acct02.c:131: TINFO: ac_exitcode != 32768 (0)
> acct02.c:139: TINFO: ac_ppid != 52326 (1)
> acct02.c:191: TINFO: == entry 3 ==
> acct02.c:82: TINFO: ac_comm != 'acct02_helper' ('(sd-pam)')
> acct02.c:125: TINFO: elap_time/clock_ticks >= 2 (1062/100: 10.00)
> acct02.c:131: TINFO: ac_exitcode != 32768 (9)
> acct02.c:139: TINFO: ac_ppid != 52326 (1)
> acct02.c:191: TINFO: == entry 4 ==
> acct02.c:82: TINFO: ac_comm != 'acct02_helper' ('systemd-user-ru')
> acct02.c:131: TINFO: ac_exitcode != 32768 (0)
> acct02.c:139: TINFO: ac_ppid != 52326 (1)
> acct02.c:191: TINFO: == entry 5 ==
> acct02.c:202: TINFO: Number of accounting file entries tested: 5
> acct02.c:208: TPASS: acct() wrote correct file contents!
>
> Summary:
> passed 1
> failed 0
> broken 0
> skipped 0
> warnings 0
> incrementing stop
> <<<execution_status>>>
> initiation_status="ok"
> duration=1 termination_type=exited termination_id=0 corefile=no
> cutime=0 cstime=0
> <<<test_end>>>
>
> Thanks & Regards,
> Harshvardhan
Hi there,
I tested this around 50 iterations and it passes all 50 times with this
timeout.
Thanks & Regards,
Harshvardhan
>
>
>>> [ 679.944433] acct_write_process+0x45/0x180
>>> [ 679.946313] acct_process+0xff/0x180
>>> [ 679.948003] do_exit+0x216/0x480
>>> [ 679.949799] ? srso_return_thunk+0x5/0x5f
>>> [ 679.951621] do_group_exit+0x30/0x80
>>> [ 679.953329] __x64_sys_exit_group+0x18/0x20
>>> [ 679.955217] x64_sys_call+0xfdb/0x14f0
>>> [ 679.956971] do_syscall_64+0x82/0x7a0
>>> [ 679.958717] ? srso_return_thunk+0x5/0x5f
>>> [ 679.960550] ? ___pte_offset_map+0x1b/0x1a0
>>> [ 679.962434] ? srso_return_thunk+0x5/0x5f
>>> [ 679.964261] ? __alloc_frozen_pages_noprof+0x18d/0x340
>>> [ 679.966389] ? srso_return_thunk+0x5/0x5f
>>> [ 679.968183] ? srso_return_thunk+0x5/0x5f
>>> [ 679.969945] ? __mod_memcg_lruvec_state+0xb6/0x1b0
>>> [ 679.971977] ? srso_return_thunk+0x5/0x5f
>>> [ 679.973690] ? __lruvec_stat_mod_folio+0x83/0xd0
>>> [ 679.975671] ? srso_return_thunk+0x5/0x5f
>>> [ 679.977392] ? srso_return_thunk+0x5/0x5f
>>> [ 679.979079] ? set_ptes.isra.0+0x36/0x90
>>> [ 679.980771] ? srso_return_thunk+0x5/0x5f
>>> [ 679.982375] ? srso_return_thunk+0x5/0x5f
>>> [ 679.984052] ? wp_page_copy+0x333/0x730
>>> [ 679.985648] ? srso_return_thunk+0x5/0x5f
>>> [ 679.987220] ? __handle_mm_fault+0x397/0x6f0
>>> [ 679.988818] ? srso_return_thunk+0x5/0x5f
>>> [ 679.990411] ? __count_memcg_events+0xbb/0x150
>>> [ 679.992111] ? srso_return_thunk+0x5/0x5f
>>> [ 679.993689] ? count_memcg_events.constprop.0+0x26/0x50
>>> [ 679.995590] ? srso_return_thunk+0x5/0x5f
>>> [ 679.997177] ? handle_mm_fault+0x245/0x350
>>> [ 679.998807] ? srso_return_thunk+0x5/0x5f
>>> [ 680.000339] ? do_user_addr_fault+0x221/0x690
>>> [ 680.002042] ? srso_return_thunk+0x5/0x5f
>>> [ 680.003553] ? arch_exit_to_user_mode_prepare.isra.0+0x1e/0xd0
>>> [ 680.005643] ? srso_return_thunk+0x5/0x5f
>>> [ 680.007202] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>> [ 680.009025] RIP: 0033:0x7fc2560d985d
>>> [ 680.010510] Code: Unable to access opcode bytes at 0x7fc2560d9833.
>>> [ 680.012660] RSP: 002b:00007ffde591df68 EFLAGS: 00000246 ORIG_RAX:
>>> 00000000000000e7
>>> [ 680.015355] RAX: ffffffffffffffda RBX: 00007fc2561f59e0 RCX:
>>> 00007fc2560d985d
>>> [ 680.017749] RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI:
>>> 0000000000000080
>>> [ 680.020292] RBP: 0000000000000080 R08: 0000000000000000 R09:
>>> 0000000000000020
>>> [ 680.022729] R10: 00007ffde591de10 R11: 0000000000000246 R12:
>>> 00007fc2561f59e0
>>> [ 680.025174] R13: 00007fc2561faf20 R14: 0000000000000001 R15:
>>> 00007fc2561faf08
>>> [ 680.027593] </TASK>
>>> [ 680.028661] ---[ end trace 0000000000000000 ]---
>>>
>>>
>>> Thanks & Regards,
>>> Harshvardhan
>>>
>>>>> <<<test_start>>>
>>>>> tag=acct02 stime=1753444172
>>>>> cmdline="acct02"
>>>>> contacts=""
>>>>> analysis=exit
>>>>> <<<test_output>>>
>>>>> tst_kconfig.c:88: TINFO: Parsing kernel config
>>>>> '/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
>>>>> tst_tmpdir.c:316: TINFO: Using /tmpdir/ltp-w1ozKKlJ6n/LTP_acc4RRfLh as
>>>>> tmpdir (nfs filesystem)
>>>>> tst_test.c:2004: TINFO: LTP version: 20250530-105-gda73e1527
>>>>> tst_test.c:2007: TINFO: Tested kernel:
>>>>> 6.15.8-1.bug38227970.el9.rc2.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jul 25
>>>>> 02:03:04 PDT 2025 x86_64
>>>>> tst_kconfig.c:88: TINFO: Parsing kernel config
>>>>> '/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
>>>>> tst_test.c:1825: TINFO: Overall timeout per run is 0h 00m 30s
>>>>> tst_kconfig.c:88: TINFO: Parsing kernel config
>>>>> '/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
>>>>> acct02.c:61: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
>>>>> acct02.c:238: TINFO: Verifying using 'struct acct_v3'
>>>>> acct02.c:191: TINFO: == entry 1 ==
>>>>> acct02.c:82: TINFO: ac_comm != 'acct02_helper' ('acct02')
>>>>> acct02.c:131: TINFO: ac_exitcode != 32768 (0)
>>>>> acct02.c:139: TINFO: ac_ppid != 88929 (88928)
>>>>> acct02.c:181: TFAIL: end of file reached
>>>>>
>>>>> HINT: You _MAY_ be missing kernel fixes:
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d9570158b626
>>>>>
>>>>> Summary:
>>>>> passed 0
>>>>> failed 1
>>>>> broken 0
>>>>> skipped 0
>>>>> warnings 0
>>>>> incrementing stop
>>>>> <<<execution_status>>>
>>>>> initiation_status="ok"
>>>>> duration=1 termination_type=exited termination_id=1 corefile=no
>>>>> cutime=0 cstime=20
>>>>>
>>>>> <<<test_end>>>
>>>>>
>>>>>
>>>>> Thanks & Regards,
>>>>>
>>>>> Harshvardhan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] SUNRPC: Don't allow waiting for exiting tasks
2025-08-06 5:47 ` Harshvardhan Jha
@ 2025-08-19 10:06 ` Harshvardhan Jha
0 siblings, 0 replies; 17+ messages in thread
From: Harshvardhan Jha @ 2025-08-19 10:06 UTC (permalink / raw)
To: NeilBrown
Cc: Mark Brown, trondmy, linux-nfs, Aishwarya.TCV, ltp, Chuck Lever,
Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Anna Schumaker
On 06/08/25 11:17 AM, Harshvardhan Jha wrote:
> On 04/08/25 1:15 PM, Harshvardhan Jha wrote:
>> On 28/07/25 3:04 PM, NeilBrown wrote:
>>> On Mon, 28 Jul 2025, Harshvardhan Jha wrote:
>>>> On 27/07/25 10:20 AM, NeilBrown wrote:
>>>>> On Fri, 25 Jul 2025, Harshvardhan Jha wrote:
>>>>>> On 23/07/25 1:37 PM, NeilBrown wrote:
>>>>>>> On Wed, 23 Jul 2025, Harshvardhan Jha wrote:
>>>>>>>> On 08/04/25 4:01 PM, Mark Brown wrote:
>>>>>>>>> On Fri, Mar 28, 2025 at 01:40:44PM -0400, trondmy@kernel.org wrote:
>>>>>>>>>> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>>>>>>>>>>
>>>>>>>>>> Once a task calls exit_signals() it can no longer be signalled. So do
>>>>>>>>>> not allow it to do killable waits.
>>>>>>>>> We're seeing the LTP acct02 test failing in kernels with this patch
>>>>>>>>> applied, testing on systems with NFS root filesystems:
>>>>>>>>>
>>>>>>>>> 10271 05:03:09.064993 tst_test.c:1900: TINFO: LTP version: 20250130-1-g60fe84aaf
>>>>>>>>> 10272 05:03:09.076425 tst_test.c:1904: TINFO: Tested kernel: 6.15.0-rc1 #1 SMP PREEMPT Sun Apr 6 21:18:14 UTC 2025 aarch64
>>>>>>>>> 10273 05:03:09.076733 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
>>>>>>>>> 10274 05:03:09.087803 tst_test.c:1722: TINFO: Overall timeout per run is 0h 01m 30s
>>>>>>>>> 10275 05:03:09.088107 tst_kconfig.c:88: TINFO: Parsing kernel config '/proc/config.gz'
>>>>>>>>> 10276 05:03:09.093097 acct02.c:63: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
>>>>>>>>> 10277 05:03:09.093400 acct02.c:240: TINFO: Verifying using 'struct acct_v3'
>>>>>>>>> 10278 05:03:10.053504 <6>[ 98.043143] Process accounting resumed
>>>>>>>>> 10279 05:03:10.053935 <6>[ 98.043143] Process accounting resumed
>>>>>>>>> 10280 05:03:10.064653 acct02.c:193: TINFO: == entry 1 ==
>>>>>>>>> 10281 05:03:10.064953 acct02.c:84: TINFO: ac_comm != 'acct02_helper' ('acct02')
>>>>>>>>> 10282 05:03:10.076029 acct02.c:133: TINFO: ac_exitcode != 32768 (0)
>>>>>>>>> 10283 05:03:10.076331 acct02.c:141: TINFO: ac_ppid != 2466 (2461)
>>>>>>> It seems that the acct02 process got logged..
>>>>>>> Maybe the vfork attempt (trying to run acct02_helper) got half way an
>>>>>>> aborted.
>>>>>>> It got far enough that accounting got interested.
>>>>>>> It didn't get far enough to update the ppid.
>>>>>>> I'd be surprised if that were even possible....
>>>>>>>
>>>>>>> If you would like to help debug this, changing the
>>>>>>>
>>>>>>> + if (unlikely(current->flags & PF_EXITING))
>>>>>>>
>>>>>>> to
>>>>>>>
>>>>>>> + if (unlikely(WARN_ON(current->flags & PF_EXITING)))
>>>>>>>
>>>>>>> would provide stack traces so we can wee where -EINTR is actually being
>>>>>>> returned. That should provide some hints.
>>>>>>>
>>>>>>> NeilBrown
>>>>>> Hi Neil,
>>>>>>
>>>>>> Upon this addition I got this in the logs
>>>>> Thanks for testing. Was there anything new in the kernel logs? I was
>>>>> expecting a WARNING message followed by a "Call Trace".
>>>>>
>>>>> If there wasn't, then this patch cannot have caused the problem.
>>>>> If there was, then I need to see it.
>>>>>
>>>>> Thanks,
>>>>> NeilBrown
>>>> This is what the dmesg contains:
>>>>
>>>> [ 678.814887] LTP: starting acct02
>>>> [ 679.831232] ------------[ cut here ]------------
>>>> [ 679.833500] WARNING: CPU: 6 PID: 88930 at net/sunrpc/sched.c:279
>>>> rpc_wait_bit_killable+0x76/0x90 [sunrpc]
>>>> [ 679.837308] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs
>>>> netfs rpcrdma rdma_cm iw_cm ib_cm ib_core nfsd auth_rpcgss nfs_acl lockd
>>>> grace loop nft_redir ipt_REJECT xt_comment xt_owner nft_compat
>>>> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib rfkill nft_reject_inet
>>>> nf_reject_
>>>> ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack
>>>> nf_defrag_ipv6 nf_defrag_ipv4 ip_set cuse vfat fat intel_rapl_msr
>>>> intel_rapl_common kvm_amd ccp kvm drm_shmem_helper irqbypass i2c_piix4
>>>> drm_kms_helper pcspkr pvpanic_mmio i2c_smbus pvpanic drm fuse xfs
>>>> crc32c_generic
>>>> nvme_tcp nvme_fabrics nvme_core nvme_keyring nvme_auth sd_mod
>>>> virtio_net sg net_failover virtio_scsi failover ata_generic pata_acpi
>>>> ata_piix ghash_clmulni_intel libata sha512_ssse3 virtio_pci sha256_ssse3
>>>> virtio_pci_legacy_dev sha1_ssse3 virtio_pci_modern_dev serio_raw
>>>> dm_multipath btrfs
>>>> blake2b_generic xor zstd_compress raid6_pq sunrpc dm_mirror
>>>> dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls
>>>> cxgb3i cxgb3 mdio libcxgbi libcxgb
>>>> [ 679.837524] qla4xxx iscsi_tcp libiscsi_tcp libiscsi
>>>> scsi_transport_iscsi iscsi_ibft iscsi_boot_sysfs qemu_fw_cfg aesni_intel
>>>> crypto_simd cryptd [last unloaded: kheaders]
>>>> [ 679.873316] CPU: 6 UID: 0 PID: 88930 Comm: acct02_helper Kdump:
>>>> loaded Not tainted 6.15.8-1.el9.rc2.x86_64 #1 PREEMPT(voluntary)
>>>> [ 679.877769] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>>>> BIOS 1.6.4 02/27/2023
>>>> [ 679.880782] RIP: 0010:rpc_wait_bit_killable+0x76/0x90 [sunrpc]
>>>> [ 679.883189] Code: 01 b8 00 fe ff ff 75 d5 48 8b 85 48 0d 00 00 5b 5d
>>>> 48 c1 e8 08 83 e0 01 f7 d8 19 c0 25 00 fe ff ff 31 d2 31 f6 e9 8a e6 c4
>>>> d4 <0f> 0b b8 fc ff ff ff 5b 5d 31 d2 31 f6 e9 78 e6 c4 d4 0f 1f 84 00
>>>> [ 679.889976] RSP: 0018:ffffaf47811a7770 EFLAGS: 00010202
>>>> [ 679.892196] RAX: ffff97be48e00330 RBX: ffffaf47811a77c0 RCX:
>>>> 0000000000000000
>>>> [ 679.894978] RDX: 0000000000000001 RSI: 0000000000002102 RDI:
>>>> ffffaf47811a77c0
>>>> [ 679.897786] RBP: ffff97be61588000 R08: 0000000000000000 R09:
>>>> 0000000000000000
>>>> [ 679.900600] R10: 0000000000000000 R11: 0000000000000000 R12:
>>>> 0000000000002102
>>>> [ 679.903432] R13: ffffffff96408ea0 R14: ffffaf47811a77d8 R15:
>>>> ffffffffc07568e0
>>>> [ 679.906233] FS: 00007fc2563f8600(0000) GS:ffff97c5c890f000(0000)
>>>> knlGS:0000000000000000
>>>> [ 679.909289] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [ 679.911736] CR2: 00007fc2561fba70 CR3: 00000003bce3a000 CR4:
>>>> 00000000003506f0
>>>> [ 679.914555] Call Trace:
>>>> [ 679.915918] <TASK>
>>>> [ 679.917215] __wait_on_bit+0x31/0xa0
>>>> [ 679.918932] out_of_line_wait_on_bit+0x93/0xc0
>>>> [ 679.920914] ? __pfx_wake_bit_function+0x10/0x10
>>>> [ 679.922944] __rpc_execute+0x109/0x310 [sunrpc]
>>>> [ 679.925024] rpc_execute+0x137/0x160 [sunrpc]
>>>> [ 679.927020] rpc_run_task+0x107/0x170 [sunrpc]
>>>> [ 679.929032] nfs4_call_sync_sequence+0x74/0xc0 [nfsv4]
>>>> [ 679.931319] _nfs4_proc_statfs+0xc7/0x100 [nfsv4]
>>>> [ 679.933520] ? srso_return_thunk+0x5/0x5f
>>>> [ 679.935391] nfs4_proc_statfs+0x6b/0xb0 [nfsv4]
>>>> [ 679.937367] nfs_statfs+0x7e/0x1e0 [nfs]
>>>> [ 679.939138] statfs_by_dentry+0x67/0xa0
>>>> [ 679.940887] vfs_statfs+0x1c/0x40
>>>> [ 679.942596] check_free_space+0x71/0x110
>>> Thanks. I'm not sure why this causes a problem as if vfs_statfs() fail,
>>> check_free_space() assumes there is still free space.
>>> However it does strongly suggest that we still need to NFS to work in
>>> processes where signals have been shutdown.
>>>
>>> Could you change rpc_wait_bit_killable() to be the following and retest?
>>> I intention is that when the process is exiting, we wait up to 5 seconds
>>> for each request and then fail. It's a bit ugly, but it is a rather
>>> strange situation. It blocking forever that we really want to avoid
>>> here, not blocking at all.
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>>
>>> static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
>>> {
>>> if (unlikely(current->flags & PF_EXITING)) {
>>> if (schedule_timeout(5*HZ) > 0)
>>> /* timed out */
>>> return 0;
>>> return -EINTR;
>>> }
>>> schedule();
>>> if (signal_pending_state(mode, current))
>>> return -ERESTARTSYS;
>>> return 0;
>>> }
>> Adding this change makes the test pass:
>>
>> <<<test_start>>>
>> tag=acct02 stime=1754066481
>> cmdline="acct02"
>> contacts=""
>> analysis=exit
>> <<<test_output>>>
>> tst_kconfig.c:88: TINFO: Parsing kernel config '/lib/modules/6.15.8-master.sunrpc.el9.rc3.x86_64/config'
>> tst_tmpdir.c:316: TINFO: Using /tmpdir/ltp-lNzAk1qhuX/LTP_accZ75zl1 as tmpdir (nfs filesystem)
>> tst_test.c:2004: TINFO: LTP version: 20250530-128-g6505f9e29
>> tst_test.c:2007: TINFO: Tested kernel: 6.15.8-master.sunrpc.el9.rc3.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Jul 29 05:06:28 PDT 2025 x86_64
>> tst_kconfig.c:88: TINFO: Parsing kernel config '/lib/modules/6.15.8-master.sunrpc.el9.rc3.x86_64/config'
>> tst_test.c:1825: TINFO: Overall timeout per run is 0h 00m 30s
>> tst_kconfig.c:88: TINFO: Parsing kernel config '/lib/modules/6.15.8-master.sunrpc.el9.rc3.x86_64/config'
>> acct02.c:61: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
>> acct02.c:238: TINFO: Verifying using 'struct acct_v3'
>> acct02.c:191: TINFO: == entry 1 ==
>> acct02.c:82: TINFO: ac_comm != 'acct02_helper' ('iscsiadm')
>> acct02.c:131: TINFO: ac_exitcode != 32768 (5376)
>> acct02.c:139: TINFO: ac_ppid != 52326 (2475)
>> acct02.c:191: TINFO: == entry 2 ==
>> acct02.c:82: TINFO: ac_comm != 'acct02_helper' ('systemd')
>> acct02.c:125: TINFO: elap_time/clock_ticks >= 2 (1065/100: 10.00)
>> acct02.c:131: TINFO: ac_exitcode != 32768 (0)
>> acct02.c:139: TINFO: ac_ppid != 52326 (1)
>> acct02.c:191: TINFO: == entry 3 ==
>> acct02.c:82: TINFO: ac_comm != 'acct02_helper' ('(sd-pam)')
>> acct02.c:125: TINFO: elap_time/clock_ticks >= 2 (1062/100: 10.00)
>> acct02.c:131: TINFO: ac_exitcode != 32768 (9)
>> acct02.c:139: TINFO: ac_ppid != 52326 (1)
>> acct02.c:191: TINFO: == entry 4 ==
>> acct02.c:82: TINFO: ac_comm != 'acct02_helper' ('systemd-user-ru')
>> acct02.c:131: TINFO: ac_exitcode != 32768 (0)
>> acct02.c:139: TINFO: ac_ppid != 52326 (1)
>> acct02.c:191: TINFO: == entry 5 ==
>> acct02.c:202: TINFO: Number of accounting file entries tested: 5
>> acct02.c:208: TPASS: acct() wrote correct file contents!
>>
>> Summary:
>> passed 1
>> failed 0
>> broken 0
>> skipped 0
>> warnings 0
>> incrementing stop
>> <<<execution_status>>>
>> initiation_status="ok"
>> duration=1 termination_type=exited termination_id=0 corefile=no
>> cutime=0 cstime=0
>> <<<test_end>>>
>>
>> Thanks & Regards,
>> Harshvardhan
> Hi there,
>
> I tested this around 50 iterations and it passes all 50 times with this
> timeout.
>
> Thanks & Regards,
> Harshvardhan
>
Hello there,
Can we go ahead and revert this patch for the meantime until a fix is
obtained?
Thanks & Regards,
Harshvardhan
>>
>>>> [ 679.944433] acct_write_process+0x45/0x180
>>>> [ 679.946313] acct_process+0xff/0x180
>>>> [ 679.948003] do_exit+0x216/0x480
>>>> [ 679.949799] ? srso_return_thunk+0x5/0x5f
>>>> [ 679.951621] do_group_exit+0x30/0x80
>>>> [ 679.953329] __x64_sys_exit_group+0x18/0x20
>>>> [ 679.955217] x64_sys_call+0xfdb/0x14f0
>>>> [ 679.956971] do_syscall_64+0x82/0x7a0
>>>> [ 679.958717] ? srso_return_thunk+0x5/0x5f
>>>> [ 679.960550] ? ___pte_offset_map+0x1b/0x1a0
>>>> [ 679.962434] ? srso_return_thunk+0x5/0x5f
>>>> [ 679.964261] ? __alloc_frozen_pages_noprof+0x18d/0x340
>>>> [ 679.966389] ? srso_return_thunk+0x5/0x5f
>>>> [ 679.968183] ? srso_return_thunk+0x5/0x5f
>>>> [ 679.969945] ? __mod_memcg_lruvec_state+0xb6/0x1b0
>>>> [ 679.971977] ? srso_return_thunk+0x5/0x5f
>>>> [ 679.973690] ? __lruvec_stat_mod_folio+0x83/0xd0
>>>> [ 679.975671] ? srso_return_thunk+0x5/0x5f
>>>> [ 679.977392] ? srso_return_thunk+0x5/0x5f
>>>> [ 679.979079] ? set_ptes.isra.0+0x36/0x90
>>>> [ 679.980771] ? srso_return_thunk+0x5/0x5f
>>>> [ 679.982375] ? srso_return_thunk+0x5/0x5f
>>>> [ 679.984052] ? wp_page_copy+0x333/0x730
>>>> [ 679.985648] ? srso_return_thunk+0x5/0x5f
>>>> [ 679.987220] ? __handle_mm_fault+0x397/0x6f0
>>>> [ 679.988818] ? srso_return_thunk+0x5/0x5f
>>>> [ 679.990411] ? __count_memcg_events+0xbb/0x150
>>>> [ 679.992111] ? srso_return_thunk+0x5/0x5f
>>>> [ 679.993689] ? count_memcg_events.constprop.0+0x26/0x50
>>>> [ 679.995590] ? srso_return_thunk+0x5/0x5f
>>>> [ 679.997177] ? handle_mm_fault+0x245/0x350
>>>> [ 679.998807] ? srso_return_thunk+0x5/0x5f
>>>> [ 680.000339] ? do_user_addr_fault+0x221/0x690
>>>> [ 680.002042] ? srso_return_thunk+0x5/0x5f
>>>> [ 680.003553] ? arch_exit_to_user_mode_prepare.isra.0+0x1e/0xd0
>>>> [ 680.005643] ? srso_return_thunk+0x5/0x5f
>>>> [ 680.007202] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>> [ 680.009025] RIP: 0033:0x7fc2560d985d
>>>> [ 680.010510] Code: Unable to access opcode bytes at 0x7fc2560d9833.
>>>> [ 680.012660] RSP: 002b:00007ffde591df68 EFLAGS: 00000246 ORIG_RAX:
>>>> 00000000000000e7
>>>> [ 680.015355] RAX: ffffffffffffffda RBX: 00007fc2561f59e0 RCX:
>>>> 00007fc2560d985d
>>>> [ 680.017749] RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI:
>>>> 0000000000000080
>>>> [ 680.020292] RBP: 0000000000000080 R08: 0000000000000000 R09:
>>>> 0000000000000020
>>>> [ 680.022729] R10: 00007ffde591de10 R11: 0000000000000246 R12:
>>>> 00007fc2561f59e0
>>>> [ 680.025174] R13: 00007fc2561faf20 R14: 0000000000000001 R15:
>>>> 00007fc2561faf08
>>>> [ 680.027593] </TASK>
>>>> [ 680.028661] ---[ end trace 0000000000000000 ]---
>>>>
>>>>
>>>> Thanks & Regards,
>>>> Harshvardhan
>>>>
>>>>>> <<<test_start>>>
>>>>>> tag=acct02 stime=1753444172
>>>>>> cmdline="acct02"
>>>>>> contacts=""
>>>>>> analysis=exit
>>>>>> <<<test_output>>>
>>>>>> tst_kconfig.c:88: TINFO: Parsing kernel config
>>>>>> '/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
>>>>>> tst_tmpdir.c:316: TINFO: Using /tmpdir/ltp-w1ozKKlJ6n/LTP_acc4RRfLh as
>>>>>> tmpdir (nfs filesystem)
>>>>>> tst_test.c:2004: TINFO: LTP version: 20250530-105-gda73e1527
>>>>>> tst_test.c:2007: TINFO: Tested kernel:
>>>>>> 6.15.8-1.bug38227970.el9.rc2.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jul 25
>>>>>> 02:03:04 PDT 2025 x86_64
>>>>>> tst_kconfig.c:88: TINFO: Parsing kernel config
>>>>>> '/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
>>>>>> tst_test.c:1825: TINFO: Overall timeout per run is 0h 00m 30s
>>>>>> tst_kconfig.c:88: TINFO: Parsing kernel config
>>>>>> '/lib/modules/6.15.8-1.bug38227970.el9.rc2.x86_64/config'
>>>>>> acct02.c:61: TINFO: CONFIG_BSD_PROCESS_ACCT_V3=y
>>>>>> acct02.c:238: TINFO: Verifying using 'struct acct_v3'
>>>>>> acct02.c:191: TINFO: == entry 1 ==
>>>>>> acct02.c:82: TINFO: ac_comm != 'acct02_helper' ('acct02')
>>>>>> acct02.c:131: TINFO: ac_exitcode != 32768 (0)
>>>>>> acct02.c:139: TINFO: ac_ppid != 88929 (88928)
>>>>>> acct02.c:181: TFAIL: end of file reached
>>>>>>
>>>>>> HINT: You _MAY_ be missing kernel fixes:
>>>>>>
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d9570158b626
>>>>>>
>>>>>> Summary:
>>>>>> passed 0
>>>>>> failed 1
>>>>>> broken 0
>>>>>> skipped 0
>>>>>> warnings 0
>>>>>> incrementing stop
>>>>>> <<<execution_status>>>
>>>>>> initiation_status="ok"
>>>>>> duration=1 termination_type=exited termination_id=1 corefile=no
>>>>>> cutime=0 cstime=20
>>>>>>
>>>>>> <<<test_end>>>
>>>>>>
>>>>>>
>>>>>> Thanks & Regards,
>>>>>>
>>>>>> Harshvardhan
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-08-19 10:07 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-28 17:40 [PATCH 1/2] SUNRPC: Don't allow waiting for exiting tasks trondmy
2025-03-28 17:40 ` [PATCH 2/2] NFS: " trondmy
2025-03-28 18:23 ` Jeff Layton
2025-03-28 17:53 ` [PATCH 1/2] SUNRPC: " Jeff Layton
2025-03-28 18:00 ` Trond Myklebust
2025-03-28 18:09 ` Jeff Layton
2025-03-28 19:36 ` Trond Myklebust
2025-04-08 10:31 ` Mark Brown
2025-07-23 7:02 ` Harshvardhan Jha
2025-07-23 8:07 ` NeilBrown
2025-07-25 11:59 ` Harshvardhan Jha
2025-07-27 4:50 ` NeilBrown
2025-07-28 8:07 ` Harshvardhan Jha
2025-07-28 9:34 ` NeilBrown
2025-08-04 7:45 ` Harshvardhan Jha
2025-08-06 5:47 ` Harshvardhan Jha
2025-08-19 10:06 ` Harshvardhan Jha
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).