* [PATCH net-next] selftests: net: exit cleanly on SIGTERM / timeout
@ 2025-04-25 15:17 Jakub Kicinski
2025-04-26 15:15 ` Willem de Bruijn
0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2025-04-25 15:17 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
petrm, willemb, sdf, linux-kselftest
ksft runner sends 2 SIGTERMs in a row if a test runs out of time.
Handle this in a similar way we handle SIGINT - cleanup and stop
running further tests.
Because we get 2 signals we need a bit of logic to ignore
the subsequent one, they come immediately one after the other
(due to commit 9616cb34b08e ("kselftest/runner.sh: Propagate SIGTERM
to runner child")).
This change makes sure we run cleanup (scheduled defer()s)
and also print a stack trace on SIGTERM, which doesn't happen
by default. Tests occasionally hang in NIPA and it's impossible
to tell what they are waiting from or doing.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: petrm@nvidia.com
CC: willemb@google.com
CC: sdf@fomichev.me
CC: linux-kselftest@vger.kernel.org
---
tools/testing/selftests/net/lib/py/ksft.py | 27 +++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/lib/py/ksft.py b/tools/testing/selftests/net/lib/py/ksft.py
index 3cfad0fd4570..73710634d457 100644
--- a/tools/testing/selftests/net/lib/py/ksft.py
+++ b/tools/testing/selftests/net/lib/py/ksft.py
@@ -3,6 +3,7 @@
import builtins
import functools
import inspect
+import signal
import sys
import time
import traceback
@@ -26,6 +27,10 @@ KSFT_DISRUPTIVE = True
pass
+class KsftTerminate(KeyboardInterrupt):
+ pass
+
+
def ksft_pr(*objs, **kwargs):
print("#", *objs, **kwargs)
@@ -193,6 +198,19 @@ KSFT_DISRUPTIVE = True
return env
+term_cnt = 0
+
+def _ksft_intr(signum, frame):
+ # ksft runner.sh sends 2 SIGTERMs in a row on a timeout
+ # if we don't ignore the second one it will stop us from handling cleanup
+ global term_cnt
+ term_cnt += 1
+ if term_cnt == 1:
+ raise KsftTerminate()
+ else:
+ ksft_pr(f"Ignoring SIGTERM (cnt: {term_cnt}), already exiting...")
+
+
def ksft_run(cases=None, globs=None, case_pfx=None, args=()):
cases = cases or []
@@ -205,6 +223,10 @@ KSFT_DISRUPTIVE = True
cases.append(value)
break
+ global term_cnt
+ term_cnt = 0
+ prev_sigterm = signal.signal(signal.SIGTERM, _ksft_intr)
+
totals = {"pass": 0, "fail": 0, "skip": 0, "xfail": 0}
print("TAP version 13")
@@ -229,11 +251,12 @@ KSFT_DISRUPTIVE = True
cnt_key = 'xfail'
except BaseException as e:
stop |= isinstance(e, KeyboardInterrupt)
+ stop |= isinstance(e, KsftTerminate)
tb = traceback.format_exc()
for line in tb.strip().split('\n'):
ksft_pr("Exception|", line)
if stop:
- ksft_pr("Stopping tests due to KeyboardInterrupt.")
+ ksft_pr(f"Stopping tests due to {type(e).__name__}.")
KSFT_RESULT = False
cnt_key = 'fail'
@@ -248,6 +271,8 @@ KSFT_DISRUPTIVE = True
if stop:
break
+ signal.signal(signal.SIGTERM, prev_sigterm)
+
print(
f"# Totals: pass:{totals['pass']} fail:{totals['fail']} xfail:{totals['xfail']} xpass:0 skip:{totals['skip']} error:0"
)
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH net-next] selftests: net: exit cleanly on SIGTERM / timeout
2025-04-25 15:17 [PATCH net-next] selftests: net: exit cleanly on SIGTERM / timeout Jakub Kicinski
@ 2025-04-26 15:15 ` Willem de Bruijn
2025-04-28 20:24 ` Jakub Kicinski
0 siblings, 1 reply; 6+ messages in thread
From: Willem de Bruijn @ 2025-04-26 15:15 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
petrm, willemb, sdf, linux-kselftest
Jakub Kicinski wrote:
> ksft runner sends 2 SIGTERMs in a row if a test runs out of time.
> Handle this in a similar way we handle SIGINT - cleanup and stop
> running further tests.
>
> Because we get 2 signals we need a bit of logic to ignore
> the subsequent one, they come immediately one after the other
> (due to commit 9616cb34b08e ("kselftest/runner.sh: Propagate SIGTERM
> to runner child")).
>
> This change makes sure we run cleanup (scheduled defer()s)
> and also print a stack trace on SIGTERM, which doesn't happen
> by default. Tests occasionally hang in NIPA and it's impossible
> to tell what they are waiting from or doing.
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: petrm@nvidia.com
> CC: willemb@google.com
> CC: sdf@fomichev.me
> CC: linux-kselftest@vger.kernel.org
> ---
> tools/testing/selftests/net/lib/py/ksft.py | 27 +++++++++++++++++++++-
> 1 file changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/net/lib/py/ksft.py b/tools/testing/selftests/net/lib/py/ksft.py
> index 3cfad0fd4570..73710634d457 100644
> --- a/tools/testing/selftests/net/lib/py/ksft.py
> +++ b/tools/testing/selftests/net/lib/py/ksft.py
> @@ -3,6 +3,7 @@
> import builtins
> import functools
> import inspect
> +import signal
> import sys
> import time
> import traceback
> @@ -26,6 +27,10 @@ KSFT_DISRUPTIVE = True
> pass
>
>
> +class KsftTerminate(KeyboardInterrupt):
> + pass
> +
> +
> def ksft_pr(*objs, **kwargs):
> print("#", *objs, **kwargs)
>
> @@ -193,6 +198,19 @@ KSFT_DISRUPTIVE = True
> return env
>
>
> +term_cnt = 0
> +
A bit ugly to initialize this here. Also, it already is initialized
below.
> +def _ksft_intr(signum, frame):
> + # ksft runner.sh sends 2 SIGTERMs in a row on a timeout
> + # if we don't ignore the second one it will stop us from handling cleanup
> + global term_cnt
> + term_cnt += 1
> + if term_cnt == 1:
> + raise KsftTerminate()
> + else:
> + ksft_pr(f"Ignoring SIGTERM (cnt: {term_cnt}), already exiting...")
> +
> +
> def ksft_run(cases=None, globs=None, case_pfx=None, args=()):
> cases = cases or []
>
> @@ -205,6 +223,10 @@ KSFT_DISRUPTIVE = True
> cases.append(value)
> break
>
> + global term_cnt
> + term_cnt = 0
> + prev_sigterm = signal.signal(signal.SIGTERM, _ksft_intr)
> +
> totals = {"pass": 0, "fail": 0, "skip": 0, "xfail": 0}
>
> print("TAP version 13")
> @@ -229,11 +251,12 @@ KSFT_DISRUPTIVE = True
> cnt_key = 'xfail'
> except BaseException as e:
> stop |= isinstance(e, KeyboardInterrupt)
> + stop |= isinstance(e, KsftTerminate)
> tb = traceback.format_exc()
> for line in tb.strip().split('\n'):
> ksft_pr("Exception|", line)
> if stop:
> - ksft_pr("Stopping tests due to KeyboardInterrupt.")
> + ksft_pr(f"Stopping tests due to {type(e).__name__}.")
> KSFT_RESULT = False
> cnt_key = 'fail'
>
> @@ -248,6 +271,8 @@ KSFT_DISRUPTIVE = True
> if stop:
> break
>
> + signal.signal(signal.SIGTERM, prev_sigterm)
> +
Why is prev_sigterm saved and reassigned as handler here?
> print(
> f"# Totals: pass:{totals['pass']} fail:{totals['fail']} xfail:{totals['xfail']} xpass:0 skip:{totals['skip']} error:0"
> )
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH net-next] selftests: net: exit cleanly on SIGTERM / timeout
2025-04-26 15:15 ` Willem de Bruijn
@ 2025-04-28 20:24 ` Jakub Kicinski
2025-04-29 1:27 ` Willem de Bruijn
0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2025-04-28 20:24 UTC (permalink / raw)
To: Willem de Bruijn
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, petrm,
willemb, sdf, linux-kselftest
On Sat, 26 Apr 2025 11:15:34 -0400 Willem de Bruijn wrote:
> > @@ -193,6 +198,19 @@ KSFT_DISRUPTIVE = True
> > return env
> >
> >
> > +term_cnt = 0
> > +
>
> A bit ugly to initialize this here. Also, it already is initialized
> below.
We need a global so that the signal handler can access it.
Python doesn't have syntax to define a variable without a value.
Or do you suggest term_cnt = None ?
The whole term_cnt dance is super ugly, couldn't think of a cleaner way.
It's really annoying that ksft infra sends 2 terminating signals one
immediately after the other :|
> > +def _ksft_intr(signum, frame):
> > + # ksft runner.sh sends 2 SIGTERMs in a row on a timeout
> > + # if we don't ignore the second one it will stop us from handling cleanup
> > + global term_cnt
> > + term_cnt += 1
> > + if term_cnt == 1:
> > + raise KsftTerminate()
> > + else:
> > + ksft_pr(f"Ignoring SIGTERM (cnt: {term_cnt}), already exiting...")
> > +
> > +
> > def ksft_run(cases=None, globs=None, case_pfx=None, args=()):
> > cases = cases or []
> >
> > @@ -205,6 +223,10 @@ KSFT_DISRUPTIVE = True
> > cases.append(value)
> > break
> >
> > + global term_cnt
> > + term_cnt = 0
> > + prev_sigterm = signal.signal(signal.SIGTERM, _ksft_intr)
> > +
> > totals = {"pass": 0, "fail": 0, "skip": 0, "xfail": 0}
> >
> > print("TAP version 13")
> > @@ -229,11 +251,12 @@ KSFT_DISRUPTIVE = True
> > cnt_key = 'xfail'
> > except BaseException as e:
> > stop |= isinstance(e, KeyboardInterrupt)
> > + stop |= isinstance(e, KsftTerminate)
> > tb = traceback.format_exc()
> > for line in tb.strip().split('\n'):
> > ksft_pr("Exception|", line)
> > if stop:
> > - ksft_pr("Stopping tests due to KeyboardInterrupt.")
> > + ksft_pr(f"Stopping tests due to {type(e).__name__}.")
> > KSFT_RESULT = False
> > cnt_key = 'fail'
> >
> > @@ -248,6 +271,8 @@ KSFT_DISRUPTIVE = True
> > if stop:
> > break
> >
> > + signal.signal(signal.SIGTERM, prev_sigterm)
> > +
>
> Why is prev_sigterm saved and reassigned as handler here?
Because we ignore all signals when cnt > 2 I didn't want to keep our
handler installed. Just in case something after ksft_run() hangs.
It should be equivalent to
signal.signal(signal.SIGTERM, signal.SIG_DLF)
if the prev is of concern. Then again keeping prev doesn't change #LOC
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH net-next] selftests: net: exit cleanly on SIGTERM / timeout
2025-04-28 20:24 ` Jakub Kicinski
@ 2025-04-29 1:27 ` Willem de Bruijn
2025-04-29 14:49 ` Paolo Abeni
2025-04-29 17:07 ` Jakub Kicinski
0 siblings, 2 replies; 6+ messages in thread
From: Willem de Bruijn @ 2025-04-29 1:27 UTC (permalink / raw)
To: Jakub Kicinski, Willem de Bruijn
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, petrm,
willemb, sdf, linux-kselftest
Reviewed-by: Willem de Bruijn <willemb@google.com>
Jakub Kicinski wrote:
> On Sat, 26 Apr 2025 11:15:34 -0400 Willem de Bruijn wrote:
> > > @@ -193,6 +198,19 @@ KSFT_DISRUPTIVE = True
> > > return env
> > >
> > >
> > > +term_cnt = 0
> > > +
> >
> > A bit ugly to initialize this here. Also, it already is initialized
> > below.
>
> We need a global so that the signal handler can access it.
> Python doesn't have syntax to define a variable without a value.
> Or do you suggest term_cnt = None ?
I meant that the "global term_cnt" in ksft_run below already creates
the global var, and is guaranteed to do so before _ksft_intr, so no
need to also define it outside a function.
Obviously not very important, don't mean to ask for a respin. LGTM.
> The whole term_cnt dance is super ugly, couldn't think of a cleaner way.
> It's really annoying that ksft infra sends 2 terminating signals one
> immediately after the other :|
>
> > > +def _ksft_intr(signum, frame):
> > > + # ksft runner.sh sends 2 SIGTERMs in a row on a timeout
> > > + # if we don't ignore the second one it will stop us from handling cleanup
> > > + global term_cnt
> > > + term_cnt += 1
> > > + if term_cnt == 1:
> > > + raise KsftTerminate()
> > > + else:
> > > + ksft_pr(f"Ignoring SIGTERM (cnt: {term_cnt}), already exiting...")
> > > +
> > > +
> > > def ksft_run(cases=None, globs=None, case_pfx=None, args=()):
> > > cases = cases or []
> > >
> > > @@ -205,6 +223,10 @@ KSFT_DISRUPTIVE = True
> > > cases.append(value)
> > > break
> > >
> > > + global term_cnt
> > > + term_cnt = 0
> > > + prev_sigterm = signal.signal(signal.SIGTERM, _ksft_intr)
> > > +
> > > totals = {"pass": 0, "fail": 0, "skip": 0, "xfail": 0}
> > >
> > > print("TAP version 13")
> > > @@ -229,11 +251,12 @@ KSFT_DISRUPTIVE = True
> > > cnt_key = 'xfail'
> > > except BaseException as e:
> > > stop |= isinstance(e, KeyboardInterrupt)
> > > + stop |= isinstance(e, KsftTerminate)
> > > tb = traceback.format_exc()
> > > for line in tb.strip().split('\n'):
> > > ksft_pr("Exception|", line)
> > > if stop:
> > > - ksft_pr("Stopping tests due to KeyboardInterrupt.")
> > > + ksft_pr(f"Stopping tests due to {type(e).__name__}.")
> > > KSFT_RESULT = False
> > > cnt_key = 'fail'
> > >
> > > @@ -248,6 +271,8 @@ KSFT_DISRUPTIVE = True
> > > if stop:
> > > break
> > >
> > > + signal.signal(signal.SIGTERM, prev_sigterm)
> > > +
> >
> > Why is prev_sigterm saved and reassigned as handler here?
>
> Because we ignore all signals when cnt > 2 I didn't want to keep our
> handler installed. Just in case something after ksft_run() hangs.
> It should be equivalent to
>
> signal.signal(signal.SIGTERM, signal.SIG_DLF)
>
> if the prev is of concern. Then again keeping prev doesn't change #LOC
Oh I see. Ok.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH net-next] selftests: net: exit cleanly on SIGTERM / timeout
2025-04-29 1:27 ` Willem de Bruijn
@ 2025-04-29 14:49 ` Paolo Abeni
2025-04-29 17:07 ` Jakub Kicinski
1 sibling, 0 replies; 6+ messages in thread
From: Paolo Abeni @ 2025-04-29 14:49 UTC (permalink / raw)
To: Willem de Bruijn, Jakub Kicinski
Cc: davem, netdev, edumazet, andrew+netdev, horms, petrm, willemb,
sdf, linux-kselftest
On 4/29/25 3:27 AM, Willem de Bruijn wrote:
> Reviewed-by: Willem de Bruijn <willemb@google.com>
>
> Jakub Kicinski wrote:
>> On Sat, 26 Apr 2025 11:15:34 -0400 Willem de Bruijn wrote:
>>>> @@ -193,6 +198,19 @@ KSFT_DISRUPTIVE = True
>>>> return env
>>>>
>>>>
>>>> +term_cnt = 0
>>>> +
>>>
>>> A bit ugly to initialize this here. Also, it already is initialized
>>> below.
>>
>> We need a global so that the signal handler can access it.
>> Python doesn't have syntax to define a variable without a value.
>> Or do you suggest term_cnt = None ?
>
> I meant that the "global term_cnt" in ksft_run below already creates
> the global var, and is guaranteed to do so before _ksft_intr, so no
> need to also define it outside a function.
>
> Obviously not very important, don't mean to ask for a respin. LGTM.
FWIW I think it's better to avoid the unneeded assignment in global
scope, so I would suggest either follow-up or a v2, whatever is simpler.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next] selftests: net: exit cleanly on SIGTERM / timeout
2025-04-29 1:27 ` Willem de Bruijn
2025-04-29 14:49 ` Paolo Abeni
@ 2025-04-29 17:07 ` Jakub Kicinski
1 sibling, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2025-04-29 17:07 UTC (permalink / raw)
To: Willem de Bruijn
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, petrm,
willemb, sdf, linux-kselftest
On Mon, 28 Apr 2025 21:27:32 -0400 Willem de Bruijn wrote:
> > > A bit ugly to initialize this here. Also, it already is initialized
> > > below.
> >
> > We need a global so that the signal handler can access it.
> > Python doesn't have syntax to define a variable without a value.
> > Or do you suggest term_cnt = None ?
>
> I meant that the "global term_cnt" in ksft_run below already creates
> the global var, and is guaranteed to do so before _ksft_intr, so no
> need to also define it outside a function.
>
> Obviously not very important, don't mean to ask for a respin. LGTM.
Oh wow, thanks! totally didn't know that using the global is enough
to add something to the global scope.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-04-29 17:07 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-25 15:17 [PATCH net-next] selftests: net: exit cleanly on SIGTERM / timeout Jakub Kicinski
2025-04-26 15:15 ` Willem de Bruijn
2025-04-28 20:24 ` Jakub Kicinski
2025-04-29 1:27 ` Willem de Bruijn
2025-04-29 14:49 ` Paolo Abeni
2025-04-29 17:07 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).