[PATCH net-next v2 1/2] selftests: net-drv: exercise queue stats when the device is down

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next v2 1/2] selftests: net-drv: exercise queue stats when the device is down
@ 2024-07-30 22:39 Stanislav Fomichev
  2024-07-30 22:39 ` [PATCH net-next v2 2/2] selftests: net: ksft: support marking tests as disruptive Stanislav Fomichev
  2024-07-31 11:34 ` [PATCH net-next v2 1/2] selftests: net-drv: exercise queue stats when the device is down Petr Machata
  0 siblings, 2 replies; 13+ messages in thread
From: Stanislav Fomichev @ 2024-07-30 22:39 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, Shuah Khan, Joe Damato,
	Petr Machata, linux-kselftest

Verify that total device stats don't decrease after it has been turned down.
Also make sure the device doesn't crash when we access per-queue stats
when it's down (in case it tries to access some pointers that are NULL).

  KTAP version 1
  1..5
  ok 1 stats.check_pause
  ok 2 stats.check_fec
  ok 3 stats.pkt_byte_sum
  ok 4 stats.qstat_by_ifindex
  ok 5 stats.check_down
  # Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0

v2:
- KTAP output formatting (Jakub)
- defer instead of try/finally (Jakub)
- disappearing stats is an error (Jakub)
- ksft_ge instead of open coding (Jakub)

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
--
Cc: Shuah Khan <shuah@kernel.org>
Cc: Joe Damato <jdamato@fastly.com>
Cc: Petr Machata <petrm@nvidia.com>
Cc: linux-kselftest@vger.kernel.org
---
 tools/testing/selftests/drivers/net/stats.py | 25 +++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/drivers/net/stats.py b/tools/testing/selftests/drivers/net/stats.py
index 820b8e0a22c6..93f9204f51c4 100755
--- a/tools/testing/selftests/drivers/net/stats.py
+++ b/tools/testing/selftests/drivers/net/stats.py
@@ -5,6 +5,7 @@ from lib.py import ksft_run, ksft_exit, ksft_pr
 from lib.py import ksft_ge, ksft_eq, ksft_in, ksft_true, ksft_raises, KsftSkipEx, KsftXfailEx
 from lib.py import EthtoolFamily, NetdevFamily, RtnlFamily, NlError
 from lib.py import NetDrvEnv
+from lib.py import ip, defer
 
 ethnl = EthtoolFamily()
 netfam = NetdevFamily()
@@ -133,9 +134,31 @@ rtnl = RtnlFamily()
     ksft_eq(cm.exception.nl_msg.extack['bad-attr'], '.ifindex')
 
 
+def check_down(cfg) -> None:
+    try:
+        qstat = netfam.qstats_get({"ifindex": cfg.ifindex}, dump=True)
+    except NlError as e:
+        if e.error == 95:
+            raise KsftSkipEx("qstats not supported by the device")
+        raise
+
+    ip(f"link set dev {cfg.dev['ifname']} down")
+    defer(ip, f"link set dev {cfg.dev['ifname']} up")
+
+    qstat = qstat[0]
+    qstat2 = netfam.qstats_get({"ifindex": cfg.ifindex}, dump=True)[0]
+    for k, v in qstat.items():
+        ksft_ge(qstat2[k], qstat[k], comment=f"{k} went backwards on device down")
+
+    # exercise per-queue API to make sure that "device down" state
+    # is handled correctly and doesn't crash
+    netfam.qstats_get({"ifindex": cfg.ifindex, "scope": "queue"}, dump=True)
+
+
 def main() -> None:
     with NetDrvEnv(__file__) as cfg:
-        ksft_run([check_pause, check_fec, pkt_byte_sum, qstat_by_ifindex],
+        ksft_run([check_pause, check_fec, pkt_byte_sum, qstat_by_ifindex,
+                  check_down],
                  args=(cfg, ))
     ksft_exit()
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next v2 2/2] selftests: net: ksft: support marking tests as disruptive
  2024-07-30 22:39 [PATCH net-next v2 1/2] selftests: net-drv: exercise queue stats when the device is down Stanislav Fomichev
@ 2024-07-30 22:39 ` Stanislav Fomichev
  2024-07-31 11:55   ` Petr Machata
  2024-07-31 11:34 ` [PATCH net-next v2 1/2] selftests: net-drv: exercise queue stats when the device is down Petr Machata
  1 sibling, 1 reply; 13+ messages in thread
From: Stanislav Fomichev @ 2024-07-30 22:39 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, Shuah Khan, Joe Damato,
	Petr Machata, linux-kselftest

Add new @ksft_disruptive decorator to mark the tests that might
be disruptive to the system. Depending on how well the previous
test works in the CI we might want to disable disruptive tests
by default and only let the developers run them manually.

KSFT framework runs disruptive tests by default. DISRUPTIVE=False
environment (or config file) can be used to disable these tests.
ksft_setup should be called by the test cases that want to use
new decorator (ksft_setup is only called via NetDrvEnv/NetDrvEpEnv for now).

In the future we can add similar decorators to, for example, avoid
running slow tests all the time. And/or have some option to run
only 'fast' tests for some sort of smoke test scenario.

  $ DISRUPTIVE=False ./stats.py
  KTAP version 1
  1..5
  ok 1 stats.check_pause
  ok 2 stats.check_fec
  ok 3 stats.pkt_byte_sum
  ok 4 stats.qstat_by_ifindex
  ok 5 stats.check_down # SKIP marked as disruptive
  # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:1 error:0

v2:
- convert from cli argument to env variable (Jakub)

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
--
Cc: Shuah Khan <shuah@kernel.org>
Cc: Joe Damato <jdamato@fastly.com>
Cc: Petr Machata <petrm@nvidia.com>
Cc: linux-kselftest@vger.kernel.org
---
 .../selftests/drivers/net/lib/py/env.py       |  5 +--
 tools/testing/selftests/drivers/net/stats.py  |  2 ++
 tools/testing/selftests/net/lib/py/ksft.py    | 32 +++++++++++++++++++
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/lib/py/env.py b/tools/testing/selftests/drivers/net/lib/py/env.py
index a5e800b8f103..1ea9bb695e94 100644
--- a/tools/testing/selftests/drivers/net/lib/py/env.py
+++ b/tools/testing/selftests/drivers/net/lib/py/env.py
@@ -4,6 +4,7 @@ import os
 import time
 from pathlib import Path
 from lib.py import KsftSkipEx, KsftXfailEx
+from lib.py import ksft_setup
 from lib.py import cmd, ethtool, ip
 from lib.py import NetNS, NetdevSimDev
 from .remote import Remote
@@ -14,7 +15,7 @@ from .remote import Remote
 
     src_dir = Path(src_path).parent.resolve()
     if not (src_dir / "net.config").exists():
-        return env
+        return ksft_setup(env)
 
     with open((src_dir / "net.config").as_posix(), 'r') as fp:
         for line in fp.readlines():
@@ -30,7 +31,7 @@ from .remote import Remote
             if len(pair) != 2:
                 raise Exception("Can't parse configuration line:", full_file)
             env[pair[0]] = pair[1]
-    return env
+    return ksft_setup(env)
 
 
 class NetDrvEnv:
diff --git a/tools/testing/selftests/drivers/net/stats.py b/tools/testing/selftests/drivers/net/stats.py
index 93f9204f51c4..4c58080cf893 100755
--- a/tools/testing/selftests/drivers/net/stats.py
+++ b/tools/testing/selftests/drivers/net/stats.py
@@ -3,6 +3,7 @@
 
 from lib.py import ksft_run, ksft_exit, ksft_pr
 from lib.py import ksft_ge, ksft_eq, ksft_in, ksft_true, ksft_raises, KsftSkipEx, KsftXfailEx
+from lib.py import ksft_disruptive
 from lib.py import EthtoolFamily, NetdevFamily, RtnlFamily, NlError
 from lib.py import NetDrvEnv
 from lib.py import ip, defer
@@ -134,6 +135,7 @@ rtnl = RtnlFamily()
     ksft_eq(cm.exception.nl_msg.extack['bad-attr'], '.ifindex')
 
 
+@ksft_disruptive
 def check_down(cfg) -> None:
     try:
         qstat = netfam.qstats_get({"ifindex": cfg.ifindex}, dump=True)
diff --git a/tools/testing/selftests/net/lib/py/ksft.py b/tools/testing/selftests/net/lib/py/ksft.py
index f26c20df9db4..a9a24ea77226 100644
--- a/tools/testing/selftests/net/lib/py/ksft.py
+++ b/tools/testing/selftests/net/lib/py/ksft.py
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 
 import builtins
+import functools
 import inspect
 import sys
 import time
@@ -10,6 +11,7 @@ from .utils import global_defer_queue
 
 KSFT_RESULT = None
 KSFT_RESULT_ALL = True
+KSFT_DISRUPTIVE = True
 
 
 class KsftFailEx(Exception):
@@ -127,6 +129,36 @@ KSFT_RESULT_ALL = True
             KSFT_RESULT = False
 
 
+def ksft_disruptive(func):
+    """
+    Decorator that marks the test as disruptive (e.g. the test
+    that can down the interface). Disruptive tests can be skipped
+    by passing DISRUPTIVE=False environment variable.
+    """
+
+    @functools.wraps(func)
+    def wrapper(*args, **kwargs):
+        if not KSFT_DISRUPTIVE:
+            raise KsftSkipEx(f"marked as disruptive")
+        return func(*args, **kwargs)
+    return wrapper
+
+
+def ksft_setup(env):
+    """
+    Setup test framework global state from the environment.
+    """
+
+    def get_bool(env, name):
+        return env.get(name, "").lower() in ["true", "1"]
+
+    if "DISRUPTIVE" in env:
+        global KSFT_DISRUPTIVE
+        KSFT_DISRUPTIVE = get_bool(env, "DISRUPTIVE")
+
+    return env
+
+
 def ksft_run(cases=None, globs=None, case_pfx=None, args=()):
     cases = cases or []
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 1/2] selftests: net-drv: exercise queue stats when the device is down
  2024-07-30 22:39 [PATCH net-next v2 1/2] selftests: net-drv: exercise queue stats when the device is down Stanislav Fomichev
  2024-07-30 22:39 ` [PATCH net-next v2 2/2] selftests: net: ksft: support marking tests as disruptive Stanislav Fomichev
@ 2024-07-31 11:34 ` Petr Machata
  2024-08-01  0:32   ` Jakub Kicinski
  1 sibling, 1 reply; 13+ messages in thread
From: Petr Machata @ 2024-07-31 11:34 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: netdev, davem, edumazet, kuba, pabeni, Shuah Khan, Joe Damato,
	Petr Machata, linux-kselftest


Stanislav Fomichev <sdf@fomichev.me> writes:

> Verify that total device stats don't decrease after it has been turned down.
> Also make sure the device doesn't crash when we access per-queue stats
> when it's down (in case it tries to access some pointers that are NULL).
>
>   KTAP version 1
>   1..5
>   ok 1 stats.check_pause
>   ok 2 stats.check_fec
>   ok 3 stats.pkt_byte_sum
>   ok 4 stats.qstat_by_ifindex
>   ok 5 stats.check_down
>   # Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0
>
> v2:
> - KTAP output formatting (Jakub)
> - defer instead of try/finally (Jakub)
> - disappearing stats is an error (Jakub)
> - ksft_ge instead of open coding (Jakub)
>
> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
> --
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Joe Damato <jdamato@fastly.com>
> Cc: Petr Machata <petrm@nvidia.com>
> Cc: linux-kselftest@vger.kernel.org
> ---
>  tools/testing/selftests/drivers/net/stats.py | 25 +++++++++++++++++++-
>  1 file changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/drivers/net/stats.py b/tools/testing/selftests/drivers/net/stats.py
> index 820b8e0a22c6..93f9204f51c4 100755
> --- a/tools/testing/selftests/drivers/net/stats.py
> +++ b/tools/testing/selftests/drivers/net/stats.py
> @@ -5,6 +5,7 @@ from lib.py import ksft_run, ksft_exit, ksft_pr
>  from lib.py import ksft_ge, ksft_eq, ksft_in, ksft_true, ksft_raises, KsftSkipEx, KsftXfailEx
>  from lib.py import EthtoolFamily, NetdevFamily, RtnlFamily, NlError
>  from lib.py import NetDrvEnv
> +from lib.py import ip, defer
>  
>  ethnl = EthtoolFamily()
>  netfam = NetdevFamily()
> @@ -133,9 +134,31 @@ rtnl = RtnlFamily()
>      ksft_eq(cm.exception.nl_msg.extack['bad-attr'], '.ifindex')
>  
>  
> +def check_down(cfg) -> None:
> +    try:
> +        qstat = netfam.qstats_get({"ifindex": cfg.ifindex}, dump=True)
> +    except NlError as e:
> +        if e.error == 95:

Could you do this as if e.error == errno.ENOTSUP?

> +            raise KsftSkipEx("qstats not supported by the device")
> +        raise
> +
> +    ip(f"link set dev {cfg.dev['ifname']} down")
> +    defer(ip, f"link set dev {cfg.dev['ifname']} up")
> +
> +    qstat = qstat[0]

Consider moving the [0] inside the try to make the two qstats_get
statements obviously the same.

> +    qstat2 = netfam.qstats_get({"ifindex": cfg.ifindex}, dump=True)[0]
> +    for k, v in qstat.items():
> +        ksft_ge(qstat2[k], qstat[k], comment=f"{k} went backwards on device down")
> +
> +    # exercise per-queue API to make sure that "device down" state
> +    # is handled correctly and doesn't crash
> +    netfam.qstats_get({"ifindex": cfg.ifindex, "scope": "queue"}, dump=True)
> +
> +
>  def main() -> None:
>      with NetDrvEnv(__file__) as cfg:
> -        ksft_run([check_pause, check_fec, pkt_byte_sum, qstat_by_ifindex],
> +        ksft_run([check_pause, check_fec, pkt_byte_sum, qstat_by_ifindex,
> +                  check_down],
>                   args=(cfg, ))
>      ksft_exit()


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 2/2] selftests: net: ksft: support marking tests as disruptive
  2024-07-30 22:39 ` [PATCH net-next v2 2/2] selftests: net: ksft: support marking tests as disruptive Stanislav Fomichev
@ 2024-07-31 11:55   ` Petr Machata
  2024-07-31 20:47     ` Stanislav Fomichev
  0 siblings, 1 reply; 13+ messages in thread
From: Petr Machata @ 2024-07-31 11:55 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: netdev, davem, edumazet, kuba, pabeni, Shuah Khan, Joe Damato,
	Petr Machata, linux-kselftest

Stanislav Fomichev <sdf@fomichev.me> writes:

> Add new @ksft_disruptive decorator to mark the tests that might
> be disruptive to the system. Depending on how well the previous
> test works in the CI we might want to disable disruptive tests
> by default and only let the developers run them manually.
>
> KSFT framework runs disruptive tests by default. DISRUPTIVE=False
> environment (or config file) can be used to disable these tests.
> ksft_setup should be called by the test cases that want to use
> new decorator (ksft_setup is only called via NetDrvEnv/NetDrvEpEnv for now).

Is that something that tests would want to genuinely do, manage this
stuff by hand? I don't really mind having the helper globally
accessible, but default I'd keep it inside env.py and expect others to
inherit appropriately.

> @@ -127,6 +129,36 @@ KSFT_RESULT_ALL = True
>              KSFT_RESULT = False
>  
>  
> +def ksft_disruptive(func):
> +    """
> +    Decorator that marks the test as disruptive (e.g. the test
> +    that can down the interface). Disruptive tests can be skipped
> +    by passing DISRUPTIVE=False environment variable.
> +    """
> +
> +    @functools.wraps(func)
> +    def wrapper(*args, **kwargs):
> +        if not KSFT_DISRUPTIVE:
> +            raise KsftSkipEx(f"marked as disruptive")

Since this is a skip, it will fail the overall run. But that happened
because the user themselves set DISRUPTIVE=0 to avoid, um, disruption to
the system. I think it should either be xfail, or something else
dedicated that conveys the idea that we didn't run the test, but that's
fine.

Using xfail for this somehow doesn't seem correct, nothing failed. Maybe
we need KsftOmitEx, which would basically be an xfail with a more
appropriate name?

> +def ksft_setup(env):
> +    """
> +    Setup test framework global state from the environment.
> +    """
> +
> +    def get_bool(env, name):
> +        return env.get(name, "").lower() in ["true", "1"]

"yes" should alse be considered, for compatibility with the bash
selftests.

It's also odd that 0 is false, 1 is true, but 2 is false again. How
about something like this?

    def get_bool(env, name):
        value = env.get(name, "").lower()
        if value in ["yes", "true"]:
            return True
        if value in ["no", "false"]:
            return False

        try:
            return bool(int(value))
        except:
            raise something something invalid value

So that people at least know if they set it to nonsense that it's
nonsense?

Dunno. The bash selftests just take "yes" and don't care about being
very user friendly in that regard at all. _load_env_file() likewise
looks like it just takes strings and doesn't care about the semantics.
So I don't feel too strongly about this at all. Besides the "yes" bit,
that should be recognized.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 2/2] selftests: net: ksft: support marking tests as disruptive
  2024-07-31 11:55   ` Petr Machata
@ 2024-07-31 20:47     ` Stanislav Fomichev
  2024-08-01  8:36       ` Petr Machata
  0 siblings, 1 reply; 13+ messages in thread
From: Stanislav Fomichev @ 2024-07-31 20:47 UTC (permalink / raw)
  To: Petr Machata
  Cc: netdev, davem, edumazet, kuba, pabeni, Shuah Khan, Joe Damato,
	linux-kselftest

On 07/31, Petr Machata wrote:
> 
> Stanislav Fomichev <sdf@fomichev.me> writes:
> 
> > Add new @ksft_disruptive decorator to mark the tests that might
> > be disruptive to the system. Depending on how well the previous
> > test works in the CI we might want to disable disruptive tests
> > by default and only let the developers run them manually.
> >
> > KSFT framework runs disruptive tests by default. DISRUPTIVE=False
> > environment (or config file) can be used to disable these tests.
> > ksft_setup should be called by the test cases that want to use
> > new decorator (ksft_setup is only called via NetDrvEnv/NetDrvEpEnv for now).
> 
> Is that something that tests would want to genuinely do, manage this
> stuff by hand? I don't really mind having the helper globally
> accessible, but default I'd keep it inside env.py and expect others to
> inherit appropriately.

Hard to say how well it's gonna work tbh. But at least from
what I've seen, large code bases (outside of kernel) usually
have some way to attach metadata to the testcase to indicate
various things. For example, this is how the timeout
can be controlled:

https://bazel.build/reference/test-encyclopedia#role-test-runner

So I'd imagine we can eventually have @kstf_short/@ksft_long to
control that using similar techniques.

Regarding keeping it inside env.py: can you expand more on what
you mean by having the default in env.py?

> > @@ -127,6 +129,36 @@ KSFT_RESULT_ALL = True
> >              KSFT_RESULT = False
> >  
> >  
> > +def ksft_disruptive(func):
> > +    """
> > +    Decorator that marks the test as disruptive (e.g. the test
> > +    that can down the interface). Disruptive tests can be skipped
> > +    by passing DISRUPTIVE=False environment variable.
> > +    """
> > +
> > +    @functools.wraps(func)
> > +    def wrapper(*args, **kwargs):
> > +        if not KSFT_DISRUPTIVE:
> > +            raise KsftSkipEx(f"marked as disruptive")
> 
> Since this is a skip, it will fail the overall run. But that happened
> because the user themselves set DISRUPTIVE=0 to avoid, um, disruption to
> the system. I think it should either be xfail, or something else
> dedicated that conveys the idea that we didn't run the test, but that's
> fine.
> 
> Using xfail for this somehow doesn't seem correct, nothing failed. Maybe
> we need KsftOmitEx, which would basically be an xfail with a more
> appropriate name?

Are you sure skip will fail the overall run? At least looking at
tools/testing/selftests/net/lib/py/ksft.py, both skip and xfail are
considered KSFT_RESULT=True. Or am I looking at the wrong place?

> > +def ksft_setup(env):
> > +    """
> > +    Setup test framework global state from the environment.
> > +    """
> > +
> > +    def get_bool(env, name):
> > +        return env.get(name, "").lower() in ["true", "1"]
> 
> "yes" should alse be considered, for compatibility with the bash
> selftests.
> 
> It's also odd that 0 is false, 1 is true, but 2 is false again. How
> about something like this?
> 
>     def get_bool(env, name):
>         value = env.get(name, "").lower()
>         if value in ["yes", "true"]:
>             return True
>         if value in ["no", "false"]:
>             return False
> 
>         try:
>             return bool(int(value))
>         except:
>             raise something something invalid value
> 
> So that people at least know if they set it to nonsense that it's
> nonsense?
> 
> Dunno. The bash selftests just take "yes" and don't care about being
> very user friendly in that regard at all. _load_env_file() likewise
> looks like it just takes strings and doesn't care about the semantics.
> So I don't feel too strongly about this at all. Besides the "yes" bit,
> that should be recognized.

Sure, will do!

(will also apply your suggestions for 1/2 so want reply separately)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 1/2] selftests: net-drv: exercise queue stats when the device is down
  2024-07-31 11:34 ` [PATCH net-next v2 1/2] selftests: net-drv: exercise queue stats when the device is down Petr Machata
@ 2024-08-01  0:32   ` Jakub Kicinski
  2024-08-01  1:23     ` Stanislav Fomichev
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2024-08-01  0:32 UTC (permalink / raw)
  To: Petr Machata
  Cc: Stanislav Fomichev, netdev, davem, edumazet, pabeni, Shuah Khan,
	Joe Damato, linux-kselftest

On Wed, 31 Jul 2024 13:34:58 +0200 Petr Machata wrote:
> > +        qstat = netfam.qstats_get({"ifindex": cfg.ifindex}, dump=True)
> > +    except NlError as e:
> > +        if e.error == 95:  
> 
> Could you do this as if e.error == errno.ENOTSUP?

just to be clear EOPNOTSUPP ..

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 1/2] selftests: net-drv: exercise queue stats when the device is down
  2024-08-01  0:32   ` Jakub Kicinski
@ 2024-08-01  1:23     ` Stanislav Fomichev
  2024-08-01  8:50       ` Petr Machata
  0 siblings, 1 reply; 13+ messages in thread
From: Stanislav Fomichev @ 2024-08-01  1:23 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Petr Machata, netdev, davem, edumazet, pabeni, Shuah Khan,
	Joe Damato, linux-kselftest

On 07/31, Jakub Kicinski wrote:
> On Wed, 31 Jul 2024 13:34:58 +0200 Petr Machata wrote:
> > > +        qstat = netfam.qstats_get({"ifindex": cfg.ifindex}, dump=True)
> > > +    except NlError as e:
> > > +        if e.error == 95:  
> > 
> > Could you do this as if e.error == errno.ENOTSUP?
> 
> just to be clear EOPNOTSUPP ..

That might be the reason it's coded explicitly as 95? :-D

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 2/2] selftests: net: ksft: support marking tests as disruptive
  2024-07-31 20:47     ` Stanislav Fomichev
@ 2024-08-01  8:36       ` Petr Machata
  2024-08-01 14:07         ` Jakub Kicinski
  0 siblings, 1 reply; 13+ messages in thread
From: Petr Machata @ 2024-08-01  8:36 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Petr Machata, netdev, davem, edumazet, kuba, pabeni, Shuah Khan,
	Joe Damato, linux-kselftest


Stanislav Fomichev <sdf@fomichev.me> writes:

> On 07/31, Petr Machata wrote:
>> 
>> Stanislav Fomichev <sdf@fomichev.me> writes:
>> 
>> > Add new @ksft_disruptive decorator to mark the tests that might
>> > be disruptive to the system. Depending on how well the previous
>> > test works in the CI we might want to disable disruptive tests
>> > by default and only let the developers run them manually.
>> >
>> > KSFT framework runs disruptive tests by default. DISRUPTIVE=False
>> > environment (or config file) can be used to disable these tests.
>> > ksft_setup should be called by the test cases that want to use
>> > new decorator (ksft_setup is only called via NetDrvEnv/NetDrvEpEnv for now).
>> 
>> Is that something that tests would want to genuinely do, manage this
>> stuff by hand? I don't really mind having the helper globally
>> accessible, but default I'd keep it inside env.py and expect others to
>> inherit appropriately.
>
> Hard to say how well it's gonna work tbh. But at least from
> what I've seen, large code bases (outside of kernel) usually
> have some way to attach metadata to the testcase to indicate
> various things. For example, this is how the timeout
> can be controlled:
>
> https://bazel.build/reference/test-encyclopedia#role-test-runner
>
> So I'd imagine we can eventually have @kstf_short/@ksft_long to
> control that using similar techniques.
>
> Regarding keeping it inside env.py: can you expand more on what
> you mean by having the default in env.py?

I'm looking into it now and I missed how this is layered. ksft.py is the
comparatively general piece of code, and env.py is something
specifically for driver testing. It makes sense for ksft_setup() to be
where it is, because not-driver tests might want to be marked disruptive
as well. It also makes sense that env.py invokes the general helper.

All is good.

>> > @@ -127,6 +129,36 @@ KSFT_RESULT_ALL = True
>> >              KSFT_RESULT = False
>> >  
>> >  
>> > +def ksft_disruptive(func):
>> > +    """
>> > +    Decorator that marks the test as disruptive (e.g. the test
>> > +    that can down the interface). Disruptive tests can be skipped
>> > +    by passing DISRUPTIVE=False environment variable.
>> > +    """
>> > +
>> > +    @functools.wraps(func)
>> > +    def wrapper(*args, **kwargs):
>> > +        if not KSFT_DISRUPTIVE:
>> > +            raise KsftSkipEx(f"marked as disruptive")
>> 
>> Since this is a skip, it will fail the overall run. But that happened
>> because the user themselves set DISRUPTIVE=0 to avoid, um, disruption to
>> the system. I think it should either be xfail, or something else
>> dedicated that conveys the idea that we didn't run the test, but that's
>> fine.
>> 
>> Using xfail for this somehow doesn't seem correct, nothing failed. Maybe
>> we need KsftOmitEx, which would basically be an xfail with a more
>> appropriate name?
>
> Are you sure skip will fail the overall run? At least looking at
> tools/testing/selftests/net/lib/py/ksft.py, both skip and xfail are
> considered KSFT_RESULT=True. Or am I looking at the wrong place?

You seem to be right about the exit code. This was discussed some time
ago, that SKIP is considered a sort of a failure. As the person running
the test you would want to go in and fix whatever configuration issue is
preventing the test from running. I'm not sure how it works in practice,
whether people look for skips in the test log explicitly or rely on exit
codes.

Maybe Jakub can chime in, since he's the one that cajoled me into
handling this whole SKIP / XFAIL business properly in bash selftests.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 1/2] selftests: net-drv: exercise queue stats when the device is down
  2024-08-01  1:23     ` Stanislav Fomichev
@ 2024-08-01  8:50       ` Petr Machata
  2024-08-01 15:34         ` Stanislav Fomichev
  0 siblings, 1 reply; 13+ messages in thread
From: Petr Machata @ 2024-08-01  8:50 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Jakub Kicinski, Petr Machata, netdev, davem, edumazet, pabeni,
	Shuah Khan, Joe Damato, linux-kselftest


Stanislav Fomichev <sdf@fomichev.me> writes:

> On 07/31, Jakub Kicinski wrote:
>> On Wed, 31 Jul 2024 13:34:58 +0200 Petr Machata wrote:
>> > > +        qstat = netfam.qstats_get({"ifindex": cfg.ifindex}, dump=True)
>> > > +    except NlError as e:
>> > > +        if e.error == 95:  
>> > 
>> > Could you do this as if e.error == errno.ENOTSUP?
>> 
>> just to be clear EOPNOTSUPP ..
>
> That might be the reason it's coded explicitly as 95? :-D

Both exist, I just didn't notice the latter.

>>> import errno
>>> errno.ENOTSUP
95
>>> errno.EOPNOTSUPP
95

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 2/2] selftests: net: ksft: support marking tests as disruptive
  2024-08-01  8:36       ` Petr Machata
@ 2024-08-01 14:07         ` Jakub Kicinski
  2024-08-01 21:31           ` Petr Machata
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2024-08-01 14:07 UTC (permalink / raw)
  To: Petr Machata
  Cc: Stanislav Fomichev, netdev, davem, edumazet, pabeni, Shuah Khan,
	Joe Damato, linux-kselftest

On Thu, 1 Aug 2024 10:36:18 +0200 Petr Machata wrote:
> You seem to be right about the exit code. This was discussed some time
> ago, that SKIP is considered a sort of a failure. As the person running
> the test you would want to go in and fix whatever configuration issue is
> preventing the test from running. I'm not sure how it works in practice,
> whether people look for skips in the test log explicitly or rely on exit
> codes.
> 
> Maybe Jakub can chime in, since he's the one that cajoled me into
> handling this whole SKIP / XFAIL business properly in bash selftests.

For HW testing there is a lot more variables than just "is there some
tool missing in the VM image". Not sure how well we can do in detecting
HW capabilities and XFAILing without making the tests super long.
And this case itself is not very clear cut. On one hand, you expect 
the test not to run if it's disruptive and executor can't deal with
disruptive - IOW it's an eXpected FAIL. But it is an executor
limitation, the device/driver could have been tested if it wasn't
for the executor, so not entirely dissimilar to a tool missing.

Either way - no strong opinion as of yet, we need someone to actually
continuously run these to get experience :(

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 1/2] selftests: net-drv: exercise queue stats when the device is down
  2024-08-01  8:50       ` Petr Machata
@ 2024-08-01 15:34         ` Stanislav Fomichev
  2024-08-01 21:38           ` Petr Machata
  0 siblings, 1 reply; 13+ messages in thread
From: Stanislav Fomichev @ 2024-08-01 15:34 UTC (permalink / raw)
  To: Petr Machata
  Cc: Jakub Kicinski, netdev, davem, edumazet, pabeni, Shuah Khan,
	Joe Damato, linux-kselftest

On 08/01, Petr Machata wrote:
> 
> Stanislav Fomichev <sdf@fomichev.me> writes:
> 
> > On 07/31, Jakub Kicinski wrote:
> >> On Wed, 31 Jul 2024 13:34:58 +0200 Petr Machata wrote:
> >> > > +        qstat = netfam.qstats_get({"ifindex": cfg.ifindex}, dump=True)
> >> > > +    except NlError as e:
> >> > > +        if e.error == 95:  
> >> > 
> >> > Could you do this as if e.error == errno.ENOTSUP?
> >> 
> >> just to be clear EOPNOTSUPP ..
> >
> > That might be the reason it's coded explicitly as 95? :-D
> 
> Both exist, I just didn't notice the latter.
> 
> >>> import errno
> >>> errno.ENOTSUP
> 95
> >>> errno.EOPNOTSUPP
> 95

I believe Jakub was talking about kernel's ENOTSUPP (524) vs EOPNOTSUPP (95):

$ grep ENOTSUPP include/linux/errno.h
#define ENOTSUPP        524     /* Operation is not supported */

$ grep EOPNOTSUPP include/uapi/asm-generic/errno.h
#define EOPNOTSUPP      95      /* Operation not supported on transport endpoint */

These two are frequently confused.

OTOH, ENOTSUP looks like a userspace/libc invention:

$ grep -w ENOTSUP /usr/include/bits/errno.h
# ifndef ENOTSUP
#  define ENOTSUP               EOPNOTSUPP

I'm gonna stick to kernel's EOPNOTSUPP to make it look similar to what
we have on the kernel side.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 2/2] selftests: net: ksft: support marking tests as disruptive
  2024-08-01 14:07         ` Jakub Kicinski
@ 2024-08-01 21:31           ` Petr Machata
  0 siblings, 0 replies; 13+ messages in thread
From: Petr Machata @ 2024-08-01 21:31 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Petr Machata, Stanislav Fomichev, netdev, davem, edumazet, pabeni,
	Shuah Khan, Joe Damato, linux-kselftest


Jakub Kicinski <kuba@kernel.org> writes:

> On Thu, 1 Aug 2024 10:36:18 +0200 Petr Machata wrote:
>> You seem to be right about the exit code. This was discussed some time
>> ago, that SKIP is considered a sort of a failure. As the person running
>> the test you would want to go in and fix whatever configuration issue is
>> preventing the test from running. I'm not sure how it works in practice,
>> whether people look for skips in the test log explicitly or rely on exit
>> codes.
>> 
>> Maybe Jakub can chime in, since he's the one that cajoled me into
>> handling this whole SKIP / XFAIL business properly in bash selftests.
>
> For HW testing there is a lot more variables than just "is there some
> tool missing in the VM image". Not sure how well we can do in detecting
> HW capabilities and XFAILing without making the tests super long.
> And this case itself is not very clear cut. On one hand, you expect 
> the test not to run if it's disruptive and executor can't deal with
> disruptive - IOW it's an eXpected FAIL. But it is an executor
> limitation, the device/driver could have been tested if it wasn't
> for the executor, so not entirely dissimilar to a tool missing.
>
> Either way - no strong opinion as of yet, we need someone to actually
> continuously run these to get experience :(

After sending my response I realized we talked about this once already.
Apparently I forgot.

I think it's odd that SKIP is a fail in one framework but a pass in
another. But XFAIL is not a good name for something that was not even
run. And if we add something like "omit", nobody will know what it
means.

Ho hum.

Let's keep SKIP as passing in Python tests then...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next v2 1/2] selftests: net-drv: exercise queue stats when the device is down
  2024-08-01 15:34         ` Stanislav Fomichev
@ 2024-08-01 21:38           ` Petr Machata
  0 siblings, 0 replies; 13+ messages in thread
From: Petr Machata @ 2024-08-01 21:38 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Petr Machata, Jakub Kicinski, netdev, davem, edumazet, pabeni,
	Shuah Khan, Joe Damato, linux-kselftest


Stanislav Fomichev <sdf@fomichev.me> writes:

> On 08/01, Petr Machata wrote:
>> 
>> Stanislav Fomichev <sdf@fomichev.me> writes:
>> 
>> > On 07/31, Jakub Kicinski wrote:
>> >> On Wed, 31 Jul 2024 13:34:58 +0200 Petr Machata wrote:
>> >> > > +        qstat = netfam.qstats_get({"ifindex": cfg.ifindex}, dump=True)
>> >> > > +    except NlError as e:
>> >> > > +        if e.error == 95:  
>> >> > 
>> >> > Could you do this as if e.error == errno.ENOTSUP?
>> >> 
>> >> just to be clear EOPNOTSUPP ..
>> >
>> > That might be the reason it's coded explicitly as 95? :-D
>> 
>> Both exist, I just didn't notice the latter.
>> 
>> >>> import errno
>> >>> errno.ENOTSUP
>> 95
>> >>> errno.EOPNOTSUPP
>> 95
>
> I believe Jakub was talking about kernel's ENOTSUPP (524) vs EOPNOTSUPP (95):
>
> $ grep ENOTSUPP include/linux/errno.h
> #define ENOTSUPP        524     /* Operation is not supported */
>
> $ grep EOPNOTSUPP include/uapi/asm-generic/errno.h
> #define EOPNOTSUPP      95      /* Operation not supported on transport endpoint */
>
> These two are frequently confused.
>
> OTOH, ENOTSUP looks like a userspace/libc invention:
>
> $ grep -w ENOTSUP /usr/include/bits/errno.h
> # ifndef ENOTSUP
> #  define ENOTSUP               EOPNOTSUPP
>
> I'm gonna stick to kernel's EOPNOTSUPP to make it look similar to what
> we have on the kernel side.

Yep, sounds good.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-08-01 21:39 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-30 22:39 [PATCH net-next v2 1/2] selftests: net-drv: exercise queue stats when the device is down Stanislav Fomichev
2024-07-30 22:39 ` [PATCH net-next v2 2/2] selftests: net: ksft: support marking tests as disruptive Stanislav Fomichev
2024-07-31 11:55   ` Petr Machata
2024-07-31 20:47     ` Stanislav Fomichev
2024-08-01  8:36       ` Petr Machata
2024-08-01 14:07         ` Jakub Kicinski
2024-08-01 21:31           ` Petr Machata
2024-07-31 11:34 ` [PATCH net-next v2 1/2] selftests: net-drv: exercise queue stats when the device is down Petr Machata
2024-08-01  0:32   ` Jakub Kicinski
2024-08-01  1:23     ` Stanislav Fomichev
2024-08-01  8:50       ` Petr Machata
2024-08-01 15:34         ` Stanislav Fomichev
2024-08-01 21:38           ` Petr Machata

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).