[RFC net-next 0/2] selftests: drv-net: add ability to schedule cleanup with defer()

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC net-next 0/2] selftests: drv-net: add ability to schedule cleanup with defer()
@ 2024-06-26  1:36 Jakub Kicinski
  2024-06-26  1:36 ` [RFC net-next 1/2] " Jakub Kicinski
  2024-06-26  1:36 ` [RFC net-next 2/2] selftests: drv-net: rss_ctx: convert to defer() Jakub Kicinski
  0 siblings, 2 replies; 14+ messages in thread
From: Jakub Kicinski @ 2024-06-26  1:36 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, willemdebruijn.kernel,
	przemyslaw.kitszel, leitao, petrm, Jakub Kicinski

This is on top of the rss_ctx patches.

I don't want to delay those from getting merged, and this may be a longer
discussion...

Jakub Kicinski (2):
  selftests: drv-net: add ability to schedule cleanup with defer()
  selftests: drv-net: rss_ctx: convert to defer()

 .../selftests/drivers/net/hw/rss_ctx.py       | 225 ++++++++----------
 tools/testing/selftests/net/lib/py/ksft.py    |  49 ++--
 tools/testing/selftests/net/lib/py/utils.py   |  41 ++++
 3 files changed, 173 insertions(+), 142 deletions(-)

-- 
2.45.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC net-next 1/2] selftests: drv-net: add ability to schedule cleanup with defer()
  2024-06-26  1:36 [RFC net-next 0/2] selftests: drv-net: add ability to schedule cleanup with defer() Jakub Kicinski
@ 2024-06-26  1:36 ` Jakub Kicinski
  2024-06-26  7:43   ` Przemek Kitszel
  2024-06-26 10:18   ` Petr Machata
  2024-06-26  1:36 ` [RFC net-next 2/2] selftests: drv-net: rss_ctx: convert to defer() Jakub Kicinski
  1 sibling, 2 replies; 14+ messages in thread
From: Jakub Kicinski @ 2024-06-26  1:36 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, willemdebruijn.kernel,
	przemyslaw.kitszel, leitao, petrm, Jakub Kicinski

This implements what I was describing in [1]. When writing a test
author can schedule cleanup / undo actions right after the creation
completes, eg:

  cmd("touch /tmp/file")
  defer(cmd, "rm /tmp/file")

defer() takes the function name as first argument, and the rest are
arguments for that function. defer()red functions are called in
inverse order after test exits. It's also possible to capture them
and execute earlier (in which case they get automatically de-queued).

  undo = defer(cmd, "rm /tmp/file")
  # ... some unsafe code ...
  undo.exec()

As a nice safety all exceptions from defer()ed calls are captured,
printed, and ignored (they do make the test fail, however).
This addresses the common problem of exceptions in cleanup paths
often being unhandled, leading to potential leaks.

There is a global action queue, flushed by ksft_run(). We could support
function level defers too, I guess, but there's no immediate need..

Link: https://lore.kernel.org/all/877cedb2ki.fsf@nvidia.com/ # [1]
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 tools/testing/selftests/net/lib/py/ksft.py  | 49 +++++++++++++++------
 tools/testing/selftests/net/lib/py/utils.py | 41 +++++++++++++++++
 2 files changed, 76 insertions(+), 14 deletions(-)

diff --git a/tools/testing/selftests/net/lib/py/ksft.py b/tools/testing/selftests/net/lib/py/ksft.py
index 45ffe277d94a..4a72b9cbb27d 100644
--- a/tools/testing/selftests/net/lib/py/ksft.py
+++ b/tools/testing/selftests/net/lib/py/ksft.py
@@ -6,6 +6,7 @@ import sys
 import time
 import traceback
 from .consts import KSFT_MAIN_NAME
+from .utils import global_defer_queue
 
 KSFT_RESULT = None
 KSFT_RESULT_ALL = True
@@ -108,6 +109,24 @@ KSFT_RESULT_ALL = True
     print(res)
 
 
+def ksft_flush_defer():
+    global KSFT_RESULT
+
+    while global_defer_queue:
+        entry = global_defer_queue[-1]
+        try:
+            entry.exec()
+        except Exception:
+            if global_defer_queue and global_defer_queue[-1] == entry:
+                global_defer_queue.pop()
+
+            ksft_pr("Exception while handling defer / cleanup!")
+            tb = traceback.format_exc()
+            for line in tb.strip().split('\n'):
+                ksft_pr("Defer Exception|", line)
+            KSFT_RESULT = False
+
+
 def ksft_run(cases=None, globs=None, case_pfx=None, args=()):
     cases = cases or []
 
@@ -130,29 +149,31 @@ KSFT_RESULT_ALL = True
     for case in cases:
         KSFT_RESULT = True
         cnt += 1
+        comment = ""
+        cnt_key = ""
+
         try:
             case(*args)
         except KsftSkipEx as e:
-            ktap_result(True, cnt, case, comment="SKIP " + str(e))
-            totals['skip'] += 1
-            continue
+            comment = "SKIP " + str(e)
+            cnt_key = 'skip'
         except KsftXfailEx as e:
-            ktap_result(True, cnt, case, comment="XFAIL " + str(e))
-            totals['xfail'] += 1
-            continue
+            comment = "XFAIL " + str(e)
+            cnt_key = 'xfail'
         except Exception as e:
             tb = traceback.format_exc()
             for line in tb.strip().split('\n'):
                 ksft_pr("Exception|", line)
-            ktap_result(False, cnt, case)
-            totals['fail'] += 1
-            continue
+            KSFT_RESULT = False
+            cnt_key = 'fail'
 
-        ktap_result(KSFT_RESULT, cnt, case)
-        if KSFT_RESULT:
-            totals['pass'] += 1
-        else:
-            totals['fail'] += 1
+        ksft_flush_defer()
+
+        if not cnt_key:
+            cnt_key = 'pass' if KSFT_RESULT else 'fail'
+
+        ktap_result(KSFT_RESULT, cnt, case, comment=comment)
+        totals[cnt_key] += 1
 
     print(
         f"# Totals: pass:{totals['pass']} fail:{totals['fail']} xfail:{totals['xfail']} xpass:0 skip:{totals['skip']} error:0"
diff --git a/tools/testing/selftests/net/lib/py/utils.py b/tools/testing/selftests/net/lib/py/utils.py
index 405aa510aaf2..1ef6ebaa369e 100644
--- a/tools/testing/selftests/net/lib/py/utils.py
+++ b/tools/testing/selftests/net/lib/py/utils.py
@@ -66,6 +66,47 @@ import time
         return self.process(terminate=self.terminate, fail=self.check_fail)
 
 
+global_defer_queue = []
+
+
+class defer:
+    def __init__(self, func, *args, **kwargs):
+        global global_defer_queue
+        if global_defer_queue is None:
+            raise Exception("defer environment has not been set up")
+
+        if not callable(func):
+            raise Exception("defer created with un-callable object, did you call the function instead of passing its name?")
+
+        self.func = func
+        self.args = args
+        self.kwargs = kwargs
+
+        self.queued = True
+        self.executed = False
+
+        self._queue =  global_defer_queue
+        self._queue.append(self)
+
+    def __enter__(self):
+        return self
+
+    def __exit__(self, ex_type, ex_value, ex_tb):
+        return self.exec()
+
+    def _exec(self):
+        self.func(*self.args, **self.kwargs)
+
+    def cancel(self):
+        self._queue.remove(self)
+        self.queued = False
+
+    def exec(self):
+        self._exec()
+        self.cancel()
+        self.executed = True
+
+
 def tool(name, args, json=None, ns=None, host=None):
     cmd_str = name + ' '
     if json:
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC net-next 2/2] selftests: drv-net: rss_ctx: convert to defer()
  2024-06-26  1:36 [RFC net-next 0/2] selftests: drv-net: add ability to schedule cleanup with defer() Jakub Kicinski
  2024-06-26  1:36 ` [RFC net-next 1/2] " Jakub Kicinski
@ 2024-06-26  1:36 ` Jakub Kicinski
  1 sibling, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2024-06-26  1:36 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, willemdebruijn.kernel,
	przemyslaw.kitszel, leitao, petrm, Jakub Kicinski

Use just added defer().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 .../selftests/drivers/net/hw/rss_ctx.py       | 225 ++++++++----------
 1 file changed, 97 insertions(+), 128 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/hw/rss_ctx.py b/tools/testing/selftests/drivers/net/hw/rss_ctx.py
index 68c7d40214eb..6f55a3b49698 100755
--- a/tools/testing/selftests/drivers/net/hw/rss_ctx.py
+++ b/tools/testing/selftests/drivers/net/hw/rss_ctx.py
@@ -8,7 +8,7 @@ from lib.py import NetDrvEpEnv
 from lib.py import NetdevFamily
 from lib.py import KsftSkipEx
 from lib.py import rand_port
-from lib.py import ethtool, ip, GenerateTraffic, CmdExitFailure
+from lib.py import ethtool, ip, defer, GenerateTraffic, CmdExitFailure
 
 
 def _rss_key_str(key):
@@ -127,64 +127,56 @@ from lib.py import ethtool, ip, GenerateTraffic, CmdExitFailure
 
     # Try to allocate more queues when necessary
     qcnt = len(_get_rx_cnts(cfg))
-    if qcnt >= 2 + 2 * ctx_cnt:
-        qcnt = None
-    else:
+    if qcnt < 2 + 2 * ctx_cnt:
         try:
             ksft_pr(f"Increasing queue count {qcnt} -> {2 + 2 * ctx_cnt}")
             ethtool(f"-L {cfg.ifname} combined {2 + 2 * ctx_cnt}")
+            defer(ethtool, f"-L {cfg.ifname} combined {qcnt}")
         except:
             raise KsftSkipEx("Not enough queues for the test")
 
-    ntuple = []
-    ctx_id = []
     ports = []
-    try:
-        # Use queues 0 and 1 for normal traffic
-        ethtool(f"-X {cfg.ifname} equal 2")
 
-        for i in range(ctx_cnt):
-            want_cfg = f"start {2 + i * 2} equal 2"
-            create_cfg = want_cfg if create_with_cfg else ""
+    # Use queues 0 and 1 for normal traffic
+    ethtool(f"-X {cfg.ifname} equal 2")
+    defer(ethtool, f"-X {cfg.ifname} default")
 
-            try:
-                ctx_id.append(ethtool_create(cfg, "-X", f"context new {create_cfg}"))
-            except CmdExitFailure:
-                # try to carry on and skip at the end
-                if i == 0:
-                    raise
-                ksft_pr(f"Failed to create context {i + 1}, trying to test what we got")
-                ctx_cnt = i
-                break
+    for i in range(ctx_cnt):
+        want_cfg = f"start {2 + i * 2} equal 2"
+        create_cfg = want_cfg if create_with_cfg else ""
 
-            if not create_with_cfg:
-                ethtool(f"-X {cfg.ifname} context {ctx_id[i]} {want_cfg}")
+        try:
+            ctx_id = ethtool_create(cfg, "-X", f"context new {create_cfg}")
+            defer(ethtool, f"-X {cfg.ifname} context {ctx_id} delete")
+        except CmdExitFailure:
+            # try to carry on and skip at the end
+            if i == 0:
+                raise
+            ksft_pr(f"Failed to create context {i + 1}, trying to test what we got")
+            ctx_cnt = i
+            break
 
-            # Sanity check the context we just created
-            data = get_rss(cfg, ctx_id[i])
-            ksft_eq(min(data['rss-indirection-table']), 2 + i * 2, "Unexpected context cfg: " + str(data))
-            ksft_eq(max(data['rss-indirection-table']), 2 + i * 2 + 1, "Unexpected context cfg: " + str(data))
+        if not create_with_cfg:
+            ethtool(f"-X {cfg.ifname} context {ctx_id} {want_cfg}")
 
-            ports.append(rand_port())
-            flow = f"flow-type tcp{cfg.addr_ipver} dst-port {ports[i]} context {ctx_id[i]}"
-            ntuple.append(ethtool_create(cfg, "-N", flow))
+        # Sanity check the context we just created
+        data = get_rss(cfg, ctx_id)
+        ksft_eq(min(data['rss-indirection-table']), 2 + i * 2, "Unexpected context cfg: " + str(data))
+        ksft_eq(max(data['rss-indirection-table']), 2 + i * 2 + 1, "Unexpected context cfg: " + str(data))
 
-        for i in range(ctx_cnt):
-            cnts = _get_rx_cnts(cfg)
-            GenerateTraffic(cfg, port=ports[i]).wait_pkts_and_stop(20000)
-            cnts = _get_rx_cnts(cfg, prev=cnts)
+        ports.append(rand_port())
+        flow = f"flow-type tcp{cfg.addr_ipver} dst-port {ports[i]} context {ctx_id}"
+        ntuple = ethtool_create(cfg, "-N", flow)
+        defer(ethtool, f"-N {cfg.ifname} delete {ntuple}")
 
-            ksft_lt(sum(cnts[ :2]), 10000, "traffic on main context:" + str(cnts))
-            ksft_ge(sum(cnts[2+i*2:4+i*2]), 20000, f"traffic on context {i}: " + str(cnts))
-            ksft_eq(sum(cnts[2:2+i*2] + cnts[4+i*2:]), 0, "traffic on other contexts: " + str(cnts))
-    finally:
-        for nid in ntuple:
-            ethtool(f"-N {cfg.ifname} delete {nid}")
-        for cid in ctx_id:
-            ethtool(f"-X {cfg.ifname} context {cid} delete")
-        ethtool(f"-X {cfg.ifname} default")
-        if qcnt:
-            ethtool(f"-L {cfg.ifname} combined {qcnt}")
+    for i in range(ctx_cnt):
+        cnts = _get_rx_cnts(cfg)
+        GenerateTraffic(cfg, port=ports[i]).wait_pkts_and_stop(20000)
+        cnts = _get_rx_cnts(cfg, prev=cnts)
+
+        ksft_lt(sum(cnts[ :2]), 10000, "traffic on main context:" + str(cnts))
+        ksft_ge(sum(cnts[2+i*2:4+i*2]), 20000, f"traffic on context {i}: " + str(cnts))
+        ksft_eq(sum(cnts[2:2+i*2] + cnts[4+i*2:]), 0, "traffic on other contexts: " + str(cnts))
 
     if requested_ctx_cnt != ctx_cnt:
         raise KsftSkipEx(f"Tested only {ctx_cnt} contexts, wanted {requested_ctx_cnt}")
@@ -216,24 +208,21 @@ from lib.py import ethtool, ip, GenerateTraffic, CmdExitFailure
 
     # Try to allocate more queues when necessary
     qcnt = len(_get_rx_cnts(cfg))
-    if qcnt >= 2 + 2 * ctx_cnt:
-        qcnt = None
-    else:
+    if qcnt < 2 + 2 * ctx_cnt:
         try:
             ksft_pr(f"Increasing queue count {qcnt} -> {2 + 2 * ctx_cnt}")
             ethtool(f"-L {cfg.ifname} combined {2 + 2 * ctx_cnt}")
+            defer(ethtool, f"-L {cfg.ifname} combined {qcnt}")
         except:
             raise KsftSkipEx("Not enough queues for the test")
 
     ntuple = []
-    ctx_id = []
+    ctx = []
     ports = []
 
     def remove_ctx(idx):
-        ethtool(f"-N {cfg.ifname} delete {ntuple[idx]}")
-        ntuple[idx] = None
-        ethtool(f"-X {cfg.ifname} context {ctx_id[idx]} delete")
-        ctx_id[idx] = None
+        ntuple[idx].exec()
+        ctx[idx].exec()
 
     def check_traffic():
         for i in range(ctx_cnt):
@@ -241,7 +230,7 @@ from lib.py import ethtool, ip, GenerateTraffic, CmdExitFailure
             GenerateTraffic(cfg, port=ports[i]).wait_pkts_and_stop(20000)
             cnts = _get_rx_cnts(cfg, prev=cnts)
 
-            if ctx_id[i] is None:
+            if ctx[i].queued:
                 ksft_lt(sum(cnts[ :2]), 10000, "traffic on main context:" + str(cnts))
                 ksft_ge(sum(cnts[2+i*2:4+i*2]), 20000, f"traffic on context {i}: " + str(cnts))
                 ksft_eq(sum(cnts[2:2+i*2] + cnts[4+i*2:]), 0, "traffic on other contexts: " + str(cnts))
@@ -249,41 +238,32 @@ from lib.py import ethtool, ip, GenerateTraffic, CmdExitFailure
                 ksft_ge(sum(cnts[ :2]), 20000, "traffic on main context:" + str(cnts))
                 ksft_eq(sum(cnts[2: ]),     0, "traffic on other contexts: " + str(cnts))
 
-    try:
-        # Use queues 0 and 1 for normal traffic
-        ethtool(f"-X {cfg.ifname} equal 2")
+    # Use queues 0 and 1 for normal traffic
+    ethtool(f"-X {cfg.ifname} equal 2")
+    defer(ethtool, f"-X {cfg.ifname} default")
 
-        for i in range(ctx_cnt):
-            ctx_id.append(ethtool_create(cfg, "-X", f"context new start {2 + i * 2} equal 2"))
+    for i in range(ctx_cnt):
+        ctx_id = ethtool_create(cfg, "-X", f"context new start {2 + i * 2} equal 2")
+        ctx.append(defer(ethtool, f"-X {cfg.ifname} context {ctx_id} delete"))
 
-            ports.append(rand_port())
-            flow = f"flow-type tcp{cfg.addr_ipver} dst-port {ports[i]} context {ctx_id[i]}"
-            ntuple.append(ethtool_create(cfg, "-N", flow))
+        ports.append(rand_port())
+        flow = f"flow-type tcp{cfg.addr_ipver} dst-port {ports[i]} context {ctx_id}"
+        ntuple_id = ethtool_create(cfg, "-N", flow)
+        ntuple.append(defer(ethtool, f"-N {cfg.ifname} delete {ntuple_id}"))
 
-        check_traffic()
+    check_traffic()
 
-        # Remove middle context
-        remove_ctx(ctx_cnt // 2)
-        check_traffic()
+    # Remove middle context
+    remove_ctx(ctx_cnt // 2)
+    check_traffic()
 
-        # Remove first context
-        remove_ctx(0)
-        check_traffic()
+    # Remove first context
+    remove_ctx(0)
+    check_traffic()
 
-        # Remove last context
-        remove_ctx(-1)
-        check_traffic()
-
-    finally:
-        for nid in ntuple:
-            if nid is not None:
-                ethtool(f"-N {cfg.ifname} delete {nid}")
-        for cid in ctx_id:
-            if cid is not None:
-                ethtool(f"-X {cfg.ifname} context {cid} delete")
-        ethtool(f"-X {cfg.ifname} default")
-        if qcnt:
-            ethtool(f"-L {cfg.ifname} combined {qcnt}")
+    # Remove last context
+    remove_ctx(-1)
+    check_traffic()
 
     if requested_ctx_cnt != ctx_cnt:
         raise KsftSkipEx(f"Tested only {ctx_cnt} contexts, wanted {requested_ctx_cnt}")
@@ -298,69 +278,58 @@ from lib.py import ethtool, ip, GenerateTraffic, CmdExitFailure
     require_ntuple(cfg)
 
     queue_cnt = len(_get_rx_cnts(cfg))
-    if queue_cnt >= 4:
-        queue_cnt = None
-    else:
+    if queue_cnt < 4:
         try:
             ksft_pr(f"Increasing queue count {queue_cnt} -> 4")
             ethtool(f"-L {cfg.ifname} combined 4")
+            defer(ethtool, f"-L {cfg.ifname} combined {queue_cnt}")
         except:
             raise KsftSkipEx("Not enough queues for the test")
 
-    ctx_id = None
-    ntuple = None
     if other_ctx == 0:
         ethtool(f"-X {cfg.ifname} equal 4")
+        defer(ethtool, f"-X {cfg.ifname} default")
     else:
         other_ctx = ethtool_create(cfg, "-X", "context new")
         ethtool(f"-X {cfg.ifname} context {other_ctx} equal 4")
+        defer(ethtool, f"-X {cfg.ifname} context {other_ctx} delete")
 
-    try:
-        ctx_id = ethtool_create(cfg, "-X", "context new")
-        ethtool(f"-X {cfg.ifname} context {ctx_id} start 2 equal 2")
+    ctx_id = ethtool_create(cfg, "-X", "context new")
+    ethtool(f"-X {cfg.ifname} context {ctx_id} start 2 equal 2")
+    defer(ethtool, f"-X {cfg.ifname} context {ctx_id} delete")
 
-        port = rand_port()
-        if other_ctx:
-            flow = f"flow-type tcp{cfg.addr_ipver} dst-port {port} context {other_ctx}"
-            ntuple = ethtool_create(cfg, "-N", flow)
+    port = rand_port()
+    if other_ctx:
+        flow = f"flow-type tcp{cfg.addr_ipver} dst-port {port} context {other_ctx}"
+        ntuple_id = ethtool_create(cfg, "-N", flow)
+        ntuple = defer(ethtool, f"-N {cfg.ifname} delete {ntuple_id}")
 
-        # Test the main context
-        cnts = _get_rx_cnts(cfg)
-        GenerateTraffic(cfg, port=port).wait_pkts_and_stop(20000)
-        cnts = _get_rx_cnts(cfg, prev=cnts)
+    # Test the main context
+    cnts = _get_rx_cnts(cfg)
+    GenerateTraffic(cfg, port=port).wait_pkts_and_stop(20000)
+    cnts = _get_rx_cnts(cfg, prev=cnts)
 
-        ksft_ge(sum(cnts[ :4]), 20000, "traffic on main context: " + str(cnts))
-        ksft_ge(sum(cnts[ :2]),  7000, "traffic on main context (1/2): " + str(cnts))
-        ksft_ge(sum(cnts[2:4]),  7000, "traffic on main context (2/2): " + str(cnts))
-        if other_ctx == 0:
-            ksft_eq(sum(cnts[4: ]),     0, "traffic on other queues: " + str(cnts))
+    ksft_ge(sum(cnts[ :4]), 20000, "traffic on main context: " + str(cnts))
+    ksft_ge(sum(cnts[ :2]),  7000, "traffic on main context (1/2): " + str(cnts))
+    ksft_ge(sum(cnts[2:4]),  7000, "traffic on main context (2/2): " + str(cnts))
+    if other_ctx == 0:
+        ksft_eq(sum(cnts[4: ]),     0, "traffic on other queues: " + str(cnts))
 
-        # Now create a rule for context 1 and make sure traffic goes to a subset
-        if other_ctx:
-            ethtool(f"-N {cfg.ifname} delete {ntuple}")
-            ntuple = None
-        flow = f"flow-type tcp{cfg.addr_ipver} dst-port {port} context {ctx_id}"
-        ntuple = ethtool_create(cfg, "-N", flow)
+    # Now create a rule for context 1 and make sure traffic goes to a subset
+    if other_ctx:
+        ntuple.exec()
+    flow = f"flow-type tcp{cfg.addr_ipver} dst-port {port} context {ctx_id}"
+    ntuple_id = ethtool_create(cfg, "-N", flow)
+    defer(ethtool, f"-N {cfg.ifname} delete {ntuple_id}")
 
-        cnts = _get_rx_cnts(cfg)
-        GenerateTraffic(cfg, port=port).wait_pkts_and_stop(20000)
-        cnts = _get_rx_cnts(cfg, prev=cnts)
+    cnts = _get_rx_cnts(cfg)
+    GenerateTraffic(cfg, port=port).wait_pkts_and_stop(20000)
+    cnts = _get_rx_cnts(cfg, prev=cnts)
 
-        ksft_lt(sum(cnts[ :2]),  7000, "traffic on main context: " + str(cnts))
-        ksft_ge(sum(cnts[2:4]), 20000, "traffic on extra context: " + str(cnts))
-        if other_ctx == 0:
-            ksft_eq(sum(cnts[4: ]),     0, "traffic on other queues: " + str(cnts))
-    finally:
-        if ntuple is not None:
-            ethtool(f"-N {cfg.ifname} delete {ntuple}")
-        if ctx_id:
-            ethtool(f"-X {cfg.ifname} context {ctx_id} delete")
-        if other_ctx == 0:
-            ethtool(f"-X {cfg.ifname} default")
-        else:
-            ethtool(f"-X {cfg.ifname} context {other_ctx} delete")
-        if queue_cnt:
-            ethtool(f"-L {cfg.ifname} combined {queue_cnt}")
+    ksft_lt(sum(cnts[ :2]),  7000, "traffic on main context: " + str(cnts))
+    ksft_ge(sum(cnts[2:4]), 20000, "traffic on extra context: " + str(cnts))
+    if other_ctx == 0:
+        ksft_eq(sum(cnts[4: ]),     0, "traffic on other queues: " + str(cnts))
 
 
 def test_rss_context_overlap2(cfg):
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC net-next 1/2] selftests: drv-net: add ability to schedule cleanup with defer()
  2024-06-26  1:36 ` [RFC net-next 1/2] " Jakub Kicinski
@ 2024-06-26  7:43   ` Przemek Kitszel
  2024-06-26  9:19     ` Petr Machata
  2024-06-26 16:49     ` Jakub Kicinski
  2024-06-26 10:18   ` Petr Machata
  1 sibling, 2 replies; 14+ messages in thread
From: Przemek Kitszel @ 2024-06-26  7:43 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, edumazet, pabeni, willemdebruijn.kernel, leitao, petrm,
	davem

On 6/26/24 03:36, Jakub Kicinski wrote:
> This implements what I was describing in [1]. When writing a test
> author can schedule cleanup / undo actions right after the creation
> completes, eg:
> 
>    cmd("touch /tmp/file")
>    defer(cmd, "rm /tmp/file")
> 
> defer() takes the function name as first argument, and the rest are
> arguments for that function. defer()red functions are called in
> inverse order after test exits. 

I like the defer name more than undo, and that makes immediate
connection to golang's defer.

For reference golang's defer is called at function exit, the choice was
made that way (instead C++'s destructor at the end of the scope) to make
it more performant. It's a language construct, so could not be easily 
translated into python.

I see why you expect discussion to be potentially long here ;), I just
showed it to my python speaking friends and all the alternatives came
up: context manager, yield, pytest "built in" fixtures and so on.
I like your approach more through, it is much nicer at actual-test code
side.


> It's also possible to capture them
> and execute earlier (in which case they get automatically de-queued).

not sure if this is good, either test developer should be able to easily
guard call to defer, or could call it with a lambda capture so actual
execution will be no-op-ish.
OTOH, why not, both .exec() and .cancel() are nice to use

so, I'm fine both ways

> 
>    undo = defer(cmd, "rm /tmp/file")
>    # ... some unsafe code ...
>    undo.exec()
> 
> As a nice safety all exceptions from defer()ed calls are captured,
> printed, and ignored (they do make the test fail, however).
> This addresses the common problem of exceptions in cleanup paths
> often being unhandled, leading to potential leaks.

Nice! Please only make it so that cleanup-failure does not overwrite
happy-test-path-failure (IOW "ret = ret ? ret : cleanup_ret")

> 
> There is a global action queue, flushed by ksft_run(). We could support
> function level defers too, I guess, but there's no immediate need..

That would be a must have for general solution, would it require some
boilerplate code at function level?

> 
> Link: https://lore.kernel.org/all/877cedb2ki.fsf@nvidia.com/ # [1]
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>   tools/testing/selftests/net/lib/py/ksft.py  | 49 +++++++++++++++------
>   tools/testing/selftests/net/lib/py/utils.py | 41 +++++++++++++++++
>   2 files changed, 76 insertions(+), 14 deletions(-)
> 
> diff --git a/tools/testing/selftests/net/lib/py/ksft.py b/tools/testing/selftests/net/lib/py/ksft.py
> index 45ffe277d94a..4a72b9cbb27d 100644
> --- a/tools/testing/selftests/net/lib/py/ksft.py
> +++ b/tools/testing/selftests/net/lib/py/ksft.py
> @@ -6,6 +6,7 @@ import sys
>   import time
>   import traceback
>   from .consts import KSFT_MAIN_NAME
> +from .utils import global_defer_queue
>   
>   KSFT_RESULT = None
>   KSFT_RESULT_ALL = True
> @@ -108,6 +109,24 @@ KSFT_RESULT_ALL = True
>       print(res)
>   
>   
> +def ksft_flush_defer():
> +    global KSFT_RESULT
> +
> +    while global_defer_queue:
> +        entry = global_defer_queue[-1]
> +        try:
> +            entry.exec()
> +        except Exception:
> +            if global_defer_queue and global_defer_queue[-1] == entry:
> +                global_defer_queue.pop()
> +
> +            ksft_pr("Exception while handling defer / cleanup!")

please print current queue size, if only for convenience of test
developer to be able tell if they are moving forward in 
fix-rerun-observe cycle

> +            tb = traceback.format_exc()
> +            for line in tb.strip().split('\n'):
> +                ksft_pr("Defer Exception|", line)
> +            KSFT_RESULT = False

I have no idea if this could be other error than just False, if so,
don't overwrite

[...]

>   
> +global_defer_queue = []
> +
> +
> +class defer:
> +    def __init__(self, func, *args, **kwargs):
> +        global global_defer_queue
> +        if global_defer_queue is None:
> +            raise Exception("defer environment has not been set up")
> +
> +        if not callable(func):
> +            raise Exception("defer created with un-callable object, did you call the function instead of passing its name?")
> +
> +        self.func = func
> +        self.args = args
> +        self.kwargs = kwargs
> +
> +        self.queued = True
> +        self.executed = False
> +
> +        self._queue =  global_defer_queue
> +        self._queue.append(self)
> +
> +    def __enter__(self):
> +        return self
> +
> +    def __exit__(self, ex_type, ex_value, ex_tb):
> +        return self.exec()

why do you need __enter__ and __exit__ if this is not a context
manager / to-be-used-via-with?

> +
> +    def _exec(self):
> +        self.func(*self.args, **self.kwargs)
> +
> +    def cancel(self):
> +        self._queue.remove(self)
> +        self.queued = False
> +
> +    def exec(self):
> +        self._exec()
> +        self.cancel()
> +        self.executed = True
> +
> +
>   def tool(name, args, json=None, ns=None, host=None):
>       cmd_str = name + ' '
>       if json:


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC net-next 1/2] selftests: drv-net: add ability to schedule cleanup with defer()
  2024-06-26  7:43   ` Przemek Kitszel
@ 2024-06-26  9:19     ` Petr Machata
  2024-06-26  9:38       ` Przemek Kitszel
  2024-06-26 16:44       ` Jakub Kicinski
  2024-06-26 16:49     ` Jakub Kicinski
  1 sibling, 2 replies; 14+ messages in thread
From: Petr Machata @ 2024-06-26  9:19 UTC (permalink / raw)
  To: Przemek Kitszel
  Cc: Jakub Kicinski, netdev, edumazet, pabeni, willemdebruijn.kernel,
	leitao, petrm, davem


Przemek Kitszel <przemyslaw.kitszel@intel.com> writes:

> On 6/26/24 03:36, Jakub Kicinski wrote:
>
>> There is a global action queue, flushed by ksft_run(). We could support
>> function level defers too, I guess, but there's no immediate need..
>
> That would be a must have for general solution, would it require some
> boilerplate code at function level?

Presumably you'd need a per-function defer pool.

Which would be naturally modeled as a context manager, but I promised
myself I'd shut up about that.

>> +    def __enter__(self):
>> +        return self
>> +
>> +    def __exit__(self, ex_type, ex_value, ex_tb):
>> +        return self.exec()
>
> why do you need __enter__ and __exit__ if this is not a context
> manager / to-be-used-via-with?

But you could use it as a context manager.

    with defer(blah blah):
        do stuff
    # blah blah runs here

IMHO makes sense.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC net-next 1/2] selftests: drv-net: add ability to schedule cleanup with defer()
  2024-06-26  9:19     ` Petr Machata
@ 2024-06-26  9:38       ` Przemek Kitszel
  2024-06-26 16:44       ` Jakub Kicinski
  1 sibling, 0 replies; 14+ messages in thread
From: Przemek Kitszel @ 2024-06-26  9:38 UTC (permalink / raw)
  To: Petr Machata
  Cc: Jakub Kicinski, netdev, edumazet, pabeni, willemdebruijn.kernel,
	leitao, davem

On 6/26/24 11:19, Petr Machata wrote:
> 
> Przemek Kitszel <przemyslaw.kitszel@intel.com> writes:
> 
>> On 6/26/24 03:36, Jakub Kicinski wrote:
>>
>>> There is a global action queue, flushed by ksft_run(). We could support
>>> function level defers too, I guess, but there's no immediate need..
>>
>> That would be a must have for general solution, would it require some
>> boilerplate code at function level?
> 
> Presumably you'd need a per-function defer pool.
> 
> Which would be naturally modeled as a context manager, but I promised
> myself I'd shut up about that.
> 
>>> +    def __enter__(self):
>>> +        return self
>>> +
>>> +    def __exit__(self, ex_type, ex_value, ex_tb):
>>> +        return self.exec()
>>
>> why do you need __enter__ and __exit__ if this is not a context
>> manager / to-be-used-via-with?
> 
> But you could use it as a context manager.
> 
>      with defer(blah blah):
>          do stuff
>      # blah blah runs here
> 
> IMHO makes sense.

oh, nice! agree!
Then this little part is "the general solution", with the rest
(global queue) added as a sugar layer for kselftests, to only
avoid inflating indentation levels, great

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC net-next 1/2] selftests: drv-net: add ability to schedule cleanup with defer()
  2024-06-26  1:36 ` [RFC net-next 1/2] " Jakub Kicinski
  2024-06-26  7:43   ` Przemek Kitszel
@ 2024-06-26 10:18   ` Petr Machata
  2024-06-26 16:09     ` Jakub Kicinski
  1 sibling, 1 reply; 14+ messages in thread
From: Petr Machata @ 2024-06-26 10:18 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, willemdebruijn.kernel,
	przemyslaw.kitszel, leitao, petrm


Jakub Kicinski <kuba@kernel.org> writes:

> This implements what I was describing in [1]. When writing a test
> author can schedule cleanup / undo actions right after the creation
> completes, eg:
>
>   cmd("touch /tmp/file")
>   defer(cmd, "rm /tmp/file")
>
> defer() takes the function name as first argument, and the rest are
> arguments for that function. defer()red functions are called in
> inverse order after test exits. It's also possible to capture them
> and execute earlier (in which case they get automatically de-queued).
>
>   undo = defer(cmd, "rm /tmp/file")
>   # ... some unsafe code ...
>   undo.exec()
>
> As a nice safety all exceptions from defer()ed calls are captured,
> printed, and ignored (they do make the test fail, however).
> This addresses the common problem of exceptions in cleanup paths
> often being unhandled, leading to potential leaks.
>
> There is a global action queue, flushed by ksft_run(). We could support
> function level defers too, I guess, but there's no immediate need..
>
> Link: https://lore.kernel.org/all/877cedb2ki.fsf@nvidia.com/ # [1]
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
>  tools/testing/selftests/net/lib/py/ksft.py  | 49 +++++++++++++++------
>  tools/testing/selftests/net/lib/py/utils.py | 41 +++++++++++++++++
>  2 files changed, 76 insertions(+), 14 deletions(-)
>
> diff --git a/tools/testing/selftests/net/lib/py/ksft.py b/tools/testing/selftests/net/lib/py/ksft.py
> index 45ffe277d94a..4a72b9cbb27d 100644
> --- a/tools/testing/selftests/net/lib/py/ksft.py
> +++ b/tools/testing/selftests/net/lib/py/ksft.py
> @@ -6,6 +6,7 @@ import sys
>  import time
>  import traceback
>  from .consts import KSFT_MAIN_NAME
> +from .utils import global_defer_queue
>  
>  KSFT_RESULT = None
>  KSFT_RESULT_ALL = True
> @@ -108,6 +109,24 @@ KSFT_RESULT_ALL = True
>      print(res)
>  
>  
> +def ksft_flush_defer():
> +    global KSFT_RESULT
> +
> +    while global_defer_queue:
> +        entry = global_defer_queue[-1]
> +        try:
> +            entry.exec()

I wonder if you added _exec() to invoke it here. Because then you could
just do entry = global_defer_queue.pop() and entry._exec(), and in the
except branch you would just have the test-related business, without the
queue management.

> +        except Exception:

I think this should be either an unqualified except: or except
BaseException:.

> +            if global_defer_queue and global_defer_queue[-1] == entry:
> +                global_defer_queue.pop()
> +
> +            ksft_pr("Exception while handling defer / cleanup!")

Hmm, I was thinking about adding defer.__str__ and using it here to
give more clue as to where it went wrong, but the traceback is IMHO
plenty good enough.

> +            tb = traceback.format_exc()
> +            for line in tb.strip().split('\n'):
> +                ksft_pr("Defer Exception|", line)
> +            KSFT_RESULT = False
> +
> +
>  def ksft_run(cases=None, globs=None, case_pfx=None, args=()):
>      cases = cases or []
>  
> @@ -130,29 +149,31 @@ KSFT_RESULT_ALL = True
>      for case in cases:
>          KSFT_RESULT = True
>          cnt += 1
> +        comment = ""
> +        cnt_key = ""
> +
>          try:
>              case(*args)
>          except KsftSkipEx as e:
> -            ktap_result(True, cnt, case, comment="SKIP " + str(e))
> -            totals['skip'] += 1
> -            continue
> +            comment = "SKIP " + str(e)
> +            cnt_key = 'skip'
>          except KsftXfailEx as e:
> -            ktap_result(True, cnt, case, comment="XFAIL " + str(e))
> -            totals['xfail'] += 1
> -            continue
> +            comment = "XFAIL " + str(e)
> +            cnt_key = 'xfail'
>          except Exception as e:
>              tb = traceback.format_exc()
>              for line in tb.strip().split('\n'):
>                  ksft_pr("Exception|", line)
> -            ktap_result(False, cnt, case)
> -            totals['fail'] += 1
> -            continue
> +            KSFT_RESULT = False
> +            cnt_key = 'fail'
>  
> -        ktap_result(KSFT_RESULT, cnt, case)
> -        if KSFT_RESULT:
> -            totals['pass'] += 1
> -        else:
> -            totals['fail'] += 1
> +        ksft_flush_defer()
> +
> +        if not cnt_key:
> +            cnt_key = 'pass' if KSFT_RESULT else 'fail'
> +
> +        ktap_result(KSFT_RESULT, cnt, case, comment=comment)
> +        totals[cnt_key] += 1
>  
>      print(
>          f"# Totals: pass:{totals['pass']} fail:{totals['fail']} xfail:{totals['xfail']} xpass:0 skip:{totals['skip']} error:0"

Majority of this hunk is just preparatory and should be in a patch of
its own. Then in this patch it should just introduce the flush.

> diff --git a/tools/testing/selftests/net/lib/py/utils.py b/tools/testing/selftests/net/lib/py/utils.py
> index 405aa510aaf2..1ef6ebaa369e 100644
> --- a/tools/testing/selftests/net/lib/py/utils.py
> +++ b/tools/testing/selftests/net/lib/py/utils.py
> @@ -66,6 +66,47 @@ import time
>          return self.process(terminate=self.terminate, fail=self.check_fail)
>  
>  
> +global_defer_queue = []
> +
> +
> +class defer:
> +    def __init__(self, func, *args, **kwargs):
> +        global global_defer_queue
> +        if global_defer_queue is None:
> +            raise Exception("defer environment has not been set up")
> +
> +        if not callable(func):
> +            raise Exception("defer created with un-callable object, did you call the function instead of passing its name?")
> +
> +        self.func = func
> +        self.args = args
> +        self.kwargs = kwargs
> +
> +        self.queued = True
> +        self.executed = False
> +
> +        self._queue =  global_defer_queue
> +        self._queue.append(self)
> +
> +    def __enter__(self):
> +        return self
> +
> +    def __exit__(self, ex_type, ex_value, ex_tb):
> +        return self.exec()
> +
> +    def _exec(self):
> +        self.func(*self.args, **self.kwargs)
> +
> +    def cancel(self):

This shouldn't dequeue if not self.queued.

> +        self._queue.remove(self)
> +        self.queued = False
> +
> +    def exec(self):

This shouldn't exec if self.executed.

But I actually wonder if we need two flags at all. Whether the defer
entry is resolved through exec(), cancel() or __exit__(), it's "done".
It could be left in the queue, in which case the "done" flag is going to
disable future exec requests. Or it can just be dropped from the queue
when done, in which case we don't even need the "done" flag as such.

> +        self._exec()
> +        self.cancel()
> +        self.executed = True
> +
> +
>  def tool(name, args, json=None, ns=None, host=None):
>      cmd_str = name + ' '
>      if json:

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC net-next 1/2] selftests: drv-net: add ability to schedule cleanup with defer()
  2024-06-26 10:18   ` Petr Machata
@ 2024-06-26 16:09     ` Jakub Kicinski
  2024-06-27  7:37       ` Petr Machata
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2024-06-26 16:09 UTC (permalink / raw)
  To: Petr Machata
  Cc: davem, netdev, edumazet, pabeni, willemdebruijn.kernel,
	przemyslaw.kitszel, leitao

On Wed, 26 Jun 2024 12:18:58 +0200 Petr Machata wrote:
> Jakub Kicinski <kuba@kernel.org> writes:
> > +def ksft_flush_defer():
> > +    global KSFT_RESULT
> > +
> > +    while global_defer_queue:
> > +        entry = global_defer_queue[-1]
> > +        try:
> > +            entry.exec()  
> 
> I wonder if you added _exec() to invoke it here. Because then you could
> just do entry = global_defer_queue.pop() and entry._exec(), and in the
> except branch you would just have the test-related business, without the
> queue management.

Initially I had both _exec, and _dequeue as separate helpers, but then
_dequeue was identical to cancel, so I removed that one, but _exec
stayed.

As you point out _exec() would do nicely during "flush".. but linter was
angry at me for calling private functions. I couldn't quickly think of
a clean scheme of naming things. Or rather, I should say, I like that
the only non-private functions in class defer right now are
test-author-facing. At some point I considered renaming _exec() to
__call__() or run() but I was worried people will incorrectly
call it, instead of calling exec().

So I decided to stick to a bit of awkward handling in the internals for
the benefit of more obvious test-facing API. But no strong preference,
LMK if calling _exec() here is fine or I should rename it..

> > +        except Exception:  
> 
> I think this should be either an unqualified except: or except
> BaseException:.

SG

> >      print(
> >          f"# Totals: pass:{totals['pass']} fail:{totals['fail']} xfail:{totals['xfail']} xpass:0 skip:{totals['skip']} error:0"  
> 
> Majority of this hunk is just preparatory and should be in a patch of
> its own. Then in this patch it should just introduce the flush.

True, will split.

> > +    def cancel(self):  
> 
> This shouldn't dequeue if not self.queued.

I was wondering if we're better off throwing the exception from
remove() or silently ignoring (what is probably an error in the 
test code). I went with the former intentionally, but happy to
change.

> > +        self._queue.remove(self)
> > +        self.queued = False
> > +
> > +    def exec(self):  
> 
> This shouldn't exec if self.executed.
> 
> But I actually wonder if we need two flags at all. Whether the defer
> entry is resolved through exec(), cancel() or __exit__(), it's "done".
> It could be left in the queue, in which case the "done" flag is going to
> disable future exec requests. Or it can just be dropped from the queue
> when done, in which case we don't even need the "done" flag as such.

If you recall there's a rss_ctx test case which removes contexts out of
order. The flags are basically for that test. We run the .exec() to
remove a context, and then we can check 

	if thing.queued:
		.. code for context that's alive ..
	else:
		.. code for dead context ..

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC net-next 1/2] selftests: drv-net: add ability to schedule cleanup with defer()
  2024-06-26  9:19     ` Petr Machata
  2024-06-26  9:38       ` Przemek Kitszel
@ 2024-06-26 16:44       ` Jakub Kicinski
  1 sibling, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2024-06-26 16:44 UTC (permalink / raw)
  To: Petr Machata
  Cc: Przemek Kitszel, netdev, edumazet, pabeni, willemdebruijn.kernel,
	leitao, davem

On Wed, 26 Jun 2024 11:19:08 +0200 Petr Machata wrote:
> >> There is a global action queue, flushed by ksft_run(). We could support
> >> function level defers too, I guess, but there's no immediate need..  
> >
> > That would be a must have for general solution, would it require some
> > boilerplate code at function level?  
> 
> Presumably you'd need a per-function defer pool.
> 
> Which would be naturally modeled as a context manager, but I promised
> myself I'd shut up about that.

No preference on the internal mechanism :)

The part I was unsure about was whether in this case test writer will
want to mix the contexts:

	fq = FuncDeferQueue()
	fq.defer(...) # this one is called when function exits
	defer(...)    # this one is global

or not:

	defer_func_context()
	# any defer after this is assumed to be func-level
	defer(...)

so probably best to wait until we have some real examples.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC net-next 1/2] selftests: drv-net: add ability to schedule cleanup with defer()
  2024-06-26  7:43   ` Przemek Kitszel
  2024-06-26  9:19     ` Petr Machata
@ 2024-06-26 16:49     ` Jakub Kicinski
  2024-06-27  8:40       ` Przemek Kitszel
  1 sibling, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2024-06-26 16:49 UTC (permalink / raw)
  To: Przemek Kitszel
  Cc: netdev, edumazet, pabeni, willemdebruijn.kernel, leitao, petrm,
	davem

On Wed, 26 Jun 2024 09:43:54 +0200 Przemek Kitszel wrote:
> > As a nice safety all exceptions from defer()ed calls are captured,
> > printed, and ignored (they do make the test fail, however).
> > This addresses the common problem of exceptions in cleanup paths
> > often being unhandled, leading to potential leaks.  
> 
> Nice! Please only make it so that cleanup-failure does not overwrite
> happy-test-path-failure (IOW "ret = ret ? ret : cleanup_ret")

That should be what we end up doing. The ret is a boolean (pass / fail)
so we have:

	pass &= cleanup_pass

effectively.

> > +            ksft_pr("Exception while handling defer / cleanup!")  
> 
> please print current queue size, if only for convenience of test
> developer to be able tell if they are moving forward in 
> fix-rerun-observe cycle

Hm... not a bad point, defer() cycles are possible.
But then again, we don't guard against infinite loops 
in  tests either, and kselftest runner (the general one, 
outside our Python) has a timeout, so it will kill the script.

> > +            tb = traceback.format_exc()
> > +            for line in tb.strip().split('\n'):
> > +                ksft_pr("Defer Exception|", line)
> > +            KSFT_RESULT = False  
> 
> I have no idea if this could be other error than just False, if so,
> don't overwrite

Yup, just True / False. The other types (skip, xfail) are a pass
(True) plus a comment, per KTAP spec.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC net-next 1/2] selftests: drv-net: add ability to schedule cleanup with defer()
  2024-06-26 16:09     ` Jakub Kicinski
@ 2024-06-27  7:37       ` Petr Machata
  2024-06-27 15:41         ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Petr Machata @ 2024-06-27  7:37 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Petr Machata, davem, netdev, edumazet, pabeni,
	willemdebruijn.kernel, przemyslaw.kitszel, leitao


Jakub Kicinski <kuba@kernel.org> writes:

> On Wed, 26 Jun 2024 12:18:58 +0200 Petr Machata wrote:
>> Jakub Kicinski <kuba@kernel.org> writes:
>> > +def ksft_flush_defer():
>> > +    global KSFT_RESULT
>> > +
>> > +    while global_defer_queue:
>> > +        entry = global_defer_queue[-1]
>> > +        try:
>> > +            entry.exec()  
>> 
>> I wonder if you added _exec() to invoke it here. Because then you could
>> just do entry = global_defer_queue.pop() and entry._exec(), and in the
>> except branch you would just have the test-related business, without the
>> queue management.
>
> Initially I had both _exec, and _dequeue as separate helpers, but then
> _dequeue was identical to cancel, so I removed that one, but _exec
> stayed.
>
> As you point out _exec() would do nicely during "flush".. but linter was
> angry at me for calling private functions. I couldn't quickly think of
> a clean scheme of naming things. Or rather, I should say, I like that
> the only non-private functions in class defer right now are
> test-author-facing. At some point I considered renaming _exec() to
> __call__() or run() but I was worried people will incorrectly
> call it, instead of calling exec().
>
> So I decided to stick to a bit of awkward handling in the internals for
> the benefit of more obvious test-facing API. But no strong preference,
> LMK if calling _exec() here is fine or I should rename it..

Maybe call it something like exec_only()? There's a list that you just
need to go through, it looks a shame not to just .pop() everything out
one by one and instead have this management business in the error path.

>> > +        except Exception:  
>> 
>> I think this should be either an unqualified except: or except
>> BaseException:.
>
> SG
>
>
>> >      print(
>> >          f"# Totals: pass:{totals['pass']} fail:{totals['fail']} xfail:{totals['xfail']} xpass:0 skip:{totals['skip']} error:0"  
>> 
>> Majority of this hunk is just preparatory and should be in a patch of
>> its own. Then in this patch it should just introduce the flush.
>
> True, will split.
>
>> > +    def cancel(self):  
>> 
>> This shouldn't dequeue if not self.queued.
>
> I was wondering if we're better off throwing the exception from
> remove() or silently ignoring (what is probably an error in the 
> test code). I went with the former intentionally, but happy to
> change.

Hmm, right, it would throw. Therefore second exec() would as well. Good.
But that means that exec() should first cancel, then exec, otherwise
second exec invocation would actually exec the cleanup a second time
before bailing out.

>
>> > +        self._queue.remove(self)
>> > +        self.queued = False
>> > +
>> > +    def exec(self):  
>> 
>> This shouldn't exec if self.executed.
>> 
>> But I actually wonder if we need two flags at all. Whether the defer
>> entry is resolved through exec(), cancel() or __exit__(), it's "done".
>> It could be left in the queue, in which case the "done" flag is going to
>> disable future exec requests. Or it can just be dropped from the queue
>> when done, in which case we don't even need the "done" flag as such.
>
> If you recall there's a rss_ctx test case which removes contexts out of
> order. The flags are basically for that test. We run the .exec() to
> remove a context, and then we can check 
>
> 	if thing.queued:
> 		.. code for context that's alive ..
> 	else:
> 		.. code for dead context ..

That test already has its own flags to track which was removed, can't it
use those? My preference is always to keep an API as minimal as possible
and the flags, if any, would ideally be private. I don't think defer
objects should keep track of whether the user has already invoked them
or not, that's their user's business to know.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC net-next 1/2] selftests: drv-net: add ability to schedule cleanup with defer()
  2024-06-26 16:49     ` Jakub Kicinski
@ 2024-06-27  8:40       ` Przemek Kitszel
  2024-06-27 15:35         ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Przemek Kitszel @ 2024-06-27  8:40 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, edumazet, pabeni, willemdebruijn.kernel, leitao, petrm,
	davem

On 6/26/24 18:49, Jakub Kicinski wrote:
> On Wed, 26 Jun 2024 09:43:54 +0200 Przemek Kitszel wrote:
>>> As a nice safety all exceptions from defer()ed calls are captured,
>>> printed, and ignored (they do make the test fail, however).
>>> This addresses the common problem of exceptions in cleanup paths
>>> often being unhandled, leading to potential leaks.
>>
>> Nice! Please only make it so that cleanup-failure does not overwrite
>> happy-test-path-failure (IOW "ret = ret ? ret : cleanup_ret")
> 
> That should be what we end up doing. The ret is a boolean (pass / fail)
> so we have:
> 
> 	pass &= cleanup_pass
> 
> effectively.
> 
>>> +            ksft_pr("Exception while handling defer / cleanup!")
>>
>> please print current queue size, if only for convenience of test
>> developer to be able tell if they are moving forward in
>> fix-rerun-observe cycle
> 
> Hm... not a bad point, defer() cycles are possible.
> But then again, we don't guard against infinite loops
> in  tests either, and kselftest runner (the general one,
> outside our Python) has a timeout, so it will kill the script.

I mean the flow:
$EDITOR mytest.py
./mytest.py
# output: Exception while handling defer / cleanup (at 4 out of 13 cleanups)

then repeat with the hope that fix to cleanup procedure will move us
forward, say:
$EDITOR mytest.py; ./mytest.py
#output: ... (at 7 of 13 cleanups)

just name of failed cleanup method is not enough as those could be
added via loop

> 
>>> +            tb = traceback.format_exc()
>>> +            for line in tb.strip().split('\n'):
>>> +                ksft_pr("Defer Exception|", line)
>>> +            KSFT_RESULT = False
>>
>> I have no idea if this could be other error than just False, if so,
>> don't overwrite
> 
> Yup, just True / False. The other types (skip, xfail) are a pass
> (True) plus a comment, per KTAP spec.
> 


Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC net-next 1/2] selftests: drv-net: add ability to schedule cleanup with defer()
  2024-06-27  8:40       ` Przemek Kitszel
@ 2024-06-27 15:35         ` Jakub Kicinski
  0 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2024-06-27 15:35 UTC (permalink / raw)
  To: Przemek Kitszel
  Cc: netdev, edumazet, pabeni, willemdebruijn.kernel, leitao, petrm,
	davem

On Thu, 27 Jun 2024 10:40:31 +0200 Przemek Kitszel wrote:
> > Hm... not a bad point, defer() cycles are possible.
> > But then again, we don't guard against infinite loops
> > in  tests either, and kselftest runner (the general one,
> > outside our Python) has a timeout, so it will kill the script.  
> 
> I mean the flow:
> $EDITOR mytest.py
> ./mytest.py
> # output: Exception while handling defer / cleanup (at 4 out of 13 cleanups)
> 
> then repeat with the hope that fix to cleanup procedure will move us
> forward, say:
> $EDITOR mytest.py; ./mytest.py
> #output: ... (at 7 of 13 cleanups)
> 
> just name of failed cleanup method is not enough as those could be
> added via loop

Oh, yes, nice one!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC net-next 1/2] selftests: drv-net: add ability to schedule cleanup with defer()
  2024-06-27  7:37       ` Petr Machata
@ 2024-06-27 15:41         ` Jakub Kicinski
  0 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2024-06-27 15:41 UTC (permalink / raw)
  To: Petr Machata
  Cc: davem, netdev, edumazet, pabeni, willemdebruijn.kernel,
	przemyslaw.kitszel, leitao

On Thu, 27 Jun 2024 09:37:50 +0200 Petr Machata wrote:
> > I was wondering if we're better off throwing the exception from
> > remove() or silently ignoring (what is probably an error in the 
> > test code). I went with the former intentionally, but happy to
> > change.  
> 
> Hmm, right, it would throw. Therefore second exec() would as well. Good.
> But that means that exec() should first cancel, then exec, otherwise
> second exec invocation would actually exec the cleanup a second time
> before bailing out.

Good point, that sounds safer.

> >> This shouldn't exec if self.executed.
> >> 
> >> But I actually wonder if we need two flags at all. Whether the defer
> >> entry is resolved through exec(), cancel() or __exit__(), it's "done".
> >> It could be left in the queue, in which case the "done" flag is going to
> >> disable future exec requests. Or it can just be dropped from the queue
> >> when done, in which case we don't even need the "done" flag as such.  
> >
> > If you recall there's a rss_ctx test case which removes contexts out of
> > order. The flags are basically for that test. We run the .exec() to
> > remove a context, and then we can check 
> >
> > 	if thing.queued:
> > 		.. code for context that's alive ..
> > 	else:
> > 		.. code for dead context ..  
> 
> That test already has its own flags to track which was removed, can't it
> use those? My preference is always to keep an API as minimal as possible
> and the flags, if any, would ideally be private. I don't think defer
> objects should keep track of whether the user has already invoked them
> or not, that's their user's business to know.

Ack, will delete it then.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-06-27 15:41 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-26  1:36 [RFC net-next 0/2] selftests: drv-net: add ability to schedule cleanup with defer() Jakub Kicinski
2024-06-26  1:36 ` [RFC net-next 1/2] " Jakub Kicinski
2024-06-26  7:43   ` Przemek Kitszel
2024-06-26  9:19     ` Petr Machata
2024-06-26  9:38       ` Przemek Kitszel
2024-06-26 16:44       ` Jakub Kicinski
2024-06-26 16:49     ` Jakub Kicinski
2024-06-27  8:40       ` Przemek Kitszel
2024-06-27 15:35         ` Jakub Kicinski
2024-06-26 10:18   ` Petr Machata
2024-06-26 16:09     ` Jakub Kicinski
2024-06-27  7:37       ` Petr Machata
2024-06-27 15:41         ` Jakub Kicinski
2024-06-26  1:36 ` [RFC net-next 2/2] selftests: drv-net: rss_ctx: convert to defer() Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).