Netdev List
 help / color / mirror / Atom feed
* [PATCH net 0/2] dpll: fix two teardown races on E825-C
@ 2026-05-16 19:13 Petr Oros
  2026-05-16 19:13 ` [PATCH net 1/2] dpll: fix NULL deref in dpll_device_ops() during teardown race Petr Oros
  2026-05-16 19:13 ` [PATCH net 2/2] dpll: filter pin->dpll_refs by cookie in dpll_pin_on_pin_unregister Petr Oros
  0 siblings, 2 replies; 6+ messages in thread
From: Petr Oros @ 2026-05-16 19:13 UTC (permalink / raw)
  To: netdev
  Cc: ivecera, Petr Oros, Vadim Fedorenko, Arkadiusz Kubalewski,
	Jiri Pirko, Milena Olech, David S. Miller, Michal Michalik,
	linux-kernel

Two independent dpll core fixes for races observed on E825-C with an
external zl3073x dpll chip, where ice contributes pins to a dpll
device owned by zl3073x. Both regressions date back to commit
9431063ad323 ("dpll: core: Add DPLL framework base functions"). The
patches are independent and apply in either order; per-patch details
are in each changelog.

Tested via repeated kexec stress on the affected hardware: patch 2
cuts the WARN incidence from 1-4 reboots to none observed across
several hundred reboots; patch 1 removes the NULL deref observed
under parallel rmmod of ice and zl3073x_i2c.

Petr Oros (2):
  dpll: fix NULL deref in dpll_device_ops() during teardown race
  dpll: filter pin->dpll_refs by cookie in dpll_pin_on_pin_unregister

 drivers/dpll/dpll_core.c    | 20 +++++++++++++-------
 drivers/dpll/dpll_netlink.c |  6 ++++++
 2 files changed, 19 insertions(+), 7 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net 1/2] dpll: fix NULL deref in dpll_device_ops() during teardown race
  2026-05-16 19:13 [PATCH net 0/2] dpll: fix two teardown races on E825-C Petr Oros
@ 2026-05-16 19:13 ` Petr Oros
  2026-05-21  2:19   ` Jakub Kicinski
  2026-05-21  2:19   ` Jakub Kicinski
  2026-05-16 19:13 ` [PATCH net 2/2] dpll: filter pin->dpll_refs by cookie in dpll_pin_on_pin_unregister Petr Oros
  1 sibling, 2 replies; 6+ messages in thread
From: Petr Oros @ 2026-05-16 19:13 UTC (permalink / raw)
  To: netdev
  Cc: ivecera, Petr Oros, Vadim Fedorenko, Arkadiusz Kubalewski,
	Jiri Pirko, Milena Olech, David S. Miller, Michal Michalik,
	linux-kernel

When a dpll device driver (zl3073x) is unloaded via rmmod while a
foreign driver (ice) still holds pins on it via
dpll_pin_on_pin_register(), dpll_device_unregister() empties
dpll->registration_list but the dpll object stays alive on ice's
refcount. A subsequent PIN_DELETED
notification on ice's pin reaches dpll_cmd_pin_get_one() which calls
dpll_device_ops() on the still-indexed but half-dead dpll;
dpll_device_registration_first() returns NULL and the accessor
dereferences ops at offset 0x10 (struct dpll_device_registration::ops):

  CPU A (zl3073x rmmod)            CPU B (ice workqueue)
  ---------------------            ---------------------
  dpll_device_unregister():
    drop reg from
    dpll->registration_list
                                   dpll_pin_on_pin_unregister
                                     dpll_pin_delete_ntf
                                       dpll_cmd_pin_get_one
                                         dpll_device_ops()
                                           registration_first() = NULL
                                           NULL deref @ 0x10

  WARNING: CPU: 27 PID: 972 at drivers/dpll/dpll_core.c:1072 dpll_device_ops+0x20/0x30
  Workqueue: ice_dpll_wq ice_dpll_pin_notify_work [ice]
  RIP: 0010:dpll_device_ops+0x20/0x30
  Call Trace:
   <TASK>
   ? __warn+0x84/0x140
   ? dpll_device_ops+0x20/0x30
   ? report_bug+0x16b/0x180
   ? handle_bug+0x3c/0x70
   ? exc_invalid_op+0x14/0x70
   ? asm_exc_invalid_op+0x16/0x20
   ? dpll_device_ops+0x20/0x30
   dpll_cmd_pin_get_one+0x303/0x490
   dpll_pin_event_send+0x93/0x140
   dpll_pin_on_pin_unregister+0x45/0xe0
   ice_dpll_pin_notify_work+0x7b/0x150 [ice]
   process_one_work+0x188/0x3b0
   worker_thread+0x2ef/0x410
   kthread+0x122/0x240
   ret_from_fork+0x28/0x50
   </TASK>
  ---[ end trace 0000000000000000 ]---
  BUG: kernel NULL pointer dereference, address: 0000000000000010
  RIP: 0010:dpll_device_ops+0x24/0x30
  Call Trace:
   dpll_cmd_pin_get_one+0x303/0x490
   dpll_pin_event_send+0x93/0x140
   dpll_pin_on_pin_unregister+0x45/0xe0
   ice_dpll_pin_notify_work+0x7b/0x150 [ice]
   process_one_work+0x188/0x3b0
   worker_thread+0x2ef/0x410
   kthread+0x122/0x240
   ret_from_fork+0x28/0x50

dpll_lock serializes the two paths but cannot reorder them: ice's work
was queued before zl3073x took the lock. "Empty registration_list on a
still-indexed dpll" is a legitimate transient state caused by
peer-driver refcounting; the accessors must tolerate it.

Drop the WARN_ON in dpll_device_registration_first(), make dpll_priv()
and dpll_device_ops() return NULL on the empty-list state, and bail
out cleanly in dpll_cmd_pin_get_one() when the first dpll the pin
references is already gone.

In-tree doit/dumpit callers of dpll_priv() / dpll_device_ops() obtain
their dpll via dpll_pre_doit() -> dpll_device_get_by_id(), which only
returns DPLL_REGISTERED devices; dpll_device_unregister() clears that
mark only after observing list_empty(&dpll->registration_list), so
those callers cannot see the new NULL return. The only path that can
is dpll_cmd_pin_get_one() reached from dpll_pin_event_send(), gated
here.

Fixes: 9431063ad323 ("dpll: core: Add DPLL framework base functions")
Signed-off-by: Petr Oros <poros@redhat.com>
---
 drivers/dpll/dpll_core.c    | 12 ++++++------
 drivers/dpll/dpll_netlink.c |  6 ++++++
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/dpll/dpll_core.c b/drivers/dpll/dpll_core.c
index cbb635db43210f..4a058b46c69d4f 100644
--- a/drivers/dpll/dpll_core.c
+++ b/drivers/dpll/dpll_core.c
@@ -1060,12 +1060,8 @@ EXPORT_SYMBOL_GPL(dpll_pin_ref_sync_pair_add);
 static struct dpll_device_registration *
 dpll_device_registration_first(struct dpll_device *dpll)
 {
-	struct dpll_device_registration *reg;
-
-	reg = list_first_entry_or_null((struct list_head *)&dpll->registration_list,
-				       struct dpll_device_registration, list);
-	WARN_ON(!reg);
-	return reg;
+	return list_first_entry_or_null((struct list_head *)&dpll->registration_list,
+					struct dpll_device_registration, list);
 }
 
 void *dpll_priv(struct dpll_device *dpll)
@@ -1073,6 +1069,8 @@ void *dpll_priv(struct dpll_device *dpll)
 	struct dpll_device_registration *reg;
 
 	reg = dpll_device_registration_first(dpll);
+	if (!reg)
+		return NULL;
 	return reg->priv;
 }
 
@@ -1081,6 +1079,8 @@ const struct dpll_device_ops *dpll_device_ops(struct dpll_device *dpll)
 	struct dpll_device_registration *reg;
 
 	reg = dpll_device_registration_first(dpll);
+	if (!reg)
+		return NULL;
 	return reg->ops;
 }
 
diff --git a/drivers/dpll/dpll_netlink.c b/drivers/dpll/dpll_netlink.c
index 0ff1658c2dc1ba..8e7e61982b867c 100644
--- a/drivers/dpll/dpll_netlink.c
+++ b/drivers/dpll/dpll_netlink.c
@@ -682,6 +682,12 @@ dpll_cmd_pin_get_one(struct sk_buff *msg, struct dpll_pin *pin,
 	ref = dpll_xa_ref_dpll_first(&pin->dpll_refs);
 	ASSERT_NOT_NULL(ref);
 
+	/* The first dpll the pin references may be torn down while still
+	 * pinned by foreign-driver refs; drop the notification cleanly.
+	 */
+	if (!dpll_device_ops(ref->dpll))
+		return -ENODEV;
+
 	ret = dpll_msg_add_pin_handle(msg, pin);
 	if (ret)
 		return ret;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net 2/2] dpll: filter pin->dpll_refs by cookie in dpll_pin_on_pin_unregister
  2026-05-16 19:13 [PATCH net 0/2] dpll: fix two teardown races on E825-C Petr Oros
  2026-05-16 19:13 ` [PATCH net 1/2] dpll: fix NULL deref in dpll_device_ops() during teardown race Petr Oros
@ 2026-05-16 19:13 ` Petr Oros
  2026-05-21  2:19   ` Jakub Kicinski
  1 sibling, 1 reply; 6+ messages in thread
From: Petr Oros @ 2026-05-16 19:13 UTC (permalink / raw)
  To: netdev
  Cc: ivecera, Petr Oros, Vadim Fedorenko, Arkadiusz Kubalewski,
	Jiri Pirko, Milena Olech, David S. Miller, Michal Michalik,
	linux-kernel

dpll_pin_on_pin_register() iterates parent->dpll_refs and adds the
child pin to each dpll the parent contributes to, tagging the
registration with cookie = parent. The unregister side asymmetrically
iterates pin->dpll_refs (a union across all of pin's parents) and
calls __dpll_pin_unregister() on every entry without checking that
the entry carries a (ops, priv, parent) registration.

When the same pin is reachable through multiple parents whose
supported-dpll sets differ, unregistering one parent walks entries
owned by other parents and trips WARN_ON(!reg) in
dpll_xa_ref_pin_del():

  parent_A in {dpll_X}, parent_B in {dpll_X, dpll_Y}
  pin reachable through both
    pin->dpll_refs = {dpll_X, dpll_Y}
    dpll_X has cookies {parent_A, parent_B}
    dpll_Y has cookie  {parent_B}

  unregister(parent_A) iterates {dpll_X, dpll_Y}; the dpll_Y step
  looks up cookie = parent_A which was never there -> WARN.

  WARNING: CPU: 137 PID: 4498 at drivers/dpll/dpll_core.c:252 dpll_xa_ref_pin_del.isra.0+0x1b0/0x1c0
  Call Trace:
   <TASK>
   __dpll_pin_unregister+0xed/0x2e0
   dpll_pin_on_pin_unregister+0x91/0xe0
   ice_dpll_deinit_pins+0xb2/0x400 [ice]
   ice_dpll_deinit+0x32/0x130 [ice]
   ice_deinit_features.part.0+0xb6/0x100 [ice]
   ice_unload+0xdf/0x120 [ice]
   ice_remove+0x144/0x2f0 [ice]
   ice_shutdown+0x16/0x50 [ice]
   pci_device_shutdown+0x34/0x60
   device_shutdown+0x158/0x200
   kernel_kexec+0x48/0x160
   __do_sys_reboot+0xda/0x230
   do_syscall_64+0xec/0x680
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
   </TASK>

Filter the iteration by the (ops, priv, parent) cookie so only the
entries this parent actually contributed to get touched. Keep
pin->dpll_refs as the iteration source (rather than parent->dpll_refs
as in register) so cleanup is also correct when the parent has been
independently unregistered from some of its dpll devices in the
meantime; switching sides would silently leak those registrations.

The fix is placed at the call site rather than relaxing WARN_ON(!reg)
in dpll_xa_ref_pin_del(): other callers (dpll_pin_unregister() with
cookie = NULL, the dpll_pin_on_pin_register() error rollback with
cookie = parent) pass cookies that must be present, and the WARN
remains useful there as a double-unregister catch.

Reproduced on E825-C + zl3073x via kexec stress: WARN appears within
1-4 reboots without the fix, none across several hundred reboots
with it.

Fixes: 9431063ad323 ("dpll: core: Add DPLL framework base functions")
Signed-off-by: Petr Oros <poros@redhat.com>
---
 drivers/dpll/dpll_core.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/dpll/dpll_core.c b/drivers/dpll/dpll_core.c
index 4a058b46c69d4f..4779976682fdb9 100644
--- a/drivers/dpll/dpll_core.c
+++ b/drivers/dpll/dpll_core.c
@@ -1024,8 +1024,14 @@ void dpll_pin_on_pin_unregister(struct dpll_pin *parent, struct dpll_pin *pin,
 	mutex_lock(&dpll_lock);
 	dpll_pin_delete_ntf(pin);
 	dpll_xa_ref_pin_del(&pin->parent_refs, parent, ops, priv, pin);
-	xa_for_each(&pin->dpll_refs, i, ref)
+	/* pin->dpll_refs is the union over all of pin's parents; only
+	 * touch entries actually registered via @parent.
+	 */
+	xa_for_each(&pin->dpll_refs, i, ref) {
+		if (!dpll_pin_registration_find(ref, ops, priv, parent))
+			continue;
 		__dpll_pin_unregister(ref->dpll, pin, ops, priv, parent);
+	}
 	mutex_unlock(&dpll_lock);
 }
 EXPORT_SYMBOL_GPL(dpll_pin_on_pin_unregister);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net 1/2] dpll: fix NULL deref in dpll_device_ops() during teardown race
  2026-05-16 19:13 ` [PATCH net 1/2] dpll: fix NULL deref in dpll_device_ops() during teardown race Petr Oros
@ 2026-05-21  2:19   ` Jakub Kicinski
  2026-05-21  2:19   ` Jakub Kicinski
  1 sibling, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2026-05-21  2:19 UTC (permalink / raw)
  To: poros
  Cc: Jakub Kicinski, netdev, ivecera, vadim.fedorenko,
	arkadiusz.kubalewski, jiri, milena.olech, davem, michal.michalik,
	linux-kernel

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
dpll: fix NULL deref in dpll_device_ops() during teardown race

This commit modifies dpll_device_ops() and dpll_priv() to return NULL during
a teardown race when the registration list is empty. It also adds a check in
dpll_cmd_pin_get_one() to drop the notification if the first referenced dpll
is unregistered.

> In-tree doit/dumpit callers of dpll_priv() / dpll_device_ops() obtain
> their dpll via dpll_pre_doit() -> dpll_device_get_by_id(), which only
> returns DPLL_REGISTERED devices; dpll_device_unregister() clears that
> mark only after observing list_empty(&dpll->registration_list), so
> those callers cannot see the new NULL return. The only path that can
> is dpll_cmd_pin_get_one() reached from dpll_pin_event_send(), gated
> here.

Is this description accurate for pin operations?

Looking at doit and dumpit paths for pins, they appear to use
dpll_pin_pre_doit() which relies on dpll_pin_available(). That function
allows the operation if at least one dpll referencing the pin is registered.

If a pin references multiple dpll devices, and one is registered while
another is half-dead (being unregistered), couldn't a user trigger a panic by
executing a pin dump or configuration command during a driver unload?

> diff --git a/drivers/dpll/dpll_netlink.c b/drivers/dpll/dpll_netlink.c
> index ea63602637866..b73d718e8a56d 100644
> --- a/drivers/dpll/dpll_netlink.c
> +++ b/drivers/dpll/dpll_netlink.c
> @@ -709,6 +709,12 @@ dpll_cmd_pin_get_one(struct sk_buff *msg, struct dpll_pin *pin,
>  	ref = dpll_xa_ref_dpll_first(&pin->dpll_refs);
>  	ASSERT_NOT_NULL(ref);
>  
> +	/* The first dpll the pin references may be torn down while still
> +	 * pinned by foreign-driver refs; drop the notification cleanly.
> +	 */
> +	if (!dpll_device_ops(ref->dpll))
> +		return -ENODEV;
> +

Will returning -ENODEV here break the netlink pin dump globally?

Since dpll_cmd_pin_get_one() is called inside the dumpit loop of
dpll_nl_pin_get_dumpit(), returning an error aborts the entire dump process.
A single pin in this transient state seems to break the pin dump operation
for all pins.

Also, for notifications sent via dpll_pin_event_send(), this return value
silently cancels and frees the netlink message. If the pin is shared with
other active dplls (which might be later in the xarray), won't their valid
state change notifications be permanently lost? Should the code skip
unregistered dplls instead of unconditionally aborting the operation?

Furthermore, does this check fully prevent the NULL pointer dereference?

This only validates the first dpll reference returned by
dpll_xa_ref_dpll_first(). If the first dpll is registered but a subsequent
dpll is half-dead (its registration list is empty), this check passes.

Execution would then proceed to functions like dpll_msg_add_pin_dplls() or
netlink set operations like dpll_pin_phase_adj_set() which iterate over
all dpll references in pin->dpll_refs. For the half-dead dpll, dpll_priv(dpll)
now evaluates to NULL, which is passed to driver callbacks
like ops->state_on_dpll_get that might unconditionally dereference it
(for example, ice_dpll_pin_state_get() uses struct ice_pf *pf = d->pf),
causing a kernel panic.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net 2/2] dpll: filter pin->dpll_refs by cookie in dpll_pin_on_pin_unregister
  2026-05-16 19:13 ` [PATCH net 2/2] dpll: filter pin->dpll_refs by cookie in dpll_pin_on_pin_unregister Petr Oros
@ 2026-05-21  2:19   ` Jakub Kicinski
  0 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2026-05-21  2:19 UTC (permalink / raw)
  To: poros
  Cc: Jakub Kicinski, netdev, ivecera, vadim.fedorenko,
	arkadiusz.kubalewski, jiri, milena.olech, davem, michal.michalik,
	linux-kernel

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
dpll: filter pin->dpll_refs by cookie in dpll_pin_on_pin_unregister

This commit fixes a warning triggered when unregistering a pin that is
reachable through multiple parents with different supported DPLL sets. It
filters the DPLL reference iteration by the parent cookie to ensure only the
entries contributed by the specific parent are removed.

> diff --git a/drivers/dpll/dpll_core.c b/drivers/dpll/dpll_core.c
> index 4a058b46c69d4..4779976682fdb 100644
> --- a/drivers/dpll/dpll_core.c
> +++ b/drivers/dpll/dpll_core.c
> @@ -1024,8 +1024,14 @@ void dpll_pin_on_pin_unregister(struct dpll_pin *parent, struct dpll_pin *pin,
>  	mutex_lock(&dpll_lock);
>  	dpll_pin_delete_ntf(pin);

Does this send a premature deletion notification to userspace when a pin has
multiple parents?

While this patch correctly fixes the kernel-side handling of multiple-parent
pins, it appears the userspace notification lifecycle remains asymmetrical in
this scenario.

Because dpll_pin_delete_ntf(pin) is called unconditionally before removing
the parent or DPLL references, dpll_pin_available(pin) evaluates to true and
the DPLL_CMD_PIN_DELETE_NTF event is broadcasted to userspace.

If a pin is registered via both parent_A and parent_B, unregistering it from
parent_A alone will instruct userspace to tear down its state for the pin,
even though the pin remains fully registered and active via parent_B.

Could this cause desynchronization between kernel and userspace states when a
shared pin is partially unregistered?

>  	dpll_xa_ref_pin_del(&pin->parent_refs, parent, ops, priv, pin);
> -	xa_for_each(&pin->dpll_refs, i, ref)
> +	/* pin->dpll_refs is the union over all of pin's parents; only
> +	 * touch entries actually registered via @parent.
> +	 */
> +	xa_for_each(&pin->dpll_refs, i, ref) {
> +		if (!dpll_pin_registration_find(ref, ops, priv, parent))
> +			continue;
>  		__dpll_pin_unregister(ref->dpll, pin, ops, priv, parent);
> +	}
>  	mutex_unlock(&dpll_lock);
>  }
>  EXPORT_SYMBOL_GPL(dpll_pin_on_pin_unregister);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net 1/2] dpll: fix NULL deref in dpll_device_ops() during teardown race
  2026-05-16 19:13 ` [PATCH net 1/2] dpll: fix NULL deref in dpll_device_ops() during teardown race Petr Oros
  2026-05-21  2:19   ` Jakub Kicinski
@ 2026-05-21  2:19   ` Jakub Kicinski
  1 sibling, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2026-05-21  2:19 UTC (permalink / raw)
  To: poros
  Cc: Jakub Kicinski, netdev, ivecera, vadim.fedorenko,
	arkadiusz.kubalewski, jiri, milena.olech, davem, michal.michalik,
	linux-kernel

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
dpll: fix NULL deref in dpll_device_ops() during teardown race

This patch tolerates an empty registration_list on a still-indexed dpll
by dropping a WARN_ON in dpll_device_registration_first(), making
dpll_priv() and dpll_device_ops() return NULL in that state, and adding
an early return in dpll_cmd_pin_get_one() when the first dpll-ref of a
pin is already torn down.

> diff --git a/drivers/dpll/dpll_core.c b/drivers/dpll/dpll_core.c
> index cbb635db4321..4a058b46c69d 100644
> --- a/drivers/dpll/dpll_core.c
> +++ b/drivers/dpll/dpll_core.c
> @@ -1060,12 +1060,8 @@ EXPORT_SYMBOL_GPL(dpll_pin_ref_sync_pair_add);
>  static struct dpll_device_registration *
>  dpll_device_registration_first(struct dpll_device *dpll)
>  {
> -	struct dpll_device_registration *reg;
> -
> -	reg = list_first_entry_or_null((struct list_head *)&dpll->registration_list,
> -				       struct dpll_device_registration, list);
> -	WARN_ON(!reg);
> -	return reg;
> +	return list_first_entry_or_null((struct list_head *)&dpll->registration_list,
> +					struct dpll_device_registration, list);
>  }
>  
>  void *dpll_priv(struct dpll_device *dpll)
> @@ -1073,6 +1069,8 @@ void *dpll_priv(struct dpll_device *dpll)
>  	struct dpll_device_registration *reg;
>  
>  	reg = dpll_device_registration_first(dpll);
> +	if (!reg)
> +		return NULL;
>  	return reg->priv;
>  }

[High]
With dpll_priv() now silently returning NULL for any half-dead dpll,
the contract for the ~28 in-tree callers of dpll_priv() changes.  The
commit message asserts that the only path which can observe the new
NULL return is dpll_cmd_pin_get_one() reached from
dpll_pin_event_send(), and that this path is "gated here" by the new
check in dpll_cmd_pin_get_one().  Is that gate actually coextensive
with the set of dplls walked by this code path?

> diff --git a/drivers/dpll/dpll_netlink.c b/drivers/dpll/dpll_netlink.c
> index ea6360263786..b73d718e8a56 100644
> --- a/drivers/dpll/dpll_netlink.c
> +++ b/drivers/dpll/dpll_netlink.c
> @@ -709,6 +709,12 @@ dpll_cmd_pin_get_one(struct sk_buff *msg, struct dpll_pin *pin,
>  	ref = dpll_xa_ref_dpll_first(&pin->dpll_refs);
>  	ASSERT_NOT_NULL(ref);
>  
> +	/* The first dpll the pin references may be torn down while still
> +	 * pinned by foreign-driver refs; drop the notification cleanly.
> +	 */
> +	if (!dpll_device_ops(ref->dpll))
> +		return -ENODEV;
> +

[High]
The check only validates the dpll returned by
dpll_xa_ref_dpll_first(&pin->dpll_refs), i.e. the first ref.  Later in
the same function, when pin->parent_refs is empty, control falls
through to dpll_msg_add_pin_dplls(), which walks every entry of
pin->dpll_refs:

    xa_for_each(&pin->dpll_refs, index, ref) {
        ...
        ret = dpll_msg_add_pin_on_dpll_state(msg, pin, ref, extack);
        ...
        ret = dpll_msg_add_pin_operstate(msg, pin, ref, extack);
        ...
        ret = dpll_msg_add_pin_prio(msg, pin, ref, extack);
        ...
        ret = dpll_msg_add_pin_direction(msg, pin, ref, extack);
        ...
        ret = dpll_msg_add_phase_offset(msg, pin, ref, extack);
        ...
    }

Each of those helpers calls dpll_priv(ref->dpll) and passes the
result as the dpll_priv argument to a driver ops callback.  Can a
non-first ref->dpll in this loop be in the same half-dead state (entry
still indexed in dpll_device_xa, but registration_list empty), so that
dpll_priv() now hands a NULL back to a driver callback that expects a
non-NULL private pointer (e.g. ice_dpll_pin_state_get does
struct ice_dpll *d = dpll_priv; struct ice_pf *pf = d->pf;)?

The teardown path the commit message describes
(zl3073x_dev_dpll_fini iterating list_for_each_entry over zldev->dplls
and calling zl3073x_dpll_unregister on each) drops dpll_lock between
per-dpll unregisters.  If a pin from a foreign driver references two
dplls of the unloading provider, can a notify work observe one
already torn down and the other still healthy, with the new gate
inspecting only the first?

Should the same NULL-tolerant treatment also be applied to the
xa_for_each loop in dpll_msg_add_pin_dplls(), or should the gate skip
half-dead refs there as well, so the commit message claim "those
callers cannot see the new NULL return ... only path that can is ...
gated here" actually holds?

[Medium]
Separately, the new -ENODEV return interacts with dpll_nl_pin_get_dumpit():

    ret = dpll_cmd_pin_get_one(skb, pin, cb->extack);
    if (ret) {
        genlmsg_cancel(skb, hdr);
        break;
    }

dpll_pin_available(pin) returns true as long as any of the pin's
parent_refs or dpll_refs is still DPLL_REGISTERED, so a pin can be
"available" while its first dpll_ref is the torn-down one.  In that
case the new -ENODEV breaks out of the dump loop and propagates to
userspace, hiding every later pin from a single dump call.  Should
this dumpit caller treat -ENODEV from dpll_cmd_pin_get_one() as a
"continue" rather than a fatal error, so the dump skips the transient
pin instead of failing wholesale?

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-21  2:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-16 19:13 [PATCH net 0/2] dpll: fix two teardown races on E825-C Petr Oros
2026-05-16 19:13 ` [PATCH net 1/2] dpll: fix NULL deref in dpll_device_ops() during teardown race Petr Oros
2026-05-21  2:19   ` Jakub Kicinski
2026-05-21  2:19   ` Jakub Kicinski
2026-05-16 19:13 ` [PATCH net 2/2] dpll: filter pin->dpll_refs by cookie in dpll_pin_on_pin_unregister Petr Oros
2026-05-21  2:19   ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox