* iwcm: CONNECT_REQUEST events silently dropped in cm_work_handler when listener is being destroyed
@ 2026-05-10 19:53 Jared Holzman
0 siblings, 0 replies; only message in thread
From: Jared Holzman @ 2026-05-10 19:53 UTC (permalink / raw)
To: linux-rdma
Hi all,
Heads-up while debugging a rare module-unload BUG on an older siw —
flagging this for the record in case it bites a future iWARP provider.
In drivers/infiniband/core/iwcm.c::cm_work_handler(), when
IWCM_F_DROP_EVENTS is set on a listener cm_id (set unconditionally by
destroy_cm_id()), any in-flight CONNECT_REQUEST work item is silently
dropped:
if (!test_bit(IWCM_F_DROP_EVENTS, &cm_id_priv->flags)) {
ret = process_event(cm_id_priv, &levent);
...
} else
pr_debug("dropping event %d\n", levent.event);
For non-CONNECT_REQUEST events that's fine — the consumer is also
tearing down. For CONNECT_REQUEST, however, the event carries
provider_data = <provider's per-connection state> (e.g. siw's CEP that
took the wait-for-CM ref via siw_cep_get() in siw_proc_mpareq()), and
there is no other release path: cm_conn_req_handler() is the only place
that turns a queued CONNECT_REQUEST into the call to iw_cm_reject() /
iw_destroy_cm_id() that gets back to the provider's iw_reject op.
Race: peer opens TCP, sends MPA REQ, immediately closes; the CEP defers
cleanup of the wait-for-CM ref because it's still expecting the consumer
to accept/reject; consumer module unload runs while that CONNECT_REQUEST
work is still queued on iwcm_wq; destroy_cm_id() sets IWCM_F_DROP_EVENTS
before iw_destroy_listen() and the subsequent flush_workqueue() runs the
work item through the silent drop branch. Provider state leaks; from the
provider's side there is no signal it could have hooked into.
A second instance of the same shape exists in cm_conn_req_handler() when
iw_create_cm_id() fails — goto out with an "ignore the request" comment,
again no provider notification.
I noticed upstream siw no longer carries the num_cep counter and
WARN_MEMBER_ATOMIC_NZ() that originally surfaced this for me, so the
leak is invisible there today. We hit it on an older siw that still
tracks num_cep and WARN_ONs in siw_device_deregister() if it isn't zero,
which is what made it observable. The underlying iwcm leak still exists
regardless; upstream siw just no longer notices.
Probably the smallest correct fix is to let cm_conn_req_handler() run
even when IWCM_F_DROP_EVENTS is set — it already has a clean code path
for listen_id_priv->state != IW_CM_STATE_LISTEN that calls
iw_cm_reject() + iw_destroy_cm_id() and never invokes the consumer's
cm_handler, so the only behavior change is that the provider gets the
iw_reject callback it was waiting for.
Happy to send a patch if there's interest, otherwise we will just accept
the leak and mask the WARN ourselves.
Thanks,
Jared Holzman
jholzman@nvidia.com
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2026-05-10 19:54 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-10 19:53 iwcm: CONNECT_REQUEST events silently dropped in cm_work_handler when listener is being destroyed Jared Holzman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox