Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* iwcm: CONNECT_REQUEST events silently dropped in cm_work_handler when listener is being destroyed
@ 2026-05-10 19:53 Jared Holzman
  0 siblings, 0 replies; only message in thread
From: Jared Holzman @ 2026-05-10 19:53 UTC (permalink / raw)
  To: linux-rdma

Hi all,

Heads-up while debugging a rare module-unload BUG on an older siw — 
flagging this for the record in case it bites a future iWARP provider.

In drivers/infiniband/core/iwcm.c::cm_work_handler(), when 
IWCM_F_DROP_EVENTS is set on a listener cm_id (set unconditionally by 
destroy_cm_id()), any in-flight CONNECT_REQUEST work item is silently 
dropped:

         if (!test_bit(IWCM_F_DROP_EVENTS, &cm_id_priv->flags)) {
                 ret = process_event(cm_id_priv, &levent);
                 ...
         } else
                 pr_debug("dropping event %d\n", levent.event);


For non-CONNECT_REQUEST events that's fine — the consumer is also 
tearing down. For CONNECT_REQUEST, however, the event carries 
provider_data = <provider's per-connection state> (e.g. siw's CEP that 
took the wait-for-CM ref via siw_cep_get() in siw_proc_mpareq()), and 
there is no other release path: cm_conn_req_handler() is the only place 
that turns a queued CONNECT_REQUEST into the call to iw_cm_reject() / 
iw_destroy_cm_id() that gets back to the provider's iw_reject op.

Race: peer opens TCP, sends MPA REQ, immediately closes; the CEP defers 
cleanup of the wait-for-CM ref because it's still expecting the consumer 
to accept/reject; consumer module unload runs while that CONNECT_REQUEST 
work is still queued on iwcm_wq; destroy_cm_id() sets IWCM_F_DROP_EVENTS 
before iw_destroy_listen() and the subsequent flush_workqueue() runs the 
work item through the silent drop branch. Provider state leaks; from the 
provider's side there is no signal it could have hooked into.

A second instance of the same shape exists in cm_conn_req_handler() when 
iw_create_cm_id() fails — goto out with an "ignore the request" comment, 
again no provider notification.

I noticed upstream siw no longer carries the num_cep counter and 
WARN_MEMBER_ATOMIC_NZ() that originally surfaced this for me, so the 
leak is invisible there today. We hit it on an older siw that still 
tracks num_cep and WARN_ONs in siw_device_deregister() if it isn't zero, 
which is what made it observable. The underlying iwcm leak still exists 
regardless; upstream siw just no longer notices.

Probably the smallest correct fix is to let cm_conn_req_handler() run 
even when IWCM_F_DROP_EVENTS is set — it already has a clean code path 
for listen_id_priv->state != IW_CM_STATE_LISTEN that calls 
iw_cm_reject() + iw_destroy_cm_id() and never invokes the consumer's 
cm_handler, so the only behavior change is that the provider gets the 
iw_reject callback it was waiting for.

Happy to send a patch if there's interest, otherwise we will just accept 
the leak and mask the WARN ourselves.

Thanks,

Jared Holzman

jholzman@nvidia.com


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2026-05-10 19:54 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-10 19:53 iwcm: CONNECT_REQUEST events silently dropped in cm_work_handler when listener is being destroyed Jared Holzman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox