From: Jared Holzman <jholzman@nvidia.com>
To: linux-rdma@vger.kernel.org
Subject: iwcm: CONNECT_REQUEST events silently dropped in cm_work_handler when listener is being destroyed
Date: Sun, 10 May 2026 22:53:59 +0300 [thread overview]
Message-ID: <94d9d929-7fed-4558-8c66-05232ff5eee3@nvidia.com> (raw)
Hi all,
Heads-up while debugging a rare module-unload BUG on an older siw —
flagging this for the record in case it bites a future iWARP provider.
In drivers/infiniband/core/iwcm.c::cm_work_handler(), when
IWCM_F_DROP_EVENTS is set on a listener cm_id (set unconditionally by
destroy_cm_id()), any in-flight CONNECT_REQUEST work item is silently
dropped:
if (!test_bit(IWCM_F_DROP_EVENTS, &cm_id_priv->flags)) {
ret = process_event(cm_id_priv, &levent);
...
} else
pr_debug("dropping event %d\n", levent.event);
For non-CONNECT_REQUEST events that's fine — the consumer is also
tearing down. For CONNECT_REQUEST, however, the event carries
provider_data = <provider's per-connection state> (e.g. siw's CEP that
took the wait-for-CM ref via siw_cep_get() in siw_proc_mpareq()), and
there is no other release path: cm_conn_req_handler() is the only place
that turns a queued CONNECT_REQUEST into the call to iw_cm_reject() /
iw_destroy_cm_id() that gets back to the provider's iw_reject op.
Race: peer opens TCP, sends MPA REQ, immediately closes; the CEP defers
cleanup of the wait-for-CM ref because it's still expecting the consumer
to accept/reject; consumer module unload runs while that CONNECT_REQUEST
work is still queued on iwcm_wq; destroy_cm_id() sets IWCM_F_DROP_EVENTS
before iw_destroy_listen() and the subsequent flush_workqueue() runs the
work item through the silent drop branch. Provider state leaks; from the
provider's side there is no signal it could have hooked into.
A second instance of the same shape exists in cm_conn_req_handler() when
iw_create_cm_id() fails — goto out with an "ignore the request" comment,
again no provider notification.
I noticed upstream siw no longer carries the num_cep counter and
WARN_MEMBER_ATOMIC_NZ() that originally surfaced this for me, so the
leak is invisible there today. We hit it on an older siw that still
tracks num_cep and WARN_ONs in siw_device_deregister() if it isn't zero,
which is what made it observable. The underlying iwcm leak still exists
regardless; upstream siw just no longer notices.
Probably the smallest correct fix is to let cm_conn_req_handler() run
even when IWCM_F_DROP_EVENTS is set — it already has a clean code path
for listen_id_priv->state != IW_CM_STATE_LISTEN that calls
iw_cm_reject() + iw_destroy_cm_id() and never invokes the consumer's
cm_handler, so the only behavior change is that the provider gets the
iw_reject callback it was waiting for.
Happy to send a patch if there's interest, otherwise we will just accept
the leak and mask the WARN ourselves.
Thanks,
Jared Holzman
jholzman@nvidia.com
reply other threads:[~2026-05-10 19:54 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=94d9d929-7fed-4558-8c66-05232ff5eee3@nvidia.com \
--to=jholzman@nvidia.com \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox