Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Jared Holzman <jholzman@nvidia.com>
To: linux-rdma@vger.kernel.org
Subject: iwcm: CONNECT_REQUEST events silently dropped in cm_work_handler when listener is being destroyed
Date: Sun, 10 May 2026 22:53:59 +0300	[thread overview]
Message-ID: <94d9d929-7fed-4558-8c66-05232ff5eee3@nvidia.com> (raw)

Hi all,

Heads-up while debugging a rare module-unload BUG on an older siw — 
flagging this for the record in case it bites a future iWARP provider.

In drivers/infiniband/core/iwcm.c::cm_work_handler(), when 
IWCM_F_DROP_EVENTS is set on a listener cm_id (set unconditionally by 
destroy_cm_id()), any in-flight CONNECT_REQUEST work item is silently 
dropped:

         if (!test_bit(IWCM_F_DROP_EVENTS, &cm_id_priv->flags)) {
                 ret = process_event(cm_id_priv, &levent);
                 ...
         } else
                 pr_debug("dropping event %d\n", levent.event);


For non-CONNECT_REQUEST events that's fine — the consumer is also 
tearing down. For CONNECT_REQUEST, however, the event carries 
provider_data = <provider's per-connection state> (e.g. siw's CEP that 
took the wait-for-CM ref via siw_cep_get() in siw_proc_mpareq()), and 
there is no other release path: cm_conn_req_handler() is the only place 
that turns a queued CONNECT_REQUEST into the call to iw_cm_reject() / 
iw_destroy_cm_id() that gets back to the provider's iw_reject op.

Race: peer opens TCP, sends MPA REQ, immediately closes; the CEP defers 
cleanup of the wait-for-CM ref because it's still expecting the consumer 
to accept/reject; consumer module unload runs while that CONNECT_REQUEST 
work is still queued on iwcm_wq; destroy_cm_id() sets IWCM_F_DROP_EVENTS 
before iw_destroy_listen() and the subsequent flush_workqueue() runs the 
work item through the silent drop branch. Provider state leaks; from the 
provider's side there is no signal it could have hooked into.

A second instance of the same shape exists in cm_conn_req_handler() when 
iw_create_cm_id() fails — goto out with an "ignore the request" comment, 
again no provider notification.

I noticed upstream siw no longer carries the num_cep counter and 
WARN_MEMBER_ATOMIC_NZ() that originally surfaced this for me, so the 
leak is invisible there today. We hit it on an older siw that still 
tracks num_cep and WARN_ONs in siw_device_deregister() if it isn't zero, 
which is what made it observable. The underlying iwcm leak still exists 
regardless; upstream siw just no longer notices.

Probably the smallest correct fix is to let cm_conn_req_handler() run 
even when IWCM_F_DROP_EVENTS is set — it already has a clean code path 
for listen_id_priv->state != IW_CM_STATE_LISTEN that calls 
iw_cm_reject() + iw_destroy_cm_id() and never invokes the consumer's 
cm_handler, so the only behavior change is that the provider gets the 
iw_reject callback it was waiting for.

Happy to send a patch if there's interest, otherwise we will just accept 
the leak and mask the WARN ourselves.

Thanks,

Jared Holzman

jholzman@nvidia.com


                 reply	other threads:[~2026-05-10 19:54 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=94d9d929-7fed-4558-8c66-05232ff5eee3@nvidia.com \
    --to=jholzman@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox