Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Liibaan Egal <liibaegal@gmail.com>
To: linux-rdma@vger.kernel.org
Cc: zyjzyj2000@gmail.com, jgg@ziepe.ca, leon@kernel.org,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH rdma-next 0/2] RDMA/rxe: add local implicit ODP MR support
Date: Tue, 12 May 2026 15:14:51 -0500	[thread overview]
Message-ID: <20260512201453.21156-1-liibaegal@gmail.com> (raw)

This RFC adds local-access implicit On-Demand Paging memory regions to
RXE (Soft-RoCE).

RXE already supports explicit ODP MRs. The implicit registration form
(addr == 0, length == U64_MAX, IB_ACCESS_ON_DEMAND) is recognized but
not implemented: the implicit branch in rxe_odp_mr_init_user() returns
-EINVAL through a placeholder block, and no path creates child umems
for SGE accesses on an implicit MR.

This series wires the implicit registration case through
ib_umem_odp_alloc_implicit() and routes the local SGE walker through
per-chunk child umems. The chunk size is fixed at 2 MiB
(RXE_ODP_CHILD_SHIFT = 21) and children are allocated lazily on first
access via ib_umem_odp_alloc_child(), stored in a per-MR xarray.

Patches
-------

  1/2 RDMA/rxe: add local implicit ODP MR support

      Adds rxe_odp_mr_init_implicit() (rejects remote access bits with
      -EOPNOTSUPP, allocates the parent umem). Adds rxe_odp_get_child()
      and the per-chunk loop in rxe_odp_mr_copy() and the prefetch
      path. Atomic, flush and atomic-write paths reject implicit MRs
      at the top because those helpers walk mr->umem->pfn_list
      directly which is empty for an implicit parent. rxe_mr_cleanup
      walks the child xarray and releases each child before the
      parent.

      This patch leaves IB_ODP_SUPPORT_IMPLICIT unadvertised, so
      rxe_odp_mr_init_user() still returns -EINVAL on the implicit
      form. No user-visible behavior change yet.

  2/2 RDMA/rxe: advertise IB_ODP_SUPPORT_IMPLICIT for local access

      Flip the cap bit so userspace can probe support via
      ibv_query_device. Kept as its own patch so the policy question
      is separable from the implementation.

Question for reviewers
----------------------

Patch 2/2 advertises IB_ODP_SUPPORT_IMPLICIT for a local-access-only
operation matrix. Local SGE access on implicit MRs works; remote rkey
access, atomic, flush, and atomic-write on implicit MRs do not. Is
this an acceptable use of the capability bit, or should capability
exposure wait for a broader operation matrix? Splitting the cap flip
out is meant to keep that decision separable from the implementation.

Scope and limitations
---------------------

Out of scope in this series:

- Remote rkey access on implicit MRs. Rejected at registration time
  with -EOPNOTSUPP.
- Atomic, flush, atomic-write paths. These return -EOPNOTSUPP /
  RESPST_ERR_RKEY_VIOLATION on implicit MRs.
- Child reclaim. The xarray grows monotonically per MR; a child is
  not freed until MR destroy. Long-lived implicit MRs that touch a
  sparse address space accumulate children. A reclaim mechanism is
  the natural follow-up.

Tested
------

Verified on rdma/for-next at commit 7fd2df204f34 (Linux 7.1-rc2),
arm64, Soft-RoCE over loopback:

- Registration accept/reject matrix (5 cases).
- Single-chunk 64 KiB RDMA WRITE through an implicit lkey.
- Two-chunk multi-range test: two 1 MiB WRITEs from buffers in
  different 2 MiB chunks of one implicit MR.
- Cross-chunk single-SGE test: one 128 KiB WRITE whose SGE spans a
  2 MiB chunk boundary.

Each patch builds cleanly standalone (M=drivers/infiniband/sw/rxe).

Registration latency was measured for 4 KiB to 1 GiB across explicit
and implicit forms. Explicit grows with size and fails ENOMEM at 1 GiB
on a 6 GiB host. Implicit median latency stays in the low microseconds
across all sizes; peak RSS during an implicit registration stays at
the baseline, while explicit RSS climbs with the registered size. The
benchmark measures registration-time work only; it does not
characterize first-touch or steady-state data path cost. Tests, bench
and raw numbers are in the companion repository:
https://github.com/Liibon/rxe-implicit-odp

scripts/checkpatch.pl --strict on each patch: 0 errors, 0 warnings,
0 checks.

---

Liibaan Egal (2):
  RDMA/rxe: add local implicit ODP MR support
  RDMA/rxe: advertise IB_ODP_SUPPORT_IMPLICIT for local access

 drivers/infiniband/sw/rxe/rxe.c       |   7 +-
 drivers/infiniband/sw/rxe/rxe_mr.c    |  19 +++
 drivers/infiniband/sw/rxe/rxe_odp.c   | 288 +++++++++++++++++++++++++++-------
 drivers/infiniband/sw/rxe/rxe_verbs.h |  18 +++
 4 files changed, 275 insertions(+), 57 deletions(-)

Liibaan Egal (2):
  RDMA/rxe: add local implicit ODP MR support
  RDMA/rxe: advertise IB_ODP_SUPPORT_IMPLICIT for local access

 drivers/infiniband/sw/rxe/rxe.c       |   7 +-
 drivers/infiniband/sw/rxe/rxe_mr.c    |  19 ++
 drivers/infiniband/sw/rxe/rxe_odp.c   | 288 +++++++++++++++++++++-----
 drivers/infiniband/sw/rxe/rxe_verbs.h |  18 ++
 4 files changed, 275 insertions(+), 57 deletions(-)

-- 
2.43.0


             reply	other threads:[~2026-05-12 20:14 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 20:14 Liibaan Egal [this message]
2026-05-12 20:14 ` [RFC PATCH rdma-next 1/2] RDMA/rxe: add local implicit ODP MR support Liibaan Egal
2026-05-12 20:40   ` Liibaan Egal
2026-05-12 20:14 ` [RFC PATCH rdma-next 2/2] RDMA/rxe: advertise IB_ODP_SUPPORT_IMPLICIT for local access Liibaan Egal
2026-05-12 22:56 ` [RFC PATCH rdma-next 0/2] RDMA/rxe: add local implicit ODP MR support yanjun.zhu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260512201453.21156-1-liibaegal@gmail.com \
    --to=liibaegal@gmail.com \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox