From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-oi1-f182.google.com (mail-oi1-f182.google.com [209.85.167.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 015103ACF0F for ; Tue, 12 May 2026 20:14:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.182 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778616900; cv=none; b=V515Q2po0uYzyT/Dfnzfj3SENpfsrjUYVZTjDpx6Z75YPLh5c6knksg5RWVb+7F678VTzIjKJdWKmk0uhU0+LKtO+Rlr9PJnV7XcMAmEMYUKtk01KP1VGjXTZfCniRnX0KFuVN0N7kfyq2MKymHq8Uv0yBEEerOFK8laRmyUebk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778616900; c=relaxed/simple; bh=evijYqI5MvA/bUAUn9wUlofA87BvntWhJzV7Ij9roZc=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=OyEHRsa9kkutnKEndCXRSW5K81BF9KX+1pfXILsqH5Hi+O4KnLCRcnKaRrHp82dnC9YzB2gNBfK7nesPNhXP+I2Ps0gTY+ZgPsJD3FEb0dbTDaHfp8hGkf4vxyichpMHJ+I8M+YqLXdDa+IZGd4cslDyt9ANlbFwAON1veSD7FQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XmUPQdEe; arc=none smtp.client-ip=209.85.167.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XmUPQdEe" Received: by mail-oi1-f182.google.com with SMTP id 5614622812f47-479d85152c9so2473972b6e.2 for ; Tue, 12 May 2026 13:14:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778616896; x=1779221696; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=7Zo4wfCS7DovMthnNqpPeS7jpyUGWNT1RN2DCXg0Rsk=; b=XmUPQdEekxWNkxGKwxghg3/9/2n3PGH5xY5vQgctA+2jWQx30NKzMA8V/xc0gKZmjR FjBff1h9Efr3DsJNsYJCFunr0RU3qN04lmzNvVRsE/A3RYRUxkjVSJhHomXFzelIcczI oN98ROIlZNX5inP2oFHQq74fSdYA3RUz77QVKJ6gJz4lPBjoq9t21rYHZhxAMgNiVvUT EtIkd690HsE2/OOTTiIPPgSV0MNNS0nGdJbl+WvERZ1t/D7HLhM/Ziq9DENeLpfhXumY k1ibu5HeCAZG7oJN/x/e/4F1uA86g/hOLZbzkOnuIhTVTgKvPHIJ7+2qfR9nYQTIgwE2 QI1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778616896; x=1779221696; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=7Zo4wfCS7DovMthnNqpPeS7jpyUGWNT1RN2DCXg0Rsk=; b=dj8aXCu8ZNyldgxhyjDLD4q7cikznIKp+ry7zI8n677ENKLbKzC3AAxyqfw9LiBT5y Mopsl5RbVt+D6vWk7et1Yi2ssQ75tTc4BOpG/q4Nxbw8WYWQVlkb0+SKp6Zh6P5v2LSm UI4HsofdPQUP/GHHg5lyS5gBZC3VLJSLaJc9SUQa7PbcG7Ph5eS+zJJzGt11Q7slD+/e VNeK83e2cs5tmohTTyrdNqjyZLObPIy4egpq+V/6UTVioTEXzNq+L76CbexHfYFaiUeD oxCOJu6LrBBYBpoCuS8JPsTkAwPF3nXVCQamxUv8YisZNKgyG4CLZ+/ZU6JnQymZ54Cm Ye1g== X-Forwarded-Encrypted: i=1; AFNElJ9ckPQgKoGYPaL3LWJ/E5i5dMu5gHNYIK694tf4HZz+uEijGMd6fmurbjc49LHe9SgQIemLidnQLf0bYrE=@vger.kernel.org X-Gm-Message-State: AOJu0Yw7Kn4h5QTO+4Iu8t+VeIT5vu3TnF2Kr69gf76wT3O2nQKLzsi8 FOXN1DTUD/yE1k1hGAmGYGVs2JhpeOZV4ct0cUKj1xl+mCkhBkA5+8z7PjSpOzWt X-Gm-Gg: Acq92OHNGQ2Nr6iD9d/v0ZtIG1lgu2c9DXUc5gYv9FThGygsWqjS81MMvZKPx1HdP3h QDSHI8sUXeRYpTj1JGXJTy2Y5MHnKK0dDiV9G1E9qg623dv0DXr1GD7AZgBq6+tNI9WBuQpYE/D fhgnBRZvQHB7mh0T7pANdVDLp/1YLgKNXg+y4MoIuCCq6VF8D7Az+eFzkufcuXPk8VU8JaLpl2r qxJSjxqArHzqC+uXHtH7+fgL4cs0Hc9jT43D5iXl1HnAUI9c0VbvIdMb4CAoNLpwVUR7rC4Dc9q 24CjHr4Ohjq+/Fg0N5m4LnVe/AQZXqx5S2sbJ2ouvwB3U4D9NScSzsWnHaXjdPk9M21wwcSoP3N GCkrWZ8cplGiuy2ZWE1XNQnL1MwXjziF1Y7FkV/V8xU5orTBDLGVURdZ7kE6d9Z2wLkyCt1UaGc OfcgSvA7KVwCGBgkNatRpBgjBqhXhLb/33mRM5QXCUEzATWK2LL126Z6MS X-Received: by 2002:a05:6808:238c:b0:467:bfa:bdb2 with SMTP id 5614622812f47-482b2bbe682mr401098b6e.25.1778616896285; Tue, 12 May 2026 13:14:56 -0700 (PDT) Received: from localhost.localdomain ([2600:1014:b0b0:a3c6:a82b:c292:fd90:24d0]) by smtp.gmail.com with ESMTPSA id 5614622812f47-47c76935904sm22800623b6e.11.2026.05.12.13.14.55 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 12 May 2026 13:14:55 -0700 (PDT) From: Liibaan Egal To: linux-rdma@vger.kernel.org Cc: zyjzyj2000@gmail.com, jgg@ziepe.ca, leon@kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH rdma-next 0/2] RDMA/rxe: add local implicit ODP MR support Date: Tue, 12 May 2026 15:14:51 -0500 Message-Id: <20260512201453.21156-1-liibaegal@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit This RFC adds local-access implicit On-Demand Paging memory regions to RXE (Soft-RoCE). RXE already supports explicit ODP MRs. The implicit registration form (addr == 0, length == U64_MAX, IB_ACCESS_ON_DEMAND) is recognized but not implemented: the implicit branch in rxe_odp_mr_init_user() returns -EINVAL through a placeholder block, and no path creates child umems for SGE accesses on an implicit MR. This series wires the implicit registration case through ib_umem_odp_alloc_implicit() and routes the local SGE walker through per-chunk child umems. The chunk size is fixed at 2 MiB (RXE_ODP_CHILD_SHIFT = 21) and children are allocated lazily on first access via ib_umem_odp_alloc_child(), stored in a per-MR xarray. Patches ------- 1/2 RDMA/rxe: add local implicit ODP MR support Adds rxe_odp_mr_init_implicit() (rejects remote access bits with -EOPNOTSUPP, allocates the parent umem). Adds rxe_odp_get_child() and the per-chunk loop in rxe_odp_mr_copy() and the prefetch path. Atomic, flush and atomic-write paths reject implicit MRs at the top because those helpers walk mr->umem->pfn_list directly which is empty for an implicit parent. rxe_mr_cleanup walks the child xarray and releases each child before the parent. This patch leaves IB_ODP_SUPPORT_IMPLICIT unadvertised, so rxe_odp_mr_init_user() still returns -EINVAL on the implicit form. No user-visible behavior change yet. 2/2 RDMA/rxe: advertise IB_ODP_SUPPORT_IMPLICIT for local access Flip the cap bit so userspace can probe support via ibv_query_device. Kept as its own patch so the policy question is separable from the implementation. Question for reviewers ---------------------- Patch 2/2 advertises IB_ODP_SUPPORT_IMPLICIT for a local-access-only operation matrix. Local SGE access on implicit MRs works; remote rkey access, atomic, flush, and atomic-write on implicit MRs do not. Is this an acceptable use of the capability bit, or should capability exposure wait for a broader operation matrix? Splitting the cap flip out is meant to keep that decision separable from the implementation. Scope and limitations --------------------- Out of scope in this series: - Remote rkey access on implicit MRs. Rejected at registration time with -EOPNOTSUPP. - Atomic, flush, atomic-write paths. These return -EOPNOTSUPP / RESPST_ERR_RKEY_VIOLATION on implicit MRs. - Child reclaim. The xarray grows monotonically per MR; a child is not freed until MR destroy. Long-lived implicit MRs that touch a sparse address space accumulate children. A reclaim mechanism is the natural follow-up. Tested ------ Verified on rdma/for-next at commit 7fd2df204f34 (Linux 7.1-rc2), arm64, Soft-RoCE over loopback: - Registration accept/reject matrix (5 cases). - Single-chunk 64 KiB RDMA WRITE through an implicit lkey. - Two-chunk multi-range test: two 1 MiB WRITEs from buffers in different 2 MiB chunks of one implicit MR. - Cross-chunk single-SGE test: one 128 KiB WRITE whose SGE spans a 2 MiB chunk boundary. Each patch builds cleanly standalone (M=drivers/infiniband/sw/rxe). Registration latency was measured for 4 KiB to 1 GiB across explicit and implicit forms. Explicit grows with size and fails ENOMEM at 1 GiB on a 6 GiB host. Implicit median latency stays in the low microseconds across all sizes; peak RSS during an implicit registration stays at the baseline, while explicit RSS climbs with the registered size. The benchmark measures registration-time work only; it does not characterize first-touch or steady-state data path cost. Tests, bench and raw numbers are in the companion repository: https://github.com/Liibon/rxe-implicit-odp scripts/checkpatch.pl --strict on each patch: 0 errors, 0 warnings, 0 checks. --- Liibaan Egal (2): RDMA/rxe: add local implicit ODP MR support RDMA/rxe: advertise IB_ODP_SUPPORT_IMPLICIT for local access drivers/infiniband/sw/rxe/rxe.c | 7 +- drivers/infiniband/sw/rxe/rxe_mr.c | 19 +++ drivers/infiniband/sw/rxe/rxe_odp.c | 288 +++++++++++++++++++++++++++------- drivers/infiniband/sw/rxe/rxe_verbs.h | 18 +++ 4 files changed, 275 insertions(+), 57 deletions(-) Liibaan Egal (2): RDMA/rxe: add local implicit ODP MR support RDMA/rxe: advertise IB_ODP_SUPPORT_IMPLICIT for local access drivers/infiniband/sw/rxe/rxe.c | 7 +- drivers/infiniband/sw/rxe/rxe_mr.c | 19 ++ drivers/infiniband/sw/rxe/rxe_odp.c | 288 +++++++++++++++++++++----- drivers/infiniband/sw/rxe/rxe_verbs.h | 18 ++ 4 files changed, 275 insertions(+), 57 deletions(-) -- 2.43.0