From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BB8CE314D37 for ; Wed, 1 Apr 2026 19:45:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775072709; cv=none; b=GNbLrdyLk8wJZc1urJUa6W7Yl88UfDP1AhFDLPROyVE3uGa825V/Q/9MmGg0UlCzBNiYRaSdg6U6Zb+imzWrXzSjl2Vu7iQqTCux/fSqVGDQd/xYo8qr7kGsHpd+l/2tJdHlx4uAQcB/0sDoaYki7EVaEn1FGQzOhYVb8LTNg/o= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775072709; c=relaxed/simple; bh=rRO3oXxMQ5K+aJS3fz/IzsEi+KzV3CVpOeh/nVrRirw=; h=Date:Mime-Version:Message-ID:Subject:From:To:Cc:Content-Type; b=UKt68cgBck1ux/3+JQbCeF2x5lMjgjlVYMDjGsHNE3XV45hrlCg9W5qVsyiHmn94ZFPDbfUq6lbmOxbOVu+bqfNUPgviMiEZm0ck/1hWsb5vIsHzLVTCkF+wOPO4AIYgpPcuq/J8kDMhcHwVS6ddIj7HAydXOFVXBMbkJZ2S96I= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--praan.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=A6Dd56uL; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--praan.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="A6Dd56uL" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2b258636d16so643935ad.2 for ; Wed, 01 Apr 2026 12:45:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1775072707; x=1775677507; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=KOVihVZmvAI72pQYZS6lCrpErGIBVs1asnwx+4bn+hA=; b=A6Dd56uLHed01H/Iuw+2YxYPg7mwDwVHU5XmKBzPq8qMiByS3sBHfTBPHvyB44xJ7X A3qP4mBGE34Ga4ErK+mk1Qh00MTC/JmDjrLjaxqy8wx9LKtUs7NJiK9cy7/05oMU1w2/ m4xKd9macxX27RcuTrgQjC0NoD05cHem8dq+yIfwh03TtcivX8NBjc2hfLVuDg21Mjq4 /KtFcJfKfr6V6YV5uwHSMQvckCy7pDamWwxEZulBsUu9tERlD16bwM32jiRB6J+oBuFZ x4DG/tDDvXj1S10moXawrPE9w3V3UvHeAhMggYFSA4A7HnN0CAWmMydNrYvCyoVVZZvR Wnbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775072707; x=1775677507; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=KOVihVZmvAI72pQYZS6lCrpErGIBVs1asnwx+4bn+hA=; b=FfFpmwazwxRPQ+kRHwf7Rbmsbt2ttuS1J3SLtBPD8oO+LoK8Qvjgu+AKpmcwjwCBAg nSEyGW7AwpdIRalFdX6gDkMpfXlHALSoKQK2/lVkoXxiA5v9aJIhJecd5Ea3p3qXCyUf s1rD4l7hUUoJgP0Lv+Ray03S3j3prbpmBtTj7PwM/ZZvUXgY3yqFH/56hoMXd/pS5147 rOuXkr/dohW+6aEtcT7d6xBVVIQoX6iw2oUqoi5AXS3ifMLaL0g7z0SsSnSl5dYfd4f4 znuQFHPO9JxuUSSTvfYM/4X+yrLZ3hqQhWMg21y1wHVi1bOMT/+YbmXaZh/IpFYkZwsY /DBQ== X-Forwarded-Encrypted: i=1; AJvYcCWRE914waTAp1ikieTRh7b5IzdWo/761DnG1Z/TCdMV4do8ke52PnnNbwuRr+pmqgWLTu67Rc0=@vger.kernel.org X-Gm-Message-State: AOJu0Yy21oH+sz0B/3pzLW+iluA/w368gnejE17X2JUO4e1vE58t3Lgh +SDCFu6MoVvBteBR36iQZiFIQR5Kgx7aBmy00Skzkk49oE7kg5hLFnbgoQc2uF/yk2XAn1oNE5G ufw== X-Received: from plbkc3.prod.google.com ([2002:a17:903:33c3:b0:2b0:a8dc:303f]) (user=praan job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:ebc3:b0:2b2:5503:1b8c with SMTP id d9443c01a7336-2b269abcbe7mr51526225ad.11.1775072706837; Wed, 01 Apr 2026 12:45:06 -0700 (PDT) Date: Wed, 1 Apr 2026 19:44:56 +0000 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Mailer: git-send-email 2.53.0.1185.g05d4b7b318-goog Message-ID: <20260401194501.2269200-1-praan@google.com> Subject: [RFC PATCH 0/4] nfs: Enable PCI Peer-to-Peer DMA (P2PDMA) support From: Pranjal Shrivastava To: trond.myklebust@hammerspace.com, anna@kernel.org Cc: davem@davemloft.net, kuba@kernel.org, edumazet@google.com, pabeni@redhat.com, chuck.lever@oracle.com, jlayton@kernel.org, tom@talpey.com, okorniev@redhat.com, neil@brown.name, dai.ngo@oracle.com, linux-nfs@vger.kernel.org, netdev@vger.kernel.org, Pranjal Shrivastava Content-Type: text/plain; charset="UTF-8" As high-performance storage environments increasingly rely on direct data movement between PCIe endpoints (e.g., moving data directly between an NVMe Controller Memory Buffer and a Network Interface), support for Peer-to-Peer DMA (P2PDMA) in the network filesystem layer becomes essential. This series introduces P2PDMA support for the NFS Direct I/O. Currently, NFS O_DIRECT operations fail with -EREMOTEIO if the user buffer resides in PCIe BAR memory. This is primarily due to the use of the legacy `iov_iter_get_pages_alloc2()` API, which cannot pass the required `FOLL_PCI_P2PDMA` flag, and a request lifecycle that is unaware of the pinning requirements for P2P memory. Design ======= The proposed design centers around making the NFS request lifecycle "pin-aware" and upgrading the infrastructure to support modern memory extraction APIs. 1. 64-bit Capability Infrastructure The existing nfs_server->caps bitmask is limited to 32 bits and is currently exhausted. This series expands the bitmask to 64 bits to accommodate NFS_CAP_P2PDMA. Crucially, it also refactors the NFS_CAP_* constants to use ULL definitions. This prevents a subtle 32-bit truncation bug where bitwise negations (e.g., caps &= ~NFS_CAP_ACLS) would accidentally clear the high bits of the 64-bit capability field. 2. Transport-Level Detection P2PDMA support is a property of the local transport hardware. A new supports_p2pdma operation is added to the SunRPC transport ops. For RDMA, this is implemented by querying the underlying device via ib_dma_pci_p2p_dma_supported(). The NFS client queries this during mount and sets the NFS_CAP_P2PDMA bit accordingly. 3. Pin-Aware Request Lifecycle Standard NFS requests use get_page() and put_page() for memory management. However, memory extracted via iov_iter_extract_pages() requires explicit pinning and unpinning (unpin_user_page()). This series introduces a PG_PINNED flag in struct nfs_page. When set, the request lifecycle skips standard page referencing and ensures that unpin_user_page() is called only when the I/O is complete. This ensures that physical memory remains pinned for the duration of the DMA transfer 4. API Migration The Direct I/O path is migrated to the modern iov_iter_extract_pages() API. The ITER_ALLOW_P2PDMA flag is passed to the iterator only when the local mount has signaled P2P support via the capability bit. This ensures that "normal" users on standard TCP/UDP transports see no change in behavior or overhead. Call for review =============== Any insights on the proposed changes to the nfs_page lifecycle and the 64-bit capability expansion are appreciated. If this approach is deemed incorrect or if there is a more idiomatic way for this, please direct me in the right direction. Thanks, Praan Pranjal Shrivastava (4): sunrpc: add supports_p2pdma to rpc_xprt_ops nfs: add NFS_CAP_P2PDMA and detect transport support nfs: make nfs_page pin-aware nfs: allow P2PDMA in direct I/O path fs/nfs/client.c | 8 ++++ fs/nfs/direct.c | 51 ++++++++++++++++++------- fs/nfs/nfs4_fs.h | 2 +- fs/nfs/pagelist.c | 18 ++++++--- fs/nfs/super.c | 2 +- include/linux/nfs_fs_sb.h | 67 +++++++++++++++++---------------- include/linux/nfs_page.h | 2 + include/linux/sunrpc/xprt.h | 1 + net/sunrpc/xprtrdma/transport.c | 9 +++++ 9 files changed, 106 insertions(+), 54 deletions(-) -- 2.53.0.1185.g05d4b7b318-goog