All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Hou Tao <houtao@huaweicloud.com>
Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-mm@kvack.org, linux-nvme@lists.infradead.org,
	Bjorn Helgaas <bhelgaas@google.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	Alistair Popple <apopple@nvidia.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Tejun Heo <tj@kernel.org>,
	"Rafael J . Wysocki" <rafael@kernel.org>,
	Danilo Krummrich <dakr@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>,
	houtao1@huawei.com
Subject: Re: [PATCH 00/13] Enable compound page for p2pdma memory
Date: Wed, 24 Dec 2025 11:22:43 +0200	[thread overview]
Message-ID: <20251224092243.GG11869@unreal> (raw)
In-Reply-To: <996c64ca-8e97-2143-9227-ce65b89ae35e@huaweicloud.com>

On Wed, Dec 24, 2025 at 09:37:39AM +0800, Hou Tao wrote:
> 
> 
> On 12/24/2025 9:18 AM, Hou Tao wrote:
> > Hi,
> >
> > On 12/21/2025 8:19 PM, Leon Romanovsky wrote:
> >> On Sat, Dec 20, 2025 at 12:04:33PM +0800, Hou Tao wrote:
> >>> From: Hou Tao <houtao1@huawei.com>
> >>>
> >>> Hi,
> >>>
> >>> device-dax has already supported compound page. It not only reduces the
> >>> cost of struct page significantly, it also improve the performance of
> >>> get_user_pages when 2MB or 1GB page size is used. We are experimenting
> >>> to use p2p dma to directly transfer the content of NVMe SSD into NPU.
> >> I’ll admit my understanding here is limited, and lately everything tends  
> >> to look like a DMABUF problem to me. Could you explain why DMABUF support 
> >> is not being used for this use case?
> > I have limited knowledge of dma-buf, so correct me if I am wrong. It
> > seems that as for now there is no available way to use the dma-buf to
> > read/write files. For the userspace vaddr backended by  the dma-buf, it
> > is a PFN mapping, get_user_pages() will reject such address.
> 
> Hit the send button too soon :) So In my understanding, the advantage of
> dma-buf is that it doesn't need struct page.

The primary advantage of dma-buf is that it provides a safe mechanism for
sharing a DMA region between devices or subsystems. This allows reliable
p2p communication between two devices. For example, a GPU and an RDMA NIC
can share a memory region for data transfer.

The ability to operate without a struct page is an important part of this
design.

> and it also means that it needs special handling to support IO
> from/to dma-buf (e.g.,  [RFC v2 00/11] Add dmabuf read/write via io_uring [1])

It looks like that read/write support is needed for IO data transfer,
but you talked about CMB.

I would imagine that NVMe exported CMB through dmabuf and your NPU will
import it without need to do any read/write at all.

Thanks

> 
> [1]
> https://lore.kernel.org/io-uring/cover.1763725387.git.asml.silence@gmail.com/
> >> Thanks
> >>
> >>> The size of NPU HBM is 32GB or larger and there are at most 8 NPUs in
> >>> the host. When using the base page, the memory overhead is about 4GB for
> >>> 128GB HBM, and the mapping of 32GB HBM into userspace takes about 0.8
> >>> second. Considering ZONE_DEVICE memory type has already supported the
> >>> compound page, enabling the compound page support for p2pdma memory as
> >>> well. After applying the patch set, when using the 1GB page, the memory
> >>> overhead is about 2MB and the mmap costs about 0.04 ms.
> >>>
> >>> The main difference between the compound page support of device-dax and
> >>> p2pdma is that p2pdma inserts the page into user vma during mmap instead
> >>> of page fault. The main reason is simplicity. The patch set is
> >>> structured as shown below:
> >>>
> >>> Patch #1~#2: tiny bug fixes for p2pdma
> >>> Patch #3~#5: add callbacks support in kernfs and sysfs, include
> >>> pagesize, may_split and get_unmapped_area. These callbacks are necessary
> >>> for the support of compound page when mmaping sysfs binary file.
> >>> Patch #6~#7: create compound page for p2pdma memory in the kernel. 
> >>> Patch #8~#10: support the mapping of compound page in userspace. 
> >>> Patch #11~#12: support the compound page for NVMe CMB.
> >>> Patch #13: enable the support for compound page for p2pdma memory.
> >>>
> >>> Please see individual patches for more details. Comments and
> >>> suggestions are always welcome.
> >>>
> >>> Hou Tao (13):
> >>>   PCI/P2PDMA: Release the per-cpu ref of pgmap when vm_insert_page()
> >>>     fails
> >>>   PCI/P2PDMA: Fix the warning condition in p2pmem_alloc_mmap()
> >>>   kernfs: add support for get_unmapped_area callback
> >>>   kernfs: add support for may_split and pagesize callbacks
> >>>   sysfs: support get_unmapped_area callback for binary file
> >>>   PCI/P2PDMA: add align parameter for pci_p2pdma_add_resource()
> >>>   PCI/P2PDMA: create compound page for aligned p2pdma memory
> >>>   mm/huge_memory: add helpers to insert huge page during mmap
> >>>   PCI/P2PDMA: support get_unmapped_area to return aligned vaddr
> >>>   PCI/P2PDMA: support compound page in p2pmem_alloc_mmap()
> >>>   PCI/P2PDMA: add helper pci_p2pdma_max_pagemap_align()
> >>>   nvme-pci: introduce cmb_devmap_align module parameter
> >>>   PCI/P2PDMA: enable compound page support for p2pdma memory
> >>>
> >>>  drivers/accel/habanalabs/common/hldio.c |   3 +-
> >>>  drivers/nvme/host/pci.c                 |  10 +-
> >>>  drivers/pci/p2pdma.c                    | 140 ++++++++++++++++++++++--
> >>>  fs/kernfs/file.c                        |  79 +++++++++++++
> >>>  fs/sysfs/file.c                         |  15 +++
> >>>  include/linux/huge_mm.h                 |   4 +
> >>>  include/linux/kernfs.h                  |   3 +
> >>>  include/linux/pci-p2pdma.h              |  30 ++++-
> >>>  include/linux/sysfs.h                   |   4 +
> >>>  mm/huge_memory.c                        |  66 +++++++++++
> >>>  10 files changed, 339 insertions(+), 15 deletions(-)
> >>>
> >>> -- 
> >>> 2.29.2
> >>>
> >>>
> 
> 


      reply	other threads:[~2025-12-24  9:22 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-20  4:04 [PATCH 00/13] Enable compound page for p2pdma memory Hou Tao
2025-12-20  4:04 ` [PATCH 01/13] PCI/P2PDMA: Release the per-cpu ref of pgmap when vm_insert_page() fails Hou Tao
2025-12-22 16:49   ` Logan Gunthorpe
2026-01-08  3:23   ` Alistair Popple
2026-01-08 15:55     ` Bjorn Helgaas
2026-01-09  0:41       ` Alistair Popple
2026-01-09 15:03         ` Bjorn Helgaas
2026-01-11 23:21           ` Alistair Popple
2026-01-12  0:12             ` Alistair Popple
2026-01-12  0:23               ` Alistair Popple
2025-12-20  4:04 ` [PATCH 02/13] PCI/P2PDMA: Fix the warning condition in p2pmem_alloc_mmap() Hou Tao
2025-12-22 16:50   ` Logan Gunthorpe
2026-01-07 14:39     ` Christoph Hellwig
2026-01-07 17:17       ` Bjorn Helgaas
2026-01-07 20:34         ` Bjorn Helgaas
2026-01-08 10:17           ` Christoph Hellwig
2026-01-08  3:28   ` Alistair Popple
2025-12-20  4:04 ` [PATCH 03/13] kernfs: add support for get_unmapped_area callback Hou Tao
2025-12-20 15:43   ` kernel test robot
2025-12-20 15:57   ` kernel test robot
2025-12-20  4:04 ` [PATCH 04/13] kernfs: add support for may_split and pagesize callbacks Hou Tao
2025-12-20  4:04 ` [PATCH 05/13] sysfs: support get_unmapped_area callback for binary file Hou Tao
2025-12-20  4:04 ` [PATCH 06/13] PCI/P2PDMA: add align parameter for pci_p2pdma_add_resource() Hou Tao
2025-12-20  4:04 ` [PATCH 07/13] PCI/P2PDMA: create compound page for aligned p2pdma memory Hou Tao
2026-01-08  5:14   ` Alistair Popple
2025-12-20  4:04 ` [PATCH 08/13] mm/huge_memory: add helpers to insert huge page during mmap Hou Tao
2025-12-20  4:04 ` [PATCH 09/13] PCI/P2PDMA: support get_unmapped_area to return aligned vaddr Hou Tao
2025-12-20  4:04 ` [PATCH 10/13] PCI/P2PDMA: support compound page in p2pmem_alloc_mmap() Hou Tao
2025-12-22 17:04   ` Logan Gunthorpe
2025-12-24  2:20     ` Hou Tao
2026-01-05 17:24       ` Logan Gunthorpe
2026-01-07 20:24     ` Jason Gunthorpe
2026-01-07 21:22       ` Logan Gunthorpe
2026-01-08  5:20   ` Alistair Popple
2025-12-20  4:04 ` [PATCH 11/13] PCI/P2PDMA: add helper pci_p2pdma_max_pagemap_align() Hou Tao
2025-12-20  4:04 ` [PATCH 12/13] nvme-pci: introduce cmb_devmap_align module parameter Hou Tao
2025-12-20 22:22   ` kernel test robot
2025-12-20  4:04 ` [PATCH 13/13] PCI/P2PDMA: enable compound page support for p2pdma memory Hou Tao
2025-12-22 17:10   ` Logan Gunthorpe
2025-12-21 12:19 ` [PATCH 00/13] Enable compound page " Leon Romanovsky
     [not found]   ` <416b2575-f5e7-7faf-9e7c-6e9df170bf1a@huaweicloud.com>
2025-12-24  1:37     ` Hou Tao
2025-12-24  9:22       ` Leon Romanovsky [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251224092243.GG11869@unreal \
    --to=leon@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=axboe@kernel.dk \
    --cc=bhelgaas@google.com \
    --cc=dakr@kernel.org \
    --cc=david@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=houtao1@huawei.com \
    --cc=houtao@huaweicloud.com \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=rafael@kernel.org \
    --cc=sagi@grimberg.me \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.