Linux Trace Kernel
 help / color / mirror / Atom feed
From: "xiaobing.li" <xiaobing.li@samsung.com>
To: bhelgaas@google.com, logang@deltatee.com,
	m.szyprowski@samsung.com, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org
Cc: kun.dou@samsung.com, peiwei.li@samsung.com
Subject: [RFC PATCH 0/0]  PCI P2PDMA: Add observability support via tracepoints, debugfs, and sysfs.
Date: Thu, 25 Jun 2026 09:59:27 +0800	[thread overview]
Message-ID: <20260625015927.5704-1-xiaobing.li@samsung.com> (raw)
In-Reply-To: CGME20260625015930epcas5p33fa9d4833d45b53597e2994fb9ec2577@epcas5p3.samsung.com

Hi all,

The Linux kernel's P2P DMA infrastructure is already very mature, but currently it is not user-friendly in terms of metric observability. 
For example, without manually adding logs, there is no intuitive data to see how many P2P transfers, which paths are taken, 
    and how performance is. It is impossible to clearly observe P2PDMA activity from user space, making the following operations difficult:

- Diagnose the reasons why P2PDMA may not work (or perform poorly).

- Verify whether the P2PDMA mapping uses the expected type (BUS_ADDR or THRU_HOST_BRIDGE)

- Monitor the use of P2PDMA in production environments

- Detect potential memory leaks (unmapped allocations)

P2PDMA is a subtle feature. When P2PDMA mapping cannot use BUS_ADDR (Direct PCIe Switch Path), it silently falls back to the THRU_HOST_BRIDGE, 
       routing traffic to the host bridge. This significantly reduces performance (usually by 10 times or more), but it cannot be detected 
       from user space.

Therefore, I plan to export some metrics in the user space to better observe P2PDMA activity.
This series of solutions adds three layers of observability:

1. Tracepoints (5 events, optional, no overhead when disabled)

- p2p_dma_alloc: P2P memory allocation

- p2p_dma_free: P2P memory release

- p2p_dma_map: P2P DMA mapping (including client/provider, mapping type,

PCIe distance and process information)

- p2p_dma_unmap: P2P DMA removes mapping

- p2p_map_type_change: New mapping type calculations (xarray missed)

All tracking points include the calling process (comm pid), enabling P2PDMA activity tracking for each process.

Example:

$ cat /sys/kernel/debug/tracing/trace | grep p2p_dma_map

nvme[1234] map nvme0 -> p2p_mem type=BUS_ADDR dist=4

python[5678] map nvme1 -> p2p_mem type=THRU_HOST_BRIDGE dist=8

2. Debugfs (global cumulative counter, always available)

- /sys/kernel/debug/pci-p2pdma/

- 11 counters: total_mappings, bus_addr_mappings, host_bridge_mappings,

total_allocations, error_count, etc.

- Enable the calculation of the "BUS_ADDR ratio" to quantify the effectiveness of P2PDMA.

3. Sysfs (Statistical Information for Each Device, Production Environment Safety)

- /sys/bus/pci/devices/*/p2pmem/stats/

- 4 attributes: alloc_count, free_count, mapped_bytes, peak_mapped_bytes

Performance impact

- Tracking point: Static branch, zero overhead when disabled (by default).

- Debugfs/sysfs: atomic64_t counter, no locking, negligible overhead

- After disabling all observability, the P2PDMA thermal path remains unchanged


I would appreciate feedback on:

1. Is the overall solution worth implementing?
2. Is the set of tracepoints appropriate? Any events I'm missing?
3. Are the tracepoint fields sufficient for debugging?
4. Is the debugfs/sysfs interface design acceptable?
5. Any concerns about the implementation approach?

           reply	other threads:[~2026-06-25  2:01 UTC|newest]

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <CGME20260625015930epcas5p33fa9d4833d45b53597e2994fb9ec2577@epcas5p3.samsung.com>]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260625015927.5704-1-xiaobing.li@samsung.com \
    --to=xiaobing.li@samsung.com \
    --cc=bhelgaas@google.com \
    --cc=kun.dou@samsung.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=m.szyprowski@samsung.com \
    --cc=peiwei.li@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox