All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] cgroup/rdma: add rdma.peak and rdma.events[.local]
@ 2026-05-12  3:17 Tao Cui
  2026-05-12  3:17 ` [PATCH 1/3] cgroup/rdma: add rdma.peak for per-device peak usage tracking Tao Cui
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Tao Cui @ 2026-05-12  3:17 UTC (permalink / raw)
  To: tj, hannes, mkoutny, cgroups; +Cc: Tao Cui

Hi,

This series adds three new cgroup interface files to the RDMA controller
to improve observability of resource usage and limit enforcement:

  - rdma.peak:        per-device high watermark of resource usage
  - rdma.events:      hierarchical max event counters
  - rdma.events.local: per-cgroup local max and failcnt counters

Why these interfaces?

Currently rdma.current only shows the instantaneous resource usage per
device.  Administrators who need to set appropriate rdma.max limits have
no way to observe usage spikes or detect when limits are being hit.

rdma.peak addresses the observability gap: it tracks the historical high
watermark so administrators can determine a sensible rdma.max based on
actual peak demand rather than guesswork.  This is directly analogous to
memory.peak.

rdma.events and rdma.events.local address the notification gap: they
provide per-device counters that track how often resource limits block
allocations, and can be monitored via poll/epoll for real-time alerting
when a cgroup hits its rdma.max.  This follows the pids.events /
pids.events.local design, where events are attributed to the cgroup
whose limit was exceeded rather than the cgroup where the allocation was
attempted.

Patch overview:

  Patch 1: rdma.peak
    Adds peak tracking in the charge path and the rdma.peak interface
    file.  rpools are kept alive while peak is non-zero so the values
    persist as historical records.

  Patch 2: rdma.events
    Adds hierarchical max event counters that propagate upward from the
    cgroup whose limit was hit.  Introduces rdmacg_event_locked() and
    the rdma.events interface file with poll notification support.

  Patch 3: rdma.events.local
    Extends the event infrastructure with per-cgroup local counters:
    local max counts how often this cgroup's limit blocked an allocation,
    failcnt counts how often allocations from this subtree were rejected.
    Adds the rdma.events.local interface file.

These patches have been tested locally.

Tao Cui (3):
  cgroup/rdma: add rdma.peak for per-device peak usage tracking
  cgroup/rdma: add rdma.events to track resource limit exhaustion
  cgroup/rdma: add rdma.events.local for per-cgroup allocation failure
    attribution

 include/linux/cgroup_rdma.h |   4 +
 kernel/cgroup/rdma.c        | 165 +++++++++++++++++++++++++++++++++++-
 2 files changed, 165 insertions(+), 4 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-13  1:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-12  3:17 [PATCH 0/3] cgroup/rdma: add rdma.peak and rdma.events[.local] Tao Cui
2026-05-12  3:17 ` [PATCH 1/3] cgroup/rdma: add rdma.peak for per-device peak usage tracking Tao Cui
2026-05-12  3:17 ` [PATCH 2/3] cgroup/rdma: add rdma.events to track resource limit exhaustion Tao Cui
2026-05-12  3:17 ` [PATCH 3/3] cgroup/rdma: add rdma.events.local for per-cgroup allocation failure attribution Tao Cui
2026-05-12 17:49 ` [PATCH 0/3] cgroup/rdma: add rdma.peak and rdma.events[.local] Tejun Heo
2026-05-13  1:51   ` Tao Cui

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.